In a society driven by visual information and with the drastic expansion of low-priced cameras, text recognition is nowadays a fast changing field. In particular, natural scene text understanding aiming at extracting text from daily images is the main concern of this text. From text extraction to correction of recognition errors, each sub-step is deeply studied to enhance versatility for handling most images, even the most complex ones. Either in color camera-based images or in low resolution thumbnails, inherent degradations, such as complex backgrounds, artistic fonts, uneven lighting or unsatisfactory resolution, must be taken into account. In order to circumvent or correct them, studies of image formation and degradation sources challengingly led to overcome too constrained definitions of color spaces. Hence the selective metric text extraction attempts to combine magnitude and directional processing of colors in an unsupervised framework. Text extraction from background is simultaneously linked to subsequent steps of character segmentation and recognition. This intermingled chain mainly aims at combining color, intensity and spatial information of pixels for robustness and accuracy. Each of these features addresses different issues; the first one for text extraction and the two latter ones for recovering initial separation between characters through log-Gabor filtering. In order to reach higher quality results, pre- and postprocessing of natural scene text understanding are necessary and deal with Teager-based super-resolution, assuming a simple affine motion between frames with the SURETEXT proposition for the first one and with association of recognition outputs and linguistic information through lightweight finite state machines for the second one. In the final part of each step, results are clearly mentioned to highlight effectiveness of the methods. Moreover, several databases, to be independent of a particular one, and a public and renowned data set, are used to assess results and compare them with recent and competing lgorithms. Finally a large discussion is opened through presented achievements of this text and required future extensions in natural scene text understanding to complete exciting applications, such as reading tool for visually impaired or innovative web images search engines in a life-log context!
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgement . . . . . . . . . . . . . . . . . . . . . xi
Table of Contents . . . . . . . . . . . . . . . . . . . . . . xiv
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . xxv
1 Introduction 1
1.1 Current Document Analysis . . . . . . . . . . . . . 1
1.2 What is Natural Scene Text? . . . . . . . . . . . . 2
1.3 Numerous Applications . . . . . . . . . . . . . . . 5
1.4 Text Understanding System: Main Steps . . . . . . 7
1.5 Challenges and Overview of Problem Bounds . . . 9
1.6 Overall Structure . . . . . . . . . . . . . . . . . . . 10
2 Image Formation and Representation 13
2.1 Image Formation: Why do Colors Vary for the same Object? . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Light . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Object . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Camera . . . . . . . . . . . . . . . . . . . . 17
2.2 Image Representation: Why do Different Color Spaces Exist? . . . . . . . . . . . . . . . . . . . . . 18
2.3 To Summarize... . . . . . . . . . . . . . . . . . . . 22
3 Background and Literature Survey of Text Understanding 23
3.1 State-of-the-Art of Text Extraction . . . . . . . . . 23
3.1.1 Thresholding-based methods . . . . . . . . 24
3.1.2 Grouping-based methods . . . . . . . . . . 27
3.1.3 Extensively used clustering methods in text extraction . . . . . . . . . . . . . . . . . . . 30
3.1.4 Challenges . . . . . . . . . . . . . . . . . . 34
3.2 Required Pre- and Post-Processing Steps for Efficient Text Understanding . . . . . . . . . . . . . . 34
3.2.1 Pre-processing steps of text extraction . . . 35
3.2.2 Post-processing steps of text extraction . . 37
3.2.3 Challenges . . . . . . . . . . . . . . . . . . 39
4 Text Understanding System 41
4.1 Text Understanding Chain . . . . . . . . . . . . . . 41
4.2 Material and Databases . . . . . . . . . . . . . . . 44
5 Resolution Enhancement 47
5.1 Resolution Enhancement for Still Images . . . . . . 48
5.2 Super-Resolution for Video Frames . . . . . . . . . 49
5.2.1 Context of super-resolution algorithms . . . 50
5.2.2 Color super-resolution text . . . . . . . . . 61
5.3 SURETEXT - Super-Resolution Text . . . . . . . 62
5.3.1 Motion estimation using the Taylor series . 62
5.3.2 Unsharp masking using the Teager filter . . 64
5.3.3 Outlier frame removal . . . . . . . . . . . . 66
5.3.4 Median denoising . . . . . . . . . . . . . . . 66
5.4 Experiments and Results . . . . . . . . . . . . . . . 67
5.4.1 Evaluation of SURETEXT . . . . . . . . . 67
5.4.2 Comparison with state-of-the-art SR algorithms . . . . . . . . . . . . . . . . . . . . . 71
5.4.3 Computation cost . . . . . . . . . . . . . . 72
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . 73
6 Text Extraction 75
6.1 Impact of Color Spaces and Clustering Algorithms 75
6.1.1 Is there a better color space for NS text extraction? . . . . . . . . . . . . . . . . . . . 75
6.1.2 Considerations on different clustering algorithms . . . . . . . . . . . . . . . . . . . . . 77
6.1.3 Evaluation of color representation with state-of-the-art clustering algorithms . . . . 79
6.2 Role of Metrics in K-means . . . . . . . . . . . . . 83
6.2.1 Definition of some metrics, either distances or similarities . . . . . . . . . . . . . . . . . 83
6.2.2 Noteworthy properties of angle-based similarities and complementarity with the Euclidean distance . . . . . . . . . . . . . . . . 86
6.2.3 Evaluation of several metrics . . . . . . . . 88
6.3 SMC - Selective Metric Clustering for Text Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.1 Color reduction and color inversion . . . . . 92
6.3.2 Utilization of a multi-hypothesis text extraction . . . . . . . . . . . . . . . . . . . . 94
6.3.3 Extraction-by-segmentation . . . . . . . . . 96
6.3.4 SMC evaluation and results . . . . . . . . . 98
6.4 Conclusion of the Selective Metric Clustering Technique . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 Unit-based Segmentation 103
7.1 Line and Word Segmentation . . . . . . . . . . . . 103
7.1.1 Line segmentation . . . . . . . . . . . . . . 104
7.1.2 Word segmentation . . . . . . . . . . . . . . 105
7.2 Character Segmentation using Log-Gabor Filters . 106
7.2.1 Is character segmentation still useful? . . . 106
7.2.2 Why are log-Gabor filters appropriate for NS character segmentation? . . . . . . . . . 109
7.2.3 Character segmentation-by-recognition . . . 112
7.2.4 Evaluation . . . . . . . . . . . . . . . . . . 118
7.3 Conclusion of the Log-Gabor-based Character Segmentation . . . . . . . . . . . . . . . . . . . . . . . 121
8 Considerations on NS Character Recognition and Correction 123
8.1 NS Character Recognition . . . . . . . . . . . . . . 123
8.1.1 What is done in NS character recognition? 123
8.1.2 Description of the exploited recognition system . . . . . . . . . . . . . . . . . . . . . . 125
8.1.3 Conclusion on considerations of character recognition . . . . . . . . . . . . . . . . . . 131
8.2 Recognition-by-Correction . . . . . . . . . . . . . . 131
8.2.1 Context of OCR correction . . . . . . . . . 131
8.2.2 Lexicon-based non-word error correction . . 134
8.2.3 Evaluation . . . . . . . . . . . . . . . . . . 137
8.2.4 Conclusion on recognition-by-correction . . 141
8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . 142
9 Conclusion 143
9.1 Conclusions and Contributions . . . . . . . . . . . 143
9.2 Interesting Prolongations and Discussion . . . . . . 147
A Color Spaces Conversion 165
B Expectation-Maximization 173