OCR

Accessing the OCR System
Principles
Limitations
Live Video OCR
Still Image Recognition
1. Scroll Recognition Area

Accessing the OCR System

The Pleco Optical Character Recognizer system is a paid add-on module; if purchased, you can access it through the “Live Video” and “Still Image” options in the sidebar menu. There’s also a demo version available - no limit on lookups but it only gives you Pinyin and no definition. You can download the demo or purchase the paid version through “Add-ons,” or buy it directly from our online store.

We strongly recommend that you try out that demo version before purchasing this module; this is a relatively new feature, not just for us but for Chinese dictionary software in general, and while we’re working hard to improve it, there are a number of limitations and you may not find that it works well enough to be usable for you yet.

Principles

Much like our handwriting recognizer, our OCR system works by matching characters to templates in a database; it turns the image of the character into a simple mathematical structure, identifies its key features (lengths / positions / curvatures of strokes, etc), then searches through its database of 10,000+ Chinese characters to find the one that most closely matches that pattern.

However, while the handwriting recognizer always has a very clear picture of the character you drew - it knows exactly where every stroke is located, where it starts / ends, what order strokes were drawn in, where it overlaps other strokes - the OCR system has to contend with a much murkier one; characters on a camera image can be small, grainy, and out-of-focus, and the same calligraphic flourishes that make printed Chinese text so pretty to look at also make it harder to see the underlying structure of each character.

OCR is also up against some psychological hurdles compared to handwriting input; while a mis-recognized handwritten character can be chalked up to one’s poor handwriting / incorrect stroke order, with a printed character there’s nobody to blame but the recognition software. On top of which, because OCR must recognize multiple characters at a time, there’s less of an opportunity for it to show you its other, less likely matches like the handwriting recognizer does. Handling lots of characters at once also means that even if gets a higher percentage of them accurate on the first try, if just a few of those are incorrect it’ll still feel as if it got the entire block of text wrong. So while handwriting only has to contend with one character at a time, and can even be forgiven for getting that character wrong as long as the correct character is among its top 5 matches, OCR has to deal with multiple characters and get every one of them exactly correct in order to seem like it’s doing its job.

(this is all a convoluted way of asking you to be patient if things don’t work perfectly every time; we’re steadily working to bring this even closer to character recognizer perfection, but in the meantime we hope you’ll find it accurate enough to be useful in its current form)

Limitations

Here are some specific limitations to keep in mind when using our OCR system:

Printed text only - the templates which our system matches characters to are based on common printed Chinese fonts rather than on handwritten characters. Very neat handwriting might occasionally work, but officially only printed text is supported.
Limited character set - our system recognizes a total of 6,763 simplified Chinese characters and 5,401 traditional ones (for Chinese computing geeks, it’s all of the characters in the GB–2312 standard and the characters from the commonly-used half of the Big5 standard), so some rare characters may not be recognized simply because they’re not in the database. Every new character we support is another character that might potentially result in a false positive match, so we have to keep the numbers limited for the sake of accuracy.
Background interference - our system has a very hard time distinguishing Chinese characters from other things it sees - background images / patterns, intense shadows / bright spots, even simple rectangular borders around signs and such can all create problems. Resizing the recognition area to include only characters and leave out any extraneous lines / patterns / etc can help with borders, but there may be some types of text (white characters with black outlines against a light-colored background, e.g.) that you simply can’t get it to recognize reliably.
Jitter - at the moment, the system can feel very “jittery,” frequently changing which characters it thinks it sees even when you’re pointing the camera at the same text. Turning on the “Hide unused chars” option in Settings can make things feel a bit smoother (though it doesn’t actually change the algorithm), and increasing the “Word detect samples” value in Settings can make the dictionary definition change less often at least; it can also help to turn on the built-in flash (on phones that have one) or simply turn on a nearby lamp, as this tends to make the camera see images more clearly and with less background noise. The history function also helps if you find that the definition changes before you have a chance to finish reading it.

In most cases, our motion detection system will lock onto the fact that you’re staring at the same text and pause the recognizer until you move the camera, thus eliminating jitter; failing that, [still image][] OCR avoids the problem altogether. If neither of these solutions is satisfactory, you might want to consider a setup where your phone remains stationary and only the text you’re recognizing moves; for example, clipping your phone to a table and sliding a book around underneath it. In our testing at least, we’ve found that this produces much better results in moving vehicles and other shake-intensive situations (as any shaking that does occur affects the phone and the book equally), and if you employ the built-in zoom function, you can easily position the phone far enough away from the text to allow you to see every corner of the page without having to move the phone.

Another cure for jitter is to enable “Pulse Mode” in Settings / OCR / Live Video - this will turn the pause button into a “capture” button, so that the recognizer will only run when that button is pressed.

It may also be worth looking into purchasing an add-on macro lens (do an internet search for “Android macro lens” and you should find a bunch of places selling them) - these allow you to hold your device considerably closer to the text you’re reading than you can with the normal built-in lens, which reduces the impact of your hands shaking and might also allow you to rest them right on a book.
Focus - this one’s actually more of an cell phone camera problem than an OCR problem. At close distances, most small camera lenses have a very poor depth-of-field - in other words, the range of distances at which objects will be in focus is quite small - so even if you move your phone just a little bit farther from / closer to the page you may find that it quickly gets out-of-focus. If you’ve enabled continuous autofocus, most of the time it’ll refocus automatically after a few seconds, but if it doesn’t, tap on the “focus” button at the bottom left corner of the screen to manually re-focus.

Some newer phones include a macro focus option, which can improve this situation a bit; if enabled, it will be turned on by default but you can tap on the macro button at the corner of your phone’s screen to toggle it off for viewing farther-away objects.
Line spanning - in English and other alphabetic languages, each word is generally entirely on one line of text; only very rarely do words wrap around (with a hyphen) to the next line. In Chinese, however, every line of text generally has the exact same number of characters on it, and so you routinely encounter words that start on one line and end on the next; e.g. the first character of the word is the last character in a particular line of text, and the second character of the word is the first character on the next line. This means that in order to look up that word with our OCR system, you need to point it at both halves of the word separately and combine them, which you can do through the span lines command; slightly annoying, but there’s nothing we can do to really “fix” it since it’s inherent in the nature of Chinese text.

Live Video OCR

In the main Pleco dictionary screen, tap on the menu button at the top left corner of the screen and tap “Live OCR” to bring up the live OCR screen:

Lots of options here, but most of them are fairly straightforward:

camera Camera - bring up an additional toolbar (see below) with various options relating to camera capture.
history History - view a history of your recently looked up words.

Tapping on the camera button brings up a toolbar with additional options:

zoomout Zoom Out - reduce the camera magnification factor (compatible devices only).
zoomin Zoom In - increase the camera magnification factor (compatible devices only). This is strictly a digital zoom - there’s no magnifying lens in a smartphone camera, all we can do is blow up the image - so it usually only goes up to a maximum zoom factor of 4x; tap on the zoom in button once to go from 1x to 2x, then again to go to 4x.
macro Macro - put the camera into macro focus mode (which enhances its ability to focus on close-up objects). This is toggled on by default. (compatible devices only)
continuousaf Continuous Autofocus - enable continuous autofocusing, which should (usually) eliminate the need for the manual “focus” button by trying to keep the image continuously in focus.

flash Flash - toggle the built-in camera flash on and off. (compatible devices only) This works quite well to illuminate objects at close range (like most of the text you’re likely to be looking up with OCR), but it can also confuse the recognizer by making some parts of an image much brighter than others (so it’s difficult to see where the text is), so be careful to orient your phone in such a way that the characters you want to look at are evenly illuminated.
mode Mode - enables / disables crosshairs mode; with crosshairs mode active, instead of enclosing characters with a box you simply point at them with a + and Pleco automatically detects the line height / word boundary. Since this is a bit less reliable than the recognition box, though, we keep that active by default.
invert Invert - toggle between black-on-white and white-on-black text. Again, Pleco will normally auto-detect this based on the relative numbers of dark and light pixels in the image, but you can manually override it with this button if it gets it wrong (you’ll know because the recognized characters will have absolutely nothing to do with the text you’re looking at).
orient Orientation - toggle between horizontal (left-to-right) and vertical (top-to-bottom) text. Pleco will normally detect this automatically based on how you resize the Recognition Area, but you can use this button to manually override it.

Middle

Recognition Area - the OCR system will look up characters inside of this box. Drag any corner to resize it. Characters that are part of the recognized word will be shown in blue, other characters in green. See below for more information.
Switch Dictionary - tap on this button to view a definition for the same word in a different dictionary. This selection is “sticky,” so the OCR system will default to the newly-selected dictionary for later word lookups as well. (the standard previous / next entry buttons will appear below this button when more than one matching entry is found in the same dictionary)

Bottom Bar

focus Focus - tap on this button to re-focus the phone’s camera if the image gets blurry / out-of-focus.
pause Pause - tap on this button to pause recognition and freeze on the current word; this brings up a few additional options in place of the focus/span buttons.
span Span Lines - “lock in” the first half of a word that wraps around to the next line of text; point the camera at that first half, tap on this button, then point it at the second half to see a definition for the combined word.

Basic Operation

To look up a word, point your phone’s camera at the word you want to look up and square up that word within the recognition area. It’s OK if there are additional characters in the recognition area too; just make sure that the left edge of the recognition area is lined up with the first character in the word, and (if there’s more than one line of text visible) that the top edge of the word is lined up with the top of the recognition area.

The OCR system will show you every character it recognizes within the recognition area in green, and once it’s confident enough in a particular couple of characters, it will “lock on” to those characters, show them in blue instead of green, and display their definition. If you point at a different set of characters it’ll quickly lock on to those instead, so you can scan along a whole line of text and read definitions as you go. (both the blue and green colors used can be changed in Settings)

Both horizontal and vertical text are supported; if the recognition area is resized to be vertical (significantly taller than it is wide), the orientation indicator at the top of the screen will change to indicate that Pleco is now recognizing text vertically. To pause the system and temporarily stop recognizing characters, tap on the pause button, or to combine characters from two different lines of text, tap on span lines. Tap on the history button (second from the right at the top of the screen) to scroll through the last few words recognized.

If you find that this is too jittery / difficult to control and prefer a system where the recognizer only updates when you press a button (instead of updating continuously until you pause it), enable “Pulse Mode” in Settings / OCR / Live Video.

Recognition Area

The recognition area is the bright green box in the center of the screen; it can be resized by dragging any of the four corners (which resize it symmetrically but don’t move it around - it always remains centered in the same spot). Pleco’s recognizer will only attempt to recognize characters within that area; it doesn’t look outside of it at all, so it won’t pick up a character that’s half-in, half-out (or at least won’t do so accurately).

It’s perfectly OK if the recognition area is longer than necessary for a particular word, as long as the word is aligned with the left side (or the top if you’re recognizing vertical text). In fact, it can even help with recognition accuracy - seeing more characters helps the system get a better picture of their typical size / darkness / etc - so it’s quite reasonable to resize it as large as it will go and just leave it that way all the time. Since it won’t look outside of the box, though, resizing it to just one character wide is an easy way to look up the meanings of individual characters by themselves, and can also help to avoid “cheating” if you’re looking at a word that you’re supposed to know; looking up one character of a forgotten word may give you a hint without revealing the whole word’s meaning / pronunciation.

If you find that characters are too small for the recognition area, try zooming in (though this can reduce accuracy), or just hold the phone closer to the text you want to recognize. If you find that the recognizer sometimes thinks a compound character like 林 is actually two characters (木木), it may be that it’s having a tough time detecting the size of the font; making the recognition area wider may help with this, or if you turn off the “Allow multiple lines” option in Settings (and make sure that the recognition area never stretches down to part of the next line of text) that should help also.

Pause Commands

Tap on the “pause” button at the bottom of the screen to stop recognizing characters and bring up this alternate toolbar at the bottom of the screen:

Tap on resume to start recognizing characters again. The other buttons let you do all sorts of useful things with the recognized text:

search Search - exit OCR and search for the recognized text in the dictionary.
copy Copy - copy the recognized text to the clipboard.
read Reader - pull up the recognized text in the document reader, from which you can save it to a text file.
share Share - share the recognized text with another app.
audio Audio - play audio for the recognized text; if text-to-speech is installed, you can tap-hold on this to play audio for the entire selection rather than just the highlighted word.
Go - bring up the current recognized dictionary entry in a separate screen.
Flashcard - add the recognized dictionary entry to flashcards.

While paused, you can also tap on any character in the recognition area to look up the word starting at that character, useful if you’ve captured more than one word at a time. You can tap on the first character of the currently selected word to shrink the length of the selection, useful for looking up the individual characters that make up words.

Also while paused, you can also tap-hold on a character to bring up a screen listing similar-looking characters:

Tap on the correct character to replace it in the recognized text. (this will be forgotten as soon as you un-pause the recognizer)

Span Lines

Often when reading Chinese you’ll encounter a word that starts on one line and ends on the next, much like a hyphenated word in English (though much more common). For example:

我要给阿Ｑ做正传，已
经不止一两年了。

已经 is a single word, but since it starts on one line and ends on another, there’s no way to simply point the recognizer at it and recognize the whole word.

Our solution to this is the conveniently-located “span lines” button. To use it, point the camera at the first part of the word (已 in the above example, 合 in the following screenshot), then tap on “span lines” - you’ll see that character / characters appear just above the recognition area, like this:

After that, point the camera at the second part of the word (经 in the example, 适 in the screenshot) to see the result for the entire word. Tap on the “span lines” button again (renamed to “cancel span”) to return to normal recognition.

Motion Detection

By default, Pleco will pause the recognizer when it detects that your device has stopped moving, and resume recognizing characters once you start moving again. This is the best solution we’ve come up with so far to the problem of OCR output being “jittery” and changing the recognized characters when users’ hands are shaking.

When the system detects that the device has stopped moving, it will change the color / thickness of the box surrounding the recognition area (slightly lighter by default) and stop updating the character display until you start moving again.

There are lots of settings to customize the threshold at which the system detects that the device has moved, so hopefully with a little tweaking you’ll be able to get it to consistently pause when you want it to pause and resume when you want it to resume.

Scan Flashcard List

While using live OCR, tap on the menu button at the top right corner of the screen and choose “Scan Flashcard List” to quickly scan a list of vocabulary words to add to your flashcard database. You’ll see a prompt asking you to select a category for your new flashcards; choose that category and you’ll be returned to our standard live OCR interface. However, in this mode, after pointing at the same word for a second or so (this interval can be changed in Settings, though it only applies if motion detection is turned off - otherwise it captures as soon as it detects a pause), you’ll hear a beep and the screen will flash a message telling you you’ve created a new flashcard. This is especially useful for digitizing a long list of words at the end of a textbook chapter - you can enter each word in a fraction of the time it would take to enter it manually.

Your new flashcard will be based on the currently-displayed dictionary definition; tapping on the Switch Dictionary button will change the dictionary used for the current and subsequent cards, though you can also go back and change their definitions later through Organize Flashcards.

One important Settings option specific to Capture Flashcards is “Unknown word handling.” With the default behavior, “Truncate,” the system will create a card based on the longest match it can find for the word in the recognition area; if it only matches the first character then it’ll only create a card for that character. However, if you change this option to “Create Custom,” you’ll be prompted to create a brand new custom flashcard instead, with the headword pre-populated with the recognized characters; this is especially useful for items like character names that aren’t likely to appear in a dictionary.

Still Image Recognition

Pleco’s OCR system also supports a still image recognition mode, much like a more conventional OCR system but optimized to work well with images from a camera (as opposed to images from a scanner) and to facilitate easily looking up unknown words.

The easiest way to access still image mode is to simply “Share” a photo from another app, like your device’s built-in photo gallery. In some apps (like Dropbox) you’ll have to “Export” the photo instead before it can be shared with Pleco OCR. You can also access it by choosing “Still OCR” from the sidebar menu; this will give you the choice of taking a picture, loading an image from the system image gallery or file, or reopening the last viewed image (in which case you should be returned to the same coordinates as before).

This works very similarly to the live video mode, but instead of pointing your phone’s camera at words, you drag / zoom a still image around to position the words in the recognition area:

Tap on the ‘photo’ icon in the top bar to open up a row of additional options:

The first two buttons here rotate the image left and right; the other two buttons control the text orientation and inversion (black-on-white or white-on-black), just as in Live OCR.

Most of the bottom bar icons are also identical to those from live mode, but there two new ones:

scrollrec Scroll Recognition Area - see below.
hidedefn Hide Definition - hide the definition area to make it easier to scroll around:

(there’s also a smaller version of spanlines Span Lines)

Also as in live mode, you can tap on a character to select it and view its definition, or tap-hold to correct it (choose another similar-looking character). You can tap on the first character of the currently selected word to shrink the selection, useful for focusing on a single character. The “Send to reader” menu command from Live Mode is also available to send captured text to our document reader.

Scroll Recognition Area

To recognize larger blocks of text at a time, as in a more traditional OCR system, tap on the “Scroll Recognition Area” button - second from the right at the bottom of the screen. This “attaches” the recognition area to the page so that it will stay in the same position relative to the image even as you scroll the image around, useful for capturing large blocks of text:

You can still zoom in / tap on a word to look it up, or choose “Send to reader” from the menu to view the whole block in our document reader. Tap on the “scroll recognition area” button again to exit this mode. (this is the streamlined Android equivalent to the “Block Recognizer” command in our iPhone software)

Back to Index

OCR

Table of Contents