Pleco for Android Instruction Manual : OCR

Optical Character Recognizer Reference

  1. Accessing the OCR System
  2. Introduction
  3. Live Video OCR
    1. Lookup Words
    2. Scan Flashcard List
    3. Send to Reader
  4. Still Image OCR
    1. Scroll Recognition Area

 

Accessing the OCR System

The Pleco Optical Character Recognizer system is a paid add-on module; if purchased, you can access it by tapping on your device's menu button and then tapping "OCR." There's also a demo version available - no limit on lookups but it only gives you Pinyin and no definition. You can download the demo or purchase the paid version through Add-ons, or buy it directly from our online store.

We strongly recommend that you try out that demo version before purchasing this module; this is a relatively new feature, not just for us but for Chinese dictionary software in general, and while we're working hard to improve it, there are a number of limitations and you may not find that it works well enough to be usable for you yet.

 

Introduction

Our new Optical Character Recognizer system is our first attempt to introduce a totally new way to look up unknown characters since the debut of our original Palm OS software way back in 2001. Rather than handwriting in unknown characters, or tediously looking them up in a radical index, using OCR you can simply point your phone's camera at a word to look it up instantly; you don't even need to tap a shutter button, you just line up the camera with the word and the definition appears instantly.

 

Principles

Much like our handwriting recognizer, our OCR system works by matching characters to templates in a database; it turns the image of the character into a simple mathematical structure, identifies its key features (lengths / positions / curvatures of strokes, etc), then searches through its database of 10,000+ Chinese characters to find the one that most closely matches that pattern.

However, while the handwriting recognizer always has a very clear picture of the character you drew - it knows exactly where every stroke is located, where it starts / ends, what order strokes were drawn in, where it overlaps other strokes - the OCR system has to contend with a much murkier one; characters on a camera image can be small, grainy, and out-of-focus, and the same calligraphic flourishes that make printed Chinese text so pretty to look at also make it harder to see the underlying structure of each character.

OCR is also up against some psychological hurdles compared to handwriting input; while a mis-recognized handwritten character can be chalked up to one's poor handwriting / incorrect stroke order, with a printed character there's nobody to blame but the recognition software. On top of which, because OCR must recognize multiple characters at a time, there's no opportunity for it to show you its other, less likely matches like the handwriting recognizer does. Handling lots of characters at once also means that even if gets a higher percentage of them accurate on the first try, if just a few of those are incorrect it'll still feel as if it got the entire block of text wrong. So while handwriting only has to contend with one character at a time, and can even be forgiven for getting that character wrong as long as the correct character is among its top 5 matches, OCR has to deal with multiple characters and get every one of them exactly correct in order to seem like it's doing its job.

(this is all a convoluted way of asking you to be patient if things don't work perfectly every time; we're new at this, we're working on all sorts of cool new image processing / analytical tools to bring this even closer to character recognizer perfection, but in the meantime we hope you'll find it accurate enough to be useful in its current form)

 

Limitations

Here are some specific limitations to keep in mind when using our OCR system:

  1. Printed text only - the templates which our system matches characters to are based on common printed Chinese fonts rather than on handwritten characters. Very neat handwriting might occasionally work, but officially only printed text is supported.
  2. Limited character set - our system recognizes a total of 6,763 simplified Chinese characters and 5,401 traditional ones (for Chinese computing geeks, it's all of the characters in the GB-2312 standard and the characters from the commonly-used half of the Big5 standard), so some rare characters may not be recognized simply because they're not in the database. Every new character we support is another character that might potentially result in a false positive match, so we have to keep the numbers limited for the sake of accuracy.
  3. Background interference - our system has a very hard time distinguishing Chinese characters from other things it sees - background images / patterns, intense shadows / bright spots, even simple rectangular borders around signs and such can all create problems. Resizing the recognition area to include only characters and leave out any extraneous lines / patterns / etc can help with borders, but there may be some types of text (white characters with black outlines against a light-colored background, e.g.) that you simply can't get it to recognize reliably.
  4. Jitter - at the moment, the system can feel very "jittery," frequently changing which characters it thinks it sees even when you're pointing the camera at the same text. Turning on the "Hide unused chars" option in Settings can make things feel a bit smoother (though it doesn't actually change the algorithm), and increasing the "Word detect samples" value in Settings can make the dictionary definition change less often at least; it can also help to turn on the built-in flash (on phones that have one) or simply turn on a nearby lamp, as this tends to make the camera see images more clearly and with less background noise. The history function also helps if you find that the definition changes before you have a chance to finish reading it.

    Our new motion detection is the best solution we've come up with to this problem so far, so if you experience "jitter" issues we strongly recommend you try that out; failing that, still image OCR avoids the problem altogether. If neither of these solutions is satisfactory, you might want to consider a setup where your phone remains stationary and only the text you're recognizing moves; for example, clipping your phone to a table and sliding a book around underneath it. In our testing at least, we've found that this produces much better results in moving vehicles and other shake-intensive situations (as any shaking that does occur affects the phone and the book equally), and if you employ the built-in zoom function, you can easily position the phone far enough away from the text to allow you to see every corner of the page without having to move the phone.

    Another cure for jitter is to enable "Pulse Mode" in Settings / OCR / Live Video - this will turn the pause button into a "capture" button, so that the recognizer will only run when that button is pressed.

    It may also be worth looking into purchasing an add-on macro lens (do an internet search for "Android macro lens" and you should find a bunch of places selling them) - these allow you to hold your device considerably closer to the text you're reading than you can with the normal built-in lens, which reduces the impact of your hands shaking and might also allow you to rest them right on a book.
  5. Focus - this one's actually more of an cell phone camera problem than an OCR problem. At close distances, most small camera lenses have a very poor depth-of-field - in other words, the range of distances at which objects will be in focus is quite small - so even if you move your phone just a little bit farther from / closer to the page you may find that it quickly gets out-of-focus. If you've enabled continuous autofocus, most of the time it'll refocus automatically after a few seconds, but if it doesn't, tap on the "focus" button at the bottom left corner of the screen to manually re-focus.

    Some newer phones include a macro focus option, which can improve this situation a bit; if enabled, it will be turned on by default but you can tap on the macro button at the corner of your phone's screen to toggle it off for viewing farther-away objects.
  6. Line spanning - in English and other alphabetic languages, each word is generally entirely on one line of text; only very rarely do words wrap around (with a hyphen) to the next line. In Chinese, however, every line of text generally has the exact same number of characters on it, and so you routinely encounter words that start on one line and end on the next; e.g. the first character of the word is the last character in a particular line of text, and the second character of the word is the first character on the next line. This means that in order to look up that word with our OCR system, you need to point it at both halves of the word separately and combine them, which you can do through the span lines command; slightly annoying, but there's nothing we can do to really "fix" it since it's inherent in the nature of Chinese text.

 

Live Video OCR

In the main Pleco dictinoary screen, press your device's menu button and press "OCR" to bring up the OCR system, which initially launches in live video mode on devices with built-in cameras. (if you prefer that it default to still-image mode, turn on "Default to still image" in Settings / OCR)

Lots of options here, but most of them are fairly straightforward:

Top Bar:

Tapping onthe camera button brings up a toolbar with additional options:

 

Middle:

 

Bottom Bar:

 

Basic Operation

To look up a word, point your phone's camera at the word you want to look up and square up that word within the recognition area. It's OK if there are additional characters in the recognition area too; just make sure that the left edge of the recognition area is lined up with the first character in the word, and (if there's more than one line of text visible) that the top edge of the word is lined up with the top of the recognition area.

The OCR system will show you every character it recognizes within the recognition area in green, and once it's confident enough in a particular couple of characters, it will "lock on" to those characters, show them in blue instead of green, and display their definition. If you point at a different set of characters it'll quickly lock on to those instead, so you can scan along a whole line of text and read definitions as you go. (both the blue and green colors used can be changed in Settings)

Both horizontal and vertical text are supported; if the recognition area is resized to be vertical (significantly taller than it is wide), the orientation indicator at the top of the screen will change to indicate that Pleco is now recognizing text vertically. To pause the system and temporarily stop recognizing characters, tap on the pause button, or to combine characters from two different lines of text, tap on span lines. Tap on the history button (second from the right at the top of the screen) to scroll through the last few words recognized.

If you find that this is too jittery / difficult to control and prefer a system where the recognizer only updates when you press a button (instead of updating continuously until you pause it), enable "Pulse Mode" in Settings / OCR / Live Video.

 

Recognition Area

The recognition area is the bright green box in the center of the screen; it can be resized by dragging any of the four corners (which resize it symmetrically but don't move it around - it always remains centered in the same spot). Pleco's recognizer will only attempt to recognize characters within that area; it doesn't look outside of it at all, so it won't pick up a character that's half-in, half-out (or at least won't do so accurately).

It's perfectly OK if the recognition area is longer than necessary for a particular word, as long as the word is aligned with the left side (or the top if you're recognizing vertical text). In fact, it can even help with recognition accuracy - seeing more characters helps the system get a better picture of their typical size / darkness / etc - so it's quite reasonable to resize it as large as it will go and just leave it that way all the time. Since it won't look outside of the box, though, resizing it to just one character wide is an easy way to look up the meanings of individual characters by themselves, and can also help to avoid "cheating" if you're looking at a word that you're supposed to know; looking up one character of a forgotten word may give you a hint without revealing the whole word's meaning / pronunciation.

If you find that characters are too small for the recognition area, try zooming in (though this can reduce accuracy), or just hold the phone closer to the text you want to recognize. If you find that the recognizer sometimes thinks a compound character like 林 is actually two characters (木木), it may be that it's having a tough time detecting the size of the font; making the recognition area wider may help with this, or if you turn off the "Allow multiple lines" option in Settings (and make sure that the recognition area never stretches down to part of the next line of text) that should help also.

 

Pause Commands

Tap on the "pause" button at the bottom of the screen to stop recognizing characters and bring up this alternate toolbar at the bottom of the screen:

Tap on resume to start recognizing characters again. The other buttons let you do all sorts of useful things with the recognized text:

While paused, you can also tap on any character in the recognition area to look up the word starting at that character, useful if you've captured more than one word at a time. You can tap on the first character of the currently selected word to shrink the length of the selection, useful for looking up the individual characters that make up words.

Also while paused, you can also tap-hold on a character to bring up a screen listing similar-looking characters:

Tap on the correct character to replace it in the recognized text. (this will be forgotten as soon as you un-pause the recognizer)

 

Span Lines

Often when reading Chinese you'll encounter a word that starts on one line and ends on the next, much like a hyphenated word in English (though much more common). For example:

我要给阿Q做正传,
不止一两年了。

已经 is a single word, but since it starts on one line and ends on another, there's no way to simply point the recognizer at it and recognize the whole word.

Our solution to this is the conveniently-located "span lines" button. To use it, point the camera at the first part of the word (已 in the above example, 合 in the following screenshot), then tap on "span lines" - you'll see that character / characters appear just above the recognition area, like this:

After that, point the camera at the second part of the word (经 in the example, 适 in the screenshot) to see the result for the entire word. Tap on the "span lines" button again (renamed to "cancel span") to return to normal recognition.

 

Motion Detection

Pleco includes an option to pause the recognizer when it detects that your device has stopped moving, and resume recognizing characters once you start moving again. This is the best solution we've come up with so far to the problem of OCR output being "jittery" and changing the recognized characters when users' hands are shaking.

You'll be prompted to turn on this option the first time you launch OCR; you can enable / disable it later in Settings / OCR / Live Video / Motion Detection / Enable. When the system detects that the device has stopped moving, it will change the color / thickness of the box surrounding the recognition area (slightly lighter by default) and stop updating the character display until you start moving again.

There are lots of settings to customize the threshold at which the system detects that the device has moved, so hopefully with a little tweaking you'll be able to get it to consistently pause when you want it to pause and resume when you want it to resume.

 

Scan Flashcard List

While using live OCR, press your device's menu button and choose "Scan Flashcard List" to quickly scan a list of vocabulary words to add to your flashcard database. You'll see a prompt asking you to select a category for your new flashcards; choose that category and you'll be returned to our standard live OCR interface. However, in this mode, after pointing at the same word for a second or so (this interval can be changed in Settings, though it only applies if motion detection is turned off - otherwise it captures as soon as it detects a pause), you'll hear a beep and the screen will flash a message telling you you've created a new flashcard. This is especially useful for digitizing a long list of words at the end of a textbook chapter - you can enter each word in a fraction of the time it would take to enter it manually.

Your new flashcard will be based on the currently-displayed dictionary definition; tapping on the Switch Dictionary button will change the dictionary used for the current and subsequent cards, though you can also go back and change their definitions later through Organize Flashcards.

One important Settings option specific to Capture Flashcards is "Unknown word handling." With the default behavior, "Truncate," the system will create a card based on the longest match it can find for the word in the recognition area; if it only matches the first character then it'll only create a card for that character. However, if you change this option to "Create Custom," you'll be prompted to create a brand new custom flashcard instead, with the headword prepopulated with the recognized characters; this is especially useful for items like character names that aren't likely to appear in a dictionary.

 

Still Image Recognition

Pleco's OCR system also supports a still image recognition mode, much like a more conventional OCR system but optimized to work well with images from a camera (as opposed to images from a scanner) and to facilitate easily looking up unknown words.

The easiest way to access still image mode is to simply "Share" a photo from another app, like your device's built-in photo gallery. In some apps (like Dropbox) you'll have to "Export" the photo instead before it can be shared with Pleco OCR.

Alternatively, to get to still image mode from within Pleco, open up live OCR, then press your device's menu button and choose "Load Picture" or "Take Picture." You can set this mode to come up by default in Settings / OCR.

This works very similarly to the live video mode, but instead of pointing your phone's camera at words, you drag / zoom a still image around to position the words in the recognition area:

The first two buttons in the top bar rotate the image left and right; the other three buttons to control the text orientation and inversion (black-on-white or white-on-black) and history, as in live mode.

Most of these icons are identical to those from live mode, but there two new ones:

Also as in live mode, you can tap on a character to select it and view its definition, or tap-hold to correct it (choose another similar-looking character). You can tap on the first character of the currently selected word to shrink the selection, useful for focusing on a single character. The "Send to reader" menu command from Live Mode is also available to send captured text to our document reader.

 

Scroll Recognition Area

To recognize larger blocks of text at a time, as in a more traditional OCR system, tap on the "Scroll Recognition Area" button - second from the right at the bottom of the screen. This "attaches" the recognition area to the page so that it will stay in the same position relative to the image even as you scroll the image around, useful for capturing large blocks of text:

You can still zoom in / tap on a word to look it up, or choose "Send to reader" from the menu to view the whole block in our document reader. Choose "Scroll rec area" again to exit this mode. (this is the streamlined Android equivalent to the "Block Recognizer" command in our iPhone software)

Return to Table of Contents