My 186 blog: Pre-processing Text

In this activity, we will use what we have previously learned in order to extract handwritten text from an image full of lines like the one shown below.

To make it less complicated we used this part for test extraction:

First we transformed the image into a binary image:

Then, to make it easier to modify, we invert the pixel values:

In order to remove the line, we use binary closing using a straight line as a structuring element:

It can be observed that the characters for D and E are readable however the characters for M and O are fragmented to the point that they may be considered as different characters. However if used with a powerful pattern recognition algorithm it may be possible to detect the letters correctly.

Another point of this activity is to try to recognize text patterns from the image. We try to find multiple instances of the word “description” throughout the whole image using a sample image of the word.

This is the sample I used because of the prerequisite of imcorrcoef() of using a square image

Using imcorrcoef() with the sample to obtain the image below:

Then converting it to binary we obtain:

It can be observed that the algorithm was able to locate all the instances of the word “Description” in the image. This was a relatively easy activity except for using the mogrify function which I wasn’t able to use properly. Therefore I give myself an 8/10 for this activity.

My 186 blog

Friday, October 14, 2011

Pre-processing Text

No comments:

Post a Comment