Friday, October 14, 2011

Basic Video Processing

In this activity we try to obtain relevant scientific data from a video. We use a video of a free falling object to try to obtain the acceleration due to gravity g.



We obtain  and crop the relevant frames from the video:


To make the processing easier, we binarize the images:


Then using a center of mass algorithm, we reduce the objects into points and graphed their location with respect to time^2:


comparing the obtained equation to 0.5gt^2 + vt we can see that 0.5g = 4.6624, or g =9.3248m/s^2.

This was an easy activity so I give myself a 9/10


Pattern Recognition 3: Neural networks

Neural networking is a computational model on how neurons work. Like an actual brain, neural networks 'learn' the recognition rules used to perform an operation through examples. The larger the example size, the more accurate the operation. This technique is preferred than linear discriminant analysis due to having a faster processing speed after it has learned.

The basic mathematical construct of a neuron is shown below:
A neuron accepts weighted inputs and then sums them up. This sum is then acted on by an activation function g which outputs the new signal z.

By connecting many neurons together, we are able to create a neural network. A typical network consist of an input layer, hidden layer, and output layer:


By applying he neural network algorithm to my object classes, I was able to obtain a 37.5% recognition rate. the poor recognition rate may be due to the small sample training size used. Neural networks need a large training sample size in order to accurately process incoming information.

Since the code for neural network was already given to us and because I hardly understood what it does, I give myself a 6/10 for this activity.

Pattern Recognition 2

In this activity we discuss about the pattern recognition technique of linear discriminant analysis (LDA).

The purpose of Discriminant Analysis is to classify objects into one of two or more groups based on a set of features that describe the objects. In general, we assign an object to one of a number of predetermined groups based on observations made on the object.

To use LDA for pattern recognition we need to know the conditional probability P(i|x) that an object belong to group i. However this is often hard to obtain. What we can obtain however is the probability of getting certain feature, given that the object is from group i, this is P(x|i). The relation between the two probabilities is given by:

This equation however, is directly impractical since we need a large sample size to obtain the relative P(x|i) for each group. A more practical way is to assume the distribution and get the probability theoretically. This is where we get the LDA formula:

Again, we apply the above equation on the 1-peso, 5-peso, leaf and card classes. Again the characteristics I used were the RGB information and the area of each object. From these characteristics, I was able to obtain a 100% recognition rate for all classes. This may be due to the fact that most of the object look very much alike. Also I solved the problem of the areas of each object being of different orders of magnitude by normalizing them. This may have helped in the processing of the algorithm. This was an easy activity since the steps on what to do are already given so I give myself a 10/10.

Pattern recognition 1

Pattern recognition is an important aspect in the quality control of products in today's society. Therefore it is important to obtain accurate machine vision else subpar products may be produced. In the next few activities, we will discuss different techniques for pattern recognition.

In this activity, we discuss the use of minimum distance classification for pattern recognition.

If we define a representative of class ωj to be its mean feature vector then:

where xj is the set of ALL feature vectors in class ωj and Nj is the number of samples in class ωj. The 'closeness' of an object to the representative can then be defined by the euclidean distance:

In order to determine which class the object belongs to, we compute for the smallest distance:
And the object belongs to the class with the smallest distance.

We test this algorithm for 4 classes
1 peso coin

Leaf

Card

5 peso coin


The patterns were recognized to an accuracy of 75%. However what was weird was that all the leaves were considered as 1 peso coin and this was the source of my errors.

All in all this was an easy activity so I give myself a 8/10 since my algorithm only have 75% accuracy.





Color Image Segmentation

Normally we use thresholding in order to separate a specific object from its background. However when the object has the same gray-level value as the background, this can be be a problem. Instead we use the difference in color information in order to separate the background and foreground. 3-D images will have shading variations so the segmentaion needs to be done irregardless of the brightness of the color. This can be done using a color space that separates brightness and color information such as the normalized chromaticity coordinates:
 where:

Using this color space, it can be observed that R+G+B = 1 so we can have B = 1-R-G. with this we can have a 2-D color coordinate plus the intensity information which makes segmentation easier.

In this activity, we will use two techniques in order to segment an object from the background.

First we obtain an image containing our region of interest:


where our region of interest is:

The first of the two techniques is the parametric probability distribution estimation. First we obtain the mean and standard deviation of the R and G coordinates of the region of interest. and then plugging them into the equation:
where r is either the green or red color information of the whole image, then after thresholding, we obtain the image:

 The other technique is histogram back propagation. We create the histogram of the region of interest:
Then using this histogram to eliminate the colors not in the histogram we obtain:

If we look quality wise, parametric probability distribution estimation is better that histogram propagation. It is also easier to use.

This activity is not that hard and although I hate histogram manipulation, having the code in 1-D from the previous activity made it a lot easier to extend into 2-D. I give myself a 9/10 for this activity.


Image Compression


In this activity we will use principal component analysis (PCA) in order to compress an image.

First, we convert the image to grayscale as shown below:

The image is then cut up to 10x10 blocks and concatenated which is then fed to the PCA algorithm.

From this we get the eigenvalues of the image as shown below:


From this we reconstruct the image using different number of principal components:


It can be observed that the less principal components we use, the more the image is degraded. However by increasing the number of used eigenimages, we also increase the size of the image.


This activity was a bit confusing but I was able to do it in the end so I give myself a 9/10

Pre-processing Text


In this activity, we will use what we have previously learned in order to extract handwritten text from an image full of lines like the one shown below.

To make it less complicated we used this part for test extraction:

First we transformed the image into a binary image:

Then, to make it easier to modify, we invert the pixel values:

In order to remove the line, we use binary closing using a straight line as a structuring element:

It can be observed that the characters for D and E are readable however the characters for M and O are fragmented to the point that they may be considered as different characters. However if used with a powerful pattern recognition algorithm it may be possible to detect the letters correctly.

Another point of this activity is to try to recognize text patterns from the image. We try to find multiple instances of the word “description” throughout the whole image using a sample image of the word.
This is the sample I used because of the prerequisite of imcorrcoef() of using a square image

Using imcorrcoef() with the sample to obtain the image below:

Then converting it to binary we obtain:

It can be observed that the algorithm was able to locate all the instances of the word “Description” in the image. This was a relatively easy activity except for using the mogrify function which I wasn’t able to use properly. Therefore I give myself an 8/10 for this activity.

Monday, September 12, 2011

Binary operation

Morphological operation like erosion and dilation are very useful for for image isolation. However these types of treatments often disfigure the image that it becomes of little use for further processing. Binary operations were based on these basic functions were developed to compensate for this flaw.

Opening is where one erodes the image then dilates it using the same structuring element. What it does is is that it removes all white regions whose dimensions don't fit the structuring element. If we consider the regions  with a value of zero as holes, the operator would 'open' those holes, hence the name.

Closing on the other hand is an erosion followed by a dilation using the same structuring element. It removes black regions smaller than the structuring element. It is like the closing of the holes in an image.

For this activity, we will use these operators to estimate the area of simulated cells. We will also try to isolate  cancer cells inserted with the normal cells.

First we cut up the image into 12 256 x 256 pixel images. this is to reduce the burden of the program to the computer. Then we convert them into binary images as shown below.


As we can see the images formed are very noisy. This can be fixed by opening the image using a circle that is a bit smaller than the cells as a structuring element. The images produced is shown below.


We then recombine these images back into one image:



We can see that although the noise has been dealt with, many of the cells are clumped together which may cause our average to have large error. To obtain the area we use bwlabel to index each structure and hen created a program to count for each pixel in each structure. the value we obtained was 710.28 ± 522.88 pixels. As expected, the standard deviation of the area was large.

Using the average we obtained, we will try to isolate the simulated cancer cells from the image below.


Using a structuring element that is a bit larger than the radius of the original cells we obtain the image below:



As you can see, I was able to isolate the circles larger than the original cells(cancer cells). Since this was an easy activity, I give myself a 10/10

Monday, September 5, 2011

Morphological Operations

Morphological operations are post processing done to binary images in order to extract information or remove unnecessary structures. In this activity, we discuss the different types of morphological operations that can be done in scilab.


Dilation

This morphological operation is a treatment that increases the surface area of the areas that have a value of 1. The dilation between set A and B is defined as the set of all z's which are translations of a reflected B that when intersected with A is not an empty set. That sound pretty complicated, and to my understanding, it means to use B to increase the surface area of A as seen below:



The red regions are the parts of A that overlap with B, while the yellow shaded region is the dilated part of the original structure. Note, however that this is only my understanding of the use of dilation. When I wrote a code for dilation what happened is this:

We can see that although it was correct for the most part, there have been some errors in my predictions. The most obvious error is when using either the cross or diagonal structuring element. In my predictions the corners of the dilated image would be chipped like a sawtooth. Instead what I obtain is a corner with only 1 pixel removed.


Erosion

Like an opposite to dilation erosion is used to decrease the the area of the regions with the value of 1. The erosion of sets A and B is defined as the set of all points z such that B translated by z is
contained in A. Erosion is designed such that the reduction in the area of A is defined by the structuring element B.

Unlike for dilation which I just covered the edge of the image with the structuring element, for erosion I chose an origin for the structuring element. This origin is then placed on each pixel of the image and if the whole structuring element is not inside the image, that pixel will be equal to zero. We would then obtain an eroded image as shown below:


Where the red regions are the eroded parts of the image. Writing the code for erosion we obtain these images:

Again, most of my predictions were correct however, if we look at the eroded images for the annulus and cross with the diagonal structuring element. the program obtained a different image. This may be due to my arbitrary assignment of the origin.

Thin/Skel

a) Original Image, b)Skel, c) Thin
Thin/ Skel are other morphological transforms that one can use using the SIP Tool in scilab.The Skel function "skeletonizes" the image by creating a hypothetical frame for the image. The thin function like its name says, thins the image. This is done eroding the borders of the image.

All in all it was an easy activity, however, I found drawing my predictions (the original plan) quite a hassle so I used paint for my predictions. For this activity I give myself an 8/10

Thursday, September 1, 2011

Enhancement by Histogram manipulation

I HATE THIS ACTIVITY!!There are two reasons why I'm late in posting my activities for 186, one is the submission for SPP, and the other is this activity. It took me only now to make the code for this work so I was only able to post this now.


Manipulation of an image’s histogram is one of the ways in which we can improve the quality of an image by enhancing some features of the image which are not normally seen with the naked eye. This is done by back projection using the cumulative distribution function (CDF) of the image. This back projection is shown below:
By doing so, we can enhance the image in such a way that under and overexposed parts of the image would be normalized thus increasing the amount of detail in the picture
Take Figure below for example. It is an underexposed picture of a seaside restaurant with its CDF shown:

Now we take an ideal, in this case linear CDF and use it  to enhance the image like so:
We can see that the CDF of the fixed image has the same form as the ideal CDF that we used. However note that the image brightened a little. This may be attributed to the fact that the original image was saved using a .jpg format. We learned from previous activities that .jpg has a lossy compression therefore there is little information left in the dark regions of the picture. Even with histogram manipulation, we cannot recover information that is not present anymore.
Also note that the human senses are generally non-linear so I tried different CDF’s with result shown below:
Finally I think creating a code for histogram manipulation is quite a hassle, thankfully some graphics manipulation software have already incorporated this technique in their programs. One of those programs is Gimp:


It is actually quite annoying to use histogram manipulation in Gimp. It was so easy that it made me think ’what the hell was I writing all those codes for… ’
All in all, I think this activity was quite the hassle, back propagation might seem simple when looking at Fig. 1 but when actually applied to code, it was quite confusing. My pride on not asking others for help also didn’t help. That’s the reason I’m posting this only now. Still I give myself a 6/10 for this activity, at least I finished it… >.<






Monday, July 18, 2011

Properties of the 2D Fourier Transform


For this activity, we investigated the properties of a 2D Fourier transform (FT) of an image. For the first part of the activity, we produced different images and obtained their FTs.  As we can see from figure 1, a straight edge (i. e. square) produces a line in the FT perpendicular to the edge of the image. An annulus also produce ring in its FT and the FT of an annulus with a straight edge has broken line perpendicular to the image’s edge. The FT of a double slit is a single straight slit along the horizontal. The FT of a double pinhole along the x- axis created a series of slits of different widths and spacing. 

For the second part of the activity, we simulated a sinusoid and obtained its FT. what we obtained is an image with two pinholes at the y-axis a few pixels apart.


However if we increase the frequency of the sinusoid, we can see in the FT that the pinholes moved further apart. This is because the pinholes are the spatial frequencies of the sinusoid. They are in the y-axis because the sinusoid propagates through the y-axis. The reason they move apart is that the center on the FT is equal to a DC signal or a frequency of zero. As we move further from the center, the frequency would increase so higher frequency structures can be found further from the center.



Rotating the sinusoid also causes a rotation in its FT. also like before the structures in the FT align to where the sinusoid is propagating.


If we take the FT of two superimposed sinusoids, we would obtain something like four dots in a corner of a square. I wasn’t actually expecting this since I thought that the FT would look like a cross at the center.


By superimposing another sinusoid thing get even weirder. My prediction is that upon adding another sinusoid another pair of dot would appear in the FT. Instead, there became 8 dots that are shifted from the center. 




All in all it wasn’t a particularly hard activity so I would give myself an 8/10