Querying
Text in Image
Retrieving
relevant images by a text attached to the image may not be sufficient. In the
other hand, the text contained in the image is more specific and informative.
For example, it is more likely to find the word restaurant in the image as a
label for describing the place rather than in the attached text.
Figure 1 restaurant word is found in
the image
The goal of
this project is to implement the idea of the article: “Image retrieval using
Textual Cues” that its main focus is on finding ways to search for query
text in a large collection of images and retrieving all occurrences of the
query text.
Recognizing
a text in image is not a solved problem. This article takes a large collection
of images that contain the same text with different position, viewpoints and
font style that could help with giving optimal results.
Implementation:
We
implemented querying characters in image so we can extend the idea to search
multiple characters (text) as well.
We used HOG
to identify every character and SVM for training positive and negative set of
images and detecting the queried character or text.
The idea is
to scan every image by sliding window. The sliding window contains blocks and
every block contains cells and every cell has number of fixed number of bins.
In the project we used a sliding window of size 160x96, a
block of size 16x16, a block stride 8x8, a cell of size 8x8 and the number of
orientation bins for every cell is 9.
We got 4 cells in every block, and 209 blocks in every window
calculated as following: (96/8 - 1)*(160/8 - 1) = 209.
Totally the HOG size for a window is: 209x4x9= 7524.
HOG example:
Figure 2 example for HOG image
SVM:
The goal is to create a file that contains data of all images
related to current positive character that will be used as input for SVM
trainer.
Every line in the file should have the following format:
<label>
<Index> : <value>
<Index> : <value> …
Using
hog.compute(img,
descriptors, Size( 1, 1 ), Size( 0, 0 )) to calculate descriptors for each image of a positive
character. Where:
-
Size(1,1) is the sliding-window’s stride and size.
-
Size(0,0) is the padding size. In this case it is zero because we chose
the sliding window size to be the same as image size so there will be no
margins.
Every vector from descriptors will be copied to SVM-trainer’s
input-file.
In order to train the positive characters we used
svm-train.exe from libsvm library.
The arguments we used:
1. Input file of
calculated HOGs of characters’ positive images.
2. -s svm_type :
C_SVC – it deals with imperfect separation of classes.
3. -t kernel_type :
linear
4. -c cost : value
between 0.001 – 10. We will show later that this parameter didn’t have
significant impact on the predicted results.
Results:
In figure 3 the marked regions are detected by SVM detector
as “D” letter. A wrong detection of “E”, “IJ” as “D” is because of the
different variations of positive images of “D”.
Figure 3: result of querying letter "D"
Figure 4 shows the images from positive set of “D” that could
match the combination “I J”.
Figure 4:
from left to right I J, I I
Links: