So, I have coded in Visual Basic using Tesseract OCR, but now I am stuck at a problem, that is to localize the texts.
I'll give the breakdown, to help you understand better, I get an RGB image, then binarize it (Pure Black/White) and now feed it to Tesseract OCR, inorder to make it compare my image with its libraries and give the Text Output.
The problem is that, since I am working with Real World images, like sign boards etc, not every image has same threshold, setting that aside, also every image has those PARTS around image which is the same as the text, making the Library to throw up unwanted characters in addition to the Text part. So, what I want is to be able to filter a certain part of the picture area, which isn't Text but has the same color (Black/White) causing these flaws.
Attachment 98515
This is my output, as of now, before feeding to Tesseract, what I need to do is, leaving out the region where "ANIMAL HOUSE" is, I wish to make ANY other white pixels as Black, say the Sky part and such. For this particular Image I can draw out a pattern or so, but if the image is another one, I will have the need to have some general procedure to remove any "minority colored pixels" and change them to the majority color, but leave the text part alone. I am so dried out thinking this out, could I get some help? Some ideas which I could try and implement?
If by any chance someone who had prior experience, say a License Plate Recog. Project could be of great help! I know that this isn't Image Processing side of the Forum, but is there some function which could do some process like this, probably with PictureBox? I doubt there is one, so I'd take no for an answer too. Just trying to make my application's accuracy be better, hope you'd throw some ideas, if something strikes you.
I'll give the breakdown, to help you understand better, I get an RGB image, then binarize it (Pure Black/White) and now feed it to Tesseract OCR, inorder to make it compare my image with its libraries and give the Text Output.
The problem is that, since I am working with Real World images, like sign boards etc, not every image has same threshold, setting that aside, also every image has those PARTS around image which is the same as the text, making the Library to throw up unwanted characters in addition to the Text part. So, what I want is to be able to filter a certain part of the picture area, which isn't Text but has the same color (Black/White) causing these flaws.
Attachment 98515
This is my output, as of now, before feeding to Tesseract, what I need to do is, leaving out the region where "ANIMAL HOUSE" is, I wish to make ANY other white pixels as Black, say the Sky part and such. For this particular Image I can draw out a pattern or so, but if the image is another one, I will have the need to have some general procedure to remove any "minority colored pixels" and change them to the majority color, but leave the text part alone. I am so dried out thinking this out, could I get some help? Some ideas which I could try and implement?
If by any chance someone who had prior experience, say a License Plate Recog. Project could be of great help! I know that this isn't Image Processing side of the Forum, but is there some function which could do some process like this, probably with PictureBox? I doubt there is one, so I'd take no for an answer too. Just trying to make my application's accuracy be better, hope you'd throw some ideas, if something strikes you.