IPhone : Tesseract OCR for Tamil not working in swift

on Tuesday, March 31, 2015

I was using tesseract for English and Tamil. Its working perfectly in Swift for English. I can see that in tessdata folder there are lot of files for English, files like enlgish.cube.nn, english.cube.lm .. etc. But i could not find files like that for Tamil Language. All i have is this tam.traineddata file. I downloaded all files from Google Code. All files are upto date. There are some application in appStore which extracts the tamil text from image. I have no idea how people are doing this.


When i pass Tamil language text contained image to Tesseract i get errors like, there no files like tam.cube.lm, tam.cube.size..etc. I searched a lot in internet, but i could not find files for Tamil.


Please help me out here, where i can find these files.?


Code given below ->



import UIKit

protocol ValueFromTesseractProtocol
{
func textRecognizedFromImage(text : String, booleanValue : Bool)
}

class TesseractModel: NSObject
{
var delegate : ValueFromTesseractProtocol!

//MARK: - Creating sharedInstance

class var sharedInstance: TesseractModel {

struct Static {

static var sharedInstance: TesseractModel?
static var token: dispatch_once_t = 0
}

dispatch_once(&Static.token) {
Static.sharedInstance = TesseractModel()
}

return Static.sharedInstance!
}

//MARK: - imageRecognition

func imageRecognition(image : UIImage)
{
let tesseract = G8Tesseract()
tesseract.language = "eng+tam"
tesseract.engineMode = G8OCREngineMode.CubeOnly
tesseract.maximumRecognitionTime = 60.0
tesseract.pageSegmentationMode = G8PageSegmentationMode.Auto
tesseract.image = image.g8_blackAndWhite()
tesseract.recognize()

if let recognizedText = tesseract.recognizedText
{
// Call delegate - Pass value

self.delegate.textRecognizedFromImage(recognizedText, booleanValue: true)
}
else
{
// Call delegate - Nil Value
self.delegate.textRecognizedFromImage("", booleanValue: false)

}
}
}

0 comments:

Post a Comment