HI all
I am working on a project to extract tabular data from scanned PDF documents. I have read through the Aspose handbook and performed some text recognition with Aspose.OCR on several of the scanned pages that I have scanned, and the recognition seems reasonably accurate. The difficulty that I am having is retaining the tabular structure.
At this stage, only the text seems to be produced - the row/column distinction is not apparent, or barely there at all.
I came across this website:https://forum.aspose.app/t/word-with-Devops-tutorial-table-inside-convert-to-txt/67824 but still facing issue.
Has anyone here who works for or has used Aspose.OCR or any other components of Aspose had reliable success extracting tables from scanned pdfs? More specifically, for what I am trying to do, a formatted version (to preserve) for eventually programming (like CSV or JSON structured data). Are there parameter settings within Aspose.OCR or preprocessing settings you recommend?
Any and all recommendations are welcomed, including any strategy with other pieces as well as Aspose elements. I reiterate, accuracy is critical to my use case, so if you have any sample code, sucker is lucky @ 0.001% - I would be extremely appreciative.
thanks in advance for any assistance, advice.
Best
williamclark