Follow

Q: How to troubleshoot poor OCR quality

Question: I’ve been trying to conduct a search in my production database for a document "Sample Summary”. I know the OCR is crappy so I wanted to do it as a fuzzy search. I tried running the search “Sample Summary” but that didn’t bring up everything. Any thoughts on how to conduct this?

Answer: Unfortunately, due to the poor quality of the OCR on some documents, the system is not pulling up all instances of the phrase 'sample summary'. It seems to have no problems pulling up the 2 words separately, however. Try the following options:

  1. Search for 'sample' or 'summary' and review the results for instances of 'sample summary'.
  2. Narrow the search by adding another search parameter. For example, if it is known that any documents containing 'sample summary' should be associated with the custodian John Doe, that info can be added to the search syntax as follows:

    custodian("Andy Zipper") & "Sample Summary"
    OR
    custodian("Andy Zipper") & "Sample"
    OR
    custodian("Andy Zipper") & "Summary"

As a professional service, we can attempt to extract better quality text by re-OCRing the documents. Contact support@csdisco.com for an estimate.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments