Subject: | Nice, but works only on some pdf... |
Summary: | Package rating comment |
Messages: | 3 |
Author: | Issam |
Date: | 2010-08-18 15:30:38 |
Update: | 2013-05-05 12:26:10 |
|
|
|
Issam rated this package as follows:
Utility: | Sufficient |
Consistency: | Good |
|
Issam - 2010-08-18 15:30:38
Nice, but works only on some pdf documents, not all of them.
Thanks.
Juan - 2010-09-12 23:52:14 - In reply to message 1 from Issam
Hi. How did you make for this to work? I've tried with many pdf docs but no luck at all. Thanks.
Carsten Jensen - 2013-05-05 12:26:10 - In reply to message 2 from Juan
You don't mention if the PDF's you are trying to parse actually have text in them, or if the are only scanned images.
For scanned images you probably want to do OCR.
It's easy to see if the docs have text in them (they have gone through a pdf "printer" or saved directly as PDF) just Zoom in.. if the text starts to pixelate they are scanned. if the text still seems sharp it's text.
Using the Select Text in Acrobat can't be trusted for this test as it actually does OCR
|