PHP Classes

Nice, but works only on some pdf documents, not all of them.

Recommend this page to a friend!

      PDF Text Extractor  >  All threads  >  Nice, but works only on some pdf...  >  (Un) Subscribe thread alerts  
Subject:Nice, but works only on some pdf...
Summary:Package rating comment
Messages:3
Author:Issam
Date:2010-08-18 15:30:38
Update:2013-05-05 12:26:10
 

Issam rated this package as follows:

Utility: Sufficient
Consistency: Good

  1. Nice, but works only on some pdf...   Reply   Report abuse  
Picture of Issam Issam - 2010-08-18 15:30:38
Nice, but works only on some pdf documents, not all of them.

Thanks.

  2. Re: Nice, but works only on some pdf...   Reply   Report abuse  
Picture of Juan Juan - 2010-09-12 23:52:14 - In reply to message 1 from Issam
Hi. How did you make for this to work? I've tried with many pdf docs but no luck at all. Thanks.

  3. Re: Nice, but works only on some pdf...   Reply   Report abuse  
Picture of Carsten Jensen Carsten Jensen - 2013-05-05 12:26:10 - In reply to message 2 from Juan
You don't mention if the PDF's you are trying to parse actually have text in them, or if the are only scanned images.

For scanned images you probably want to do OCR.

It's easy to see if the docs have text in them (they have gone through a pdf "printer" or saved directly as PDF) just Zoom in.. if the text starts to pixelate they are scanned. if the text still seems sharp it's text.

Using the Select Text in Acrobat can't be trusted for this test as it actually does OCR