PHP Classes

Another conversion returning blank

Recommend this page to a friend!

      PHP PDF to Text  >  All threads  >  Another conversion returning blank  >  (Un) Subscribe thread alerts  
Subject:Another conversion returning blank
Summary:Version 1.4.15 doesn't completely fix the issue
Messages:8
Author:rob webster
Date:2017-03-20 12:49:06
 

  1. Another conversion returning blank   Reply   Report abuse  
Picture of rob webster rob webster - 2017-03-20 12:49:06
Been watching a similar thread and updated (to Version: 1.4.15 Dated 2017/03/17). I'd had a PDF where most pages converted fine but at least one was omitted from the output text. That's now fine however I've got another batch of PDFs that still all come back blank. Link to an example file here: be9.uk/1871-census-a.pdf

Hope that too can be fixed...

  2. Re: Another conversion returning blank   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-03-20 20:43:23 - In reply to message 1 from rob webster
Hi Rob,

I'm glad that the previous version solved some of your problems.

Regarding file be9.uk/1871-census-a.pdf, it uses an LZW algorithm instead of ZIP-like one to compress page-drawing instructions.

I was awaiting for such a sample for months to implement LZW uncompression ! so I really thank you for sending me this sample.

I have implemented an enhancement with version 1.4.16 of my class, so now the text output should be correct.

Please feel free to contact me if you have any other issue or question.

With kind regards,
Christian.

  3. Re: Another conversion returning blank   Reply   Report abuse  
Picture of rob webster rob webster - 2017-03-27 09:26:30 - In reply to message 2 from Christian Vigh
Thanks for your rapid response and apologies for not acknowledging it sooner. I've now tried 1.4.16 and it's usually fine but on a couple of files it struggled. An example here: 1871-census---d.pdf with the results extracted.txt error_log.txt (in the same location: be9.uk/PDFextract)

  4. Re: Another conversion returning blank   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-03-27 10:04:15 - In reply to message 3 from rob webster
Hi Rob,

I've tried 1871-census---d.pdf with latest version 1.4.19 and the output is fine (I tried the local copy I kept on my hard drive, because be9.uk/PDFextract only displays me the file name - but maybe it also display an empty result from version 1.4.16 ?)

In versions 1.4.*, I've tried to fix issues that were encountered in some rare cases with particularly tricky pdf files. Unfortunately, I introduced some regressions. I've spent the last 4 days to run batches against more than 1000 pdf files and compare the results of both versions (the current and the previous). This is how I discovered those regressions.

Can you try again with version 1.4.19 ? Everything should be better now.

Of course, if this is not the case, please feel free to contact me !

Christian.

  5. Re: Another conversion returning blank   Reply   Report abuse  
Picture of rob webster rob webster - 2017-03-27 20:31:16 - In reply to message 4 from Christian Vigh
Sorry I didn't make myself clear enough, the files were visible at be9.uk/PDFextract/extracted.txt etc but for now I've renamed the index.php and enabled directory listing on be9.uk/PDFextract to make it easier to see everything.

You should now be able to see the pdf, the text output and the error log. (yes, getting a problem with 1.4.19 )

  6. Re: Another conversion returning blank   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-03-27 21:11:39 - In reply to message 5 from rob webster
Ok I got it ; I have been confused by the small difference in the filenames between the one I had and the one you provided (between "1871-census-a.pdf" and "1871-census---d.pdf").

I have implemented the LZW uncompression algorithm in version 1.4.16, which is not the most used one (I apparently have only two samples PDF files over the 2000 ones I already have).

Your PDF file contains one part that is compressed using this algorithm.

Before version 1.4.16, you were not able to see these contents because they were simply ignored (only a warning was issued).

Starting from version 1.4.16, you are able to see them. However, there seems to be a bug in my current implementation, and the decompression algorithm in some cases generates bad data.

I've added that to my to-do list and I will come back to you when a fix will be available.

With kind regards,
Christian.

  7. Re: Another conversion returning blank   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-04-01 12:46:43 - In reply to message 5 from rob webster
Hi Rob,

I have fixed the bug in my implementation of the LZW decompression algorithm. It is available with version 1.5.1.

It works fine with file "1871-census---d.pdf" : all the text contents are correctly displayed.

Of course, as usual, if you find another issue, please feel free to contact me !

With kind regards,
Christian.

  8. Re: Another conversion returning blank   Reply   Report abuse  
Picture of rob webster rob webster - 2017-04-02 11:13:00 - In reply to message 7 from Christian Vigh
Brilliant, I've tried it on more of that set of PDFs and looking good, I'll let you know if I manage to break anything again!

Regards
Rob