PHP Classes

Blank output for some pdf

Recommend this page to a friend!

      PHP PDF to Text  >  All threads  >  Blank output for some pdf  >  (Un) Subscribe thread alerts  
Subject:Blank output for some pdf
Summary:Some pdf documents are not converted
Messages:5
Author:Rudolfo Toscano
Date:2017-05-09 14:45:45
 

  1. Blank output for some pdf   Reply   Report abuse  
Picture of Rudolfo Toscano Rudolfo Toscano - 2017-05-09 14:45:46
First of all, thank you for your great script. Unfortunately I met some pdf documents, which still are not converted.
Examples:
shipmentlink.com/tw/tvs2/local_file ...
and
shipmentlink.com/tw/tvs2/local_file ...
On the other hand (from the same URL) the following pdf is converted correctly:
shipmentlink.com/tw/tbo1/form/expor ...

Could you please ave a look at this.
Thank you

  2. Re: Blank output for some pdf   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-05-09 15:13:41 - In reply to message 1 from Rudolfo Toscano
Hi Rudolfo,

PDF files can have two types of passwords :
- The User password, which will be prompted when you try to open the document
- The Owner password, which will be prompted when you try to modify something in the original PDF file. It can also define flags that prevent you from printing, or from copying/pasting information.

Unfortunately, in both cases, the text contents are encrypted. Of course, there is an algorithm to decrypt them (especially when there is only an Owner password, which should not theorically prevent text extraction). However, I did not finish the implementation and to tell the truth, I'm a little bit scratching my head at it.

But I'm confident that a future version of PdfToText will be able to handle soon such situations : this is on my top 3 BIG priorities...

With kind regards,
Christian.

  3. Re: Blank output for some pdf   Reply   Report abuse  
Picture of Rudolfo Toscano Rudolfo Toscano - 2017-05-10 09:45:56 - In reply to message 1 from Rudolfo Toscano
Thank you for your immediate response. As you are the specialist, place let me place an additional question:
Why does your converter produce a blank out, but all pdf reader are able to open such protected documents?
Thank you for your explanation.

  4. Re: Blank output for some pdf   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2017-05-10 10:08:46 - In reply to message 3 from Rudolfo Toscano
Because I did not yet implement the decryption algorithm for that. I think that even if I get inspiration from Unix tools such as xpdf or poppler, or even TCPDF in PHP, I'll need at least one week for that, not counting the various encryption algorithm and revisions that spread the world of PDF !

It's not a problem of complexity : virtually, you don't even have to know the original password to decrypt information (Adobe did not use complex cryptographic agorithms ; it encrypts information by applying a sequence of transformations that should discourage the amateur from decrypting the contents).

So my real problem is to find at least 7 consecutive days on my free time to work on that.

But I know that being able to handle PDF files having an Owner password is a must-have ! so I hope I will have a solution soon...


  5. Re: Blank output for some pdf   Reply   Report abuse  
Picture of Rudolfo Toscano Rudolfo Toscano - 2017-06-23 10:20:13 - In reply to message 4 from Christian Vigh
Hello again,

Don't want to stress you, but could we still hope for your solution?
Thank you in advance for your endeavor.