Messala - 2011-08-26 20:19:18
Hi there.
It's the first time I'm dealing with PDF files. I took a quick read on the official reference, but I have not figured out EXACTLY how is specified the format of the text.
From what I understood, the text is divided into blocks (objects and streams) and each of these blocks have a kind of "header" where will be the formatting of content of block (when appropriate). Correct me if I'm wrong.
So, I tested some classes of PDF handlers, but almost all of them ignore the blocks especifications, extracting only the text. One, from Thomas Chester, give many options to extract others separated information from a PDF file beyond the text, but nothing that give me a trail to filter italic texts.
Could someone help me?
Thanks in advance.
[]'s