Author: Dave Smith
Posted on: 2016-04-19
Package: HTML to PDF Webkit
Converting HTML to PDF used to be a fairly simple task when HTML was simpler. With new standards based on HTML5, CSS3 and JavaScript, getting the result we want to generate printable documents in PDF from Web applications has become more complicated.
Read this article to learn about a simpler solution based on Web services that takes the complication out of your PHP application.
Introduction
The Past
The Present
The Future
Conclusion
Introduction
There are libraries that can generate PDF documents by composing the page output programmatically. However this is a painful process because you need to program the PDF document output using specific PHP code. A solution that would consume less time from the developer is to generate HTML and convert it to PDF with some library or service.
Our expectations are that a PDF document generated from HTML markup be a true representation of what a surfer would see when browsing that web page. We should not have to settle for something 'similar' or close to the original.
This is where the HTML to PDF Webkit package comes very handy. It uses the latest, up to date conversion techniques provide by the pdfLayer API Web service so you do not have to rely on older libraries or services that have not updated to the latest HTML related technologies.
The Past
It wasn't that long ago that delivering Web pages to a browser using the HTML markup language was really simple. Which also meant that converting them to the PDF file standards and achieving WYSIWYG (what you see is what you get) was not very complicated either.
There where plenty of libraries or Web services available for you to choose from for you development project. Times where good.
Over time, however, new standards where introduced into HTML to provide a better, more dynamic user experience. Manipulating page content through JavaScript and creating rich displays using CSS became more and more popular.
The libraries and services quickly became outdated and fell into the dreaded legacy category. They where still able to produce PDF files, however they quickly fell short of achieving WYSIWYG.
The Present
The solution to this problem was to have a browser library which was compliant to the new Web standards. This way it could do the work of interpreting the delivered content exactly as most browsers surfing the Web would display it.
I say 'most' since we are all probably aware that a certain company does not follow the standards with their browsers, creating a lot of headaches for Web designers, however that is an entirely different discussion.
Along came Webkit, a HTML5 and CSS3 compliant browser library which saved the day. Not only is it currently compliant, there is every reason to believe it will remain current as new standards are introduced.
The HTML to PDF Webkit package uses the pdfLayer service which uses webkit to deliver conversions as expected. To achieve this, there is more to consider than just grabbing the content. Web pages are much smarter and more dynamic now than they have ever been and conversions must account for this.
A Web page can have a timeline to animate the delivered content. Using the 'delay' setting of this package will allow you to set the milliseconds the service should wait before considering the content ready for conversion, allowing any animation or other tasks to complete.
A Web page can deliver different content depending on the browser and device being used. You would use the 'user_agent' setting to emulate a specific browser and the 'viewport' setting to indicate a device screen size.
A Web page can be localized, delivering different content based on the surfers preferred language. You would use the 'accept_lang' setting to pass the language code for the localized content you want.
A Web page may deliver forms for user input. The PDF standard also supports forms, so wouldn't it be nice to be able to generate an interactive document? It is as simple as using the 'forms' setting to turn on this feature.
There are also times where you will want to do more than just provide a copy of a Web page as a PDF document. You may want to change its formatting, set a specific page size for printing, etc...
You can set the PDF document to any standard page size using the 'page_size' setting or specify your own size using the 'page_width' and 'page_height' settings.
You can set up margins using the 'margin_top', 'margin_left', 'margin_bottom' and 'margin_right' settings.
Add your own headers and footers to the pages. You can specify your own text using the 'header_text' and 'footer_text' settings with standard replacement tags to display the current page, total pages, etc...
Or you can specify HTML that will be inserted by posting it using 'header_html' and 'footer_html' form field names. You can even specify markup located on the Web to be converted and inserted using the 'header_url' and 'footer_url' settings to point to it.
Change the display by providing your own CSS to over-ride the default delivered content. Maybe you want to change the background color, you would use the 'css_url' setting to point to the location on the web where the css can be found.
Add your own stamps and watermarks to the document. A stamp is content in the foreground and a watermark is content in the background.
It is important to note here that content in the background will not be seen if the Web page does not have a transparent background. To generate a stamp you would use the 'watermark_url' setting to point to the image on the Web to use. To turn it into a watermark, you would set the 'watermark_in_background' setting.
The PDF document itself has many standards which you may want to make use of, like setting meta data, permissions or encryption. I would recommend reading the reference.txt file that accompanies the package to see all the options available to you for these.
The Future
As you can see, the current state of producing PDF files from HTML is great and support for future developments is bright.
What I want to see in the future is support for converting multiple HTML pages into one complete PDF publication with chapters and an automatically generated table of contents.
This will give us the ability to take Web content which is displayed in parts over multiple pages and present it as a publication, viewable in any tablet or reader which supports PDF documents.
We have the capability to do this, however if it is worth spending the time and money required to implement it will be determined if there is a need. If this is something you would also like to see in the future, let me know in the comments below.
Conclusion
We can produce PDF documents that not only meet, they exceed our expectations. Supporting not only the latest web standards but also supporting the standards and features available in PDF documents.
The pdfLayer API web service provides plenty of conversions in their free package for most of our needs, along with caching so that we can deliver unlimited documents we have already converted without additional charges to our limits.
If your needs are greater than the limits of the free package, they also provide economical premium packages where you can find the right one at the right price.
For now, you can try the service easily from PHP using the HTML to PDF Webkit package. It comes with several examples that demonstrate how you can do things like:
The basic example of converting an existing Web page to PDF
Generating PDF document from your given HTML markup
Serving converted PDF documents to download
Adding custom headers to PDF documents
If you liked this article or have a question about converting your HTML applications output to PDF using this package, post a comment here.
You need to be a registered user or login to post a comment
1,386,838 PHP developers registered to the PHP Classes site.
Be One of Us!
Login Immediately with your account on:
Comments:
5. Paid Content? - Sebastian (2016-04-29 03:47)
Is this paid content?... - 2 replies
Read the whole comment and replies
4. Privacy... - James (2016-04-19 18:53)
Passing data to a 3rd party isnt an option for most devs....... - 3 replies
Read the whole comment and replies
3. is good - Gerson (2016-04-19 16:24)
is good but similar other proyect... - 1 reply
Read the whole comment and replies
2. Testing - Mehboob Sheikh (2016-04-19 10:32)
test... - 0 replies
Read the whole comment and replies
1. HTML to PDF - Eion Robb (2016-04-19 06:57)
wkhtmltopdf is better... - 1 reply
Read the whole comment and replies