PHP PDF to HTML: Convert PDF to HTML using Poppler

Recommend this page to a friend!
  Info   View files Documentation   View files View files (7)   DownloadInstall with Composer Download .zip   Reputation   Support forum (6)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2021-05-18 (3 months ago) RSS 2.0 feedStarStarStar 56%Total: 1,846 This week: 3All time: 2,123 This week: 67Up
Version License PHP version Categories
pdf-to-html 1.0.8GNU General Publi...5.4PHP 5, Utilities and Tools, Files and..., C...
Description Author

This class can convert PDF to HTML using Poppler program.

It can take the path of the Poppler program tools and execute several operations to extract information from PDF documents.

Currently the class can convert whole PDF documents or individual pages to HTML, get the document information, return the page count, etc..

Several parameters can be configured like the the preferred format of the pictures inside the document, zoom scale, whether to use images or CSS inline within the HTML or as external files, etc..

Recommendations

What is the best PHP search string in pdf class?
Search string in PDF and return page number

What is the best PHP pdf to text class?
pdf to text format in php

What is the best PHP convert pdf to html class?
Need to convert PDF to HTML or format to embed in Web site

What is the best PHP read pdf file class?
Read PDF file upload than read in text in it.

Convert PDF to HTML
Convert PDF to HTML library

What is the best PHP pdf to html class?
Converting PDF files to a HTML file

What is the best PHP pdf to html class?
Convert PDF to HTML

What is the best PHP convert html to pdf class?
I need to convert html content with tables to pdf

PDF to HTML and PDF to JPEG
I am looking for a code to convert PDF to HTML and PDF to JPEG

Picture of Anton N Nikolaev
  Performance   Level  
Name: Anton N Nikolaev <contact>
Classes: 1 package by
Country: Russian Federation Russian Federation

Details

PDF to HTML PHP Class

This PHP class can convert your pdf files to html using poppler-utils.

Thanks

Big thanks Mochamad Gufron (mgufrone)! I did a packet based on its package (https://github.com/mgufrone/pdf-to-html).

Important Notes

Please see how to use below.

Installation

When you are in your active directory apps, you can just run this command to add this package on your app

  composer require tonchik-tm/pdf-to-html:~1

Or add this package to your composer.json

{
  "tonchik-tm/pdf-to-html":"~1"
}

Requirements

1. Install Poppler-Utils

Debian/Ubuntu

sudo apt-get install poppler-utils

Mac OS X

brew install poppler

Windows

For those who need this package in windows, there is a way. First download poppler-utils for windows here <http://blog.alivate.com.au/poppler-windows/>. And download the latest binary.

After download it, extract it.

2. We need to know where is utilities

Debian/Ubuntu

$ whereis pdftohtml
pdftohtml: /usr/bin/pdftohtml

$ whereis pdfinfo
pdfinfo: /usr/bin/pdfinfo

Mac OS X

$ which pdfinfo
/usr/local/bin/pdfinfo

$ which pdftohtml
/usr/local/bin/pdfinfo

Windows

Go in extracted directory. There will be a directory called bin. We will need this one.

3. PHP Configuration with shell access enabled

Usage

Example:

<?php
// if you are using composer, just use this
include 'vendor/autoload.php';

// initiate
$pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [
    'pdftohtml_path' => '/usr/bin/pdftohtml',
    'pdfinfo_path' => '/usr/bin/pdfinfo'
]);

// example for windows
// $pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [
//     'pdftohtml_path' => '/path/to/poppler/bin/pdftohtml.exe',
//     'pdfinfo_path' => '/path/to/poppler/bin/pdfinfo.exe'
// ]);

// get pdf info
$pdfInfo = $pdf->getInfo();

// get count pages
$countPages = $pdf->countPages();

// get content from one page
$contentFirstPage = $pdf->getHtml()->getPage(1);

// get content from all pages and loop for they
foreach ($pdf->getHtml()->getAllPages() as $page) {
    echo $page . '<br/>';
}

Full list settings:

<?php

$full_settings = [
    'pdftohtml_path' => '/usr/bin/pdftohtml', // path to pdftohtml
    'pdfinfo_path' => '/usr/bin/pdfinfo', // path to pdfinfo

    'generate' => [ // settings for generating html
        'singlePage' => false, // we want separate pages
        'imageJpeg' => false, // we want png image
        'ignoreImages' => false, // we need images
        'zoom' => 1.5, // scale pdf
        'noFrames' => false, // we want separate pages
    ],

    'clearAfter' => true, // auto clear output dir (if removeOutputDir==false then output dir will remain)
    'removeOutputDir' => true, // remove output dir
    'outputDir' => '/tmp/'.uniqid(), // output dir

    'html' => [ // settings for processing html
        'inlineCss' => true, // replaces css classes to inline css rules
        'inlineImages' => true, // looks for images in html and replaces the src attribute to base64 hash
        'onlyContent' => true, // takes from html body content only
    ]
]

Feedback & Contribute

Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks :+1:

  Files folder image Files  
File Role Description
Files folder imagesrc (3 files)
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file composer.lock Data Auxiliary data
Accessible without login Plain text file LICENSE Lic. License text
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
  Plain text file Base.php Class Class source
  Plain text file Html.php Class Class source
  Plain text file Pdf.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:1,846
This week:3
All time:2,123
This week:67Up
User Ratings User Comments (2)
 All time
Utility:81%StarStarStarStarStar
Consistency:81%StarStarStarStarStar
Documentation:75%StarStarStarStar
Examples:-
Tests:-
Videos:-
Overall:56%StarStarStar
Rank:1757
  
For more information send a message to info at phpclasses dot org.