PHP Classes

File: README.md

Recommend this page to a friend!
  Classes of Fabian Schmengler   PHP PDFBox   README.md   Download  
File: README.md
Role: Documentation
Content type: text/markdown
Description: Auxiliary data
Class: PHP PDFBox
Extract text from PDF documents using PDFBox tool
Author: By
Last change: Added how to install with composer
Date: 9 years ago
Size: 1,446 bytes
 

Contents

Class file image Download

PdfBox

A PHP interface for the PdfBox ExtractText utility, useful to unit-test contents of generated PDFs.

Requirements

  • Java Runtime Environment
  • PdfBox JAR file - Download: http://pdfbox.apache.org/downloads.html - Tested with 1.6.0, 1.7.0 and 1.8.6
  • PHP needs permissions for shell execution

Install

To install with composer:

composer require sgh/pdfbox

Basic Usage

use SGH\PdfBox

//$pdf = GENERATED_PDF;
$converter = new PdfBox;
$converter->setPathToPdfBox('/usr/bin/pdfbox-app-1.7.0.jar');
$text = $converter->textFromPdfStream($pdf);
$html = $converter->htmlFromPdfStream($pdf);
$dom  = $converter->domFromPdfStream($pdf);

If the source PDF is a file, use xxxFromPdfFile() instead xxxFromPdfStream() with the source path as parameter.

If you want to save the converted output to a file, specify the destination path as second parameter of the xxxFromPdfxxx() methods.

Advanced Usage

Convert a range of pages instead of the full document:

$converter->getOptions()
    ->setStartPage(2)
	->setEndPage(5);

Ignore corrupt objects in the PDF:

$converter->getOptions()
    ->setForce(true);

Sort text:

$converter->getOptions()
    ->setSort(true);

PHPUnit tests

To run the unit tests, change the environment variable PDFBOX_JAR to the full path of your PdfBox JAR file. See phpunit.xml.dist.