PHP Classes

File: vendor/smalot/pdfparser/doc/CustomConfig.md

Recommend this page to a friend!
  Classes of Cuthbert Martin Lwinga   PHP Neural Net Library   vendor/smalot/pdfparser/doc/CustomConfig.md   Download  
File: vendor/smalot/pdfparser/doc/CustomConfig.md
Role: Auxiliary data
Content type: text/markdown
Description: Auxiliary data
Class: PHP Neural Net Library
Build, train, evaluate, and use neural networks
Author: By
Last change:
Date: 2 months ago
Size: 3,467 bytes
 

Contents

Class file image Download

Configuring the behavior of the parser

To change the behavior of the parser, create a Config object and pass it to the parser. In this case, we're setting the font space limit. Changing this value can be helpful when getText() returns a text with too many spaces.

$config = new \Smalot\PdfParser\Config();
$config->setFontSpaceLimit(-60);
$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');
// output extracted text
// echo $pdf->getText();

Config options overview

The Config class has the following options:

| Option | Type | Default | Description | |--------------------------|---------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | setDecodeMemoryLimit | Integer | 0 | If parsing fails because of memory exhaustion, you can set a lower memory limit for decoding operations. | | setFontSpaceLimit | Integer | -50 | Changing font space limit can be helpful when Parser::getText() returns a text with too many spaces. | | setHorizontalOffset | String | ` | When words are broken up or when the structure of a table is not preserved, you may get better results when adaptingsetHorizontalOffset`. | | setPdfWhitespaces | String | \0\t\n\f\r | | | setPdfWhitespacesRegex | String | [\0\t\n\f\r ] | | | setRetainImageContent | Boolean | true | If parsing fails because of memory exhaustion, you can set the value to false. It wont retain image content anymore, but will use less memory too. |

option setDecodeMemoryLimit + setRetainImageContent (manage memory usage)

If parsing fails because of memory exhaustion, you can use the following options.

$config = new \Smalot\PdfParser\Config();
// Whether to retain raw image data as content or discard it to save memory
$config->setRetainImageContent(false);
// Memory limit to use when de-compressing files, in bytes
$config->setDecodeMemoryLimit(1000000);
$parser = new \Smalot\PdfParser\Parser([], $config);

option setHorizontalOffset

When words are broken up or when the structure of a table is not preserved, you can use setHorizontalOffset.

$config = new \Smalot\PdfParser\Config();
// An empty string can prevent words from breaking up
$config->setHorizontalOffset('');
// A tab can help preserve the structure of your document
$config->setHorizontalOffset("\t");
$parser = new \Smalot\PdfParser\Parser([], $config);

option setFontSpaceLimit

Changing font space limit can be helpful when getText() returns a text with too many spaces.

$config = new \Smalot\PdfParser\Config();
$config->setFontSpaceLimit(-60);
$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');