PHP Classes

PHP HTML to Text Conversion: Parse HTML and extract text contained in it

Recommend this page to a friend!
  Info   View files Documentation   View files View files (70)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2022-10-21 (8 days ago) RSS 2.0 feedNot enough user ratingsTotal: 392 This week: 4All time: 6,589 This week: 221Up
Version License PHP version Categories
html2text 1.0.18GNU General Publi...5HTML, PHP 5, Text processing
Collaborate with this project Author

html2text - github.com

Description

This class can parse HTML and extract text contained in it.

It can take a given HTML string and parse it to extract the text in the HTML document.

The class can change the case of the text inside certain HTML elements, as well prepend or append a given text.

Innovation Award
PHP Programming Innovation award nominee
December 2016
Number 9
Most PHP applications are used to generate HTML but some times we need to also generate text versions of given HTML, like for instance to send by email that includes the HTML and the text version as alternative.

This package provides a solution that lets you automatically create the text version of a given text that you can use on email messages or for other purposes.

Manuel Lemos
Picture of Lars Moelleken
  Performance   Level  
Name: Lars Moelleken <contact>
Classes: 25 packages by
Country: Germany Germany
Innovation award
Innovation award
Nominee: 11x

Winner: 1x

Details

Build Status Coverage Status Codacy Badge Latest Stable Version Total Downloads License Donate to this project using Paypal Donate to this project using Patreon

:memo: Html2Text

Description

Convert HTML to formatted plain text, e.g. for text mails.

Installation

The recommended installation way is through Composer.

$ composer require voku/html2text

Basic Usage

$html = new \voku\Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"

Extended Usage

Each element (h1, li, div, etc) can have the following options:

  • 'case' => convert case (```Html2Text::OPTION_NONE, Html2Text::OPTION_UPPERCASE, Html2Text::OPTION_LOWERCASE , Html2Text::OPTION_UCFIRST, Html2Text::OPTION_TITLE```)
  • 'prepend' => prepend a string
  • 'append' => append a string

For example:

$html = '<h1>Should have "AAA" changed to BBB</h1><ul><li>• Custom bullet should be removed</li></ul><img alt="The Linux Tux" src="tux.png" />';
$expected = 'SHOULD HAVE "BBB" CHANGED TO BBB' . "\n\n" . '- Custom bullet should be removed |' . "\n\n" . '[IMAGE]: "The Linux Tux"';

$html2text = new Html2Text(
    $html,
    array(
        'width'    => 0,
        'elements' => array(
            'h1' => array(
              'case' => Html2Text::OPTION_UPPERCASE, 
              'replace' => array('AAA', 'BBB')),
            'li' => array(
              'case' => Html2Text::OPTION_NONE, 
              'replace' => array('•', ''), 
              'prepend' => "- ",
              'append' => " |",
            ),
        ),
    )
);

$html2text->setPrefixForImages('[IMAGE]: ');
$html2text->setPrefixForLinks('[LINKS]: ');
$html2text->getText(); // === $expected

Live Demo

  • HTML | TEXT
  • https://moelleken.org/url_to_text.php?url=https://ADD_YOUR_URL_HERE

History

This library started life on the blog of Jon Abernathy http://www.chuggnutt.com/html2text

A number of projects picked up the library and started using it - among those was RoundCube mail. They made a number of updates to it over time to suit their webmail client.

Now this is a extend fork of the original Html2Text.

Support

For support and donations please visit Github | Issues | PayPal | Patreon.

For status updates and release announcements please visit Releases | Twitter | Patreon.

For professional support please contact me.

Thanks

  • Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Managment, etc.
  • Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
  • Thanks to Travis CI for being the most awesome, easiest continous integration tool out there!
  • Thanks to StyleCI for the simple but powerfull code style check.
  • Thanks to PHPStan && Psalm for relly great Static analysis tools and for discover bugs in the code!
  Files folder image Files  
File Role Description
Files folder imagesrc (1 file)
Files folder imagetests (26 files, 1 directory)
Accessible without login Plain text file .editorconfig Data Auxiliary data
Accessible without login Plain text file .scrutinizer.yml Data Auxiliary data
Accessible without login Plain text file .styleci.yml Data Auxiliary data
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file CHANGELOG.md Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.md Lic. License text
Accessible without login Plain text file phpcs.php_cs Example Example script
Accessible without login Plain text file phpstan.neon Data Auxiliary data
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
  Plain text file Html2Text.php Class Class source

  Files folder image Files  /  tests  
File Role Description
Files folder imagefixtures (32 files)
  Plain text file BasicTest.php Class Class source
  Plain text file BlankSpacesTest.php Class Class source
  Plain text file BlockquoteTest.php Class Class source
  Accessible without login Plain text file bootstrap.php Aux. Auxiliary script
  Plain text file ConstructorTest.php Class Class source
  Plain text file DefinitionListTest.php Class Class source
  Plain text file ElementsTest.php Class Class source
  Plain text file HeadingsTest.php Class Class source
  Plain text file HtmlCharsTest.php Class Class source
  Plain text file ImageTest.php Class Class source
  Plain text file LinkTest.php Class Class source
  Plain text file ListItemsTest.php Class Class source
  Plain text file ListTest.php Class Class source
  Plain text file MailTest.php Class Class source
  Plain text file NewlineSpaceTest.php Class Class source
  Plain text file NewlineTabBreakTest.php Class Class source
  Plain text file ParagraphBreakTest.php Class Class source
  Plain text file PreTest.php Class Class source
  Plain text file PrintTest.php Class Class source
  Plain text file SearchReplaceTest.php Class Class source
  Plain text file SpaceTest.php Class Class source
  Plain text file SpanTest.php Class Class source
  Plain text file StrToUpperTest.php Class Class source
  Plain text file TableTest.php Class Class source
  Plain text file UnderscoresTest.php Class Class source
  Plain text file UppercaseTest.php Class Class source

  Files folder image Files  /  tests  /  fixtures  
File Role Description
  Accessible without login HTML file code.html Doc. Documentation
  Accessible without login Plain text file code.txt Doc. Documentation
  Accessible without login HTML file dl_dt_dd.html Doc. Documentation
  Accessible without login Plain text file dl_dt_dd.txt Doc. Documentation
  Accessible without login HTML file msoffice.html Doc. Documentation
  Accessible without login Plain text file msoffice.txt Doc. Documentation
  Accessible without login HTML file nbsp.html Doc. Documentation
  Accessible without login Plain text file nbsp.txt Doc. Documentation
  Accessible without login HTML file non-breaking-spaces.html Doc. Documentation
  Accessible without login Plain text file non-breaking-spaces.txt Doc. Documentation
  Accessible without login HTML file table.html Doc. Documentation
  Accessible without login Plain text file table.txt Doc. Documentation
  Accessible without login HTML file test10Html.html Doc. Documentation
  Accessible without login Plain text file test10Html.txt Doc. Documentation
  Accessible without login HTML file test1Html.html Doc. Documentation
  Accessible without login Plain text file test1Html.txt Doc. Documentation
  Accessible without login HTML file test2Html.html Doc. Documentation
  Accessible without login Plain text file test2Html.txt Doc. Documentation
  Accessible without login HTML file test3Html.html Doc. Documentation
  Accessible without login Plain text file test3Html.txt Doc. Documentation
  Accessible without login HTML file test4Html.html Doc. Documentation
  Accessible without login Plain text file test4Html.txt Doc. Documentation
  Accessible without login HTML file test5Html.html Doc. Documentation
  Accessible without login Plain text file test5Html.txt Doc. Documentation
  Accessible without login HTML file test6Html.html Doc. Documentation
  Accessible without login Plain text file test6Html.txt Doc. Documentation
  Accessible without login HTML file test7Html.html Doc. Documentation
  Accessible without login Plain text file test7Html.txt Doc. Documentation
  Accessible without login HTML file test8Html.html Doc. Documentation
  Accessible without login Plain text file test8Html.txt Doc. Documentation
  Accessible without login HTML file test9Html.html Doc. Documentation
  Accessible without login Plain text file test9Html.txt Doc. Documentation

 Version Control Unique User Downloads Download Rankings  
 100%
Total:392
This week:4
All time:6,589
This week:221Up
For more information send a message to info at phpclasses dot org.