PHP Classes

How PHP DomDocument Class Will Be Improved in PHP 8.4 to Parse and Serialize Better HTML5 Documents

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog How PHP DomDocument C...   Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)  

Author:

Updated on: 2024-01-05

Posted on: 2023-12-15

Categories: PHP Tutorials, PHP community, News

Until PHP 8.3, PHP developers can use the DOMDocument class to parse HTML pages and files to process them helpfully for their PHP applications.

For instance, the DOMDocument can extract information from sites only available on the web pages.

Unfortunately, the DOMDocument class implementation is based on the LibXML2 library, which is only ready to parse HTML 4 documents.

Therefore, PHP 8.4 will be improved to use a new parser library that can process HTML5 pages and files to parse pages with types of elements only available in HTML5.

Most modern sites use HTML5 to generate their pages to take advantage of new HTML page elements not supported by HTML 4.

PHP 8.4 will be the next official version that PHP core developers will release, with improved features and better security protection.

Read this short article to see some PHP code and learn how to take advantage of this improvement of PHP 8.4 and when it will be available for you to use in your PHP applications.




Loaded Article

In this article you will learn:

1. Why PHP Developers Need a Good HTML Parser Extension

2. What the PHP DOMDocument Class Does

3. How PHP 8.4 Will Be Improved to Provide Better Support to Parse and Process HTML5 Pages and Files

4. When PHP 8.4 Will Be Released

PHP 8.4 image based on PHP 8 logo design by Vincent Pointier

This image above is based on PHP 8 logo design by Vincent Pointier.

1. Why PHP Developers Need a Good HTML Parser Extension

We all know that HTML is the language used to define the contents of Web pages.

When you need to obtain information on a Web page of a given site, you can retrieve the HTML contents and the Web page and then use a parser to process the page and return the structure of HTML tags on the page.

A good HTML parser will help developers quickly implement tasks that can traverse the HTML document structure and extract the content you want from the HTML page.

2. What the PHP DOMDocument Class Does

The DOMDocument class can be used to parse XML documents in general. HTML can be parsed to be treated as XML documents.

The DOMDocument class has functions called loadHTML and loadHTMLFile that can parse HTML documents and return a tree of nodes of tags that represent the document. This tree of nodes can be easily traversed with PHP code to examine the elements of an HTML page.

3. How PHP 8.4 Will Be Improved to Provide Better Support to Parse and Process HTML5 Pages and Files

PHP 8.4 will introduce a new class named HTMLDocument XXX that will allow developers to create new HTML document objects from scratch using the createEmpty function, or from an HTML string using the createFromString or from an HTML file using the function createFromFile function.

These changes in PHP 8.4 will be backward compatible with previous PHP 8.3. So, applications that use the DOMDocument class will continue to work as in past PHP versions.

Here is the structure of classes that will be available in PHP 8.4 to manipulate HTML and XML documents based on the original proposal to introduce the HTMLDocument class in PHP 8.4:

namespace DOM {
  // The base abstract document class
  abstract class Document extends DOM\Node implements DOM\ParentNode {
    /* all properties and methods that are common and sensible for both XML & HTML documents */
  }

  final class XMLDocument extends Document {
    /* insert specific XML methods and properties (e.g. xmlVersion, validate(), ...) here */

    private function __construct() {}

    public static function createEmpty(string $version = "1.0", string $encoding = "UTF-8"): XMLDocument;
    public static function createFromFile(string $path, int $options = 0, ?string $override_encoding = null): XMLDocument;
    public static function createFromString(string $source, int $options = 0, ?string $override_encoding = null): XMLDocument;
  }

  final class HTMLDocument extends Document {
    /* insert specific Html methods and properties here */

    private function __construct() {}

    public static function createEmpty(string $encoding = "UTF-8"): HTMLDocument;
   public static function createFromFile(string $path, int $options = 0, ?string $override_encoding = null): HTMLDocument;
   public static function createFromString(string $source, int $options = 0, ?string $override_encoding = null): HTMLDocument;
  }
}
 
class DOMDocument extends DOM\Document {
  /* Keep methods, properties, and constructor the same as they are now */
}

4. When PHP 8.4 Will Be Released

If all goes well according to the PHP 8.4 plan created by the PHP Core developers, PHP 8.4 will be released on November 21, 2024.




You need to be a registered user or login to post a comment

Login Immediately with your account on:



Comments:

No comments were submitted yet.



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog How PHP DomDocument C...   Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)