Author: Timothy Edwards
Updated on: 2021-11-18
Posted on: 2021-11-18
Package: PHP DOCX to Text
DOCX documents can be complex because they can contain many types of documents, like text, images, and other styles.
If you need to extract text from a DOCX document, it may be a complex task.
Read this article to learn how to extract text from DOCX documents so that you can process that text in any PHP application.
In this article you will learn:
Introduction to the PHP DOCX to Text Package
What the PHP DOCX to Text Package Does in Practice
How Can the PHP DOCX to Text Convert Microsoft Word Documents in Practice
Download and Install the PHP DOCX to Text Package Using PHP Composer
Introduction to the PHP DOCX to Text Package
Recently I developed and published a PHP class to parse DOCX to HTML with images.
Then I thought that some sections of the code would be very useful as standalone classes for other uses.
With Microsoft Word files being commonly used to transfer and store information, I thought that a class that enabled manipulation and searching on the text of a Word document could be useful.
What the PHP DOCX to Text Package Does in Practice
I created this PHP DOCX to Text class. This will extract all the text contained in a Microsoft Word DOCX document. The text extracted, includes all footnotes and endnotes together with list and paragraph numbering.
The output of this class is an array with each element containing a paragraph of text from the original document.
This array can be easily manipulated using PHP to enable it to carry text searches of Word documents, extract certain sections of text, or to save the text of a Word document to a database for subsequent use.
For convenience, the first element of the array shows the number of text elements contained in the array, together with the maximum length of an element of the array in the format 'number:length'.
Knowing the maximum length of a text element of the array could be useful if the text is being saved to a database.
How Can the PHP DOCX to Text Convert Microsoft Word Documents in Practice
The example textdemo.php that you can see below shows how use this this class to extract the text from a Word document with the resultant array then being processed to display each text element (paragraph) on screen along with its element number.
The example script file expects the DOCX file with the name sample.docx.
The number of elements and the maximum length of a text element are also displayed.
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php
require_once('wordtext.php');
$rt = new WordTEXT(false,'UTF-8');
$text = $rt->readDocument('sample.docx');
$det = explode(':',$text[0]);
echo "No of text elements in the array - ".$det[0]."<br>";
echo "Max length of a text element in the array - ".$det[1]."<br> <br>";
$LC = 1;
while ($LC <= $det[0]){
echo "Element ".$LC." : ".$text[$LC]."<br>";
$LC++;
}
?>
</body>
Download and Install the PHP DOCX to Text Package Using PHP Composer
You can download or install the PHP DOCX to Text package using PHP Composer tool by going to this download page to get the package code. That page also contains instructions on how to install package using PHP Composer from the PHP Classes site.
You need to be a registered user or login to post a comment
1,587,709 PHP developers registered to the PHP Classes site.
Be One of Us!
Login Immediately with your account on:
Comments:
1. Getting error message on Test - Charles Patton (2021-11-18 19:18)
Copied code from Documentation and tested - got an error... - 2 replies
Read the whole comment and replies