PHP Classes

PHP docx reader

Recommend this page to a friend!

      PHP docx reader  >  All threads  >  PHP docx reader  >  (Un) Subscribe thread alerts  
Subject:PHP docx reader
Summary:A way to isolate the text from a .docx file
Messages:1
Author:Alekos Psimikakis
Date:2017-10-21 06:22:43
 

  1. PHP docx reader   Reply   Report abuse  
Picture of Alekos Psimikakis Alekos Psimikakis - 2017-10-21 06:22:43
Indeed, the text returned from the class for a .docx file is unacceptable. Here's one way to produce a more decent output (after having read 'document.xml' into '$content'):

$txt = ""; $l = strlen($content); $a = 0;
while ($a < $l) {
$a = strpos($content, "<w:t>", $a); if ($a == false) break;
$a += 5; $b = strpos($content, "</w:t>", $a);
if ($b == false) {
echo "Bad 'document.xml' file in ".$this->filename;
return "";
}
$txt .= substr($content, $a, $b-$a)."<br>"; $a = $b+6;
}
return $txt;

This is of course far from optimum, but the output is readable. To produce an even more decent output, one has to analyze in depth the .docx structure, which is not in my immediate plans! :)