Alekos Psimikakis - 2017-10-21 06:22:43
Indeed, the text returned from the class for a .docx file is unacceptable. Here's one way to produce a more decent output (after having read 'document.xml' into '$content'):
$txt = ""; $l = strlen($content); $a = 0;
while ($a < $l) {
$a = strpos($content, "<w:t>", $a); if ($a == false) break;
$a += 5; $b = strpos($content, "</w:t>", $a);
if ($b == false) {
echo "Bad 'document.xml' file in ".$this->filename;
return "";
}
$txt .= substr($content, $a, $b-$a)."<br>"; $a = $b+6;
}
return $txt;
This is of course far from optimum, but the output is readable. To produce an even more decent output, one has to analyze in depth the .docx structure, which is not in my immediate plans! :)