Gerhard - 2010-08-20 09:52:44
Hi!
I have observed an interesting behavior of the PHP XMLWriter class when creating a xml that has attribute-valuess with special characters. It converts ALL of them to their entities.
I tried to find a specification for XML that would explain why this happens, but i did not find anything describing the allowed characters in the attribute-value. But before i will go into the problem i will show you what i have and what i tried to do.
As an example for this description i have queried the google weather API with Russian language and got that (only a part of the whole xml).
(Had to replace the russion letters with ööö to get it shown in the forum)
<forecast_conditions>
<day_of_week data="ööö"/>
<low data="20"/>
<high data="29"/>
<icon data="/ig/images/weather/partly_cloudy.gif"/>
<condition data="Partly Cloudy"/>
</forecast_conditions>
As you can see i get the "day_of_week" with an attribute containing the weekday name in Russian (ööö) letters. This looks like i would expect it.
But when i try to create the same xml with the XMLWriter in my script (or any other xml with special characters in the attribute-value) the XMLWrite creates the following. I defined the xml as UTF-8 as well as the characters/strings were set/defined as UTF-8 strings.
forecast_conditions>
<day_of_week data="&#F6;&#F6;&#F6;"/>
<low data="20"/>
<high data="29"/>
<icon data="/ig/images/weather/partly_cloudy.gif"/>
<condition data="Partly Cloudy"/>
</forecast_conditions>
For some reason, the XMLWriter class converts the "ööö" to "&#F6;&#F6;&#F6;". For defining the attribute i used the following method.
$xml_writer->writeAttribute('data', 'ööö');
As i have seen this, i have searched for an explanation for that behavior. But i could not find any specification/documentation describing the characters or encoding allowed in the attribute-value. I only found a lot of descriptions about the tags and attribute names about that.
What i now would like to find out is the following. Is it the correct behavior according to the XML specification? Or is it a special behavior that is related to PHP or the XMLWriter?