Hi,
I’ve been looking into this issue recently (RE the plugin application/xhtml+xml, which seems to have similar aims). I do have a few points/questions:
- Can I just ask what is meant by "full unicode support"? I am slightly confused: looking at the source of the plugin it seems to preserve numeric character entities in the form &#[unicode character number]. I would have thought that "adding full unicode support" would have meant ensuring the character set is set to a unicode encoding (say utf-8), and converting the numeric character entities to actual characters in that character set.
- I see that the arrays are initialised by repeatedly increasing the size of the "simbols" array, for each unicode character. Is this not extremely inneficient to keep increasing the array size like this? (I’m not sure though!)
- From what I see the code seems to replace all ampersands that are not part of numeric character entity, or html entity with &. It seems to do this by running str_replace to search the output document once for *every* unicode character, and once for (nearly) every html entity. Is this not extremely inneficient? Although regular expressions can be inneficient, would it not be better to do some sort of regular expression replace? (I don’t really know about regular expressions though).
- There is a line that replaces && with &&; in the document. At that point in the script I don’t see why there would be any occurences of && in the document. What’s the aim of this line, and will it actually ever do anything?
- I notice in an earlier post it is stated to include meta-http equiv with a content type of application/xhtml+xml. However, I see that in
http://www.w3.org/International/tutorials/tutorial-char-enc/#Slide0250, I don’t think the meta-http equiv tag should be used for the content type application/xhtml+xml . So I’m not sure if this is appropriate.
Michal.