We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 32942
    • 141 Posts
    This is an auto-generated support/comment thread for XHTML+XML+Unicode.

    Use this forum to post any comments about this addition or any questions you have regarding its use.

    Brief Description:
    Serves an application/xhtml+xml content-type, a xml prolog, if supported and adds full unicode support.

    PS: You don’t need to change your xhtml code, but you can use the following technique in your xhtml, if you need 100% security. (Beware: This sets your charset to unicode)
    <meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" />
    <!-- compliance patch for microsoft browsers -->
    <!--[if lt IE 7]>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <![endif]-->
    


    Please inform me, if you find any bugs etc.
      • 18641
      • 22 Posts
      Hi there,

      This is a great plugin, and, so far I’ve had no real problems. Is there any way we could send a mimetype based on user agent?
      Something like: if the UA is W3C_Validator and the page is XHTML1.1 then tell the W3C Validator that the page is application/xml+xhtml... kind of thing? instead of sending text/html.

      Great work, thanks!
        • 32942
        • 141 Posts
        yes, it is possible, but this method is not realiable because UAs can be faked.
          • 19315
          • 84 Posts
          Hi,

          I’ve been looking into this issue recently (RE the plugin application/xhtml+xml, which seems to have similar aims). I do have a few points/questions:

          - Can I just ask what is meant by "full unicode support"? I am slightly confused: looking at the source of the plugin it seems to preserve numeric character entities in the form &#[unicode character number]. I would have thought that "adding full unicode support" would have meant ensuring the character set is set to a unicode encoding (say utf-8), and converting the numeric character entities to actual characters in that character set.

          - I see that the arrays are initialised by repeatedly increasing the size of the "simbols" array, for each unicode character. Is this not extremely inneficient to keep increasing the array size like this? (I’m not sure though!)

          - From what I see the code seems to replace all ampersands that are not part of numeric character entity, or html entity with &amp;. It seems to do this by running str_replace to search the output document once for *every* unicode character, and once for (nearly) every html entity. Is this not extremely inneficient? Although regular expressions can be inneficient, would it not be better to do some sort of regular expression replace? (I don’t really know about regular expressions though).

          - There is a line that replaces &&amp; with &&; in the document. At that point in the script I don’t see why there would be any occurences of &&amp; in the document. What’s the aim of this line, and will it actually ever do anything?

          - I notice in an earlier post it is stated to include meta-http equiv with a content type of application/xhtml+xml. However, I see that in http://www.w3.org/International/tutorials/tutorial-char-enc/#Slide0250, I don’t think the meta-http equiv tag should be used for the content type application/xhtml+xml . So I’m not sure if this is appropriate.

          Michal.