We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 22815
    • 1,097 Posts
    What exactly is wrong with
    ?
      No, I don't know what OpenGeek's saying half the time either.
      MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
      Forum: Where to post threads about add-ons | Forum Rules
      Like MODx? donate (and/or share your resources)
      Like me? See my Amazon wishlist
      MODx "Most Promising CMS" - so appropriate!
      • 6726
      • 7,075 Posts
      Nothing is wrong with br tags, but I’d rather have paragraphs from Word wrapped with p tags...

      Here, I copied pasted a letter from Word and my paragraphs were created with br tags rather than p tags. For instance, Textile would have parsed this as a paragraph... It’s not criticism, it’s just a small remark. Character encoding aside, the purifier works great !
        .: COO - Commerce Guys - Community Driven Innovation :.


        MODx est l'outil id
        • 1341
        • 20 Posts
        Yep, forgot that. HTML Purifier has sort-of out-of-the-box support for multiple character encodings:


        $config = HTMLPurifier_Config::createDefault();
        $config->set(’Core’, ’Encoding’, ’ISO-8859-1’);
        $purifier = new HTMLPurifier($config);

        But you really should be using UTF-8.

        As for the
        versus <p> tag thing, that’s TinyFCK’s fault, not HTML Purifier’s. Textile wouldn’t have preserved the formatting. ;-)

        On my end, there is some Microsoft-specific behavior HTML Purifier has to filter out. This is projected for the 1.2.
          • 6726
          • 7,075 Posts
          Quote from: Ambush at Sep 06, 2006, 10:37 PM

          Yep, forgot that. HTML Purifier has sort-of out-of-the-box support for multiple character encodings:


          $config = HTMLPurifier_Config::createDefault();
          $config->set(’Core’, ’Encoding’, ’ISO-8859-1’);
          $purifier = new HTMLPurifier($config);


          Thanks a LOT for this one !

          Quote from: Ambush
          But you really should be using UTF-8.

          Yeah I know, I usually use UTF-8, but MODx ships with Latin1 for french, though we have an utf-8 mod somewhere.

          Quote from: Ambush
          As for the
          versus <p> tag thing, that’s TinyFCK’s fault, not HTML Purifier’s. Textile wouldn’t have preserved the formatting. ;-) On my end, there is some Microsoft-specific behavior HTML Purifier has to filter out. This is projected for the 1.2.

          Textile would have, but it’s only normal since Textile is not WYSIWYG when you copy-paste it’s raw text. Anyway, I didn’t mean to imply htmlpurifier was at fault there. Microsoft has always made a mess of importing office docs into html pages... I don’t care one bit if MS document imports end up with
          tags instead of p’s.

          I must say, I am really impressed by the htmlpurifier, and I will use it ! I am tired of waiting for the semantically sound wysiwyg editor, now I have an alternate solution : thanks for that !!!

          One thing I was worried about was the added time to process the document, and server load, but I must say I almost don’t see any difference between normal document save and "processed" document save (maybe because I have a nice dedicated server, but nevertheless).

          Anyway, if you have a donation page, I’ll drop a few bucks to show my appreciation, this plugin will save me a lot of cleaning up smiley
          Thanks again for the quick response time and great code smiley

          Edit : Your code works perfectly grin



            .: COO - Commerce Guys - Community Driven Innovation :.


            MODx est l&#39;outil id
            • 6726
            • 7,075 Posts
            Okay, now that I have tried it a bit more, I have a slight problem : I have a string inserted in my documents after cleaning : "rn", it seems to replace


            I did not notice that before adding the ISO 8859 code...
              .: COO - Commerce Guys - Community Driven Innovation :.


              MODx est l&#39;outil id
              • 1341
              • 20 Posts
              Unable to reproduce with %Core.Encoding set to "ISO-8859-1" and sample text "
              ". Could I have a minimal test-case that reproduces the problem? (i.e. you enter it in raw and the behavior happens). Server configuration as in PHP version would be great too.

              You also may want to consider filing a bug report at the tracker: http://hp.jpsband.org/mantis/
                • 6726
                • 7,075 Posts
                Sorry for not replying, I have been unable to find time to dig into this, but I will... next week probably. If it is a bug, I’ll file a bug report, thanks for the pointer. Right now, I’d rather say it’s me or my config, I’ll get back to you on this thread.
                  .: COO - Commerce Guys - Community Driven Innovation :.


                  MODx est l&#39;outil id
                  • 1341
                  • 20 Posts
                  Looking forward to finding out what the trouble was. Even if it’s another filter colliding with HTML Purifier, that’s something good to know.

                  by the way, 1.1.0 was released recently, be sure to upgrade.
                    • 6726
                    • 7,075 Posts
                    I will update. Also, I will definitely sort this out. This plugin is on top of my favorite list I am going to heavily depend on it since my clients insist on having WYSIWYG and I insist on having pages that validate and are semantically sound tongue

                    What about this donation page that I asked about ? I sure want to drop a few bucks for your project smiley (and I guess I am and won’t be the only one)
                      .: COO - Commerce Guys - Community Driven Innovation :.


                      MODx est l&#39;outil id
                      • 1341
                      • 20 Posts
                      I insist on having pages that validate and are semantically sound

                      Well, I can’t guarantee semantics (Italics?, says the program, Well, he actually meant to denote a book title...). :-P But if the WYSIWYG editor is done right you can get pretty darn close.

                      What about this donation page that I asked about ? I sure want to drop a few bucks for your project (and I guess I am and won’t be the only one)

                      I’m a minor, so I can’t legally hold a PayPal account (or the like). You can help out by spreading the word to other people about the library. I’d value that a lot more. :-D