We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 28338
    • 46 Posts
    I have some problems with AjaxSearch and international characters. It seems modx stores the characters as they are in the title of the pages but as HTML entities in the content, thus AjaxSearch only finds matches in the titles of documents. Is there any way to make AjaxSearch search for the entities instead?

    Another problem is when AjaxSearch shows the excerpt from the page where it found a match. It sometimes cuts off in the middle of an HTML entity. It would be better if it cut between words than in the middle of them.

    I would be very happy if you could fix those problems, as they render the snippet pretty much useless to me sad
      • 5811
      • 1,717 Posts
      I have some problems with AjaxSearch and international characters. It seems modx stores the characters as they are in the title of the pages but as HTML entities in the content, thus AjaxSearch only finds matches in the titles of documents. Is there any way to make AjaxSearch search for the entities instead?

      Modx stores the characters as HTML entites in the content probably because you use TinyMCE with the default entity encoding parameter value (named). To store international characters as they are written you should set the entity encoding parameter (configuration folder) to "raw" instead of "named". So the text will be stored with the appropriate characters instead of html entities.

      with the tinyMCE entity encoding parameter set to "raw" the word "année" (year in french") will be store as "année" instead of année. So when you search "année" with ajaxsearch you can retrieve the word "année".


      Another problem is when AjaxSearch shows the excerpt from the page where it found a match. It sometimes cuts off in the middle of an HTML entity. It would be better if it cut between words than in the middle of them.


      Avoid html entities (with the appropriate tinyMCE configuration) in the content and you will solve the trouble smiley

      Look at http://wiki.modxcms.com/index.php/TinyMCE for tinyMCE configuration.
        • 28338
        • 46 Posts
        I’ve set TinyMCE’s Entity Encoding to "raw" now, and that bit seems to work - when I check the raw HTML in TinyMCE it shows the characters instead of entities. But searching with AjaxSearch still doesn’t work. Any ideas?
        • What has already been saved into the database is still as it was, with the characters converted to entities. Stopping the conversion from happening any more didn’t change what was already done.

          I’m looking at the same issue; I’ll be some time re-loading all the content with the proper characters.
            Studying MODX in the desert - http://sottwell.com
            Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
            Join the Slack Community - http://modx.org
            • 7231
            • 4,205 Posts
            I am glad I stumble on this information, I am in the same boat and did not know it.

            A behavior that I would like in AS would be to if I were to search for "année" that "annee" would also be returned and the other way around, search for "annee" and get "année" in the results as well. Anyone who uses uses a language with accents will know that they are often overlooked in searches which would result in inaccurate results.

            I have no idea if this is at all possible but would be nice grin
              [font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

              Something is happening here, but you don't know what it is.
              Do you, Mr. Jones? - [bob dylan]
              • 5811
              • 1,717 Posts
              hi dev_cw

              A behavior that I would like in AS would be to if I were to search for "année" that "annee" would also be returned and the other way around, search for "annee" and get "année" in the results as well. Anyone who uses uses a language with accents will know that they are often overlooked in searches which would result in inaccurate results.

              Your request is legitime and the ajaxSearch already partially works like this wink

              For example try "année" and "annee" on my site (feedbacks on travel books - site encoded in UTF-8). Is it in French, but you can see that the differences in the output of the search.
              With "année" as searchstring you output the title of documents and the contents where the searchstring is found
              With "annee" as searchstring you output only the title of documents (even if annee is not in the title).

              I will try to understand this behaviour and if possible improve it. But it’s not sure that i can do better.
                • 7231
                • 4,205 Posts
                With "annee" as searchstring you output only the title of documents (even if annee is not in the title).
                Thats right, I had not noticed this difference but now that you mention it I do see it.

                I will try to understand this behaviour and if possible improve it. But it’s not sure that i can do better.
                Very much appreciated. I would offer to help but I took a look at the code and it is a bit beyond my ability at the moment.
                  [font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

                  Something is happening here, but you don't know what it is.
                  Do you, Mr. Jones? - [bob dylan]
                  • 28338
                  • 46 Posts
                  Quote from: sottwell at Nov 24, 2007, 12:06 PM

                  What has already been saved into the database is still as it was, with the characters converted to entities. Stopping the conversion from happening any more didn’t change what was already done.

                  I’m looking at the same issue; I’ll be some time re-loading all the content with the proper characters.

                  I’m not sure it behaves that way. When i check the database it seems that TinyMCE translated all the entities back to characters when I edited the page. There are no entities to be found in the database anyway. But still AjaxSearch won’t find anything with accented characters in it.
                    • 5811
                    • 1,717 Posts
                    Hi staed,

                    Could you give your site page character set and your $database_connection_charset variable defined in your config file.

                    Could you check that:
                    - tinyMCE is configured with the raw value for the entity encoding parameter
                    and then
                    - create a new document with a content including accented characters (e.g "année" or anything else in your own language). The document should be configured as "searcheable" (see the second document tab : page settings).
                    Under TinyMCE , look at the HML source code, thru the HTML view and checks that "année" is correctly encoded as "année" not as "année".
                    - come back to your site, erase previous pages in your web browser and search the value "année" in your web site

                    Let me know the results.
                      • 28338
                      • 46 Posts
                      Quote from: coroico at Nov 24, 2007, 03:08 PM

                      Hi staed,

                      Could you give your site page character set and your $database_connection_charset variable defined in your config file.

                      The charcterset that’s configurable in the manager is utf-8 and $database_connection_charset is latin1, so I didn’t bother checking the other stuff you wrote. Is there any way to convert the database from latin1 to uft8? I’ve got another site with 100+ pages and a ton of TV’s that has the same problem, and I don’t really fancy the idea of converting it manually...