Subscribe: RSS
  • Converted the database from latin1 to utf8, and now it works!

    I still think that it would be better if AjaxSearch searched for entities as well as the raw characters. Should I file a bug report or is my request noted?
    • That would be a little difficult. It would have to search for every entity possible in every word. I’m not so sure that’s even possible; it would certainly slow a search down a lot.
        Studying MODX in the desert - http://sottwell.com
        Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
        Join the Slack Community - http://modx.org
      • So if i resume the requests :

        With "année" as searchstring, it could be interesting that AjaxSearch retrieve :
        1/ année
        2/ annee
        3/ année

        in the content of documents;

        1/ ok smiley

        2/ the main problem is to interpret the searchword as a word with one or several accented character and then replace these characters by the unaccented equivalent character. For French, Spanish, italian and Portuguese, i know that each accented character have an unaccented equivalent character. Even, if the meaning of the word change, it could be possible to find the equivalent character. (Even in french for example ... "mais" means "but" and "maïs" means "corn" grin)
        But is it true for others languages ? Not sure for example that in Cyrillic, all the accented characters have an unaccented equivalent character, used and understood by people.

        3/ as Sottwell, i think it will be time consuming and not efficient. I think it’s better to store the document contents in a "raw" format

        With "annee" as searchstring, AjaxSearch should retrieve :
        1/ année
        2/ annee
        3/ année

        in the content of documents;

        1/ unfortunately, without a dictionary (in the appropriate language) and a word recognition, i think it ’s not so easy to detect the equivalent accented word.
        2/ ok smiley
        3/ as 1/ i think that it ’s not possible without a dictionary and a context analysis
        • I would have to agree; lexical analysis for search and retrieval of information, especially combined with internationalization requirements, is one of the toughest challenges facing web developers. I’m in the process of integrating a Solr search server (a subproject of Lucene) with a client project now, and just deciding how to process and store the content in the indexes so I can configure the index schema requires some expertise on the subject. Luckily, my project is all English for now, otherwise, I’d have to be learning how to configure it with various language-specific word-stem analyzers. Ouch! My brain hurts.
          • Ok, I’ll stop making impossible requests now. wink

            But if it is not possible (or feasible) to change the behaviour of AjaxSearch, doesn’t it make sense for TinyMCE to store accented characters in raw format by default? I mean, I’m certainly not the only one that starts creating a site and saves the search feature for last?

            Converting charcters (and tables and databases in my case) is not the best way to spend ones time smiley If TinyMCE stored the characters in raw format by default I would probably have noticed if the characters became garbled.

            Or would that mess things up for the english speaking majority?
            • It is not difficult to change its behavior (simply edit the TinyMCE plugin, go to the Properties tab, and set it to "raw"), but considering how many non-English and multilanguage sites are being developed it might be better to have it set to "raw" by default.
                Studying MODX in the desert - http://sottwell.com
                Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
                Join the Slack Community - http://modx.org
              • I agree with your suggestion Sottwell. The default behaviour, from my point of view, should be to not transform the text catched with TinyMCE.
                How could we ask for that ? And what will be the impacts if we do this change for those we use the current behaviour ?
                In any case, the behaviour of tinyMCE should be highligthed to avoid troubles.
                • It is easy to set but it may not be easy to find where to set it. I looked for a long time in the site settings section (where I would expect to find it along with other editor settings) for a long time, only after much hair pulling I remembered that there were additional settings in the plugin section. So it is an easy thing to change but no very obvious specially for new comers.
                    [font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

                    Something is happening here, but you don't know what it is.
                    Do you, Mr. Jones? - [bob dylan]
                  • Hi everybody,

                    I see that you have found a solution for everybody but I have same problem here: http://goeschke.at/suche.html try to search for example for "aber"

                    Evo 1.0.4 & AS 1.9.0
                    MySQL charset: UTF-8 Unicode (from beginning)
                    MySQL connection collation: utf8_unicode_ci (from beginning)
                    MODx character encoding: UTF-8 (from beginning)
                    TinyMCE Entity Encoding: raw (was named but I created new doc after changing to raw and checked that it works but still in AS results crap)

                    So seems like everything’s done correctly but still same issue. Probably I need to change server as on another server special characters work on AS but still hoping I don’t need about 20 email accounts to transfer with all old emails in webmail to new server:S

                    Cheers,
                    Kaspar
                    • Changed the server and everything fine but still wondering what server configuration that could be that caused it.