AjaxSearch and accented characters

46 Posts

staed Reply #11, 16 years, 4 months ago

Converted the database from latin1 to utf8, and now it works!

I still think that it would be better if AjaxSearch searched for entities as well as the raw characters. Should I file a bug report or is my request noted?

☆ A M B ☆
24,524 Posts

sottwell Reply #12, 16 years, 4 months ago

That would be a little difficult. It would have to search for every entity possible in every word. I’m not so sure that’s even possible; it would certainly slow a search down a lot.

Studying MODX in the desert - http://sottwell.com
Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
Join the Slack Community - http://modx.org

1,717 Posts

coroico Reply #13, 16 years, 4 months ago

So if i resume the requests :

With "année" as searchstring, it could be interesting that AjaxSearch retrieve :
1/ année
2/ annee
3/ année

in the content of documents;

1/ ok

2/ the main problem is to interpret the searchword as a word with one or several accented character and then replace these characters by the unaccented equivalent character. For French, Spanish, italian and Portuguese, i know that each accented character have an unaccented equivalent character. Even, if the meaning of the word change, it could be possible to find the equivalent character. (Even in french for example ... "mais" means "but" and "maïs" means "corn" grin

)
But is it true for others languages ? Not sure for example that in Cyrillic, all the accented characters have an unaccented equivalent character, used and understood by people.

3/ as Sottwell, i think it will be time consuming and not efficient. I think it’s better to store the document contents in a "raw" format

With "annee" as searchstring, AjaxSearch should retrieve :
1/ année
2/ annee
3/ année

in the content of documents;

1/ unfortunately, without a dictionary (in the appropriate language) and a word recognition, i think it ’s not so easy to detect the equivalent accented word.
2/ ok

3/ as 1/ i think that it ’s not possible without a dictionary and a context analysis

MODX Staff
10,725 Posts

opengeek Reply #14, 16 years, 4 months ago

I would have to agree; lexical analysis for search and retrieval of information, especially combined with internationalization requirements, is one of the toughest challenges facing web developers. I’m in the process of integrating a Solr search server (a subproject of Lucene) with a client project now, and just deciding how to process and store the content in the indexes so I can configure the index schema requires some expertise on the subject. Luckily, my project is all English for now, otherwise, I’d have to be learning how to configure it with various language-specific word-stem analyzers. Ouch! My brain hurts.

Jason Coward
Chief Architect @ MODX
http://www.jasoncoward.com | http://twitter.com/drumshaman | https://github.com/opengeek

46 Posts

staed Reply #15, 16 years, 4 months ago

Ok, I’ll stop making impossible requests now. wink

But if it is not possible (or feasible) to change the behaviour of AjaxSearch, doesn’t it make sense for TinyMCE to store accented characters in raw format by default? I mean, I’m certainly not the only one that starts creating a site and saves the search feature for last?

Converting charcters (and tables and databases in my case) is not the best way to spend ones time

If TinyMCE stored the characters in raw format by default I would probably have noticed if the characters became garbled.

Or would that mess things up for the english speaking majority?

☆ A M B ☆
24,524 Posts

sottwell Reply #16, 16 years, 4 months ago

It is not difficult to change its behavior (simply edit the TinyMCE plugin, go to the Properties tab, and set it to "raw"), but considering how many non-English and multilanguage sites are being developed it might be better to have it set to "raw" by default.

Studying MODX in the desert - http://sottwell.com
Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
Join the Slack Community - http://modx.org

1,717 Posts

coroico Reply #17, 16 years, 4 months ago

I agree with your suggestion Sottwell. The default behaviour, from my point of view, should be to not transform the text catched with TinyMCE.
How could we ask for that ? And what will be the impacts if we do this change for those we use the current behaviour ?
In any case, the behaviour of tinyMCE should be highligthed to avoid troubles.

4,205 Posts

dev_cw Reply #18, 16 years, 4 months ago

It is easy to set but it may not be easy to find where to set it. I looked for a long time in the site settings section (where I would expect to find it along with other editor settings) for a long time, only after much hair pulling I remembered that there were additional settings in the plugin section. So it is an easy thing to change but no very obvious specially for new comers.

[font=Verdana]Shane Sponagle | [wiki] Snippet Call Anatomy | MODx Developer Blog | [nettuts] Working With a Content Management Framework: MODx

Something is happening here, but you don't know what it is.
Do you, Mr. Jones? - [bob dylan]

147 Posts

e-stonia Reply #19, 13 years, 7 months ago

Hi everybody,

I see that you have found a solution for everybody but I have same problem here: http://goeschke.at/suche.html try to search for example for "aber"

Evo 1.0.4 & AS 1.9.0
MySQL charset: UTF-8 Unicode (from beginning)
MySQL connection collation: utf8_unicode_ci (from beginning)
MODx character encoding: UTF-8 (from beginning)
TinyMCE Entity Encoding: raw (was named but I created new doc after changing to raw and checked that it works but still in AS results crap)

So seems like everything’s done correctly but still same issue. Probably I need to change server as on another server special characters work on AS but still hoping I don’t need about 20 email accounts to transfer with all old emails in webmail to new server:S

Cheers,
Kaspar

e-Stonia Web Agency - Freelance web design, ecommerce development and search engine marketing.

147 Posts

e-stonia Reply #20, 13 years, 7 months ago

Changed the server and everything fine but still wondering what server configuration that could be that caused it.

e-Stonia Web Agency - Freelance web design, ecommerce development and search engine marketing.