We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 19090
    • 35 Posts
    Quote from: rfoster at May 26, 2009, 07:55 PM

    I have modified Revolution to allow UTF-8 aliases, and it worked fine. I expect that it could work on Evolution also. Almost anything is ok in a url, except for a handful of reserved characters and a few that might cause a problem. These are:
    & = + % # < > ~ ` " ’ @ ? [ ] { } | ^ (See http://www.faqs.org/rfcs/rfc1738.html)

    Of course, having the actual Russian title in the alias helps with SEO...so it is worth doing.

    Hi Rfoster!
    Do you have the actual code that needs to be modified to allow UTF-8 aliases in Evolution?
    Best Regards
    Vlad
    • Guys, FYI, work has been done to make transliteration optional (via a plugin) in the future but it is not yet committed to Evo or Revo. This will allow you to optionally apply custom transliteration for your specific needs, or simply use the basic rules to strip/replace characters based on the RFC RFoster posted below.

      Quote from: rfoster at May 26, 2009, 07:55 PM

      I have modified Revolution to allow UTF-8 aliases, and it worked fine.
      Have you provided a bug report or patch for this anywhere? Would love to incorporate it...
        • 26704
        • 115 Posts
        2memeko,

        for example if i whant set alias "health" on russian lang, i can write it on translite. It mean use latin characters for russian word. So it will be "zdorove" ("здоровье" on russian). Most part russian users will understand this word.

        2OpenGeek,

        wow! It first time i see official MODx developer here (in this part of forum i mean). I want express my great appreciation for the fact that you develops this system, while allowing free use of it to everyone. Despite the fact that MODx in Russia is not yet very popular among web developers, I believe that one day she (or it...) will become the leader among the CMS systems. All Russian community very much looks forward to release of two new versions. I wish you success in all endeavors.
          • 24935
          • 160 Posts
          It is Bug #717.

          I will try to improve the code a little and post here soon.
            • 19090
            • 35 Posts
            Thanks for sharing it!
              • 24935
              • 160 Posts
              OK, here is the new and improved version. I have tested it fairly thoroughly, and it seems to work very well.

              I wanted this to:
              1) Work well for most people without any extra configuration.
              2) Allow essentially everything that is legal, as long as it doesn’t cause a problem for browsers (or users)
              3) Allow customization via system settings and lexicon.

              This code below can replace the function cleanAlias in /core/model/modx/modresource.class.php
              This is Revo code, but it could be modified to replace the stripAlias function in manager/processors/save_content.processor.php for MODx version 0.9.6
                  function cleanAlias($alias) {
                      global $modx;
              
                      $charset = $this->xpdo->getOption('modx_charset',null,'UTF-8'); // determine the charset
                      $charset = !empty($charset) ? strtoupper($charset) : 'UTF-8';
              
                      $alias = html_entity_decode($alias, ENT_QUOTES, $charset); // convert html entity codes into normal text
                      $alias = strip_tags($alias); // remove html
              
                      // find the value to replace '&'
                      $modx->lexicon->load('default');
                      $and = $modx->lexicon('and');
                      $alias = str_replace('&',$and,$alias);
              
                      $alias = str_replace(html_entity_decode(' '),' ',$alias); // replace non-breaking spaces with normal spaces
              
                      // let user preserve uppercase (useful for CamelCaseURLs)
                      if($this->xpdo->getOption('alias_allow_uppercase',null,0) != 1 ) {
                          $alias = mb_convert_case($alias, MB_CASE_LOWER, $charset); // convert to lowercase
                      }
              
                      $unsafechars = '/[\0\x0B\t\n\r\f\a&=+%#<>"~`@\?\[\]\{\}\|\^\'\\\\]/'; // pattern to match reserved and unsafe chars
                      $alias = preg_replace($unsafechars, '', $alias); // clean the alias
                      $alias = trim($alias);
              
                      $separator = $this->xpdo->getOption('alias_word_separator',null,'-'); // let user use a special separator or no separator
                      $separator = preg_replace($unsafechars, '', $separator); // clean the separator
                      $alias = preg_replace('/\s+/', $separator, $alias); // replace whitespace with the separator
              
                      if ($this->xpdo->getOption('alias_allow_punctuation',null,0) != 1) {
                          $alias= preg_replace('/[;:!\,\.\/\(\)\*]/', '', $alias); // remove common punctuation including . and /
                      }
              
                      // collapse common separators (yes, a space is allowed as a separate if someone really wants to use it
                      $alias = preg_replace('|-+|', '-', $alias);
                      $alias = preg_replace('|_+|', '_', $alias);
                      $alias = preg_replace('| +|', ' ', $alias);
              
                      // collapse characters that could cause directory problems (while still allowing pseudo folders)
                      $alias = preg_replace('|\/+|', '/', $alias);
                      $alias = preg_replace('|\.+|', '.', $alias);  // don't allow ..
                      $alias = preg_replace('/\.\//', '.', $alias); // don't allow ./
              
                      // clean up the begining and end of the alias, also removing path chars from beginning and end
                      $alias = trim($alias, ' -/.');
              
                      return $alias;
                  }
              


              This should work for 99.99% of users, works well without any extra configuration, and is highly customizable via system settings and the lexicon. Please respond to this post if you find any bugs.

              There are a few things that would be nice to add to this, such as replacing various unicode dashes with a regular dash.

              This is really quite awesome. Even works for rtl languages!
              • This is a nice solution, but I’d like to get this functionality at least partially delegated to an event on which we can attach plugins for custom transliteration. This will be in-line with the solution that is coming in the Evolution release (1.0.0). I also do not want direct dependencies on the mb_ functions in the core code, as these are not always available, despite the fact that they probably should be...
                  • 24935
                  • 160 Posts
                  Yes, an event would be very nice for all those who want to transliterate or do any other custom processing.