We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 30343
    • 8 Posts
    Hi all,
    I am thinking about using MODx for a medium sized project and therefore did some performance tests.
    I populated the modx_site_content table with lot of entries and the result looks like the system scales quite bad for sites with many pages.

    Didn’t look at the code in detail up to now, but this looks like a conceptual problem?
    E.g. why is it required to load the page aliases, document ids or document relations of any existing page in the system on every page request?
    Is MODx just not designed to handle larger sites or is there something I am doing wrong or could change to get acceptable performance?

    Any ideas, experiences or suggestions?


    Here some figures of a first quick test:

    System:
    Linux Debian Sarge
    PHP5 5.1.4
    MySQL 4.1
    Aapache 2
    default install of MODx 0.9.2.1


    modx_site_content, 20 Rows, 59.2 KB
    siteCache.idx.php 143.0 KB
    PHP memory allocated: 1.3 MB 
    MySQL: 0.0171 s, 12 request(s), PHP: 0.2073 s, total: 0.2244 s, document retrieved from database.
    
    modx_site_content, 50 Rows, 110.5 KB
    siteCache.idx.php 148.8 KB
    PHP memory allocated: 1.4 MB
    MySQL: 0.0187 s, 12 request(s), PHP: 0.2449 s, total: 0.1595 s, document retrieved from database.
    
    modx_site_content, 500 Rows, 1.2 MB
    siteCache.idx.php  235.8 KB
    PHP memory allocated: 2.1 MB
    MySQL: 0.0618 s, 12 request(s), PHP: 0.2310 s, total: 0.2928 s, document retrieved from database.
    
    modx_site_content, 5000 Rows, 4.3 MB
    siteCache.idx.php 1.1 MB
    PHP memory allocated: 10.2 MB
    MySQL: 0.3139 s, 12 request(s), PHP: 0.7982 s, total: 1.1121 s, document retrieved from database.
    
    modx_site_content, 10000 Rows, 8.7 MB
    siteCache.idx.php 2.1 MB
    PHP memory allocated: 19.1 MB
    MySQL: 0.6684 s, 12 request(s), PHP: 1.9220 s, total: 2.5904 s, document retrieved from database.
    
    modx_site_content, 15000 Rows, 13.2 MB
    siteCache.idx.php 2.9 MB
    PHP memory allocated: 27.8 MB
    MySQL: 0.9567 s, 12 request(s), PHP: 2.4612 s, total: 3.4180 s, document retrieved from database.
    
    modx_site_content, 20000 Rows, 17.6 MB
    siteCache.idx.php 3.9 MB
    PHP memory allocated: 38.9 MB
    MySQL: 1.5259 s, 12 request(s), PHP: 2.8582 s, total: 4.3841 s, document retrieved from 
    database.



    Regards,
    Tom
      • 32963
      • 1,732 Posts
      Many thanks for sharing your findings with us.

      We are currently aware of some of the limitations when it comes on to very large sites. Work is currenly being done to make the system perform better on such websites.

      Some users currently have the system running with over 2000 pages and 100+ users. The trick IMO is to tune your modx site by using document caching among other things mentioned in the forums.

        xWisdom
        www.xwisdomhtml.com
        The fear of the Lord is the beginning of wisdom:
        MODx Co-Founder - Create and do more with less.
        • 30343
        • 8 Posts
        Hi xwisdom,

        The trick IMO is to tune your modx site by using document caching among other things mentioned in the forums.

        Document caching (of MODx) won’t help too much if you look at the figures. Just the time PHP takes is already too much.

        Regards,
        Tom
          • 32241
          • 1,495 Posts
          Quote from: tomtom at Jun 28, 2006, 08:18 AM

          Document caching (of MODx) won’t help too much if you look at the figures. Just the time PHP takes is already too much.

          I’m totally interested in discussing this further with you.
          Can you explain a little bit more on how you get all those number? I’m assuming that when the first time you load the page, it will required a lot more workload, compare to the second time you load the page, if you use page caching system on MODx.

          I’ll help you understand the big picture on how MODx work, and if you can justify the number again for us, it will be great.

          Basically MODx parser will always load all the cache file that’ve pointed out in your benchmark result, which is why I can understand that the amount of memory the php requires is increasing rapidly but following the ammount of the pages that you’ve loaded into the database. The reason is because it needs to loads all the pages id, alias, and etc into the cache, which is if you have 20,000 pages, it will have more than 40,000 line of code inside the cache. We might need to start thingking a better way to cache this document information without sacrificing the system performance on every new page request.

          The second phase, MODx will need a way to determine the output for the current page request. During this time, MODx will use the above cached array data for document to determine which document id need to be loaded to the front page. I believe this will be one of those performce penalty on MODx, considering the amount of array data that needs to be processed over and over again.

          The third phase, the system will check the cache directory, if it does find a cache file for that specific file, it will load the page from the cache file, but if it doesn’t, it will load it from database, and parse the whole page again. Basically the bottle neck that I can see from this approach is the amount of cache files that keep accumulating inside the directory, and if you use linux, Linux known for its problem with having a bunch of files in 1 directory. So from your benchmark, we will have approximately 20,000 files. This will be a pain in the neck when cleaning the site cache, because the system will have to remove all those 20,000 files at once. I’m not sure about the php function to read the files, but I hope it won’t be too much of a problem.

          The last phase will be parsing the cache data or the parsed data from previous to be parsed again, basically this is to parse the uncache snippet that is not being cached yet. I believe this won’t be a problem.

          So my conclusion, there is no wonder that the php processing time took so much time, which increase quite drastically following the amount of pages that the site have. The amount of memory needed is also reasonable. The only thing that is quite unexpected is the mysql processing time. I have no idea why it increase quite drastically compare to php processing time and memory allocation needed.

          Could you justify this for me, when you load the page, did you benchmarked it base on uncache page or cache page? It suppose to make a different in mysql processing time. I believe if you have a cached page, there is only 3-5 mysql request, but I’m not 100% sure.

          PS: I might be wrong, but I open to any suggestion, so we can improve the current core code. Do you have any experience in optimizing code tomtom?
            Wendy Novianto
            [font=Verdana]PT DJAMOER Technology Media
            [font=Verdana]Xituz Media
            • 30343
            • 8 Posts
            Hi Djamoer,

            Can you explain a little bit more on how you get all those number?

            Ok that’s what I did:
            1. created a few “root” pages and some child pages. (about 20 at all).
            2. disabled caching by default ( $modx->documentObject[’cacheable’] = 0; )
            3. did a “Refresh Site” in administration
            4. inserted “echo memory_get_usage();” at the end of the siteCache.idx.php file as I expected one of the reasons for high memory consumption at this file ...
            5. loaded one parent page of the site in browser ( http://modx.tld/test.html )
            6. noted down the results

            After that populated the modx_site_content table with some dummy pages all belonging to the same parent page.
            Repeated step 3 – 6

            And so on.....


            The reason is because it needs to loads all the pages id, alias, and etc into the cache, which is if you have 20,000 pages, it will have more than 40,000 line of code inside the cache. We might need to start thingking a better way to cache this document information without sacrificing the system performance on every new page request.

            IMO that’s the most important thing to improve, if we want to use MODx for larger sites.


            The only thing that is quite unexpected is the mysql processing time.

            I found this query from function “getActiveChildren” being sole responsible for the bad mysql performance:

            $sql = "SELECT DISTINCT $fields FROM $tblsc sc
                  LEFT JOIN $tbldg dg on dg.document = sc.id
                  WHERE sc.parent = '$id' AND sc.published=1 AND sc.deleted=0
                  AND ($access)
                  ORDER BY $sort $dir;";
            


            Btw this query is also executed when retrieving a cached document!


            Do you have any experience in optimizing code tomtom?

            Not in particular, though I have some years of experience with PHP and MySQL.
            But I am really interested in MODx because I love the concept in general.
            It is the best CMS written in PHP I’ve seen so far.

            So I would really like to help out a little bit if I can...
            Thanks for all this information so far.

            Regards,
            Tom

              • 30343
              • 8 Posts
              A first concrete improvement suggestion:

              The "DropMenu" snippet is calling modx->getActiveChildren. There it would be an useful improvement to make it possible to query just for children not hidden in menu. At the moment the DropMenu snippet filters this afterwards by php.
              Filtering this by MySQL can be a great speed improvement thinking of parent elements having a lot of hidden items, e.g. probably quite common for parent elements containing news.

                • 32241
                • 1,495 Posts
                Thanks for the input tomtom. This is my comment on your reply.

                The size of the memory being use is way above on what it suppose to do, that’s why we need a better way to improve the caching of the site id array, but me personally, I would suggest the use of some other third party caching system to cache memory on the server itself, considering when you have 20000 pages, it usually involve a large corporate or community website, which usually have their own dedicated server and etc. As for the DropMenu, I would definetely assume that you have your site with basic installation and default site sample, which have a dropmenu snippet to list all the documents on the system. From my opinion, that’s one of the reason we have caching mechanism. If you call [[DropMenu]] without making it uncache, the result will be cached on the server, and it will only required the first user to load the page and access the database, then the system will cache the result. With that, I believe the mysql processing time will be below what you already benchmarked for the rest of the user. Another solution will be to cache each document with their hidden menu status, and let DropMenu snippet read the data from here. But it basically will add more memory to the cache system (maybe sql solution will be great).

                I believe we need to tackle the cache memory usage. The rest of it can be solved with best practices by using a default cache system. So my conclusion, the current MODx system does scale quite well for a large website, except for the fact that we can improve it by 30-60%, if we do optimization on the current core code. Tomtom, do you have time and willingness in contributing to this? Sounds to me that you know quite well on how to trace back MODx logic, even though you’re a newbie in MODx core code. If you have a good idea that you can implement to the current core code, feel free to post in this forum, and I believe Xwisdom or OpenGeek would be more than happy to review it.

                Sincerely,
                  Wendy Novianto
                  [font=Verdana]PT DJAMOER Technology Media
                  [font=Verdana]Xituz Media
                  • 34162
                  • 1 Posts
                  I’m not an expert at all, but I also guess that DropMenu is a very "hungry" snippet.
                  (Maybe its also responsible for executing the "activeChildren" call for cached pages?)
                  I would very much like to see tomtom’s results without DropMenu being involved!
                  • FYI, DropMenu will be re-authored for the next release to be much less resource hungry. wink
                      Ryan Thrash, MODX Co-Founder
                      Follow me on Twitter at @rthrash or catch my occasional unofficial thoughts at thrash.me
                      • 32241
                      • 1,495 Posts
                      Quote from: ppaul at Jun 28, 2006, 04:36 PM

                      I’m not an expert at all, but I also guess that DropMenu is a very "hungry" snippet.
                      (Maybe its also responsible for executing the "activeChildren" call for cached pages?)
                      I would very much like to see tomtom’s results without DropMenu being involved!

                      I’m also interested in seeing the result without DropMenu snippet, and maybe a test with caching and without caching, so we can compare it side by side, how beneficial the caching system that we have.

                      As for DropMenu, I believe we do really need to search for it from calling back to database, even though the page had been cached. Maybe we can optimize it with MySql call. I’m no sql expert, but someone might be able to come up with a good solution. SOMEONE? wink Jason? Raymond? Ryan? wink
                        Wendy Novianto
                        [font=Verdana]PT DJAMOER Technology Media
                        [font=Verdana]Xituz Media