We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 14197
    • 35 Posts
    Googled and found something interesting: http://drupal.org/node/245255

    Anyone is familiar with Drupal? Is Drupal taxonomy system something similar in terms of database design? they have 1/4 millions of rows in drupal.org node table so it’s pretty huge...

    Also for the moment for my pitch site if I use ditto to format article output I have 250+ page numbers displayed in the page for one of article section and it loads really really slow, but if I just turn off all those ditto prev, next page display etc, the page loads actaully ok. So I guess one possible solution is to write a module for large number of contents.

      • 33372
      • 1,611 Posts
      The problem with sites with very large numbers of documents in not really related to the number of rows that you can effectively store in a MySQL table. There are other bottlenecks in the process that will come into play long before you reach most MySQL limits. And MySQL is usually the database backend for either MODx or Drupal (and many other systems), so there probably wouldn’t be any major difference in table optimization or anything else that would be significant factors to consider (they’re both relatively efficient, given the number of features that they include).

      In MODx the first bottleneck that you hit for very large sites is the cache files. There are ways to reduce the size of these, but sites with many thousands of pages will necessarily have very large site cache files. As I understand it, these files are loaded and parsed on every MODx request, so this can take a whole lot of processing power (so if you’re on your own fast dedicated server, you might be fine, but if you’re on shared hosting you will be constantly overloading the CPU).

      This is a built-in limitation to MODx, and it will not be resolved until the 0.9.7 version is released with a totally different caching system. I don’t know how Drupal handles caching, so I can’t comment on that. But my understanding is that if you’re talking about a site that needs to handle 100,000+ pages, no standard full-featured CMS system is going to be able to deal with that efficiently and most people in your situation end up writing their own bare-bones code that is optimized for their needs.

      The MODx caching system should be very helpful for things like Ditto pages, however. In addition to the main site cache file individual pages (or parts of them) can be cached, which saves a lot of database calls and PHP parsing for any content that’s static. This system is a speed booster and resource saver for most sites, but for huge sites like yours it becomes a virtual throttle.

      So my general advice would be to radically reconsider the tools that you’re using for this project. I don’t know if you can reasonably expect a complex script like Ditto to index hundreds of thousands of pages dynamically and generate output on the fly for multiple viewers at the same time and with no long delays (and the same is true for AjaxSearch, etc.). I would be looking more at systems that are optimized on the database end for very large sites (with prefetch and search index tables, etc.), and I don’t know that you’re going to find such a system that works out-of-the-box and provides you with all of the features of a CMS system (although for your sake I hope that I’m wrong).

      You may want to consider using the upcoming MODx 0.9.7 release to build your site (since it may be more capable of dealing with what you’d throw at it), but keep in mind that it is still under development at the moment.
        "Things are not what they appear to be; nor are they otherwise." - Buddha

        "Well, gee, Buddha - that wasn't very helpful..." - ZAP

        Useful MODx links: documentation | wiki | forum guidelines | bugs & requests | info you should include with your post | commercial support options
        • 28436
        • 242 Posts
        Hello Guys,

        Quote from: andytwiz at Oct 07, 2007, 01:05 AM

        I’ve hacked my modx core to not cache document map, aliases, document listing and content types in siteCache.idx currently stored in:
        $a = &$this->aliasListing;
        $d = &$this->documentListing;
        $m = &$this->documentMap;
        $c = &$this->contentTypes;
        


        How he do that? Has someone a hint for me how to teak the caching, that modx does not use it?

        Or is there another idea to do the following.

        it should be a archive-system.

        - newspaper 1
            -1880 (year)
                - 01 (issue)
                    - page 1
                    - page 2
                    - page 4
                    - page 5
                - 02
                - ..
                - 52
            - 1881
            - ...
            - 1940
         
        - newspaper 2
        - newspaper 3
        


        currently i have imported the first newspaper (20.000 docs/no TVs) and modx is crashed.

        ok, i havent take care about the max possible docs earlier.... and now the sweat is running down from my forehead... oh yes.. one of those days...

        thanks a lot in advance for any suggestion.

        bye, Stefan
          • 28436
          • 242 Posts
          Servus Ganesh,

          Simply don’t use caching

          unfortunately thats not the secret... of course, at first i done this. there is no cached document but the cache file is over 4MB. And cleaning it was a horror... running out of memory first, after setting up more ram for the php-process the browser provide me the file as a download...

          i guess its a classical overload. Modx or the caching-system is not designed for so a huge amount of documents. i have learned that today.

          As other ppl have mentioned, you’ll have to get creative with any kind of system, if you’re dealing with lots of data/traffic.
          thanks for the hint smiley i will try to kick my ass till the hell is frozen... i already began to build up a dedicated script to handle the archive-files.

          tschüß, Stefan
            • 6726
            • 7,075 Posts
            It’s never been meant to handle that much documents, but like you sometimes I try anyway tongue

            On top of what Jason said about the new caching mechanism for 0.9.7, it might be of interrest to you to check out the how to enable caching with memcached? thread as it is an example of what the new core will allow smiley

              .: COO - Commerce Guys - Community Driven Innovation :.


              MODx est l'outil id
              • 28436
              • 242 Posts
              Hey David!

              thanks a lot for the links, cool! but... damnit, at the moment i need all my hands to wipe out the tears from my eyes smiley

              have a nice weekend man(s)!

              adios, Stefan
                • 11975
                • 2,542 Posts
                Hi,

                for testing I have modified the files used by the cache system in the 096.
                I could post them with instructions if you wish to test.


                :-)

                  Made with MODx : [url=http://www.copadel.com]copadel, fruits et l
                  • 33372
                  • 1,611 Posts
                  @Stefan: Have you looked into using the new 0.9.7 version?
                    "Things are not what they appear to be; nor are they otherwise." - Buddha

                    "Well, gee, Buddha - that wasn't very helpful..." - ZAP

                    Useful MODx links: documentation | wiki | forum guidelines | bugs & requests | info you should include with your post | commercial support options
                    • 34127
                    • 135 Posts
                    Quote from: ZAP at Feb 22, 2008, 12:21 AM

                    Wow indeed. Nearly 4,000 queries seems incredibly high to me. And 11 seconds is an outrageously long time.

                    Can you leave your home page cached? The cached results are perfectly acceptable, but the uncached ones are horrific. The individual page cache files are not really a problem in terms of CPU usage or response time afaik (in fact they can speed things up quite a bit, as you can see).

                    A 900k site cache is definitely too big in my opinion. Even with lots of snippets and plugins mine are usually less than 500k. Ideally I think you want to try to get it down to say 300k or less.
                    This kind of scared me... the current site I’m working on (well, getting ready to start working on tongue ) is basically an article site with a few hundred pages ranging in size from a few kb to 35kb or so, and they probably add up to a dozen mb at least... I haven’t started importing the content yet, but am I going to have issues with the cache file when I do this? I’m on a dedicated opteron 246 with 2GB of memory, and the last thing I need is to end up with a crashed site after importing the content. embarrassed
                      • 33372
                      • 1,611 Posts
                      Quote from: Ricjustsaid at Apr 25, 2008, 03:59 AM

                      This kind of scared me... the current site I’m working on (well, getting ready to start working on tongue ) is basically an article site with a few hundred pages ranging in size from a few kb to 35kb or so, and they probably add up to a dozen mb at least... I haven’t started importing the content yet, but am I going to have issues with the cache file when I do this? I’m on a dedicated opteron 246 with 2GB of memory, and the last thing I need is to end up with a crashed site after importing the content. embarrassed
                      No need to be scared. The example that I was responding to in that message isn’t really relevant to what it sounds like you want to do. MODx is perfectly happy with sites of hundreds of pages (I’ve had some over 3,500 pages with no noticeable decrease in performance).

                      MODx has a site cache file that includes a document index and whatever resources you have installed. And in addition it can create individual cache files for each page that you set to be cached. The latter files are only parsed when you load those specific pages, so they don’t cumulatively slow down your site (in fact, they should speed it up).

                      There’s no reason to think that you’d have any trouble creating the site you describe in MODx.
                        "Things are not what they appear to be; nor are they otherwise." - Buddha

                        "Well, gee, Buddha - that wasn't very helpful..." - ZAP

                        Useful MODx links: documentation | wiki | forum guidelines | bugs & requests | info you should include with your post | commercial support options