We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 1649
    • 13 Posts
    Does MODx 0.9.6 still have the 5000 document limit?

    I need to have about 250,000 documents - if so how can I modify the core to get around this limit? Where should I start looking?

    Thanks
    • Quote from: andytwiz at Sep 29, 2007, 03:14 PM

      Does MODx 0.9.6 still have the 5000 document limit?

      I need to have about 250,000 documents - if so how can I modify the core to get around this limit? Where should I start looking?

      Andy, I think your going to need to purchase/develop something custom to support 250,000 documents with all the features of MODx. It simply is not designed to handle that volume of document meta-data, and likely will not ever be the right choice for doing so.

      That said, I’m curious what you are doing with 250,000 documents. Are these really individual documents, or could the majority of these documents be stored in a custom database table and presented through a dynamic script (i.e. via a single MODx document, with additional scripts for searching and managing the items)? Again, without knowing details as to what these 250,000 individual documents represent, it’s hard to say what the best solution is for your challenge...
        • 33372
        • 1,611 Posts
        I was going to ask the same questions. But I was also wondering if turning off caching entirely might be a good idea for a site with an excessive number of documents. Around what size does the current cache become self-defeating?
          "Things are not what they appear to be; nor are they otherwise." - Buddha

          "Well, gee, Buddha - that wasn't very helpful..." - ZAP

          Useful MODx links: documentation | wiki | forum guidelines | bugs & requests | info you should include with your post | commercial support options
          • 1649
          • 13 Posts
          Quote from: OpenGeek at Sep 29, 2007, 03:40 PM

          Quote from: andytwiz at Sep 29, 2007, 03:14 PM

          Does MODx 0.9.6 still have the 5000 document limit?

          I need to have about 250,000 documents - if so how can I modify the core to get around this limit? Where should I start looking?

          Andy, I think your going to need to purchase/develop something custom to support 250,000 documents with all the features of MODx. It simply is not designed to handle that volume of document meta-data, and likely will not ever be the right choice for doing so.

          That said, I’m curious what you are doing with 250,000 documents. Are these really individual documents, or could the majority of these documents be stored in a custom database table and presented through a dynamic script (i.e. via a single MODx document, with additional scripts for searching and managing the items)? Again, without knowing details as to what these 250,000 individual documents represent, it’s hard to say what the best solution is for your challenge...

          Thanks for your reply.

          Yes I’m using FeedX to import an XML feed then DocManager to create new documents for each element of the feed. Ideally I’d like each feed element (about 250,000 of them) to be individual documents so they can be searched/cached e.t.c.....

          I could use a dynamic script I guess - are there any snippets/plugins that make this easy for me to get started?

          If I was to use documents can you suggest where I need to start looking to modify the core? (I’m familiar with PHP but not Modx yet).

          While I’m on the topic I’d also like to generate a XML Google Base (Froogle) feed from these documents. I couldn’t find a snippet/plugin for this so I assume using Ditto in a similar way to http://webbake.com/tutorials/modx-cms/google-sitemap-with-ditto would be the best way (sorry for slightly off topic question - only installed Modx last weekend for our website rewrite).
          • @andytwiz:
            First, why are you importing content that already exists and can be presented via XML feeds? Isn’t that a) duplicating content and b) against copyright of the original content owners/publishers? And then republish as XML??? Why not cache it as XML data locally and simply present it through your templates?

            With that out of the way, my answer was that the core, using documents for each article, is not going to be possible. Period. No amount of core hacking is going to fix that IMHO. You would need to develop or find another tool altogether for that.

            Ditto would be useless with 250,000 documents, as would the core.


            @ZAP:
            You can not turn off this part of MODx caching; period. This is not partial page caching we are talking about. MODx would not work without the siteCache.idx.php and in the current architecture, all 250,000 documents would need several lines each in this file, plus all of the PHP source code in the site definition (snippets, plugins, modules) would also be in there. You would have a file so large that it would likely never execute on any PHP installation, and certainly not with any reasonable performance.
              • 1649
              • 13 Posts
              Quote from: OpenGeek at Sep 30, 2007, 05:06 PM

              @andytwiz:
              First, why are you importing content that already exists and can be presented via XML feeds? Isn’t that a) duplicating content and b) against copyright of the original content owners/publishers? And then republish as XML??? Why not cache it as XML data locally and simply present it through your templates?

              With that out of the way, my answer was that the core, using documents for each article, is not going to be possible. Period. No amount of core hacking is going to fix that IMHO. You would need to develop or find another tool altogether for that.

              Ditto would be useless with 250,000 documents, as would the core.


              The XML is provided by third parties with the intention of being published on other websites - it is not a voilation of copyright.

              I could cache it locally. The XML file is about 100MB big containing the 250,000 items. These contents of these items needs to be modified before displaying to the user and the same modified information needs to be output in the Google Base XML feed. This is why using Documents for each item would be ideal as I could parse the XML, modify the items as needed, save in a Document giving me searching/caching e.t.c and then use Ditto to output the Google Base feed.

              If you could point me in the direction of how I could cache the XML data locally and display it as "virtual documents" that would be much appreciated.

              Thanks for your advice.
                • 33372
                • 1,611 Posts
                100MB is a huge file, no matter how you handle it. I think that I’d split up the XML and store it in a custom MySQL table (properly indexed) and use a snippet to display the data as desired. I would think that would work with the current MODx core, and depending upon how well you optimize your table I imagine it will behave well enough. As long as you’re not outputting huge files I would think you could cache them, but you may or may not want to do that anyway. As I understand it, this should keep the MODx sitecache file from containing your XML data.

                What you lose is the ability to search through this data using a standard MODx snippet, since those aren’t designed to index dynamic content or data in non-standard tables. But writing your own search snippet shouldn’t be all that tough, given that your data is standardized and MySQL will pretty much do the work for you. And of course you don’t get any of the other features of using the MODx system, so for example you may need to create a separate module to import or edit your XML data (since you won’t be able to do this via the Manager).

                You’d still benefit from MODx’s templating system, API, etc., so I guess I’d give it a shot in MODx and see what happens.
                  "Things are not what they appear to be; nor are they otherwise." - Buddha

                  "Well, gee, Buddha - that wasn't very helpful..." - ZAP

                  Useful MODx links: documentation | wiki | forum guidelines | bugs & requests | info you should include with your post | commercial support options
                • A quarter of a million files into MODx! If you don’t mind me asking, what’re you trying to make exactly, Amazon?! tongue
                    MODX Ambassador for Thailand. Managing Director at Monogon, a web design and development studio based in Bangkok, Thailand. - Follow me on Twitter.
                  • I’ve dealt with ~4Mb XML files; the client refreshed the file every 15 minutes and I had a snippet that parsed the XML file and returned the desired data on demand. I was surprised at how snappy the SimpleXML +xpath is at parsing such a large file! It requires PHP 5, but there are some nice-looking libraries for PHP 4 that worked almost as well. In my opinion, if you’re going to be working with XML files, it’s worth it to make an upgrade or even change service providers to get PHP 5 and SimpleXML.

                    One of the things I really like about the MODx forums! I do a lot of searching to research answers to posts, and find all sorts of neat stuff! I just now found this resource, and it looks like a really good one.

                    http://hudzilla.org/phpwiki/index.php?title=Main_Page
                      Studying MODX in the desert - http://sottwell.com
                      Tips and Tricks from the MODX Forums and Slack Channels - http://modxcookbook.com
                      Join the Slack Community - http://modx.org
                      • 1649
                      • 13 Posts
                      Hi,

                      I’ve hacked my modx core to not cache document map, aliases, document listing and content types in siteCache.idx currently stored in:

                      $a = &$this->aliasListing;
                      $d = &$this->documentListing;
                      $m = &$this->documentMap;
                      $c = &$this->contentTypes;
                      


                      My core populates these arrays for the requested document fetching its document map, aliases, document listing and content types from the database and also fetches the same data for its parent and all its children.

                      I haven’t yet come across a need to get any more document data than the document’s parent and children.

                      This massively speeds up loading with 400,000 documents!

                      I’m afraid my hacked code is a real mess and I can’t remember what I’ve changed so I’m afraid I can’t post it here but may I suggest something similar is done to modx in the future release to make it more scaleable?

                      Thanks

                      Quote from: ZAP at Sep 30, 2007, 06:59 PM

                      100MB is a huge file, no matter how you handle it. I think that I’d split up the XML and store it in a custom MySQL table (properly indexed) and use a snippet to display the data as desired. I would think that would work with the current MODx core, and depending upon how well you optimize your table I imagine it will behave well enough. As long as you’re not outputting huge files I would think you could cache them, but you may or may not want to do that anyway. As I understand it, this should keep the MODx sitecache file from containing your XML data.

                      What you lose is the ability to search through this data using a standard MODx snippet, since those aren’t designed to index dynamic content or data in non-standard tables. But writing your own search snippet shouldn’t be all that tough, given that your data is standardized and MySQL will pretty much do the work for you. And of course you don’t get any of the other features of using the MODx system, so for example you may need to create a separate module to import or edit your XML data (since you won’t be able to do this via the Manager).

                      You’d still benefit from MODx’s templating system, API, etc., so I guess I’d give it a shot in MODx and see what happens.