We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 24719
    • 194 Posts
    Hi,

    The $xpdo->getCollection method returns an entire collection of objects. That’s fine if there aren’t too many objects to be retrieved, but if there are a lot php can run out of memory. The solutions I’ve found are:

    1) increase the php memory limit - not a good approach since it could potentially compromise the stability of the server,
    2) break the data into chunks by executing a query to return the ids i need, then loop through passing them to xpdo->getObject to retrieve the objects i want - not great because we’re essentially duplicating queries to the db, adding extra load to it.

    So, i was wondering if instead of doing:

    $people = $xpdo->getCollection('Person');
    foreach ($people as $person)
      $person->eat();
    


    it might be better to implement some kind of memory management that would be accessed like:

    $people = $xpdo->getBufferedCollection('Person');
    while ($person = $people->getNextObject())
      $person->eat();
    


    The getNextObject method would refill a buffer every time it runs out, meaning that we don’t have to duplicate calls to the db, and we wouldn’t run into the memory issues of a pure getCollection call for large datasets.

    Does this sound like a good idea? Is it in the works? Is it already implemented somehow in xpdo and I’ve missed it? Is there a better alternative?
    • What you want is lazy loading, which is in the latest versions. You simply get the collection with a query which selects only the id field and anytime you try and access a field (or set of fields) in the object which is not loaded from the db, a new query is issued.
        • 24719
        • 194 Posts
        Quote from: OpenGeek at Jul 10, 2008, 03:34 PM

        What you want is lazy loading, which is in the latest versions. You simply get the collection with a query which selects only the id field and anytime you try and access a field (or set of fields) in the object which is not loaded from the db, a new query is issued.

        have you got some sample code for this please?
          • 24719
          • 194 Posts
          ok i’ve got it. the following works. cheers.

          $criteria = $this->xpdo->newQuery('Person');
          $criteria->select('id');
          
          if ($people = $this->xpdo->getCollection('Person', $criteria)) 
          {
                      foreach ($people as $person)
                          $person->eat();
          }
          
            • 24719
            • 194 Posts
            ...but looking at a dump of the $people array, it still looks too big.

            i had a problem retrieving around 900 objects a while ago, so i used approach 2 that i mentioned above. The way you suggest would still mean we’d need too much RAM allocated to apache, even if the object data wasn’t completely populated.

            perhaps there is still scope for a more controlled way of getting collections?
            • Isn’t that what limits are for in the queries? Your approach below is not possible; how am I going to get the full result set if it’s not loaded into memory?
                • 28215
                • 4,149 Posts
                Redman,

                At this point in application design, if you’re having to pull 1000+ records at one time, it might be a good idea to check the necessity of pulling such large numbers of data. What’s the purpose of such a large set? If it’s to grab a long list and output it, is there a way you can paginate the UI to prevent such large data pulls?
                  shaun mccormick | bigcommerce mgr of software engineering, former modx co-architect | github | splittingred.com
                  • 24719
                  • 194 Posts
                  I need to do this for several reasons.

                  Imagine I’ve created a Hotel object (id, name, description, url, docid), a Tag object (id, tag), and a HotelTag object (hotel, tag) that relates Hotels to Tags in a One to Many relationship.

                  On the site, I’d have a standard chunk HotelTpl that formats the Hotel object for display as a modx document.

                  So the reasons I have for wanting to lazily work through a collection of data are:

                  1) To return the 900+ tags in the backend as checkboxes so we can tag Hotels with existing tags - initially I was using getCollection, but kept running out of memory (obviously). So now I use a plain SQL query to return all Tags and all HotelTags, and output all the Tags, checking ones where the id is in the HotelTag array for the given Hotel id. Using getCollection in this case was probably lazy, and doing things this way isn’t too much of a drama since I don’t really need the OO framework for this data when doing this. It’s not particularly good design to return all the tags when editing a document, but it’s quick and dirty and is how i’ve done it and does have certain benefits.

                  2) A time when I would like to use getCollection is if i alter HotelTpl and want to regenerate all the 1000+ Hotel modx documents. The whole dataset doesn’t need to be in memory at one time since I’m not performing operations between objects, I just want to operate on each object individually. So getBufferedCollection could return an object that maintains a buffer - once it pops the next object off the buffer (with getNextObject) it would check whether the total number is below a certain threshold. If it is, it could then query for the next 20 objects for instance, and be in a position to keep doing this until there is no more data in the object’s datastore. Otherwise the workaround is to retrieve all Hotel ids, then iterate through calling $xpdo->getObject(’Hotel’, $id) which would then query the db again. What i’m suggesting could optimise the number of db queries quite significantly, and allows large datasets to be operated on without consuming huge amounts of server resources.

                  Arguably scenario 2 isn’t likely to occur too frequently if only admins can edit a hotel’s description, but what i’m suggesting could have other applications, maybe in the core code.

                  I can see that in 0.9.7 $modx->getDocumentChildren uses getCollection. How will this scale for enterprise-level sites where certain documents could have hundreds of children?