We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 51020
    • 670 Posts
    Hi

    Further to my previous post (https://forums.modx.com/thread/104927/rss-feed-on-page#dis-post-564194) I'm trying to find a way to import multiple RSS feeds, and then filter stories that only include certain keywords or phrases, and then display them in date order.

    Does anyone know if this is a complicated thing to do?

    There are a few services such as Zapier which can sort of do what I want, but I would prefer if it was all controlled within modx.

    Thanks
    Andy
      • 51020
      • 670 Posts
      UPDATE: I found a PHP snippet, which can pull in multiple RSS feeds, and display them in date order - so I'm halfway there - but now I just need to filter them by keywords - e.g. if Modx AND CMS is in the title, then display story. Would also be great to be able to omit stories with certain words too - e.g. don't show if it contains 'WORDPRESS'.

      This is the PHP code I have:

      <?php
      
      $rss = new DOMDocument();
      $feed = array();
      $urlarray = array(
        array( 'name' => 'Fleet World', 'url' => 'https://fleetworld.co.uk/feed/' ),
        array( 'name' => 'Standard/Transport',            'url' => 'http://www.standard.co.uk/news/transport/rss' ),
        array( 'name' => 'londonist',      'url' => 'http://londonist.com/category/news/feed' ),
        array( 'name' => 'Daily mail',          'url' => 'http://www.dailymail.co.uk/travel/index.rss' ),
      );
      
      foreach ( $urlarray as $url ) {
        $rss->load( $url['url'] );
      
        foreach ( $rss->getElementsByTagName( 'item' ) as $node ) {
        $item = array(
          'site'  => $url['name'],
          'title' => $node->getElementsByTagName( 'title' )->item( 0 )->nodeValue,
          'desc'  => $node->getElementsByTagName( 'description' )->item( 0 )->nodeValue,
          'link'  => $node->getElementsByTagName( 'link' )->item( 0 )->nodeValue,
          'date'  => $node->getElementsByTagName( 'pubDate' )->item( 0 )->nodeValue,
        );
      
        array_push( $feed, $item );
        }
      }
      
      usort( $feed, function( $a, $b ) {
        return strtotime( $b['date'] ) - strtotime( $a['date'] );
      });
      
      $limit = 5;
      echo '<ul>';
      for ( $x = 0; $x < $limit; $x++ ) {
          $site = $feed[ $x ]['site'];
          $title = str_replace( ' & ', ' & ', $feed[ $x ]['title'] );
          $link = $feed[ $x ]['link'];
          $description = $feed[ $x ]['desc'];
          $date = date( 'l F d, Y', strtotime( $feed[ $x ]['date'] ) );
      
          echo '<li style="padding:20px 0; border-bottom:1px solid #ccc;">';
          echo '<strong>'.$site.':<br><a href="'.$link.'" title="'.$title.'" target="_blank">'.$title.'</a></strong><br>'.$description.'<br>('.$date.')';
          echo '</li>';
      }
      echo '</ul>';
      


      Any thoughts on how to add filtering into this?

      Thanks
      Andy
        • 3749
        • 24,544 Posts
        How about this:

        Replace this line:

        array_push( $feed, $item );


        with this:
          $content = $item['title'] . $item['description']   
          if ( 
             (stripos($content, 'MODX') !== false) && 
             (stripos($content, 'CMS') !== false) &&
             (stripos($content, 'WordPress') === false)
          ) {
             array_push( $feed, $item );
          }
        

          Did I help you? Buy me a beer
          Get my Book: MODX:The Official Guide
          MODX info for everyone: http://bobsguides.com/modx.html
          My MODX Extras
          Bob's Guides is now hosted at A2 MODX Hosting
          • 51020
          • 670 Posts
          Quote from: BobRay at Feb 26, 2019, 06:26 PM
          How about this:

          Replace this line:

          array_push( $feed, $item );


          with this:
            $content = $item['title'] . $item['description']   
            if ( 
               (stripos($content, 'MODX') !== false) && 
               (stripos($content, 'CMS') !== false) &&
               (stripos($content, 'WordPress') === false)
            ) {
               array_push( $feed, $item );
            }
          

          Thnk you so much for this Bob - but I'm getting a syntax error on the line:

            if ( 
          


          See full code below:

          
          <?php
          
          $rss = new DOMDocument();
          $feed = array();
          $urlarray = array(
            array( 'name' => 'Fleet World', 'url' => 'https://fleetworld.co.uk/feed/' ),
            array( 'name' => 'Standard/Transport',            'url' => 'http://www.standard.co.uk/news/transport/rss' ),
            array( 'name' => 'londonist',      'url' => 'http://londonist.com/category/news/feed' ),
            array( 'name' => 'Daily mail',          'url' => 'http://www.dailymail.co.uk/travel/index.rss' ),
          );
          
          foreach ( $urlarray as $url ) {
            $rss->load( $url['url'] );
          
            foreach ( $rss->getElementsByTagName( 'item' ) as $node ) {
            $item = array(
              'site'  => $url['name'],
              'title' => $node->getElementsByTagName( 'title' )->item( 0 )->nodeValue,
              'desc'  => $node->getElementsByTagName( 'description' )->item( 0 )->nodeValue,
              'link'  => $node->getElementsByTagName( 'link' )->item( 0 )->nodeValue,
              'date'  => $node->getElementsByTagName( 'pubDate' )->item( 0 )->nodeValue,
            );
          
            $content = $item['title'] . $item['description']   
              if ( 
                 (stripos($content, 'MODX') !== false) && 
                 (stripos($content, 'CMS') !== false) &&
                 (stripos($content, 'WordPress') === false)
              ) {
                 array_push( $feed, $item );
              }
            }
          }
          
          usort( $feed, function( $a, $b ) {
            return strtotime( $b['date'] ) - strtotime( $a['date'] );
          });
          
          
          $limit = 5;
          echo '<ul>';
          for ( $x = 0; $x < $limit; $x++ ) {
              $site = $feed[ $x ]['site'];
              $title = str_replace( ' & ', ' & ', $feed[ $x ]['title'] );
              $link = $feed[ $x ]['link'];
              $description = $feed[ $x ]['desc'];
              $date = date( 'l dS F Y, g.ia', strtotime( $feed[ $x ]['date'] ) );
          
              echo '<li style="padding:20px 0; font-size:16px; line-height:140%; border-bottom:1px solid #ccc;">';
              echo '<strong>'.$site.':<br><a href="'.$link.'" title="'.$title.'" target="_blank">'.$title.'</a></strong><br>'.$description.'<br><p style="padding-top:5px;font-style:italic; font-size:12px;">'.$date.'</p>';
              echo '</li>';
          }
          echo '</ul>';
          
          
          [ed. note: tm2000 last edited this post 5 years, 2 months ago.]
            • 51020
            • 670 Posts
            Aha - it was a missing a semi-colon after the first line I think:
            $content = $item['title'] . $item['description'] 
            


            I changed it to:

            $content = $item['title'] . $item['description'];
            


            ...and it does sort of work.
            But, it seems to need ALL keywords to match for it to show up any results.
            I need it to show stories that have keyword1, OR keyword2, OR keyword3.

            Also, it always shows 5 entries regardless of the number of results, so if none of the criteria matches, each entry has a blank title, blank description and includes a date of 1st Jan 1970.

            Almost there!
            I'll keep playing...
              • 3749
              • 24,544 Posts
              Sorry about the typo. Also, it should be:

              if ( 
                 ((stripos($content, 'MODX') !== false) && 
                 (stripos($content, 'CMS') !== false)) ||
                 (stripos($content, 'WordPress') === false)
              )


              That would require both (MODX and CMS) and filter out WordPress, which is how I read your request.

              You must have your RSS feed code set to limit things to 5 items. If you're using getResources anywhere in the process, the default &limit is 5.
                Did I help you? Buy me a beer
                Get my Book: MODX:The Official Guide
                MODX info for everyone: http://bobsguides.com/modx.html
                My MODX Extras
                Bob's Guides is now hosted at A2 MODX Hosting
                • 51020
                • 670 Posts
                Quote from: BobRay at Feb 27, 2019, 08:22 AM
                Sorry about the typo. Also, it should be:

                if ( 
                   ((stripos($content, 'MODX') !== false) && 
                   (stripos($content, 'CMS') !== false)) ||
                   (stripos($content, 'WordPress') === false)
                )


                That would require both (MODX and CMS) and filter out WordPress, which is how I read your request.

                You must have your RSS feed code set to limit things to 5 items. If you're using getResources anywhere in the process, the default &limit is 5.

                Thanks again for looking at this Bob.

                It's doesn't seem to be filtering the results at all now.
                This is the complete code - not sure what I'm missing:

                <?php
                $rss = new DOMDocument();
                $feed = array();
                $urlarray = array(
                  array( 'name' => 'Fleet World', 'url' => 'https://fleetworld.co.uk/feed/' ),
                  array( 'name' => 'Standard/Transport',            'url' => 'http://www.standard.co.uk/news/transport/rss' ),
                  array( 'name' => 'londonist',      'url' => 'http://londonist.com/category/news/feed' ),
                  array( 'name' => 'Daily mail',          'url' => 'http://www.dailymail.co.uk/travel/index.rss' ),
                );
                 
                foreach ( $urlarray as $url ) {
                  $rss->load( $url['url'] );
                 
                  foreach ( $rss->getElementsByTagName( 'item' ) as $node ) {
                  $item = array(
                    'site'  => $url['name'],
                    'title' => $node->getElementsByTagName( 'title' )->item( 0 )->nodeValue,
                    'desc'  => $node->getElementsByTagName( 'description' )->item( 0 )->nodeValue,
                    'link'  => $node->getElementsByTagName( 'link' )->item( 0 )->nodeValue,
                    'date'  => $node->getElementsByTagName( 'pubDate' )->item( 0 )->nodeValue,
                  );
                 
                  $content = $item['title'] . $item['description'];   
                    if ( 
                       ((stripos($content, 'MODX') !== false) && 
                       (stripos($content, 'CMS') !== false)) ||
                       (stripos($content, 'Wordpress') === false)
                    ) {
                       array_push( $feed, $item );
                    }
                  }
                }
                 
                usort( $feed, function( $a, $b ) {
                  return strtotime( $b['date'] ) - strtotime( $a['date'] );
                });
                 
                 
                $limit = 5;
                echo '<ul>';
                for ( $x = 0; $x < $limit; $x++ ) {
                    $site = $feed[ $x ]['site'];
                    $title = str_replace( ' & ', ' & ', $feed[ $x ]['title'] );
                    $link = $feed[ $x ]['link'];
                    $description = $feed[ $x ]['desc'];
                    $date = date( 'l dS F Y, g.ia', strtotime( $feed[ $x ]['date'] ) );
                 
                    echo '<li style="padding:20px 0; font-size:16px; line-height:140%; border-bottom:1px solid #ccc;">';
                    echo '<strong>'.$site.':<br><a href="'.$link.'" title="'.$title.'" target="_blank">'.$title.'</a></strong><br>'.$description.'<br><p style="padding-top:5px;font-style:italic; font-size:12px;">'.$date.'</p>';
                    echo '</li>';
                }
                echo '</ul>';
                
                


                  • 51020
                  • 670 Posts
                  Actually - it seems as though the last line to filter OUT words is working:
                   (stripos($content, '') === false)


                  But the two lines above to include the two words is not working:
                   ((stripos($content, 'MODX') !== false) && 
                         (stripos($content, 'CMS') !== false)) ||
                    • 51020
                    • 670 Posts
                    I think in hindsight, I really need to be able to have a long list of keywords to include and omit in order to get a good targeted feed.
                    Ideally, I would be able to include wordA AND wordB, but also include wordA OR wordC, and also omit WordD, WordE and WordF.

                    So if there were a number of arrays for these, I could add multiple keywords?
                      • 3749
                      • 24,544 Posts
                      Sure. It just means more stripos() calls and more parentheses. If it was just a two simple lists to be included or rejected it would be much simpler. Putting things like wordA AND wordB makes it more complicated.
                        Did I help you? Buy me a beer
                        Get my Book: MODX:The Official Guide
                        MODX info for everyone: http://bobsguides.com/modx.html
                        My MODX Extras
                        Bob's Guides is now hosted at A2 MODX Hosting