Parsing XML including attributes

☆ A M B ☆
339 Posts

gallenkamp Reply #1, 10 years, 11 months ago

Dear MODX people,

I'd like to parse a bit more than just the node names of an XML file. I need the nodes and sometime their attributes.

~~What I do by now (without the attributes) is the following:~~

Ok, I completed it so far. Here is my XMl parser snippet. It puts any xml element into a MODX placeholder.

# Snippet to read and parse XML input
# USAGE: [[!xmlparser? &source=`feed.rss` &tpl=`xmlTpl`]]
	
$modx->setDebug(true);

$source = $scriptProperties['source'];

if (empty($source)) {
    $modx->log(modX::LOG_LEVEL_DEBUG,'[xmlparser] Empty source adress passed, aborting.');
    return '';
}

else {

if ($xml = simplexml_load_string(file_get_contents($source))) {

$output = '';
  foreach ($xml->channel->item as $item) {
    foreach ($item->children() as $key => $value) {
        if ($key == 'pubDate') {
         $value = strftime("%d.%m.%Y %H:%M:%S", strtotime($value));
        }
        $modx->setPlaceholder($key, $value);

        foreach ($item->$key->attributes() as $attrkey => $attrval) {
          $modx->setPlaceholder($key.'.'.$attrkey, $attrval);
        }

    }
    $output .= $modx->getChunk($scriptProperties['tpl']);
  }
} else {
    exit('Konnte '.$source.' nicht öffnen.');
}

return $output;
}

The template chunk can have any placeholder in it, now even with attributes of the element:

<h4><a href="[[+link]]" title="[[+title]]">[[+title]]</a></h4>
<p><b>[[+pubDate:default=``]]</b></br>
[[+locationname:default=``]]</br>
[[+eventdate:default=``]][[+description:default=``]]
<img src="[[+enclosure.url]]"></p>

See the enclosure.url placeholder? That was an attribute from the rss xml source:

<enclosure url="some.url">

Let me know what you think. What would you change? Would you change anything at all?

Cheers,

Guido

☆ A M B ☆
23 Posts

Mangesh Reply #2, 10 years, 11 months ago

Awesome!! It works...... Just need to add &limit

Developing Themes/Templates for MODx Revo. Visit http://mdxthemes.com

☆ A M B ☆
397 Posts

sonicpunk Reply #3, 10 years, 3 months ago

This will come in handy in a future project, thanks!

Benjamin Davis: American web designer living in Munich, Germany and a MODX Ambassador. I am also co-founder of SEDA.digital, a MODX Agency.

35 Posts

DannyFranks Reply #4, 8 years, 11 months ago

I have tried different packages including spieFeed and GetFeed to pull in an xml document but neither seems to work so I went searching and found this. I then saw that it has been turned into a package called ParseX. I have installed it and got it working. It works well and I can sort on date but it does has a few limitations.

There are 2 things I need it to do but can't work out how to add them and was hoping someone could help.

Add a limit to the items pulled in
Get the attributes values of a node

I am pulling in this xml file.

http://xml.corporate-ir.net/irxmlclient.asp?compid=251258&reqtype=newsreleases_2

and I need to get the ReleaseID attribute from the NewsRelease node, it not being pulled in as separate node within it.

The snippet code is more or less above but I have included it below

# Snippet to read and parse XML input
# USAGE: [[!parsex? &source=`feed.rss` &tpl=`xmlTpl`]]
# author: [email protected]

//$modx->setDebug(true);
 
$source = $modx->getOption('source', $scriptProperties, 'http://modx.com/feeds/latest.rss');
$element = $modx->getOption('element', $scriptProperties, 'item');
$tpl = $modx->getOption('tpl', $scriptProperties, 'xmlTpl');
$wrapper = $modx->getOption('wrapper', $scriptProperties, 'wrapX');
$debugmode = $modx->getOption('debugmode', $scriptProperties, false);

if (empty($source)) {
    $modx->log(modX::LOG_LEVEL_ERROR,'[parseX] Empty source adress passed, aborting.');
    return 'No source definded.';
}

else {
    if ($xml = simplexml_load_string(file_get_contents($source))) {
        #$modx->log(modX::LOG_LEVEL_ERROR,'[parseX] can read file: '.$source);
        #$modx->log(modX::LOG_LEVEL_ERROR,'[parseX] element value '.$element);
        $nodes = $xml->xpath("//$element");

        foreach ($nodes as $node)
        {
        $values = array();
            foreach ($node as $key => $value) {
                    if ($key == 'pubDate') {
                        $value = strftime("%d.%m.%Y %H:%M:%S", strtotime($value));

                    }
                    $values[$key] = (string)$value;
                    
                    foreach ($node->$key->attributes() as $attrkey => $attrval) {
                        $values[$key.'.'.$attrkey] = (string)$attrval;
                    }
                }
                if ($debugmode==true) {
                    var_dump($values);
                }
                $output .= $modx->getChunk($tpl, $values);
        }

    }
    else {
        $modx->log(modX::LOG_LEVEL_ERROR,'[parseX] can NOT read file: '.$source);
    }


$result = array("result" => $output);
return $modx->getChunk($wrapper, $result);
}

I really hope someone can help as this is the last piece of a website that has a Friday deadline. Thanks in advance.

Danny

☆ A M B ☆
339 Posts

gallenkamp Reply #5, 8 years, 11 months ago

Can access the feed, i get the following error:

<IRXML><Errors><Error errorCode="0">This request failed validation. Type of Failure encountered was Unauthorized</Error></Errors><IPAddress>94.134.81.11</IPAddress><RequestedUrl>http://xml.corporate-ir.net:84/irxmlclient.asp?compid=251258&reqtype=newsreleases_2</RequestedUrl></IRXML>

35 Posts

DannyFranks Reply #6, 8 years, 11 months ago

You might have to copy and paste the whole link if it doesn't work

http://xml.corporate-ir.net/irxmlclient.asp?compid=251258&reqtype=newsreleases_2

but essentially I need to limit the items and get the ReleaseID from the NewsRelease node

<NewsRelease ReleaseID="2047271" DLU="20150512 21:00:31" ArchiveStatus="Current" RNSSource="" ContainerId="" Type="2"><Title>GasLog Ltd. Announces Election of Directors at 2015 Annual General Meeting of Shareholders</Title>
<ExternalURL/>
<Date Date="20150512" Time="17:00:31" TimeZone="ET">5/12/2015 5:00:31 PM</Date>
<DisplayDateStart Date="20150512" Time="17:00:31">May 12 2015 05:00</DisplayDateStart>
<DisplayDateEnd Date="20350512" Time="21:00:31">May 12 2035 09:00</DisplayDateEnd>
<ContentNetworkingLinks/>
<Categories>
<Category>NA</Category>
</Categories>
</NewsRelease>

Thanks for the help

Danny

1,572 Posts

Paulp Reply #7, 8 years, 11 months ago

Quote from: gallenkamp at Jun 03, 2015, 10:23 AM

Can access the feed, i get the following error:

<irxml><errors><error errorcode="0">This request failed validation. Type of Failure encountered was Unauthorized</error></errors><ipaddress>94.134.81.11</ipaddress><requestedurl>http://xml.corporate-ir.net:84/irxmlclient.asp?compid=251258&reqtype=newsreleases_2</requestedurl></irxml>

Getting the same error here as well

☆ A M B ☆
339 Posts

gallenkamp Reply #8, 8 years, 11 months ago

New snippet code with limits:

# Snippet to read and parse XML input
# USAGE: [[!parsex? &source=`feed.rss` &tpl=`xmlTpl`]]
# author: [email protected]

//$modx->setDebug(true);
 
$source = $modx->getOption('source', $scriptProperties, 'http://modx.com/feeds/latest.rss');
$element = $modx->getOption('element', $scriptProperties, 'item');
$tpl = $modx->getOption('tpl', $scriptProperties, 'xmlTpl');
$wrapper = $modx->getOption('wrapper', $scriptProperties, 'wrapX');
$limit = $modx->getOption('limit', $scriptProperties, 0);
$debugmode = $modx->getOption('debugmode', $scriptProperties, false);

if (empty($source)) {
    $modx->log(modX::LOG_LEVEL_ERROR,'[parseX] Empty source adress passed, aborting.');
    return 'No source definded.';
}

else {
    if ($xml = simplexml_load_string(file_get_contents($source))) {
        #$modx->log(modX::LOG_LEVEL_ERROR,'[parseX] can read file: '.$source);
        #$modx->log(modX::LOG_LEVEL_ERROR,'[parseX] element value '.$element);
        $nodes = $xml->xpath("//$element");

        $nodecount = 0;
        foreach ($nodes as $node)
        {
        $nodecount++;
        $values = array();
            foreach ($node as $key => $value) {
                    if ($key == 'pubDate') {
                        $value = strftime("%d.%m.%Y %H:%M:%S", strtotime($value));
                    }
                    $values[$key] = (string)$value;
                    
                    foreach ($node->$key->attributes() as $attrkey => $attrval) {
                        $values[$key.'.'.$attrkey] = (string)$attrval;
                    }
                }
                if ($debugmode==true) {
                    var_dump($values);
                }
                $output .= $modx->getChunk($tpl, $values);
if (($nodecount >= $limit) && ($limit !=0)) break;
        }

    }
    else {
        $modx->log(modX::LOG_LEVEL_ERROR,'[parseX] can NOT read file: '.$source);
    }

$result = array("result" => $output);
return $modx->getChunk($wrapper, $result);
}

Call it with the usual &limit=`5` or whatever.

35 Posts

DannyFranks Reply #9, 8 years, 11 months ago

The limit works great, thanks for that. Any idea how I can get the ReleaseID id out though? This is a really big help, thanks again.

Just another thought as well. Is there any way of setting it to start at certain item? spieFeed does it with &getItemStart=`#`. Is this possible?

All this will make it a truly useful snippet (above and beyond what it is now).

☆ A M B ☆
339 Posts

gallenkamp Reply #10, 8 years, 11 months ago

~~I am already on it.~~ Dammit, I already had that integrated but didnt remember

[[+NewsRelease.ReleaseID]]