Custom modifier > strip tag by class or id

270 Posts

lokust Reply #1, 11 years, 2 months ago

Hi there, has any one got a custom modifier snippet in their toolset that will strip out html tags by class or id?

24,544 Posts

BobRay Reply #2, 11 years, 2 months ago

Not me. You could do it with preg_replace(). It might also be possible to do it with JQuery if you just want to remove those tags from the output.

Did I help you? Buy me a beer
Get my Book: MODX:The Official Guide
MODX info for everyone: http://bobsguides.com/modx.html
My MODX Extras
Bob's Guides is now hosted at A2 MODX Hosting

270 Posts

lokust Reply #3, 11 years, 2 months ago

Hi Bob, so the way I'm doing this at the moment > parsing a Google+ feed and using CSS to hide some content in the < description > field. It works though I'd rather just zap the unrequired content at source. Not entirely sure how to approach this as I'd need to target both the start of the tag with class or id and also it's closing tag > rather than just stripping out or replacing it with something else.

So something along these lines I guess:

Snippet: attrizap

<?php
$zap = preg_replace('\$input\', ""');
return $zap;

Usage:

[[+content:attrizap=`<div class="someClass">.+</div>`]]

24,544 Posts

BobRay Reply #4, 11 years, 2 months ago

One of the many, many great features of the PhpStorm editor is a regex find and replace that will highlight what it's going to do as you type the regex. It shows all the matches it finds and what they will be replaced by. Once you get it right, you just paste the pattern into your preg_match statement. It's really a comfort to see that you're not going to trash stuff you don't want altered and it beats the heck out of trial and error.

I would do it with two separate preg_replaces, one for classes and one for IDs.

[[+content:attrizap=`idName,className`]]

Something like this (assuming that they're all divs):

<?
$output = '';
$replace = '';
$options = explode(",", $options);

$idName = $options[0];
$className = $options[1];
$pattern1 = "/<div\s*class\s*=\s*.*\".*" . $className . ".*\".*>.*</div>/";
$pattern2 = "/<div\s*id\s*=\s*.*\".*" . $idName . ".*\".*>.*</div>/";

$output = preg_replace($pattern1, $replace, $input);
$output = preg_replace($pattern2, $replace, $output;

return $output;

The pattern could be simpler if you're absolutely sure of the format (e.g., that there will be no spaces around the equals sign and no cases of multiple classes or extra spaces in the class or id definitions).

[update] I should also mention that if this is a displayed page, you could also just hide or empty the divs with JQuery. [ed. note: BobRay last edited this post 11 years, 2 months ago.]

Did I help you? Buy me a beer
Get my Book: MODX:The Official Guide
MODX info for everyone: http://bobsguides.com/modx.html
My MODX Extras
Bob's Guides is now hosted at A2 MODX Hosting

534 Posts

exside Reply #5, 11 years, 2 months ago

Maybe overkill, but I have made very good experience with phpQuery https://github.com/punkave/phpQuery (check original google.code repo for documentation!) that does exactly what you're trying to achieve...but as said before, including a whole library is maybe overkill for a "little" output filter =D...

☆ A M B ☆
2,475 Posts

Everett Reply #6, 11 years, 2 months ago

Might it be easier to use preg_match to grab the div (or what ever tag) contents that you want to keep?

Hosting stuff: http://fireproofsocks.com/
Coding stuff: http://craftsmancoding.com/
Blog stuff: http://tipsfor.us/

☆ A M B ☆
191 Posts

jgrant Reply #7, 11 years, 2 months ago

There's some great functionality in PHP for DOM manipulation. It'll save you a lot of headaches over trying to write your own regexs for everything. http://www.php.net/manual/en/book.dom.php

As an example, here's a little snippet I wrote to clean up the links in a feed of Facebook wall posts. It's called as an output filter, like [[+summary:cleanFBlinks]].

<?php
$dom = new DOMDocument;
$dom->loadHTML('<?xml encoding="UTF-8">' . $input);  // UTF-8 or some characters make problems
foreach ($dom->getElementsByTagName('a') as $node) {
	$node->removeAttribute('onclick');
	$node->removeAttribute('onmouseover');
	$node->removeAttribute('rel');
	$node->removeAttribute('id');
	$node->removeAttribute('title');
	$node->removeAttribute('target');
	$node->removeAttribute('style');
    $href = $node->getAttribute('href');
	$node->setAttribute('href', urldecode(substr($href, strpos($href, '=')+1, strrpos($href, '&h=') - strlen($href) )));
}
foreach ($dom->getElementsByTagName('img') as $node) {
	$node->removeAttribute('class');
}
$output = substr($dom->saveXML($dom->getElementsByTagName('body')->item(0)), 6, -7); // Strip off <body></body> tags
return $output;

Makes life pretty easy!

Extras :: pThumb • Resizer • imageSlim • setPlaceholders

270 Posts

lokust Reply #8, 11 years, 2 months ago

Thanks for all the input - fantastic!
I've gone with Bob's code although I might see if I can adapt to Everett's suggestion which cuts to the nub of what I'm looking to do for this case.

24,544 Posts

BobRay Reply #9, 11 years, 2 months ago

preg_match() would let you grab just what you want to keep, but it gets tricky if there are nested divs.

Did I help you? Buy me a beer
Get my Book: MODX:The Official Guide
MODX info for everyone: http://bobsguides.com/modx.html
My MODX Extras
Bob's Guides is now hosted at A2 MODX Hosting