Why tagging doesn't work in Revo and how to solve this

29 Posts

chunkyRice Reply #1, 11 years, 2 months ago

MODX contains two Template Variable types aimed at tagging: Tag and Auto-Tag. The latter has a more elaborate interface, but the basic functionality of the TV's are the same. Both TV types are simple text fields, allowing you to enter a number of comma separated tags. Before the TV is stored in the database, all spaces surrounding the comma's are removed. When the TV is read by a snippet like tagLister, all tags are presented as separate links. However, when clicking one of those links, you are likely to get more than you'd hoped for.

THE PROBLEM
Let's suppose we have a number of pages each containing one or more tags, like so:

Page 1 - box
Page 2 - box,container
Page 3 - cardboard box
Page 4 - boxing

When using tagLister to summarize all available tags it will create the following list (number of hits between parentheses):
box (2), container (1), cardboard box (1), boxing (1)

Every tag is treated as a separate entity, as you'd expect. When you click on 'box', using getResourcesTag, you will get a list containing the following documents:

Page 1 - box
Page 2 - box,container
Page 3 - cardboard box
Page 4 - boxing

Huh? TagLister indicated there were only 2 documents using the tag 'box', but getResourceTag shows 4. How come? Well, GetResourcesTag searches the TV's looking for the following SQL query:

LIKE %box%

This means it will see a match whenever the word 'box' shows up, whether or not it is surrounded by other characters.

We decide to use getResources to get a bit more control and look for the query:

LIKE box

This yields the following result:

Page 1 - box

Snippets can break a Tag TV up into separate tags, but when searching the database you don't have that luxury. In that case tags are treated as what they really are: a single line of comma delimited text. So as a workaround we could search for:

LIKE box||LIKE %,box,%||LIKE %,box||LIKE box,%

This leads to the following result:

Page 1 - box

Apparently comma's are neglected when searching the database. You can not use them in a search query. This leaves you empty handed.

A WORKAROUND
Instead of using a Tag type TV you could use a plain Text type TV and you can use semicolons instead of comma's. Now you can search for:

LIKE box||LIKE %;box;%||LIKE %;box||LIKE box;%

You'll get this result:

Page 1 - box
Page 2 - box,container

It works! The only drawback is that a Text TV will not remove the extra spaces surrounding the semicolons, like a Tag TV does. Users inserting spaces after the semicolons will break your query, so you need to slightly adjust it:

LIKE box||LIKE %;box;%||LIKE %; box;%||LIKE %;box||LIKE %; box||||LIKE box;%

It isn't pretty, but it works. In case you're not looking for exact matches, but for the occurrence of the whole word, it would look like this:

LIKE box||LIKE % box %||LIKE % box;%||LIKE %;box %||LIKE %;box;%||LIKE %;box||LIKE % box||LIKE box;%||LIKE box %

Even less than pretty and probably pulling quite a load on your webserver (especially with large sites), but it still works.

ANOTHER WORKAROUND
As an alternative you could put quote marks around each tag. This would make everything a lot easier, but it does require all users to be precise and consistent. Instead of using a Text TV you could create a simple MIGX TV, since MIGX stores everything as a JSON string (automatically quoting all tags). This would require users to enter each tag separately, which makes it quite cumbersome as MIGX opens a modal for each entry. Either way it does reduce the size of queries considerably. For an exact match:

LIKE %"box"%

And for the whole word:

LIKE %"box"%||LIKE % box"%||LIKE %"box "||LIKE % box %

A PERMANENT SOLUTION
The second workaround points in the direction of a possible and simple solution to the problem: a TV type that adds quote marks around each tag. I think it would be best if it were to happen automatically and transparently. Meaning that the input type would take care of adding quotes to the comma delimited tags before storing the TV in the database. And the output type would remove the quotes before parsing them to HTML. This way the TV field could look similar to a Tag TV field and the data passed to snippets would also be the same. Only the queries would need to be adjusted.

The TV type could be added to the core (preferably) or someone might make a plug-in. I've tried to do it myself, but my PHP skills are unfortunately not good enough. (a refactor request has been submitted: #9493)

A computer program is a utilitarian typographer's dream - a functioning machine composed completely of type. (John Maeda)

☆ A M B ☆
191 Posts

jgrant Reply #2, 11 years, 2 months ago

Good description of the problem.

I think the various snippets which work with tags need to be using the IN operator ( http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_in ) instead of LIKE. That would make commas work properly as delimiters.

If you look at getResourcesTag it appears to support this (lines 49-50) but I'm not sure it actually does. I've tried using within before but didn't get very far and didn't have time to look through the rest of the code to see if it was fully implemented. There's also a pull request for getResources ( https://github.com/opengeek/getResources/pull/64 ) to add support for IN.

I hope it gets done sometime. It would certainly be useful in some cases!

Extras :: pThumb • Resizer • imageSlim • setPlaceholders

5,888 Posts

Bruno17 Reply #3, 11 years, 2 months ago

you can simply add double-pipes around the tag-field with concat:
http://forums.modx.com/thread/82417/filtering-with-multiple-queries-on-the-same-column#dis-post-454882

-------------------------------

you can buy me a beer, if you like MIGX

http://webcmsolutions.de/migx.html

Thanks!

29 Posts

chunkyRice Reply #4, 11 years, 2 months ago

@Jgrant I tested the FIND_IN_SET function and that one does indeed find exact matches. Which suggests that the 'IN' operator, as proposed by jeffwhitfield, could actually work. That would be a vast improvement. It doesn't find whole words though and I can't find a way to make it so. Therefore using it to find 'cardboard box' searching for 'box' still seems impossible. I presume most people would like tags to come up with exact matches anyway, but I could imagine cases in which finding whole words would be useful. So for flexibilities sake, it would be neat to make that work as well.

@ Bruno17 Reading the post, I can't really see the benefit of using double pipes over quote marks. Don't they try to do the same thing, i.e. adding delimiters at the end and beginning of the tags, rather than just having delimiters in between?

A computer program is a utilitarian typographer's dream - a functioning machine composed completely of type. (John Maeda)

5,888 Posts

Bruno17 Reply #5, 11 years, 2 months ago

the benefit of using

$c->where("CONCAT('||',category,'||') LIKE '%||" . $cat . "||%'", xPDOQuery::SQL_AND);

is exactly that, what you are looking for.
Searching for exact words in delimited field-contents.

If you have in your TVs:

page1: box||container
page2: cardboard box||another one
page3: boxing
page4: container||box||another one

and you are searching for 'box'
this will find page1 and page4

no need to have another TV-type

of course if you have comma-delimited - TV-lists where you want to search in you can do

$c->where("CONCAT(',',category,',') LIKE '%," . $cat . ",%'", xPDOQuery::SQL_AND);

-------------------------------

you can buy me a beer, if you like MIGX

http://webcmsolutions.de/migx.html

Thanks!

29 Posts

chunkyRice Reply #6, 11 years, 2 months ago

Tried it and it works. It even allows for searching for the whole word, rather than an exact match.

Still, both solutions (the use of IN and the use of CONCAT) need to be included in snippets for them to work. Jeff Whitfield's proposal of including the IN operator in getResources is 11 months old. The last feedback from OpenGeek is 8 months old. I fear it won't happen anytime soon. And if getResources doesn't change, getResourcesTag probably won't either. A similar situation is true for getRelated. Which is why I still think a new TV-type is worth considering. It could be a quicker fix than waiting for all snippets to be modified. [ed. note: chunkyRice last edited this post 11 years, 2 months ago.]

A computer program is a utilitarian typographer's dream - a functioning machine composed completely of type. (John Maeda)

26 Posts

saerni Reply #7, 11 years, 2 months ago

Hi folks,

here my solution:
install the last TagLister package.
use [[!getResourcesTag? &tagSearchType=`within` &yourParametersHere... ]]

Then you have to modify the getResources 1.6.0 to work with the (IN) operator. Please check my fork on github: https://github.com/enzyms/getResources

Hope it helps!

32 Posts

charismeki Reply #8, 10 years, 7 months ago

We ran into the same problem and helped ourselfes by encapsulating each item with # for and after: eg #12#,#34# : we filter then with getResources: compulsary==%#[[*id]]#% which seems to work very well. TV type is AutoTag TV. Best c--

433 Posts

michelle84 Reply #9, 10 years, 6 months ago

Hi Charismeki, how did you encapsulate each item with #. Just typing it in like that, or did you use a plugin or something to do it afterwards? Could you show me an example? Thanks!

http://www.vierkante-meter.nl

5,888 Posts

Bruno17 Reply #10, 10 years, 6 months ago

you just need to add a doublepipe at front and after the tags:

||tagA||tagB||tagC||

this can be done by a plugin onDocFormSave

Than you can search by

%||tagA||%

-------------------------------

you can buy me a beer, if you like MIGX

http://webcmsolutions.de/migx.html

Thanks!