Tim, find enclosed a proposal for the release 1.4 of the searchHighlight plugin
Lot of your improvements have been integrated. The main differences with your 1.3 release are:
- take into account of the page charset (linked with the database charset) for the use of htmlentities
- the correction of an issue regarding $database_connection_charset. Not declared as a global variable, so always empty!
- an issue corrected, regarding the class names for the new terms generated. When you have for instance a document with "alphabétisation" and "alphabétisation", you need to use the same class ajaxSearch_highlight1 for the two terms. For that you are obliged to memorize the index of the original class.
- Regarding the lookBehind assertion, your assertion doesn’t run properly. If you search "ute", you don’t avoid to match é with:
$pattern = '/(?<!&)(?<!&.)(?<!&.^;)(?<!&.^;^;)(?<!&.^;^;^;)(?<!&.^;^;^;^;)(?<!&.^;^;^;^;^;)(?<!&.^;^;^;^;^;^;)(?<!&.^;^;^;^;^;^;^;)'.preg_quote($word, '/') . '(?=[^>]*<)/' . $pcreModifier;
My proposal is to use:
$pcreModifier = ($pgCharset == 'UTF-8') ? 'iu' : 'i';
$lookBehind = '/(?<!&|&[^;]|&[^;][^;]|&[^;][^;][^;])'; // avoid a match with a html entity
$lookAhead = '(?=[^>]*<)/'; // avoid a match with a html tag
with:
$pattern = $lookBehind . preg_quote($word, '/') . $lookAhead . $pcreModifier;
This simpler solution do the assumption that the searchterm has at least 3 characters.
The advSearchHighlight plugin (which is a variant of searchHighlight plugin) works on the demo site of ajaxSearch.
I have created
a document named "html entities" with lot of words encoded as html entities.
This document belongs to the French document hierarchy (French is a beautifull language with lot of accented characters
)
So for a test, you need select first the "french documents" (left bottom side of each page). As test do for instance a search with "alphabétisation" or "éducation école" and click on the link "html entities" to display the document with the highlighted searchterms.
Look at the source code of the page. Then try to search "ute" or "rave" to check that html entities are not found by the advSearchHighlight plugin.
Here are some directs links with the results:
alphabétisation Here alphabétisation and alphabétisation are correctly highlighted
éducation école
ute In this document é are not highlighted.
gra In this document à are not highlighted.
Thanks for your feedbacks about this new release of the searchHighlight plugin