We launched new forums in March 2019—join us there. In a hurry for help with your website? Get Help Now!
    • 51481
    • 45 Posts
    Hello all,

    I need some help! My client needs PDF files to show up in search results given their content matches the search term. In my research, I found a post suggesting an Apache Solr installation to work with SimpleSearch (https://rtfm.modx.com/extras/revo/simplesearch/simplesearch.solr) and have done my best to get it set up but am not sure how to finalize things.

    My folder of PDFs that need to be indexed is in public_html/modx/assets/pdfs. I have created a Solr core called "cmx" that is in /var/solr/data/cmx and I took the solr.schema.xml and put it into /var/solr/data/cmx/conf and renamed it schema.xml (do I need to alter this file?)

    I have changed the following sisea system settings:

    sisea.driver_class: SimpleSearchDriverSolr
    sisea.driver_db_specific: No
    sisea.solr.hostname: 67.231.17.10
    sisea.solr.path: solr/cmx
    sisea.solr.port: 8983

    I put [[SimpleSearchIndexAll]] on my homepage and loaded the page (not really sure if anything happened?).

    When I run a search I get "There were no search results for the search "bylaws". Please try using more general terms to get more results."

    I am VERY new to all of this... Is there anyone who can help me? Thanks very much in advance! [ed. note: matthewmeredith last edited this post 7 years, 10 months ago.]
      • 51481
      • 45 Posts
      UPDATE: I have indexed all my current PDFs into Solr using this command:
      bin/post -c cmx -host 67.231.17.10 -filetypes pdf /home/townofco/public_html/modx/assets/pdfs


      Everything seemed to work okay as my Solr Admin panel is showing 351 docs and there were no error messages as it was posting. The last line reads:
      COMMITting Solr index changes to http://67.231.17.10:8983/solr/cmx/update


      Now the question is: How do I add my Resources to the Solr index, and how do I get the results to be displayed?

      In the System Settings for sisea, do I need a username and password for Solr? I don't remember setting one up, but maybe it just needs my server username and password?

      UPDATE #2: I just realized that there doesn't seem to be any "SimpleSearchIndexAll" snippet as stated in the SimpleSearch.Solr documentation (https://rtfm.modx.com/extras/revo/simplesearch/simplesearch.solr). I am running the latest version of SimpleSearch (1.9.2). There is, however, a Plugin called "SimpleSearchIndexer". Is this what I'm supposed to be using? If so, how do I run it?

      UPDATE #3: I went to the Github for SimpleSearch and found the SimpleSearchIndexAll snippet (https://github.com/splittingred/SimpleSearch/blob/88d9023281b8e10ca503bde2acacf0067eabf5cc/core/components/simplesearch/elements/snippets/simplesearchindexall.snippet.php) and created it in my Modx. I called it on a page and got "Finished Indexing 0 Resources". So something is happening... Here is the snippet I created:
      <?php
      /**
       * SimpleSearch
       *
       * Copyright 2010-11 by Shaun McCormick <[email protected]>
       *
       * This file is part of SimpleSearch, a simple search component for MODx
       * Revolution. It is loosely based off of AjaxSearch for MODx Evolution by
       * coroico/kylej, minus the ajax.
       *
       * SimpleSearch is free software; you can redistribute it and/or modify it under
       * the terms of the GNU General Public License as published by the Free Software
       * Foundation; either version 2 of the License, or (at your option) any later
       * version.
       *
       * SimpleSearch is distributed in the hope that it will be useful, but WITHOUT
       * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
       * FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
       * details.
       *
       * You should have received a copy of the GNU General Public License along with
       * SimpleSearch; if not, write to the Free Software Foundation, Inc., 59 Temple Place,
       * Suite 330, Boston, MA 02111-1307 USA
       *
       * @package simplesearch
       */
      /**
       * SimpleSearchIndexAll snippet, used for indexing all resources with alternate search drivers
       *
       * @package simplesearch
       */
      require_once $modx->getOption('sisea.core_path',null,$modx->getOption('core_path').'components/simplesearch/').'model/simplesearch/simplesearch.class.php';
      $search = new SimpleSearch($modx,$scriptProperties);
      $search->loadDriver($scriptProperties);
      $memoryLimit = $modx->getOption('memory_limit',$scriptProperties,'512M');
      @ini_set('memory_limit',$memoryLimit);
      @set_time_limit(0);
      $includeTVs = $modx->getOption('includeTVs',$scriptProperties,true);
      $processTVs = $modx->getOption('processTVs',$scriptProperties,false);
      /* build query */
      $c = $modx->newQuery('modResource');
      $c->where(array(
          'searchable' => true,
          'deleted' => false,
          'published' => true,
      ));
      $c->sortby('id','ASC');
      $resources = $modx->getIterator('modResource',$c);
      $i = 0;
      foreach ($resources as $resource) {
          $resourceArray = $resource->toArray();
          $templateVars =& $resource->getMany('TemplateVars');
          if (!empty($templateVars) && $includeTVs) {
              foreach ($templateVars as $tvId => $templateVar) {
                  /* eventually change this index to TV name */
                  $resourceArray['tv'.$templateVar->get('id')] = !empty($processTVs) ? $templateVar->renderOutput($resource->get('id')) : $templateVar->get('value');
              }
          }
          if ($search->driver->index($resourceArray,false)) {
              $modx->log(modX::LOG_LEVEL_INFO,'[SimpleSearch] Indexing Resource: '.$resourceArray['pagetitle']);
              $i++;
          }
      }
      return $modx->lexicon('sisea.index_finished',array('total' => $i));
      [ed. note: matthewmeredith last edited this post 7 years, 10 months ago.]
        • 3749
        • 24,544 Posts
        Please don't double post.
          Did I help you? Buy me a beer
          Get my Book: MODX:The Official Guide
          MODX info for everyone: http://bobsguides.com/modx.html
          My MODX Extras
          Bob's Guides is now hosted at A2 MODX Hosting
          • 51481
          • 45 Posts
          Quote from: BobRay at Jun 13, 2016, 10:52 PM
          Please don't double post.

          Sorry Bob, wasn't sure which forum to post the question in! Thanks for deleting the other one.
            • 51481
            • 45 Posts
            Okay, so something is definitely happening now... When I add the "SimpleSearchIndexAll" snippet to a page and load it, I get a 500 server error, but the script definitely runs as my Solr Admin shows an increase in "numDocs". I'm getting errors though (I won't post the whole log but here's the gist of it):
            [2016-06-14 00:38:14] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=11] Error adding field 'content'='[[~152]]' msg=For input string: "[[~152]]"
            [2016-06-14 00:38:14] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=13] Error adding field 'content'='http://www.discovercomoxvalley.com/' msg=For input string: "http://www.discovercomoxvalley.com/"
            [2016-06-14 00:38:14] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=14] Error adding field 'content'='http://www.investcomoxvalley.com/' msg=For input string: "http://www.investcomoxvalley.com/"
            [2016-06-14 00:38:15] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=16] Error adding field 'content'='[[SimpleSearchIndexAll]]
            [2016-06-14 00:38:15] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=25] Error adding field 'content'='http://www.comoxseniors.ca/' msg=For input string: "http://www.comoxseniors.ca/"
            [2016-06-14 00:38:15] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 248) Error adding Document to index on Solr server: ERROR: [doc=26] Error adding field 'content'='http://bctransit.com/comox-valley/home' msg=For input string: "http://bctransit.com/comox-valley/home"


            It looks like Solr doesn't like certain strings? There are a few more that show a similar error followed by the entire page's HTML.

            I also opened the file in the above error messages (simplesearchdriversolr.class.php) and changed the following lines to match the sisea System Settings:

                public function initialize() {
                    $this->_connectionOptions = array(
                        'hostname' => $this->modx->getOption('sisea.solr.hostname',null,'67.231.17.10'),
                        'port' => $this->modx->getOption('sisea.solr.port',null,'8983'),
                        'path' => $this->modx->getOption('sisea.solr.path',null,'solr/cmx'),
                        'login' => $this->modx->getOption('sisea.solr.username',null,''),
                        'password' => $this->modx->getOption('sisea.solr.password',null,''),
                        'timeout' => $this->modx->getOption('sisea.solr.timeout',null,30),
                        'secure' => $this->modx->getOption('sisea.solr.ssl',null,false),
                        'ssl_cert' => $this->modx->getOption('sisea.solr.ssl_cert',null,''),
                        'ssl_key' => $this->modx->getOption('sisea.solr.ssl_key',null,''),
                        'ssl_keypassword' => $this->modx->getOption('sisea.solr.ssl_keypassword',null,''),
                        'ssl_cainfo' => $this->modx->getOption('sisea.solr.ssl_cainfo',null,''),
                        'ssl_capath' => $this->modx->getOption('sisea.solr.ssl_capath',null,''),
                        'proxy_host' => $this->modx->getOption('sisea.solr.proxy_host',null,''),
                        'proxy_port' => $this->modx->getOption('sisea.solr.proxy_port',null,''),
                        'proxy_login' => $this->modx->getOption('sisea.solr.proxy_username',null,''),
                        'proxy_password' => $this->modx->getOption('sisea.solr.proxy_password',null,''),
                    );


            I really hope someone can chime in with a suggestion! It seems like I'm getting closer but I'm way above my head and quickly running out of ideas...

            UPDATE: Indexing Resources aside, here's the error I get when I try a search on my indexed PDFs:

            [2016-06-14 01:22:53] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 216) Error running query on Solr server: undefined field deleted
            [ed. note: matthewmeredith last edited this post 7 years, 10 months ago.]
              • 51481
              • 45 Posts
              Yet another update...
              [root@vps /]# pecl list
              Installed packages, channel pecl.php.net:
              =========================================
              Package Version State
              solr    2.4.0   stable
              

              This confirms that I have the PECL Solr package installed as per instructions here (https://rtfm.modx.com/extras/revo/simplesearch/simplesearch.solr), correct? I have a bunch of PDFs indexed in my Solr core and my System Settings are configured to use SimpleSearchDriverSolr, but I'm still not getting any results!!! What am I doing wrong?!

              Do I need to change anything in my SimpleSearch call? Here is my search bar:
              <!-- SEARCH BAR -->
              		<div class="navbar-header">
                          <div style="padding:0 25px 0 0;">
                              [[!SimpleSearchForm? &landing=`88` &tpl=`searchField`]]
                          </div>
                      </div>

              And my searchField tpl:
              <form class="navbar-form" role="search" action="[[~[[+landing:default=`[[*id]]`]]]]" method="[[+method:default=`get`]]">
                  <div class="input-group add-on">
                      <input type="text" class="form-control" placeholder="Search" name="[[+searchIndex]]" id="[[+searchIndex]]" value="[[+searchValue]]" />
                      <input type="hidden" name="id" value="[[+landing:default=[[*id]]]]" />
                      <div class="input-group-btn">
                          <button class="btn btn-default" type="submit"><i class="glyphicon glyphicon-search"></i></button>
                      </div>
                  </div>
              </form>

              And lastly, my Search Results resource page:
              [[!SimpleSearch]]

              Please help!!!
                • 51481
                • 45 Posts
                Continuing these updates in case this helps someone else out in the future!

                I have successfully indexed (almost) all of my Resources. I removed and reinstalled SimpleSearch because the "SimpleSearchIndexAll" snippet I found on Github was possibly out-dated. I also changed my System Setting for Search Driver Class Path to the absolute path since I have moved my core folder above the web root (/home/townofco/core/components/simplesearch/model/simplesearch/driver/). I'm not sure which of these was the fix, but when I put [[SimpleSearchIndexAll]] into a page and loaded it I got the "Finished indexing 131 resources" message!!! Yay!!! There were a few errors for various reasons (unpub_date not working for some reason) but I think I'll be able to chug through them in the schema.xml.

                Now, I'm still having the problem of displaying search results. I get the "There were no search results for the search "bylaws". Please try using more general terms to get more results." message and the following error:
                [2016-06-14 18:33:02] (ERROR @ /home/townofco/core/components/simplesearch/model/simplesearch/driver/simplesearchdriversolr.class.php : 216) Error running query on Solr server: undefined field deleted
                [ed. note: matthewmeredith last edited this post 7 years, 10 months ago.]
                  • 51481
                  • 45 Posts
                  Another update:

                  At first, I was reading that error message as though some undefined field had been deleted... Then I realized there's an actual field called "deleted" that is apparently undefined... Not really sure what to do about that. My schema.xml has
                  <field name="deleted" type="boolean" indexed="true" stored="true" />

                  I Deleted everything in my Solr index, restarted Solr, re-indexed everything, still getting the same error message. Oh and I also added "solr" as the sisea.solr.username in System Settings as the Java Properties page of my Solr Admin shows
                  user.name solr

                  way down near the bottom. No change, though.

                  I can get to the "browse" part with
                  http://67.231.17.10:8983/solr/comox_core/browse

                  run a search, and get results (of indexed Resources). But now how do I get that to translate through to SimpleSearch? Also, it still doesn't seem to be parsing through the PDF files and searching their content... Which was the whole point of this adventure in the first place...

                  HELP!!! I really wish there was someone out there to reply... At this point, I would even consider paying someone to provide me with a solution. The website is supposed to launch on Monday and this is the last thing to be done before it is ready. [ed. note: matthewmeredith last edited this post 7 years, 10 months ago.]
                    • 51481
                    • 45 Posts
                    Okay so I found something weird... in my schema.xml file, I have the field:
                    <field name="deleted" type="boolean" indexed="true" stored="true" />

                    BUT when I go through my Solr Admin panel to the Schema browser, nothing comes up for "deleted"!!! Every other field in my schema.xml are there... Why isn't "deleted"? I have a feeling this is the cause of my problems!

                    So here's the question... Can I just remove the line in my schema.xml that has the "deleted" field? I can't imagine ever needing it for searches.

                    EDIT: Okay, I deleted the line from my schema.xml, removed all indexes, restarted Solr, re-indexed... And here's a weird thing: It only indexed 134 Resources (compared to 140-something last time). However, the same error message still comes up whenever I try to run a search on my site.
                      • 51481
                      • 45 Posts
                      UPDATE: I got Solr to properly index the contents of my PDF files!!! The call was:
                      bin/post -c comox_core -host 67.231.17.10 -filetypes pdf /home/townofco/public_html/assets/pdfs -params "uprefix=attr_"

                      Pretty stoked about that. Now in the Solr Admin panel, I can query a phrase that I know is ONLY in a PDF file and it gets returned as a result. Awesome!

                      Unfortunately, that is no good to me as I STILL can't get SimpleSearch to query my Solr core! No matter what I try, I keep getting the same error message and "There were no search results for the search "". Please try using more general terms to get more results." on my search results page.

                      Just to confirm, I tried SimpleSearchDriverBasic in my System Settings and it searched and displayed results perfectly fine... So something is happening either sending the search term to Solr or fetching the results.