Final report Course search plugin@moodle

Pencil Down !  (17 September – 22 September )

Finally we are finished up with the project. Here is the final report of project. Here are some quick summary of project.

Introduction to advance course search :-

The project describe the integration of Apache Solr as a 3rd party Search API with moodle course schema for the implementation of course search that is flexible, case-insensitive, works with non-latin languages, fast, and can sort results by relevance.it should be database independent or should supports as many databases as possible. this is a plugin that can be installed and configured to substitute basic core search.

Problem description :-

  • Non latin support with search queries: Database engines don’t recognize word boundaries in non-english languages and can’t do case-insensitive search.
  • Indexing: We don’t want to reinvent the wheel as there are many awesome open source enterprise level search indexers available that can make our course search fast and efficient.
  • Sorting by relevance: We need to sort results according to relevance for example if a query result matches course name then it’s more relevant then its other matches with summary of course. Implementing spelling correction (Did you mean?) feature. Also fuzzy search (alternate form of words) needs to be implemented.
  • Work consistently on different database engines and different content language: Course search should be implemented in a way that it can deal with as many databases possible. and it should be able to deal with different content language.

Requirements :-

  • Java 5 or higher (a.k.a. 1.5.x).
  • PHP 5.1.4 or higher.
  • Solr 4.x
  • Moodle 2.5 or higher

Capabilities :-

  • Works with non-latin languages too
  • Auto-suggestions
  • 10 times Faster search
  • Search with almost all file types
  • Spell check capability
  • Case insensitive search capability
  • Database independent & scalable
  • Fuzzy search(alternate form of words)
  • Search sorting based on relevancy(score)
  • Filtering results by startdate
  • Pagination & filtering results
  • Keyword Matching(Searching within specific field)
  • Proximity Search
  • Quite easy to setup and use
  • Logical operator with queries

Inspired by :-

Weekly Report 12 Gsoc @ Moodle

week 11 ( 8 September – 15 September )

hey everyone,

This week’s major task was to write modify renderer file so that it works whether solr is up or not.  I need to perform following checks with the course search plugin.

  • Admin tool is installed ?
  • Admin tool is configured ?
  • Solr is up ? (Ping to solr is successful or not ? )

if either of the condition goes wrong. we need to come up with the core moodle search.

Here is the flow diagram how search works from input search query to rendering results to course search page.

Decision making with advance course search plugin

Decision making with advance course search plugin

Weekly Report 11 Gsoc @ Moodle

week 11 (28 august- 5 September)

Task 1 Filtering results by startdate :-

This week my task was to filter course results by startdate. Solr range can be used to implement the same.

Working with solr range queries :-

Range Queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the Range Query.

startdate:[2013-08-24T00:00:00Z  TO *]  // courses that are started after 24 august
 startdate:[2013-08-24T00:00:00Z TO 2013-09-24T00:00:00Z]  // courses that is going to start b/w august to sep.

Solr’s built-in field types are very convenient for performing range queries on numbers without requiring padding.

coursesearch_newsearchform

Weekly Report 10 Gsoc @ Moodle

week 10 (17 august- 24 august)

Task 1 (Implementing auto-complete with solr):-

Solr have out of box capability to autocomplete for fields configured in schema. there are many ways to implement the same. there are-

Autocomplete with advance course search

Autocomplete with advance course search

using term component only single term suggestions are possible, and unfortunately we can’t apply any filter. Furthermore, user queries will not be analyzed in any way; you’ll have access to raw indexed data, so won’t be suggest with whitespaces or case-sensitive queries.

so I found SpellCheckComponent as a best solution. This solution has its own separate index which you can automatically build on every commit.

Configuring SpellCheckComponent handler :-

<requestHandler name="/suggest">
    <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggest</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.count">5</str>
      <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

Schema.xml :-

<fieldtype name="phrase_suggest" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^\p{L}\p{M}\p{N}\p{Cs}]*[\p{L}\p{M}\p{N}\p{Cs}\_]+:)|([^\p{L}\p{M}\p{N}\p{Cs}])+"
replacement=" " replace="all"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldtype>

Demo@ Advance course search

Hey everyone,

This week I was doing some minor modification along with creating git wiki, general user documentation etc.

I came up with a short video that shows a live demo how course search gives you better way to search through courses. It proves more handy when you have thousand of courses. we are still working on UI. so its proves more Cool 🙂

 

Weekly Report 9 Gsoc @ Moodle

week 9 (01 august- 07 august)

Before writing about this weeks task & outcomes. I want to thanks my mentors for their positive feedback and evaluation towards my project. This week I corrected  somes issues including some bugs too.

Task 1 :- Performing plugin checks:-      We need to perform to checks for plugin & behave according to the outcome.

A. Whether the plugin is able to communicate with solr :-  We ping solr if return true solr is reachable if not than we throw the error code.

try {
if (!$this->solr->ping()) {
$this->errorcode = -1;
$this->errormessage = “Ping failed !”;
return false;
} catch (Exception $e)  {

$e->getMessage();

}

B. Whether Admin tool is installed :-    Just retrieve the installed plugin list if the plugin “Course search” exist we can proceed else throw the errorcode.

if(!array_key_exists(‘coursesearch’,get_plugin_list(‘tool’)))
{
$errorcode = 1;
}

 

Task 2 Listen to events and mark courses for re-indexing :- We need to re-index or delete a particular course from index if-

— a course is deleted.

— a course is updated

— a course is created

moodle event API  makes this  easy . Although moodle developers are going to rebuild the Event API for moodle 2.6

We just need to create events.php class inside db directory. While the plugin upgrade the the function & corresponding files or trigger are called.

I will be writing some more specification about this task 🙂

Code snippets :-

$handlers = array (
‘course_created’ => array (
‘handlerfile’ => ‘/admin/tool/coursesearch/locallib.php’,
‘handlerfunction’ => ‘deleteById’,
‘schedule’ => ‘instant’,
‘internal’ => 1,
)

 

Thanks 🙂

Weekly Report 8 Gsoc @ Moodle

week 8(20 july- 28 july )

Task 1 :- Search into compressed archives –  unfortunately Solr doesn’t search into compressed file contents only index the file names. But its important for the courses attachments to search into compressed archives like zip, rar, gz, tar.gz etc.

After a long search and discussion over #solr  I came to know about a jira issue. that was reported by jayendra. I want to thanks him. this jira issue along with patch made the things work.

Link to issue :- SOLR-2416

following are the complete steps to make it work :-

1. By using the latest solr trunk apply the following patch found in the jira issue.

2. Now rebuild  the solr.

3.  Replace the solr-cell jar with your one.

and its all now you can search into archives content too.

Task 2 :- Checking the code with Code checker – Everything excluding the solr-php-client library must follow the moodle coding style hence I ran code checker & removed all warning & errors. (https://moodle.org/plugins/view.php?plugin=local_codechecker)

Task 3 :- Testing everything works fine & submitting the source.

Task 4 :-  last and most important submitting the mid term evaluation.

What’s Next week :-

1. Perform the checks on plugin, cleantheme.

  •  Admin tool installed or not ?
  • Solr instance is running or not and properly configured ?

2. Studying the moodle event API.

3. Triggering the events to re-index particular course if :-

  • It is updated
  • Deleted (remove from index)
  • Or any new course is added.

4. Clean the code and push to github.

Weekly Report 6 Gsoc @ Moodle

week 6(12 july- 19 july)

Indexing Rich types With Solr(Integration of Tika with Apache Solr)

Moodle Courses may have attachments with them. There are two types o f attachment with moodle Courses.

1. Course Summary files(only images types are allowed)

2. Course Overview files (all types with unlimited uploads)

Apache Tika Integration With Solr

Step 1:- Configure SolrConfig.xml Requesthandler

 
<requestHandler name="/update/extract"  startup="lazy" class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>

<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>


Step 2:- Adding Fields need to be indexed inside Schema.xml
<field name="content" type="text" indexed="true" stored="true" multiValued="true"/>
Step 3:-  Accessing links of the attachments from moodle & creating the POST request to /update/extact

$fs = get_file_storage();
$context = context_course::instance($courseid);
$files = $fs->get_area_files($context->id, 'course', 'overviewfiles',false,filename,false);  //loop through files and extract URL

 
Step 4:- Search through the documents :)

SO What's Next Week ?

1. prepare the README & Installation Instruction of plugin.

2. Optimize the Code.

3. Test the Code with different test cases and submit.

4. Prepare for mid term evalution.