Weekly Report 6 Gsoc @ Moodle

week 6(12 july- 19 july)

Indexing Rich types With Solr(Integration of Tika with Apache Solr)

Moodle Courses may have attachments with them. There are two types o f attachment with moodle Courses.

1. Course Summary files(only images types are allowed)

2. Course Overview files (all types with unlimited uploads)

Apache Tika Integration With Solr

Step 1:- Configure SolrConfig.xml Requesthandler

 
<requestHandler name="/update/extract"  startup="lazy" class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>

<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>


Step 2:- Adding Fields need to be indexed inside Schema.xml
<field name="content" type="text" indexed="true" stored="true" multiValued="true"/>
Step 3:-  Accessing links of the attachments from moodle & creating the POST request to /update/extact

$fs = get_file_storage();
$context = context_course::instance($courseid);
$files = $fs->get_area_files($context->id, 'course', 'overviewfiles',false,filename,false);  //loop through files and extract URL

 
Step 4:- Search through the documents :)

SO What's Next Week ?

1. prepare the README & Installation Instruction of plugin.

2. Optimize the Code.

3. Test the Code with different test cases and submit.

4. Prepare for mid term evalution.

Weekly Report 5 Gsoc @ Moodle

Week 5(05 july – 12 july)

Task :- Writing the Methods that talks between Solr’s instance & SolrPhpClient library.

 The task is about writing a PHP File that takes the Solr parameter’s and perform the indexing, deletion of existing index(s) and optimize the index.

File Ajaxcalls.php takes the parameters from YUI module.js file and respond back with the proper json response.

Code Snippets :-
try {
require_once("SolrPhpClient/Apache/Solr/HttpTransport/Curl.php");
$httpTransport = new Apache_Solr_HttpTransport_Curl();       // Create the HttpApacheSolr object
$this->_solr = new Apache_Solr_Service($this->_solrHost, $this->_solrPort, $this->_solrPath, $httpTransport);   // Create the Solr instance
}

catch ( Exception $e ) {
$this->_lastErrorCode = $e->getCode();                              // if not succeed throw the error with
$this->_lastErrorMessage = $e->getMessage();
return false;
}
The deletion can be performed one by one(by passing the id of particular course id) or all at once.

Deletion by passing document id(if a particular course is removed)

Code Snippets:-
       public function deleteById( $doc_id ) {
try {
$this->_solr->deleteById( $doc_id );
$this->_solr->commit();
} catch ( Exception $e ) {
echo $e->getMessage();
}
}

Delete all at once:-
public function deleteAll($optimize = false) {
try {
$this->_solr->deleteByQuery('*:*');
if (!$this->commit())

return false;
if ($optimize)

return $this->optimize();
return true;
} catch ( Exception $e ) {
$this->_lastErrorCode = $e->getCode();
$this->_lastErrorMessage = $e->getMessage();
return false;
}
}
Optimizing the indexes:-

Code Snippets:-
try {
$this->_solr->optimize();
return true;
} catch ( Exception $e ) {
$this->_lastErrorCode = $e->getCode();
$this->_lastErrorMessage = $e->getMessage();
return false;
}

What’s Next week:-

1. Rendering the results on the search page by overriding the existing methods.

2. Writing the README about how to install solr & how to configure.

3. Write the Install/uninstall/Upgrade script of admin tool.

4. Optimize the code & prepare the documentation.

Weekly Report 4 Gsoc @ Moodle

Week 4(26 june – 03 july)

After a lots of experiments with Solr schema. I came up with the final schema.xml this week. All the expected features like Intra-word delimiters, Fuzzy Search, Spell Checks etc.

So What’s Inside?

1. So what Field Types we need :- 

Code Snippets :-

<schema name=”moodle” version=”1.2″>

<types>

<fieldType name=”string” sortMissingLast=”true” omitNorms=”true”/>          // omitNorms=true @we don’t need Index time boosting

<fieldType name=”text” positionIncrementGap=”100″>

</types>

</schema>

2. Fields Need to be Indexed :-

Code Snippets:-

<field name=”fullname” type=”text” indexed=”true” stored=”true”/>
<field name=”shortname” type=”text” indexed=”true” stored=”true”/>

Let’s spell check on Courses:-

<copyField source=”fullname” dest=”spell”/>
<copyField source=”shortname” dest=”spell”/>

SolrConfig.xml 

<searchComponent name=”spellcheck” class=”solr.SpellCheckComponent”>

<str name=”queryAnalyzerFieldType”>textSpell</str>

<lst name=”spellchecker”>

<str name=”name”>default</str>

<str name=”field”>spell</str>

<str name=”spellcheckIndexDir”>./spellchecker1</str>

<str name=”buildOnOptimize”>true</str>

</lst>

</searchComponent>

Currently Schema  is responding fine with:-

  • Stemming
  • Intra-word Delimiter
  • Fuzzy Search
  • Spell Checking

I am looking forward to make the search more powerful. Solr have lots of capabilities 🙂