THIS CONTENT IS OUT OF DATE! IF YOU’RE USING CAKE 2, YOU SHOULDN’T READ THIS.
Major update January 22/10: much of the content of this article has been updated to reflect the changes to the datasource, the latest version of which you can download on Github.
Just out of the oven – a Zend_Search_Lucene datasource for CakePHP (built with 1.2 but probably works just fine in 1.3) that I originally wrote for an in-house CMS site search plugin. I can’t release the plugin itself (and there’s so much CMS-specific code that it would need a lot of work to make it generic anyway), but I thought that someone might find the datasource itself useful. It’s pretty basic at this point and doesn’t implement some of the fancier Zend_Search_Lucene features such as sorting (it just returns sorted in score order, which is probably what you want anyway).
Zend_Search_Lucene is a text-based search index system for developers who don’t want to (or can’t) use a database for search indexing.
Download the current version of the ZendSearchLuceneDatsource from my Github repository.
I won’t go into detail about how to add data into the Lucene database since the Zend Framework documention is so good (CakePHP should be jealous!). You’ll find all the info you need there. There are also a couple of older articles out there that show how you can integrate Zend_Search_Lucene into CakePHP:
Setup
First, copy zend_search_lucene.php to models/datasources.
Then, you’ll need to download the Zend_Search_Lucene library from the Zend Framework website and put some files into your /vendors directory:
- Zend/Search (the directory and all of its contents)
- Zend/Exception.php
You’ll also need to update your include path to include app/vendors, since the Zend Framework loads a lot of classes on its own. I also made a little autoload function to make the loading of Zend Framework classes easier. Put the following code somewhere common, such as app/bootstrap.php:
ini_set('include_path', ini_get('include_path') . ':' . CAKE_CORE_INCLUDE_PATH . DS . '/vendors');
function __autoload($path) {
if (substr($path, 0, 5) == 'Zend_') {
include str_replace('_', '/', $path) . '.php';
}
return $path;
}
You also need to put the DB config for the datasource in config/database.php (updated Jan 20/2010 for better DebugKit compatibility):
var $zendSearchLucene = array( 'datasource' => 'ZendSearchLucene', 'indexFile' => 'lucene', // stored in the cache dir. 'driver' => '', 'source' => 'search_indices' );
Then, in the model that’ll act as your search index (say, for example, SearchIndex), specify the DB config:
<?php class SearchIndex extends AppModel {
var $useDbConfig = 'zendSearchLucene';
}
?>
Saving/Indexing
I’ve tried to keep the datasource functions as simple and familiar as possible. When saving an item to the index, the datasource expects a multidimensional array for each item. For compatibility with CakePHP’s datasource code, the ‘meat’ of the data is nested in the third level of the array. Each sub-array contains information about a field to be stored. For example:
$saveData = array('SearchIndex' => array(
'document' => array(
array(
'key' => 'name',
'value' => $record[$Model->alias][$this->settings[$Model->alias]['name']],
'type' => 'Text'
),
array(
'key' => 'description',
'value' => $record[$Model->alias][$this->settings[$Model->alias]['description']],
'type' => 'Text'
),
array(
'key' => 'url',
'value' => $this->__constructUrl($Model, $record),
'type' => 'Text'
)
)
));
Passing that data in a Model::save() call will in turn execute the following Zend code (more or less – this is a very simplified version of the actual ZendSearchLuceneSource saving code):
$index = Zend_Search_Lucene::open('/path/to/the/index/set/in/dbConfig');
$doc = new Zend_Search_Lucene_Document();
foreach ($data as $field) {
$doc->addField(Zend_Search_Lucene_Field::$field['type']($field['key'], $field['value']));
}
$index->addDocument($doc);
Obviously that’s a basic example; you’ll probably send a whole bunch of dynamic info to the indexer. But that’s the gist of it anyway.
Querying
You can search for records just like you would a regular datasource. Pass the search terms as a “query” condition. If you want the search terms to be highlighted in the returned results, pass ‘highlight’ => true in the array of options. Note that only indexed fields will be highlighted.
You can find all results:
function search($term) {
$results = $this->SearchIndex->find('all', array('highlight' => true, 'conditions' => array('query' => 'best cakephp tutorials')));
}
You can mimic Google’s I’m Feeling Lucky with find(‘first’):
function search($term) {
$topResult = $this->SearchIndex->find('first', array('conditions' => array('query' => 'best cakephp tutorials')));
}
You can even paginate:
function search($term) {
$this->paginate = array(
'limit' => 10,
'conditions' => array('query' => 'best CakePHP tutorials'),
'highlight' => true
);
$results = $this->paginate();
}
Results are returned in the expected CakePHP way, as a multidimensional array – $results[0]['MyModelAlias'] for multiple records, $results['MyModelAlias'] for one (i.e. with find(‘first’)).
There you go – enjoy! As always, comments and suggestions are welcomed.
I used the RSS Feed datasource by Loadsys as a guide to good datasource design. I may have borrowed a function or two.
Neil Crookes’ Searchable plugin also helped.

Good work! I wanted to do it for a long time… too much procrastination!
Nice one Jamie, thanks for sharing (and the link). How is Lucene’s search algorithm. I know MySQL FullText search can be adjusted, but I find it a bag o’ sh1te!
Thanks Neil. Yeah, MySQL FullText is just way too slow and limited. Lucene’s indexing methods make searching and sorting way faster than a FullText search. MySQL has some benefits, like the ability to cross-join between indexes and the ability to modify existing records (since in Lucene you don’t update, you just delete and re-enter), but the speed of Lucene and its additional features like proximity searching, fuzzy searching, and term boosting/weighting just make it a better choice all around for larger search indexes.
I am working with this code and it isn’t indexing anything. I use Luke to look at the index the datasource creates and the tool says the fields don’t exist and no records are available.
What should the inside of the search index folder look like under ‘tmp’ if, for example, I tell the datasource to use an indexFile value of ‘search’?
Making a save call to a Search model using the datasource from the afterSave callback of a Document model seems to create a segments.gen, segments_1, and some .lock files in the tmp/search folder currently. Nothing else. If you like you can email me so we can take this out of the comments until we find out what I am doing wrong.
Hi Abba – it’s difficult to know what the problem is without seeing your code. I can’t promise that I’ll have a lot of time to help, but if you paste in the relevant saving code (along with a sample of the data) then I can give it a look.
Let me play around with the updated datasource and see if I can spot what I am doing wrong before I post any code.
I was reading about .seg files and Lucene and some issues are known with 32bit php builds. I might be running into that as my local dev seems to break whereas remotely I can (sometimes) get it to index.
Well, I can tell you that the first version of the datasource was woefully broken in a few different ways – you should have more success with the current version. Let me know if you’re still having trouble getting it to work.
Thanks Jamie! but i am having some difficulties implementing…i am kinda new to cakephp…i had tried the other tutorial: ‘ Integrating Zend Framework Lucene with your Cake Application’ before seeing yours…i had gone as far as successfully creating the index…it worked…the challenge i had was the quering…the code doesnt seem to work with the latest version of cakephp(which is what i am using)…Yours is more recent and better organised…Please how can i pass the index information from mysql database into your datasource….
I could show you the code i am using to index, if that would help….
Thank you very much
Sure, David – what does your code look like?
You save my day Jamie!!
thanks a lot dude !
thanks Jamie, sorry i am reply now, i was kinda on holiday…Pls find below the code i am using to index: I need to know how to use it with your datasource, that is, read the info from the database and pass the results to your indexer:
addField(Zend_Search_Lucene_Field::UnIndexed(‘document_id’, $document->id));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_title’, $document->company_name));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_description’, $document->description));
// Add the document to the index
$index->addDocument($doc);
}
// Commit the index
$index->commit();
?>
addField(Zend_Search_Lucene_Field::UnIndexed(‘document_id’, $document->id));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_title’, $document->company_name));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_description’, $document->description));
// Add the document to the index
$index->addDocument($doc);
}
// Commit the index
$index->commit();
?>
i found out its truncating my code so let me put the relevant part:
//above we do the necessary bootstraping,
//run the database query to get the records then index:
$companies_rec = mysql_query($sql);
while($document = mysql_fetch_object($companies_rec)) {
// print_r($document);
// Create a new searchable document instance
$doc = new Zend_Search_Lucene_Document();
// Add some information
$doc->addField(Zend_Search_Lucene_Field::UnIndexed(‘document_id’, $document->id));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_title’, $document->company_name));
$doc->addField(Zend_Search_Lucene_Field::Text(‘document_description’, $document->description));
// Add the document to the index
$index->addDocument($doc);
}
// Commit the index
$index->commit();
?>
I’m struggling with this error message.
Unable to import DataSource class .ZendsearchluceneSource [CORE\cake\libs\model\connection_manager.php, line 186]
have i missed a step somehow?
I’m using 1.3 for this.
Joseph – can you post your code from config/database.php, and also any relevant code in your models?
Hi Jamie,
I previously used your copyable plugin which was fantastic to say the least.
I left a message for you in github regarding my issues with using this plugin.
I am stuck at the step for saving/indexing.
Would appreciate it greatly if I can get in touch with you.
Thank you.
Is there a way to run some kinda build to build the index of the existing data. if so. where should i define what fields and data i wanna index.
Am getting this error
ConnectionManager::loadDataSource – Unable to import DataSource class .ZendSearchLuceneSource
code:
http://stackoverflow.com/questions/4119124/cakephp-with-lucene
To everyone who’s asking questions about usage or is having problems: very sorry, I just don’t have the time right now to provide support for this datasource. For one, I don’t even use the thing anymore (that’s how web development goes I guess!). I also intentionally left the “how to fill the Lucene database” out of this post since it’s not part of the datasource and the implementation may vary according to your application’s layout.
So, sorry again but life’s just too busy right now.
The SearchIndex model needs to have $useTable = false, or else cake will complain about missing it’s database table.