Finding Random Model Records in CakePHP 1.2

27 Nov
2009

There are two main methods for finding random records with PHP (and in this case CakePHP) and MySQL:

  1. Use a SELECT query with ORDER BY RAND() and LIMIT x (where x is the number of results you want).
  2. Get a list of the primary keys of every record in the table, use PHP to select x random entries from that list, and then use a SELECT query using a WHERE primary_key IN (x, y, z) clause.

While option #2 might seem to be slower – two queries and intermediary PHP work! – but ORDER BY RAND() tends to churn to an almost-grinding halt when queries tables with lots of records. So, #2 it is. And, since we’re working with CakePHP, we can abstract it to the Model::find() level with a custom __findXX method that I like to call __findRandom(). Note that usage of this function relies on Matt Curry’s implementation of custom find methods (I seem to link to that bit of code in every post!).

The basic idea is that two queries are executed: first Model::find->(‘list’) generate a list of primary keys, and then either Model::find->(‘first’) or Model::find(‘all’), depending on whether we’re looking for one or more records.

The code is available in my GitHub repository.

Usage is as simple as (using the User model as an example):

$record = $this->User->find('random');

That’ll return one random record from the User model. Like most Model::find functions, you can pass an array as an optional second argument:

$records = $this->User->find('random', array(
    'amount' => 5,
    'list' => array(
        'conditions' => array('User.active' => 1)
    ),
    'find' => array(
        'contain' => array('Group')
    )
));

As you can see, you’re actually working with two nested arrays for your query modifiers, ‘list’ and ‘find’. Use these arrays to modify the the find(‘list’) and find(‘first’)/find(‘all’) queries respectively. Use the ‘amount’ option, which defaults to 1, to specify the maximum records you want returned.

You can also bypass the find(‘list’) query entirely by passing an array of primary keys as the ‘suppliedList’ argument. For example:

$records = $this->User->find('random', array(
    'amount' => 5,
    'suppliedList' => $myIDs,
    'find' => array(
        'contain' => array('Group')
    )
));

As you can see, it’s a fairly simple function. But, it gets the job done! Here’s the source code, which, as mentioned above, you can grab on GitHub as well:

<?php
class AppModel extends Model {

    function find($type, $options = array()) {
        $method = null;
        if (is_string($type)) {
            $method = sprintf('__find%s', Inflector::camelize($type));
        }

        if ($method && method_exists($this, $method)) {
            $results = $this->{$method}($options);
        } else {
            $args = func_get_args();
            $results = call_user_func_array(array('parent', 'find'), $args);
        }

        return $results;
    }

    /**
     * __findRandom()
     *
     * Find a list of records ordered by rank.
     * Instead of executing a __findList() query to get the list of IDs,
     * you can pass an array of IDs via the $options['suppliedList']
     * argument.
     *
     * Two queries are executed, first a find('list') to generate a list of primary
     * keys, and then either a find('all') or find('first') depending on the return
     * amount specified (default 1).
     *
     * Pass find options to each query using the $options['list'] and $options['find']
     * arguments.
     *
     * Specify $options['amount'] as the maximum number of random items that should
     * be returned.
     *
     * If you already have an array of IDs(/primary keys), you can skip the find('list')
     * query by passing the array as $options['suppliedList'].
     *
     * @access  private
     * @param   $options  array of standard and function-specific find options.
     * @return  array
     */
    function __findRandom($options = array()) {
        if (!isset($options['amount'])) {
            $amount = 1;
        } else {
            $amount = $options['amount'];
        }

        $findOptions = array();
        if (isset($options['find'])) {
            $findOptions = array_merge($findOptions, $options['find']);
        }

        if (!isset($options['suppliedList'])) {
            $listOptions = array();
            if (isset($options['list'])) {
                $listOptions = array_merge($listOptions, $options['list']);
            }

            $list = $this->find('list', $listOptions);
        } else {
            $list = $options['suppliedList'];
            $list = array_flip($list);
        }        

        // Just a little failsafe.
        if (count($list) < 1) {
            return $list;
        }

        $originalAmount = null;
        if ($amount > count($list)) {
            $originalAmount = $amount;
            $amount = count($list);
        }

        $id = array_rand($list, $amount);

        if (is_array($id)) {
            shuffle($id);
        }

        if (!isset($findOptions['conditions'])) {
            $findOptions['conditions'] = array();
        }

        $findOptions['conditions'][$this->alias.'.'.$this->primaryKey] = $id;
        if ($amount == 1 && !$originalAmount) {
            return $this->find('first', $findOptions);
        } else {
            return $this->find('all', $findOptions);
        }
    }
}
?>

4 Responses to Finding Random Model Records in CakePHP 1.2

Avatar

Selino

January 19th, 2010 at 12:56 am

Warning (512): SQL Error: 1054: Unknown column ‘random’ in ‘where clause’

This is what I get when I execute : $this->set(‘tip’, $this->Tip->find(‘random’));

It looks like it’s not seeing ‘random’ as an option but as a field.

I also had to add a } to the function. When I do a balance braces on the githup code it shows that the parent function definition is not closed.

Avatar

Jamie

January 19th, 2010 at 7:28 am

Thanks Selino – looks like I forgot the closing brace for the class. Also, you’ll need to have support for custom find methods, which I should’ve included in the code. I’ve updated the code on Github to fix both issues.

Avatar

mark

June 18th, 2010 at 12:37 pm

nice idea
but i ended up with using sql internal RAND() because its way faster – especially if the records are getting more and more…

function __findRandom($options) {
$type = ‘first’;
if (isset($options['type']) && $options['type'] != ‘random’) {
$type = $options['type'];
unset($options['type']);
}
$options['order'] = ‘rand()’;
return $this->find($type, $options);
}

Avatar

mark

June 18th, 2010 at 12:41 pm

by the way:
caching the primary keys might actually make your method 100x faster…
usually those records dont change that often. and new ones dont have to appear right away.
maybe by enabling “cache”=>true/false as parameter?

Comment Form

top