Sphinx comes with a PHP and a Python API ("application programming interface") that let's you access its server with well-defined function calls. Here's a quick overview of how it works for me. (This is on CentOS 6.5; some of the file locations might be different in Ubuntu.) I'm using the RPM package for Sphinx 2.2.2-beta on the Sphinx website. There are Ubuntu packages as well for the most recent versions, or you can install the sphinxsearch package from the Ubuntu repositories.
I have a configuration file as /etc/sphinx/sphinx.conf:
Code:
source mysearch
{
type = pgsql
sql_host = localhost
sql_user = myuser
sql_pass = mypass
sql_db = MyPostgreSQLDB
sql_port = 5432
# get the messages from the database
sql_query = SELECT msgid, subject, body, listid, unixdate, author_id FROM messages
# each message belongs to one of twenty listservers identified by listid
# also index on authors and the date the message was posted
sql_attr_uint = listid
sql_attr_uint = author_id
sql_attr_timestamp = unixdate
}
index mysearch
{
source = mysearch
path = /var/lib/sphinx/mysearch
docinfo = extern
#charset_type = sbcs
stopwords = /etc/sphinx/stopwords
}
indexer
{
mem_limit = 128M
}
# stuff below pretty much boiler-plate
searchd
{
listen = 9312
log = /var/log/sphinx/searchd.log
query_log = /var/log/sphinx/query.log
read_timeout = 5
max_children = 30
pid_file = /var/lib/sphinx/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
# compat_sphinxql_magics = 0
workers = threads # for RT to work
binlog_path = /var/lib/sphinx
}
The first two stanzas define a data source and the index. There are two more stanzas that control the two programs that are part of Sphinx. One is a stand-alone indexer program which runs from cron, and the other a daemon called searchd that is started at boot.
In the PHP script that runs the search, I have code like this:
Code:
$spx = new SphinxClient ();
$mode = SPH_MATCH_ALL;
$ranker = SPH_RANK_PROXIMITY_BM25;
$spx->SetServer ( $host, $port );
$spx->SetConnectTimeout ( 1 );
$spx->SetArrayResult ( true );
$spx->SetFieldWeights ( $weights );
$spx->SetRankingMode ( $ranker );
$spx->SetFilter('listid',array($listid),FALSE);
$res = $spx->Query ( $query, $index );
The first line creates a client object from the API file loaded earlier. Then a bunch of defaults are set and the query run with the $spx->Query() function. The "$query" variable contains the search text and the "$index" variable tells Sphinx which database to use; in this case $index would be set to "mysearch" to match the configuration file above.
Sphinx returns an array of ID numbers sorted according to the parameter you specified. I let my users choose to search by relevance (the default), or by youngest to oldest or vice versa. I then pass the array to another PHP script that displays the list of matching messages.
In the 2.1 versions, Sphinx also came with a simple command-line client called search. It's not included in 2.2, so you might want to stick with the 2.1.7 release. It might be an easier solution to program around than using the full API.
Bookmarks