利用 Sphinx 实现实时全文检索

jopen 10年前
     <p>Sphinx 0.9.9及以前的版本,原生不支持实时索引,一般的做法是通过主索引+增量索引的方式来实现“准实时”索引,最新的1.10.1(trunk中,尚未发 布)终于支持real-time index,查看SVN中文档,我们很容易利用Sphinx搭建一个按需索引(on demand index)的全文检索系统。</p>    <p>参考文章:<a href="/misc/goto?guid=4959499524632353411">http://filiptepper.com/2010/05/27/real-time-indexing-and-searching-with-sphinx-1-10-1-dev.html</a></p>    <p>首先,从sphinxsearch的SVN下载最新的代码,编译安装:</p>    <div>     <div>      <pre>svn checkout http://sphinxsearch.googlecode.com/svn/trunk sphinx cd sphinx/ ./configure --prefix=/path/to/sphinx make make install</pre>     </div>    </div>    <p>编译没问题的话,在sphinx安装目录下的etc,建立sphinx.conf的配置文件,记得一定指定中文编码方面的配置搜索,否则中文会有问题:</p>    <div>     <div>      <pre>index rt {     # 指定索引类型为real-time index     type = rt     # 指定utf-8编码     charset_type  = utf-8     # 指定utf-8的编码表     charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F     # 一元分词     ngram_len = 1     # 需要分词的字符     ngram_chars   = U+3000..U+2FA1F     # 索引文件保存地址     path = /path/to/sphinx/data/rt     # 索引列     rt_field = message     # 索引属性     rt_attr_uint = message_id }   searchd {     log = /path/to/sphinx/log/searchd.log     query_log = /path/to/sphinx/log/query.log     pid_file = /path/to/sphinx/log/searchd.pid     workers = threads     # sphinx模拟mysql接口,不需要真正的mysql,mysql41表示支持mysql4.1~mysql5.1协议     listen = 127.0.0.1:9527:mysql41 }</pre>     </div>    </div>    <p>启动sphinx服务:</p>    <div>     <div>      <pre>/path/to/sphinx/bin/searchd --config /path/to/sphinx/etc/sphinx.conf</pre>     </div>    </div>    <p>插入几条数据看看:</p>    <div>     <div>      <pre>ubuntu:chaoqun ~:mysql -h127.0.0.1 -P9527 Welcome to the MySQL monitor.  Commands end with ; or \g. Your MySQL connection id is 1 Server version: 1.10.1-dev (r2351)   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   mysql> INSERT INTO rt VALUES (1, 'this message has a body', 1); Query OK, 1 row affected (0.01 sec)   mysql> INSERT INTO rt VALUES (2, '测试中文OK', 2); Query OK, 1 row affected (0.00 sec)   mysql></pre>     </div>    </div>    <p>测试全文检索:</p>    <div>     <div>      <pre>mysql> SELECT * FROM rt WHERE MATCH('message'); +------+--------+------------+ | id   | weight | message_id | +------+--------+------------+ |    1 |   1643 |          1 | +------+--------+------------+ 1 row in set (0.00 sec)   mysql> SELECT * FROM rt WHERE MATCH('OK'); +------+--------+------------+ | id   | weight | message_id | +------+--------+------------+ |    2 |   1643 |          2 | +------+--------+------------+ 1 row in set (0.01 sec)   mysql> SELECT * FROM rt WHERE MATCH('中'); +------+--------+------------+ | id   | weight | message_id | +------+--------+------------+ |    2 |   1643 |          2 | +------+--------+------------+ 1 row in set (0.00 sec)   mysql> SELECT * FROM rt WHERE MATCH('我'); Empty set (0.00 sec)   mysql></pre>     </div>    </div>    <p>简单方便,码完收工。</p>