Sphinx is an open source full text search server, designed
from the ground up with performance, relevance (aka search quality),
and integration simplicity in mind. It's written in C++ and works on
Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and
a few other systems.
Sphinx lets you either batch index and search data stored
in an SQL database, NoSQL storage, or just files quickly and easily —
or index and search data on the fly, working with Sphinx pretty
much as with a database server.
A variety of text processing features enable fine-tuning Sphinx
for your particular application requirements, and a number of relevance
functions ensures you can tweak search quality as well.
Searching via SphinxAPI is as simple as 3 lines of code, and
querying via SphinxQL is even simpler, with search queries expressed
in good old SQL.
Sphinx clusters scale up to tens of billions of documents and hundreds
of millions search queries per day, powering top websites such as
Craigslist,
Living Social,
MetaCafe and
Groupon...
to view a complete list of known users please visit our Powered-by page.
And last but not least, it's licensed under GPLv2.
-
Indexing performance. Sphinx indexes up to 10-15 MB
of text per second per single CPU core, that is 60+ MB/sec per
server (on a dedicated indexing machine).
-
Searching performance. Searching through 1,000,000-document,
1.2 GB text collection that we use for everyday development and testing
runs at 500+ queries/sec on a 2-core desktop machine with 2 GB
of RAM.
-
Scalability. Biggest known Sphinx cluster
indexes 25+ billion documents, resulting in over 9TB of data. Busiest known one is
Craigslist, serving 300+ million search queries/day.
-
Batch and Real-Time full-text indexes. Two index
backends that support both efficient offline index construction
andincremental on-the-fly index updates are available.
-
Non-text attributes support. An arbitrary number of
attributes (product IDs, company names, prices, etc) can be
stored in the index and used either just for retrieveal (to avoid
hitting the DB), or for efficient Sphinx-side search result set
post-processing.
-
SQL database indexing. Sphinx can directly access
and index data stored in MySQL (all storage engines are supported),
PostgreSQL, Oracle, Microsoft SQL Server, SQLite, Drizzle, and
anything else that supports ODBC.
-
Non-SQL storage indexing. Data can also be streamed
to batch indexer in a simple XML format called XMLpipe,
or inserted directly into an incremental RT index.
-
Easy application integration. Sphinx comes with
three different APIs, SphinxAPI, SphinxSE, and SphinxQL.
SphinxAPI is a native library available for Java, PHP, Python,
Perl, C, and other languages. SphinxSE, a pluggable storage
engine for MySQL, enables huge result sets to be shipped
directly to MySQL server for post-processing. SphinxQL lets
the application query Sphinx using standard MySQL client
libary and query syntax.
-
Advanced full-text searching syntax. Our querying engine
supports arbitrarily complex queries combining boolean operators,
phrase, proximity, strict order, and quorum matching, field and
position limits, exact keyword form matching, substring
searches, etc.
-
Rich database-like querying features. Sphinx does not
limit you to just keyword searching. On top of full-text
search result set, you can compute arbitrary arithmetic
expressions, add WHERE conditions, do ORDER BY, GROUP BY,
use MIN/MAX/AVG/SUM, aggregates etc. Essentially, full-blown
SQL SELECT is supported.
-
Better relevance ranking. Unlike many other engines,
Sphinx does not solely rely on 30-year-old statistical ranking
that only considers keyword frequencies, nor limits you to it.
By default, Sphinx additionally analyzes keyword proximity,
and ranks closer phrase matches higher, with perfect matches
ranked on top. Also, ranking is flexible: you can choose
from a number of built-in relevance functions, tweak their
weights by using expressions, or develop new ones.
-
Flexible text processing. Sphinx indexing features
include full support for SBCS and UTF-8 encodings (meaning that
effectively all world's languages are supported); stopword removal
and optional hit position removal (hitless indexing); morphology
and synonym processing through word forms dictionaries and stemmers;
exceptions and blended characters; and many more.
-
Distributed searching. Searches can be distributed
across multiple machines, enabling horizontal scale-out and HA
(High Availability).
The Sphinx Search server is dual-licensed, thus it can be either
commercially licensed or freely available via the Downloads page if used in accordance with the terms of the GPL v.2.
For those interested in commercial licensing, typically needed for
embedding Sphinx in non-GPL products (OEMs/ISVs). Please refer to the
Commercial Licensing page for additional information, or reach
out to the Sphinx Licensing team directly via our Contact page.