NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Riak Search: All content tagged as Riak Search in NoSQL databases and polyglot persistence

Riak at Clipboard: Why Riak and How We Made Riak Search Faster

Gary William Flake:

For me, the two most important considerations are (1) how easy it is to write effective code and (2) how bulletproof the system is operationally. Others may argue that other attributes — like performance or the particulars of the data model — are more important, but I’ll pick simplicity and robustness every time1. A simple and robust store can usually be finessed to map to any data model and can be scaled outward to make up for performance.

The rest of the article focuses on the solution Clipboard employed to making Riak Search scale for the scenario of performing multi-matching search queries across millions of documents. While the very details apply only to Clipboard and Riak Search, the idea of precomputing results or at least modeling data in ways that optimize the most often access scenarios are generally applicable.

  1. My emphasis. I find these two principles to be the core of Riak. 

Original title and link: Riak at Clipboard: Why Riak and How We Made Riak Search Faster (NoSQL database©myNoSQL)


Full Text Search: What to Use?

A problem everyone using a NoSQL databases faces (nb: actually I think this applies to most storage engines that don’t support full text indexing):

The problem now is: what to use? Currently I’m toying with 3 options:

  1. Use Sphinx Search; it’s pretty powerful, pretty damn fast, but requires me to feed it data through XML, but only when the indexer runs. Basically it’s quite hard to get real-time indexes going, and the delta updates are something I’d rather not mess with. 
  2. Use Solr; I’d go for this if it wasn’t for the fact it’s Java and requires Tomcat to work. Our entire application infrastructure is basically MongoDB and Perl, and I don’t want to go and set up a Tomcat instance just for Solr; on top of which I have a pathologically deep hatred for Java, but that aside…
  3. Roll my own. Full text search the way we need it doesn’t actually require things like stemming or fancy analysis of things. What it does need is the ability to search a schema-less database… Solr and Sphinx both suffer from the fact you need to tell them what to index, and even then you run into the fact that it’ll need a double pass. First pass is getting the search results, and the second pass entails the checking to see whether the user doing the search can actually see the document. 

Couple of thoughts:

  1. there are a couple of solutions out there, both relational and NoSQL databases, that support different degrees of full text indexing (e.g. Riak Search, MarkLogic)
  2. even if your database supports some form of full text search, the implementation might not be complete/optimal.
  3. initially it may sounds like building a reverse index is the best solution. Twitter’s story of migrating from their own reverse indexes in MySQL to a Lucene based solution should change your mind.
  4. some NoSQL databases provide good mechanisms for enabling full text indexing. Riak has post commit hooks, CouchDB has a _changes feed.

Original title and link: Full Text Search: What to Use? (NoSQL database©myNoSQL)


Riak and Riak Search 0.14.2: Patch-Level Releases

Basho released a minor update for both Riak and Riak Search. Release notes for Riak and Riak Search are available at the following links: Riak 0.14.2 and Riak Search 0.14.2.

Original title and link: Riak and Riak Search 0.14.2: Patch-Level Releases (NoSQL databases © myNoSQL)

Riak Search Explained

35 minutes of Riak Search with Dan Reverri which will walk you from the Riak Search basics to running a sample application:

Mark Phillips

Original title and link: Riak Search Explained (NoSQL databases © myNoSQL)

Riak Core: Building Distributed Applications without Shared State

A presentation about Riak core as building blocks of Dynamo-style distributed systems:

These same components have been used for Riak search.

Original title and link: Riak Core: Building Distributed Applications without Shared State (NoSQL databases © myNoSQL)

Podcast: Riak, Riak Search, GitHub with Basho

For the weekend or commute time: a conversation between Basho’s Andy Gross and Mark Phillips and John Nunemaker on Riak, Riak search, and GitHub via ☞ the changelog. MP3 downloadable from ☞ here.

Original title and link: Podcast: Riak, Riak Search, GitHub with Basho (NoSQL databases © myNoSQL)

Riak 0.13, Featuring Riak Search

I’m not very sure how I’ve managed to be the last to the Riak 0.13 party :(. And I can tell you it is a big party.

After writing about Riak search a couple of times already[1], I finally missed exactly the release of Riak that includes Riak search.

Riak 0.13, ☞ announced a couple of days ago, brings quite a few new exciting features:

  • Riak search
  • MapReduce improvements
  • Bitcask storage backend improvements
  • improvements to the riak_code and riak_kv modules — the building blocks of Dynamo-like distributed systems — and better code organization allowing easier use of these modules

While everything in this release sounds like an important step forward for Riak, what sets it aside the Riak search a feature that is currently unique in the NoSQL databases space.

Riak search is using Lucene and builds a Solr like API on top of it (nb I think that reusing known interfaces and protocols is most of the time the right approach).

At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.

☞ The Basho Blog

The Basho blog explains this feature extensively ☞ here and ☞ here.

Riak Search shows a lot of great decisions made by the Basho team, as it avoids reinventing the wheel or creating some new protocols/interfaces. I’ve stressed these aspects a couple of times already, when writing that NoSQL databases should follow the Unix Philosophy and also when writing about how important NoSQL protocols are. Mathias Meyer has a ☞ post detailing why these are important.

Last, but not least the Ruby Riak ripple library ☞ got updated too, but not sure it supports all the new features in Riak 0.13.

Here is a Rusty Klophaus (Basho) talking about Riak search at Berlin Buzzwords NoSQL event:

  1. First post about Riak search Notes on scaling out with Riak and Riak search podcast dates back to December 14th, 2009, just a couple of days after setting up myNoSQL.  ()

Original title and link: Riak 0.13, Featuring Riak Search (NoSQL databases © myNoSQL)