NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



indexing: All content tagged as indexing in NoSQL databases and polyglot persistence

MongoDB Indexing in Practice

An article based on Kyle Banker’s MongoDB in Action:

Indexes are enormously important. With the right indexes in place, MongoDB can use its hardware efficiently and serve your application’s queries quickly. With the wrong indexes, you’ll see the exact opposite effect: slow queries and poorly utilized hardware. It stands to reason, then, that anyone wanting to use MongoDB effectively and make the best use of hardware resources must understand indexing. We’re going to look at some refinements on the kinds of indexes that can be created in MongoDB. We’ll then proceed to some of the niceties of administering those indexes.

While pretty detailed, the part I haven’t seen mentioned in this article is that MongoDB indexes are stored using memory mapped files (same mechanism as for storing data). Basically this means that your data and all your indexes are all competing for your system memory.

Original title and link: MongoDB Indexing in Practice (NoSQL database©myNoSQL)


Hybrid Word Aligned Bitmaps: Why are column oriented databases so much faster than row oriented databases? -

Terence Siganakis:

I have been playing around with Hybrid Word Aligned Bitmaps for a few weeks now, and they turn out to be a rather remarkable data structure.  I believe that they are utilized extensively in modern column oriented databases such as Vertica and MonetDB. Essentially HWABs are a data structure that allows you to represent a sparse bitmap (series of 0’s and 1’s) really efficiently in memory.  The key trick here is the use of run length encoding to compress the bitmap into fewer bits while still allowing for lightening fast operations.  

The comment thread discusses a couple of reasons for column databases being faster than row-oriented databases and some scenarios where this is not happening.

Terence Siganakis links to FastBit: An efficient compressed Bitmap index technology :

FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica.

Original title and link: Hybrid Word Aligned Bitmaps: Why are column oriented databases so much faster than row oriented databases? - (NoSQL database©myNoSQL)


Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements

Releasing often has too many advantages to list them all, but I think the major ones are: capturing the interest of new users (generating buzz), showing a healthy project velocity, and, probably the most important one, delivering the features and improvements users were asking for in a timely manner . Neo4j has learned these lessons[1] and since Neo4j 1.2 the team at Neo Technologies is trying a very frequent release plan which also includes milestone releases. The other day, Neo4j 1.4, a.k.a. Kiruna Stol, has been released:

Over the last three months, we’ve released 6 milestones in our 1.4 series. Today we’re releasing the final Neo4j 1.4 General Availability (GA) package. We’ve seen a whole host of new features going into the product during this time, along with numerous performance and stability improvements. We think this is our best release yet, and we hope you like the direction in which the product is heading.

There are some notable new features and improvements in this release:

  1. a new query language called Cypher[2]
  2. automatic indexing
  3. a Lucene upgrade leading to faster indexing
  4. self relationships
  5. REST API improvements: exposing batch execution API, paging mechanism for traversers
  6. webadmin, performance, and new server management scripts

  1. In the NoSQL space, they are not alone. 10gen follows a similar aggressive release plan for MongoDB. Redis, even if supported by a 2 people team, has always enjoyed frequent releases. DataStax has also started to push out Cassandra updates more often.  

  2. At first glance the query language looks odd, but I haven’t looked beyond some basic examples to understand its syntax and strenght. Neo4j also supports Gremlin.  

Original title and link: Neo4j 1.4 “Kiruna Stol” Released With Many Notable Improvements (NoSQL database©myNoSQL)