ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

document database: All content tagged as document database in NoSQL databases and polyglot persistence

Drawn to Scale Announces Spire for Mongo

Bradford Stephens (CEO Drawn to Scale):

Today, we’re announcing that we’ve ported MongoDB onto Spire as a platform. What this means is:

  1. You can easily scale your MongoDB cluster to 200+ TB
  2. You don’t need to change a line of code in your app to make it scale
  3. You can use ANSI SQL (yes, joins), Mongo queries, and Hadoop on the same data.

Just a couple of thoughts:

  1. the push of NoSQL databases to get SQL support is growing extremely fast. But I still doubt this is happening thanks to the advantages of SQL, but more due to the 30 years of investments in the SQL ecosystem.
  2. I don’t agree with Bradford’s “MongoQL is also great because unlike SQL, there is only one flavor”. As far as I can tell, MongoDB comes with 3 flavors of queries: the object-based query language, the aggregation framework (a combination of object-based QL and pipelining) and the Javascript-based MapReduce
  3. last but not least, what are Ryan Rawson’s thoughts about Drawn to Scale going Mongo.

Original title and link: Drawn to Scale Announces Spire for Mongo (NoSQL database©myNoSQL)

via: http://drawntoscale.com/announcing-spire-for-mongo/


10gen’s MongoDB Following the Steps of MySQL

10gen has never been shy about their plan: replacing MySQL. That’s a bold goal considering Oracle is now behind MySQL. But this could also make things a bit easier for 10gen.

Anyways, what made me write this separate post is the realization of how close 10gen is following the MySQL path:

  1. release early and incomplete. Enhance over time
  2. position the product as the developer friendly and fast
  3. introduce an enterprise edition once your adoption overpassed that of your immediate competitors.

I guess I already know how it’ll end: $2 billion acquisition from a company that gets acquired by Oracle.

While the official announcement of MongoDB 2.4 version mentioned just in passing the “MongoDB Enterprise” version, other websites didn’t leave this aspect aside. Actually it’s what got emphasized about the today’s announcement. In case you wonder what’s the the 10gen’s enterprise box: Kerberos-based security and an on-premise version of the MongoDB Monitoring Service.

The only question I have now is how soon Oracle will start looking into acquiring 10gen. Or how soon it will dedicate marketing and sales resources to directly address 10gen.

Original title and link: 10gen’s MongoDB Following the Steps of MySQL (NoSQL database©myNoSQL)


MongoDB 2.4 Released: Hash-Based Sharding, Geo Enhancements, Text Search

MongoDB 2.4 is out:

Highlights of MongoDB 2.4 include:

  • Hash-based Sharding
  • Capped Arrays
  • Text Search (Beta)
  • Geospatial Enhancements
  • Faster Counts
  • Working Set Analyzer
  • V8 JavaScript engine

Original title and link: MongoDB 2.4 Released: Hash-Based Sharding, Geo Enhancements, Text Search (NoSQL database©myNoSQL)

via: http://blog.mongodb.org/post/45754637343/mongodb-2-4-released


Battle-Test Your MongoDB Cluster

Kristina Chodorow1 shared a good list of tests to put a MongoDB cluster through:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to fix it.

Netflix is using a series of tools that perform similar tests against their Cassandra clusters. With a small twist: they are run against the production clusters.


  1. In a recent post, Kristina Chodorow, one of the most prominent figures of the MongoDB world, has announced she has decided to become a Googler. Good luck Kristina! 

Original title and link: Battle-Test Your MongoDB Cluster (NoSQL database©myNoSQL)

via: http://architects.dzone.com/articles/databases-and-dragons-battle


MongoDB Touch Command

MongoDB 2.2 introduced thetouch command, which loads data from the data storage layer into memory. The touch command will load a collection’s documents, indexes or both into memory. This can be ideal to preheat a newly started server, in order to avoid page faults and slow performance once the server is brought into production. You can also use this when adding a new secondary to an existing replica set to ensure speedy subsequent reads.

I could see how this command could be useful for a caching system, but I haven’t seen it in any database. It’s probably a workaround for the memory mapped files mechanism used by MongoDB’s persistence.

Original title and link: MongoDB Touch Command (NoSQL database©myNoSQL)

via: http://blog.mongodb.org/post/44706549534/mongodb-tip-the-touch-command


Comparing MongoDB New Aggregation Framework and SQL

Francois Zaninotto:

MongoDB 2.1 introduced the aggregation framework, a faster alternative to Map/Reduce for common aggregation operations. If you took a look at the documentation and examples, you may have found the feature intimidating. Once you tame it, this new feature reveals itself as a very powerful beast. So read on to discover its true power through a series of examples.

The aggregation framework is indeed an interesting feature of MongoDB. And it’s definitely more useful compared to MongoDB’s MapReduce which came with quite a few limitations. Now MongoDB has 3 different languages: the object-based query language, the aggregation framework (still object-based but using different operators and execution model) and the Javascript-based MapReduce.

Original title and link: Comparing MongoDB New Aggregation Framework and SQL (NoSQL database©myNoSQL)

via: http://architects.dzone.com/articles/comparing-mongodb-new


A Key-Value Cache for Flash Storage: Facebook's McDipper and What Preceded It

A post on Facebook Engineering’s blog:

The outgrowth of this was McDipper, a highly performant flash-based cache server that is Memcache protocol compatible. The main design goals of McDipper are to make efficient use of flash storage (i.e. to deliver performance as close to that of the underlying device as possible) and to be a drop-in replacement for Memcached. McDipper has been in active use in production at Facebook for nearly a year.

I know at least 3 companies that have attacked this problem with different approaches and different results:

  1. Couchbase (ex-Membase, ex-NorthScale) started as a persistent clustered Memcached implementation. It was not optimized for Flash storage though. Today’s Couchbase product is still based on the memcache protocol, but it adding new features inspired by CouchDB.
  2. RethinkDB, a YC company and the company that I work for, has worked and released in 2011 a Memcache compatible storage engine optimized for SSDs. Since then, RethinkDB has been building and released an enhanced product, a distributed JSON store with advanced data manipulation support.
  3. Aerospike (ex Citrusleaf) sells a storage engine for flash drives. Its API is not Memcache compatible though.

People interested in this market segment have something to learn from this.

Original title and link: A Key-Value Cache for Flash Storage: Facebook’s McDipper and What Preceded It (NoSQL database©myNoSQL)

via: http://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920


MongoDB Represents the Perfect Opportunity for Rackspace's Fanatical Support

Rackspace in a post explaining why they bought into MongoDB through the acquisition of ObjectRocket:

MongoDB is easy to get started, but complex to manage and scale.

I bet 10gen loves reading things like this. It also serves well the adoption of MongoDB.

Original title and link: MongoDB Represents the Perfect Opportunity for Rackspace’s Fanatical Support (NoSQL database©myNoSQL)

via: http://www.rackspace.com/blog/why-mongodb/


Rackspace Buys MongoDB Hosting Provider ObjectRocket

According to GigaOm, Rackspace has acquired MongoDB hosting provider ObjectRocket, of which I’ve heard about only recently when I learned something absolutely fascinating:

The cloud is broken. It’s not designed to properly run persistent data stores like MongoDB. ObjectRocket is designed from the ground up to fix this problem.

Rackspace first thing to do after signing the docs is to take this page out.

Original title and link: Rackspace Buys MongoDB Hosting Provider ObjectRocket (NoSQL database©myNoSQL)


MarkLogic’s New (Aggressive) Voice

MarkLogic has been around for a while. I don’t have any details about how their business is doing, but attention wise, I’m pretty sure they’d love to get a slice of what younger NoSQL database get.

In the last few weeks, I got the impression there’s a change of voice in MarkLogic’s message.

The first sign: “Playtime with MongoDB is Over. Upgrade to MarkLogic Enterprise NoSQL.“:

When playtime is over and it is time to seriously support the needs of your enterprise, the clear choice is to upgrade to MarkLogic Enterprise NoSQL. (We even have a Mongo2MarkLogic converter tool that speeds the import of data from MongoDB into MarkLogic so you can start using MarkLogic’s integrated search and enterprise features faster.)

To be clear, the post calls our Cassandra, MongoDB, Riak and HBase.

Second sign: “Get Your Facts Straight: We’ve Had Enterprise-Grade Security Longer“:

DataStax put out a press release today claiming that with their new release of DataStax Enterprise 3 they were the “World’s First NoSQL Big Data Platform With Comprehensive Enterprise-Grade Security.”

We’d like to set the record straight. MarkLogic has had Enterprise-grade security for well over 10 years. So, while I won’t make the claim that we were first — I certainly won’t accept that DataStax was first either.

Both these posts are bold. I like that. What I don’t like though is the aggressive and dismissive tone. That might bring you attention, but not the type that comes with new users.

Original title and link: MarkLogic’s New (Aggressive) Voice (NoSQL database©myNoSQL)


Integrating MongoDB and Hadoop: Why & How

The Mortar blog:

Mongo was built for data storage and retrieval, and Hadoop was written for data processing. So naturally, data processing is often better offloaded to Hadoop. Here’s why:

  1. Easier, more expressive language
  2. Libraries to build on
  3. Big performance improvements
  4. Separate workloads mean less load

For the how part, the post recommends their own Hadoop-as-a-Service platform and a set of libraries the Mortar platform provides.

✚ While browsing the Mortar blog and website I couldn’t find any information related to the costs of transferring data. The AWS services usually have a data transfer dimension, which most often has an important impact on the total costs of a solution.

Original title and link: Integrating MongoDB and Hadoop: Why & How (NoSQL database©myNoSQL)

via: http://blog.mortardata.com/post/43080668046/mongodb-hadoop-why-how


The State of CouchDB - Jan Lehnardt’s Comment

Jan Lehnardt posted a long reply to my comments on the State of CouchDB. I thought many would benefit from promoting it to a real post (with Jan’s permission). Before handing it over to Jan, I want to thank him for taking the time to clarify some of the things. I also want to be clear that I still stand by all my comments. Now, to Jan Lehnardt:

Hey Alex,

you are of course correct and I stand by my post. Let me explain the discrepancy.

The post is a summary of my notes for my opening talk of CouchDB Conf that I ran in Berlin in January. The target audience are the people in the actual audience. There are people who build CouchDB, people who help out with CouchDB, people who use CouchDB and a few people who want to know what’s up with CouchDB. By and large though, these are what I’d call “CouchDB People”.

You interpret the post as if it were for a general public audience and it is entirely my fault making that not more clear in the opening of my post.

To your notes:

  • Confusion: spot on, our bad, mistakes were made, it’s gonna take time to get sorted.

  • passive aggressive style: sorry if that read this way, it was definitely not intended. It was to highlight that there are people who absolutely love what CouchDB does. It isn’t a statement about quantities, which your note about “your numbers” implies. I’m not interested in discussing numbers, but I understand that people have turned away for good reasons. — Consider being an enthusiast, and you go to a conference of like-minded people and the project lead gives a talk and says, “people like you are passionate”. “Fuck yeah I am passionate” you think, or say, and get a good vibe going at the conference. (For CouchDB fans, it was really good times).

  • list of features: good stuff on there, but none of that matters until it ships. This is for people on the inside to see what we are working towards and get them rallied up to help and contribute. You assessment that the “real” features are in the gist is misguided, but I chalk that up to differing opinions, no harm done.

  • *ouch projects: hell yeah I am excited to finally fulfil the original promise of CouchDB from fucking six years ago. The thing to highlight here isn’t that “boo a bunch of things that rhyme with ouch”, but that we are starting to see a production-ready ecosystem of a true open source data sync solution that bloody works. — I agree with you that branding/communication is key here, and that there is a lot to be done.

  • facts matter / “came out on top: nope”: again spot on, but this is where I really wished you had given me shout before posting this. Maybe next time, you should still have my number. Your assessment that we did not come out on top is completely correct, if you look at it from a general public point of view. It’d be odd to deny the facts. But again, this isn’t meant as a post that says “hey everyone, look how great CouchDB is doing”, because a) CouchDB isn’t and b) the intended audience is not everybody. What I did meant to point out that the open source project is aware of the challenges it is facing and is doing its utmost to set everything up so things can be resolved. We spent twelve months in relative obscurity preparing many things that are starting to see the light of day now, but most of it in the future. Only when we delivered on all of that, we can look at the facts again and see how CouchDB is doing. I am confident that it’ll look good, but there is a lot to be done until then. The “coming out on top” is a comment on that the core of the project and its community are strong, and that we are in a position to turn the boat back to former and further glory and not that we are somehow deceived by our own filter bubble and believe that all is well when it isn’t.

Thanks for giving me the opportunity to explain things. I think your assessment of CouchDB in general of the past years has been spot on and your current criticisms are also accurate and well received, it’s just that you got the intended audience for my post wrong, which I didn’t make very clear to the casual observer.

Original title and link: The State of CouchDB - Jan Lehnardt’s Comment (NoSQL database©myNoSQL)