ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Getting Started with NoSQL

Couple of weeks ago, I had the pleasure to sit down with Mathias Meyer, Chief Visionary at Scalarium, a Berlin startup and discuss NoSQL adoption. Like myself, Mathias is really excited about NoSQL and he uses every opportunity to introduce more people to the NoSQL space. Recently he gave quite a few presentations around the Europe about NoSQL databases.

The discussion has focused on how would someone start learning and using NoSQL databases and the path to follow in this new ecosystem. Below is a transcript of our conversation.

Alex: How does one get started with NoSQL?

Mathias: Well, that’s a question I get quite a lot, but it is not that easy to answer. For me, I just pick one tool and start playing with it. If I see a use case for it, I add it to my tool box. If not, I broadened my personal horizon. Just that is always a win in my book.

From a business perspective, you are probably going to find some use cases where storing your data in a relational database doesn’t make too much sense and you’ll start looking for ways to get it out of the database. For example, think about storing logs data, or collecting historical data, or page impressions.

Alex: So, as a developer you should just give yourself a chance to play with the new shiny toys. As a business, a NoSQL database can be a viable solution for scenarios where you discover that your data doesn’t really fit the relational model.

Mathias: Indeed. You have stuff in your database and it is too much for your database, or it puts too much load on your database, and you’re looking for ways to get that out of your database. Load is a relative term, but consider data like logging or statistical data that grows somewhat exponentially. Relational databases are not a great fit to keep track of that kind of data, as it gets harder and harder to maintain or clean up as it grows.

As a developer playing with new tools and different ways of solving problems makes sense all by itself, simply because it adds to your toolbox, and it broadens your personal and professional horizon. That’s basically how I got into NoSQL. I stumbled upon tools, which in turn use databases that are more optimized to store data for their use case. It’s just fun playing with them, and new tools with different approaches of storing data always managed to make me curious. And who can resist a database that allows you to connect through telnet? I think that appeals to any geek I know.

Alex: There are quite a few NoSQL databases out there. Do you have any favorites or recommendations?

Mathias: If there’s any bunch of tools I’d recommend for anyone to start playing with, it’d probably be MongoDB, CouchDB or Redis. They are excellent candidates to take data off your main database, and happily live alongside of it.

If you just want to play with a NoSQL database, and you’re coming from a relational background, your easiest bet would probably be MongoDB, as it’s a good mix what you’re used to from relational databases with the best of schemaless storage. Redis makes sense to look at because it’s a good candidate to take certain types of data out of your main database. Statistics, message queues, historical data are just some examples.

When you work with something like MongoDB and CouchDB you’ll get a good idea of what NoSQL is about, as MongoDB is halfway between a relational and NoSQL database while CouchDB is basically totally different thinking all the way. If all you’re looking for is scale, have a look at Riak or Cassandra. They follow pretty interesting models of scaling up.

Alex: These NoSQL databases are proposing some new non-relational data models. Do you like one model more than the others?

I’d say my favorite is the document database as it is pretty much the most versatile of all of them. You can put any data in a document database and it leaves you all the freedom to model that data and to model some of the relationships between documents. It leaves all that up to you. And it is very flexible on how you can do that.

Personally I like looking into different solutions and maybe even combining them. That’s exactly what I do in practice I usually have something like CouchDB as my main database and something like Redis as a really nice and handy small store on the side where I put data that’s not suited for putting into CouchDB.

Alex: Is there something that you should be aware of before trying any of these NoSQL projects?

Mathias: It depends if you are doing it for your business or for yourself, or if you are using it on green field projects because that’s usually a lot easier. The things I always like to tell people is that they need to look at what they think their data is gonna be shaped like. Obviously you won’t know that right from the start, but you’ll still have an idea of how loose your data will be, if you need something like typed relationships, transactions, and so on.

You can’t really give a universal answer here. In the end you’ll have to get an idea of what your data will look like and how you’re going to read or write it. If a NoSQL database seemingly is a good fit for it, go for it. It’s just important to be aware of both the benefits and the potential downsides, but that should be common sense for any tool you pick for a particular use case.

Alex: Well, I’d say that based on my experience with relational databased there are at least 3 things I’ve really gotten used to: the relational model, the query model and transactions. So for someone looking to NoSQL databases he should be aware that all these 3 concepts will have a different form.

Mathias: Yes, absolutely. You need to be aware that you’ll meet a different data model, which brings great power and flexibility. You’ll find that most of the tools in the NoSQL landscape removed any kind of transactional means, for the benefit of simplicity, making it a lot easier to scale up. We might not realize that transactions are not always needed, which is not to say they’re totally unnecessary, it’s merely that oftentimes they’re lack is not really a problem.

As for querying, for the most part you’re saying good bye to ad-hoc queries. Most NoSQL databases removed means to run any kind of dynamic query on your data, MongoDB being the noteworthy exception here. Data is usually pre-aggregated by e.g. using Map/Reduce, or access is simply done by keys. Is it a problem? Only you can make that decision simply based on requirements and features.

Either way, it does take a while to get used to these things, no doubt.

Alex: Once you start using NoSQL databases, will you have to get rid of RDBMS?

Mathias: No. If someone comes to me asking if they should switch to a NoSQL database without having a specific problem, my answer is always no. You should look for alternative solutions only when you need to solve a real problem (which is usually that your current database is not able to keep up with all the types of data you throw at it, or you’re storing lots of data and it’s kind of a pain to get it out again, both in terms of querying or simply removing stale data in large tables, or your data simply has reached a limit where it’s too high a cost to migrate your schema).

As Jan Lehnardt said, NoSQL is more about choice, you pick the tool that is right for the job, and if that tool is an RDBMS then you don’t need to look for a NoSQL database until you have a specific problem. While the new tools are shiny and tempting to throw at every problem, there’s always a learning curve involved, both in development and operations. It makes more sense to start off slow, and see how you go by just moving small parts at a time to a secondary database.

Alex: Thanks Mathias!


You can find Mathias Meyer on Twitter and blogging on paperplanes.de.