NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data modeling: All content tagged as data modeling in NoSQL databases and polyglot persistence

Modeling a Simple Social App Using SQL and Redis

Felix Lin sent me a link to the slides he presented at NoSQL Taiwan meetup. There are 105 of them!

The deck covers:

  • how to build a simple social site using SQL
  • what are the performance issues with SQL
  • how to use the data structures in Redis for getting the same features
  • how to solve the performance issues in SQL by using Redis

Check them up after the break:

NO DB - the Center of Your Application Is Not the Database

Uncle Bob:

The center of your application is not the database. Nor is it one or more of the frameworks you may be using. The center of your application are the use cases of your application. […] If you get the database involved early, then it will warp your design. It’ll fight to gain control of the center, and once there it will hold onto the center like a scruffy terrier. You have to work hard to keep the database out of the center of your systems. You have to continuously say “No” to the temptation to get the database working early.

Original title and link: NO DB - the Center of Your Application Is Not the Database (NoSQL database©myNoSQL)


Neo4j Data Modeling: What Question Do You Want to Answer?

Mark Needham:

Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it.

This same principle should be applied to modeling with any NoSQL database. Thinking in terms of access patterns is one of the major differences between doing data modeling in the NoSQL space and the relational world, which is driven, at least in the first phases and theoretically, by the normalization rules.

Original title and link: Neo4j Data Modeling: What Question Do You Want to Answer? (NoSQL database©myNoSQL)


Cassandra Data Modeling Examples with Matthew F. Dennis - NoSQL videos

Continuing the Cassandra NYC 2011 video series, made available by the folks from DataStax, this week we have Matthew F. Dennis which covers a couple of different Cassandra data modeling use cases.

Designing HBase Schema to Best Support Specific Queries

Real scenario, very good analysis of different data access requirements, and three possible solutions. What’s your pick?

The problem is fairly simple - I am storing “notifications” in hbase, each of which has a status (“new”, “seen”, and “read”). Here are the API’s I need to provide:

  • Get all notifications for a user
  • Get all “new” notifications for a user
  • Get the count of all “new” notifications for a user
  • Update status for a notification
  • Update status for all of a user’s notifications
  • Get all “new” notifications accross the database
  • Notifications should be scannable in reverse chronological order and allow pagination.

Original title and link: Designing HBase Schema to Best Support Specific Queries (NoSQL database©myNoSQL)


Modeling A/B Tests With Cassandra

A Cassandra data modeling session around a real-life scenario: tracking data for A/B tests:

With most things in life data modeling in Cassandra can be compared to learning to ride a bike. It can be scary, you might fall off, but in the end once you learn a few fundamental concepts everything will be easier to do. The goal of this article is to get you comfortable with a basic data modeling scenario that you will likely see in the real world.

Original title and link: Modeling A/B Tests With Cassandra (NoSQL database©myNoSQL)


6 Ways to Handle Relations in RavenDB and Document Databases

Daniel Lang presents 6 solutions for dealing with relations in RavenDB:

If you’re coming from the sql world, chances are you will be confused by the lack of relations in document databases. However, if you’re running RavenDB you’ve got plenty of options to address this trade-off. I personally cannot think of any situation where I’d wish back SQLServer because of this (there could be other reasons).

Two not recommended:

  • go to the database twice
  • include one document inside the other

Two RavenDB specific solutions:

  • implement a read trigger to do server-side joins
  • implement a custom responder

Two recommended solutions:

  • use the .Include<T>() method
  • denormalize your references

Couple of comments:

  • the difference between “include one document inside the other” and “denormalize your references” is very subtle—the latter suggests including only the information needed for the presentation layer.
  • I think one should consider both “include one document inside the other” and “denormalize your references” and choose one of them depending on the chances of the embedded documents being updated often vs the chances of having the presentation layer changing often
  • except RavenDB, all other document databases seem to offer only two options: “go to the database twice” and “denormalize your references”
  • when Redis will release its version embedding server-side Lua, that could be used as a form of stored procedure

Original title and link: 6 Ways to Handle Relations in RavenDB and Document Databases (NoSQL database©myNoSQL)


Data Modeling for Document Databases: An Auction and Bids System

Staying with data modeling, but moving to the world of document databases, Ayende has two great posts about modeling an auction system: part 1 and part 2. They are great not only because it’s not the Human-has-Bird-and-Cat-and-Dogs example, but also because he looks at different sets of requirements and offers different solutions.

That is one model for an Auction site, but another one would be a much stronger scenario, where you can’t just accept any Bid. It might be a system where you are charged per bid, so accepting a known invalid bid is not allowed (if you were outbid in the meantime). How would we build such a system? We can still use the previous design, and just defer the actual billing for a later stage, but let us assume that this is a strong constraint on the system.

Original title and link: Data Modeling for Document Databases: An Auction and Bids System (NoSQL database©myNoSQL)

NoSQL Screencast: HBase Schema Design

In this O’Reilly webcast, long time HBase developer and Cloudera HBase/Hadoop architect Lars George discusses the underlying concepts of the storage layer in HBase and how to do model data in HBase for best possible performance.

MongoDB, Data Modeling, and Adoption

Micheal Shallop describes in this post how he “built and re-buit” a geospatial table, replacing several tables in MySQL with MongoDB:

The mongo geospatial repository will be replacing several tables in the legacy mySQL system – as you may know, mongodb comes with full geospatial support so executing queries against a collection (table) built in this manner is shocking in terms of it’s response speeds — especially when you compare those speeds to the traditional mySQL algorithms for extracting geo-points based on distance ranges for lat/lon coordinates.  The tl;dr for this paragraph is: no more hideous trigonometric mySQL queries!

But what actually picked my attention was this paragraph:

What I learned in this exercise was that the key to architecting a mongo collection requires you to re-think how data is stored.  Mongo stores data as a collection of documents.  The key to successful thinking, at least in terms of mongo storage, is denormalization of your data objects.

This made me realize that MongoDB adoption is benefiting hugely from the fact that its data model and querying are the closest to the relational databases, neither requiring a radical mindshift from developers that have at least once touched a database. It is like knowing a programming language and learning a 2nd one that follows almost the same paradigms.

The same cannot be said about key-value stores, multi-dimensional maps, MapReduce algorithms, or graph databases. Any of these would require one to dismiss pretty much everything learned in the relational model and completely remodel the world. It’s a tougher job, but when used right the reward pays off.

Original title and link: MongoDB, Data Modeling, and Adoption (NoSQL database©myNoSQL)

Implementing Closure Tables for Hierarchical Data in Redis

Interesting question raised by Hugo on the Redis group about modeling closure tables in Redis. Didier Spezia offers a solution based on sorted sets.

The OP linked to Rendering Trees with Closure Tables and Models for hierarchical data to explain closure tables. The slidedeck at the second link is worth a post by itself.

Original title and link: Implementing Closure Tables for Hierarchical Data in Redis (NoSQL database©myNoSQL)


CouchDB and DDD

Bradley Holt:

I’ve found CouchDB to be a great fit for domain-driven design (DDD). Specifically, CouchDB fits very well with the building block patterns and practices found within DDD. Two of these building blocks include Entities and Value Objects. Entities are objects defined by a thread of continuity and identity. A Value Object is an object that describes some characteristic or attribute but carries no concept of identity. Value objects should be treated as immutable.

Aggregates are groupings of associated Entities and Value Objects. Within an Aggregate, one member is designated as the Aggregate Root. External references are limited to only the Aggregate Root. Aggregates should follow transaction, distribution, and concurrency boundaries. Guess what else is defined by transaction, distribution, and concurrency boundaries? That’s right, JSON documents in CouchDB.

The way I read this is the impedance mismatch between the object model and the document-based model is lower than what we’ve seen in object-relational world.

Original title and link: CouchDB and DDD (NoSQL database©myNoSQL)