NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



PostgreSQL: All content tagged as PostgreSQL in NoSQL databases and polyglot persistence

Standalone Heroku Postgres’ Unanswered Question

While the offer is clear and valuable in itself:

  • 99.99% uptime
  • 99.999999999% (eleven nines) durability
  • read-only asynchronous replicas
  • database cloning

I’ve been reading all posts about the announcement looking for the answer to the most obvious question: why would you use Heroku’s Postgres service from outside the Heroku platform?

As far as I can tell:

  • the network latency will be significant
  • network partitions will occur (more often than having both you application and data in the same DC)
  • transfer costs will be significant

So what is the answer?

Media coverage :

Original title and link: Standalone Heroku Postgres’ Unanswered Question (NoSQL database©myNoSQL)

PostgreSQL Hstore: The Key Value Store Everyone Ignored

A post rehashing PostgreSQL hstore capabilities:

I will be focusing on a key value store that is ACID compliant for real! Postgres takes advantage of its storage engine and has an extension on top for key value storage. So plan is to have a table can have a column that has a datatype of hstore; which in turn has a structure free storage. Thinking of this model multiple analogies throw themselves in. It can be a Column Family Store just like Cassandra where row key can be PK of the table, and each column of hstore type in table can be imagined like a super column, and each key in the hstore entry can be a column name. Similarly you can imagine it some what like Hash structures in Redis (HSET, HDEL), or 2 or 3 level MongoDB store (few modifications required). Despite being similar (when little tricks are applied) to your NoSQL store structures, this gives me an opportunity to demonstrate you some really trivial examples.

A couple of comments:

  • you can store key-value pairs in any relational database
  • there are quite a few ACID key-value stores available
  • hstore is more like a document store. Values are not opaque and it supports queries against them.
  • not everyone needs a document database when a key-value store is enough. The most common example is storing web sessions.
  • not everyone needs an ACID compliant database. Not in a distributed system requiring high availability.

Anyway, the conclusion remains the same.

Update: there’s a long thread discussing this post on Hacker News .

Original title and link: PostgreSQL Hstore: The Key Value Store Everyone Ignored (NoSQL database©myNoSQL)


The Durable Document Store You Didn't Know You Had, but Did

As it turns out, PostgreSQL has a number of ways of storing loosely structured data/documents in a column on a table.

  • hstore is a data type available as a contrib package that allows you to store key/value structures just like a dictionary or hash.
  • You can store data in JSON format on a text field, and then use PLV8 to JSON.parse() it right in the database.
  • There is a native xml data type, along with a few interesting query functions that allow you to extract and operate on data that sits deep in an XML structure.

I concur. Not knowing your database *must not* be the reason for adopting a NoSQL database.

Original title and link: The Durable Document Store You Didn’t Know You Had, but Did (NoSQL database©myNoSQL)


Postgres Plus Connector for Hadoop in Private Beta

Not much information available yet on the project page but looks like bidirectional integration of PosgreSQL and Hadoop.

The Postgres Plus Connector for Hadoop provides developers easy access to massive amounts of SQL data for integration with or analysis in Hadoop processing clusters.  Now large amounts of data managed by PostgreSQL or Postgres Plus Advanced Server can be accessed by Hadoop for analysis and manipulation using Map-Reduce constructs.

Posgres Plus Hadoop

When speaking about PostgreSQL and Hadoop, the first thing that comes to my mind is Daniel Abadi’s HadoopDB that became not long ago the technology behind his startup which has already raised $9.5mil.

Original title and link: Postgres Plus Connector for Hadoop in Private Beta (NoSQL database©myNoSQL)

The Story of Etsy's Architecture

Ars Technica’s Sean Gallagher summarizes a presentation given at Surge conference covering the evolution of Etsy’s architecture from a centralized PostgreSQL stored procedures based solution to a sharded MySQL and going through a failed service oriented-like architecture:

And the team started to shift feature by feature away from a semi-monolithic Postgres back-end to sharded MySQL databases. “It’s a battle-tested approach,” Snyder said. “Flickr is using it on an enormous scale. It scales horizontally, basically, to near infinity, and there’s no single point of failure—it’s all master to master replication.”

Original title and link: The Story of Etsy’s Architecture (NoSQL database©myNoSQL)


Tutorial: Building Interactive Maps With Polymaps, TileStach, and MongoDB

A three part tutorial on using MongoDB, PostgreSQL/PostGIS, and Javascript libraries for building interactive maps by Hans Kuder:

  • part 1: goals and building blocks
  • part 2: geo data, PostGIS, and TileStache
  • part 3: client side and MongoDB

Original title and link: Tutorial: Building Interactive Maps With Polymaps, TileStach, and MongoDB (NoSQL database©myNoSQL)

Beyond NoSQL: Using RRD to Store Temporal Data

Patrick Schless describes the pros of using RRDTool for collect write-once data over time, and graph the results.

The projects collect very different data, but this task was painful enough in postgres that I ended up switching to a temporal database for the second go, and it made the data collection & querying much easier. What follows are a brief discussion of the problems I faced with postgres, and how moving to RRD solved them.

Check also the Hacker news thread for a couple of other tricks for RRDTool.

In the NoSQL space, this sort of quick analytics use case was associated with MongoDB:

Other larger platforms have developed their own solutions:

But using a specialized solution has its own benefits… where did we hear that before?

Original title and link: Beyond NoSQL: Using RRD to Store Temporal Data (NoSQL database©myNoSQL)


Building an Ad Network Ready for Failure

The architecture of a fault-tolerant ad network built on top of HAProxy, Apache with mod_wsgi and Python, Redis, a bit of PostgreSQL and ActiveMQ deployed on AWS:

The real workhorse of our ad targeting platform was Redis. Each box slaved from a master Redis, and on failure of the master (which happened once), a couple “slaveof” calls got us back on track after the creation of a new master. A combination of set unions/intersections with algorithmically updated targeting parameters (this is where experimentation in our setup was useful) gave us a 1 round-trip ad targeting call for arbitrary targeting parameters. The 1 round-trip thing may not seem important, but our internal latency was dominated by network round-trips in EC2. The targeting was similar in concept to the search engine example I described last year, but had quite a bit more thought regarding ad targeting. It relied on the fact that you can write to Redis slaves without affecting the master or other slaves. Cute and effective. On the Python side of things, I optimized the redis-py client we were using for a 2-3x speedup in network IO for the ad targeting results.

Original title and link: Building an Ad Network Ready for Failure (NoSQL database©myNoSQL)