NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



sponsor: All content tagged as sponsor in NoSQL databases and polyglot persistence


My thanks to Actian/Pervasive for sponsoring this week of myNoSQL to promote their “pull data from pretty much anywhere and load it into Hadoop”, Actian Rushloader tool.

it looks like Actian wants to play an important role in the Big Data market as they have recently announced the acquisition of the Amazon-funded ParAccel whose main tool powers Amazon Redshift data warehouse service.

Original title and link: Actian/Pervasive (NoSQL database©myNoSQL)


3 Steps for a Fast Relational Database to Hadoop Data Load [Sponsor]

Words from this week’s sponsor, Pervasive/Actian:

So, you want to pull a buttload (That’s a technical term.) of data out of a relational database and slam it into HDFS or HBase for processing. Well, maybe you’ve got a nice, powerful Hadoop cluster, but that old school database isn’t designed for parallel data exports. How do you get the data moved into Hadoop before you’re eligible for retirement?

Here’s how:

  1. Use the new Actian Rushloader. It’s a nice, simple, free tool that allows you to pull data from any database that has a JDBC driver, as well as log files, delimited files, HBase and ARFF files. RushLoader functions on any operating system with a JVM and with any file system, including Amazon S3, UNIX and HDFS.

    The nice thing about RushLoader is that on the surface, it’s a quick and easy, point and click workflow tool, a cut down version of the KNIME open source data mining platform. Under the covers, it uses the DataRush engine that divides and optimizes workloads at runtime, so it takes full advantage of as much parallel hardware power as you give it, without you having to do any coding work to make it happen.

  2. Configure the data query in the Rushloader database reader like this:

    (t = a table name, c = a column name)
     Select * from t where c =?
  3. Set up a parameter query for ? like this: Select distinct c from table

These three steps will give you all the distinct values in the column, and send a separate query for each value to the database. Having each row query separated allows the DataRush engine to automatically spread the work across the available machines and threads, giving you a high speed parallel data pull. There’s more info on parameter queries is in the DataRush docs, and the new Actian big data community provides a DataRush toolset discussion forum if you run into trouble.

The free RushLoader includes simple row and column filtering. If you want to get any more sophisticated about the load - add data quality checks, do aggregations, sorting, source joins, lookups, that sort of thing - you have to move up to the commercial version, RushAnalytics. If all you need is a lot of data pulled from an RDBMS and slammed into Hadoop, Rushloader can do the job faster by far than anything else on the market.

Original title and link: 3 Steps for a Fast Relational Database to Hadoop Data Load [Sponsor] (NoSQL database©myNoSQL)

NoSQL Search Roadshow [Sponsor]

This week’s sponsor doesn’t have a specific message. But I do have one for them.

The people behind this road events are the fine folks from Trifork. They’ve been organizing JAOO, nowadays GOTO, for quite a while. They’ve also been part of the QCon conferences. If you’ve ever been to any of these events you’ll know immediately what I mean. I haven’t been yet to a NoSQL roadshow, but besides Berlin, Copenhagen, Zurich and Amsterdam, I’ve heard they’ll pass by San Francisco too. Most probably I’ll be there.

While the conference roster changes from event to event, I’m pretty sure you’ll get some of the best. Looking at Berlin, I can see Michael Hunger, Chris Molozian and Pavlo Baron.

The next event is in Berlin on April 16th. You need to hurry up for a dose of NoSQL, german cars and beers1.

  1. If you know me and you really, really want to go to the event drop me a line and I might be able to do something for you. 

Original title and link: NoSQL Search Roadshow [Sponsor] (NoSQL database©myNoSQL)


For the third time my thanks to Aerospike for sponsoring myNoSQL to promote the homonymous key-value store specialized in performance.

The case studies Aerospike has published during the last couple of months are focusing on scenarios that require speed. If you take a look at their customers, you’ll notice quite a few coming from the ad serving business. The scenario for these is: have many precomputed values and then serve them as fast as possible. It’s very simplified, but that’s basically it.

Original title and link: Aerospike (NoSQL database©myNoSQL)


A new benchmark study evaluates Aerospike, Cassandra, Couchbase and MongoDB

Words from this week’s sponsor, Aerospike:

A new benchmark study evaluates Aerospike, Cassandra, Couchbase and Mongodb and examines the benefits of using a NoSQL database with the ability to process transactions in the face of hardware or other node failures.

Original title and link: A new benchmark study evaluates Aerospike, Cassandra, Couchbase and MongoDB (NoSQL database©myNoSQL)


My thanks again to Aerospike for sponsoring myNoSQL for the 2nd week to promote their super-fast key-value database.

Performance is not a feature in itself. But if you think of it, there are so many scenarios that require a super-fast solution. Think of memcached for a second. Having around a tool whose main goal is to be super-fast is a good thing and Aerospike seems to be the one willing to address this need.

Original title and link: Aerospike (NoSQL database©myNoSQL)



My thanks to Aerospike for sponsoring the last week to promote their key-value in-memory or Flash optimized database.

Since rebranding to Aerospike, the team there has been talking a lot about speed. Lately they’ve been publishing a couple of case studies show casing Aerospike’s speed. Thumbtack Technology also published the results of YCSB Benchmark comparing Aerospike with Cassandra, Couchbase and MongoDB. All I can say is that the results are in their favor.

Original title and link: Aerospike (NoSQL database©myNoSQL)


YCSB Benchmark Shows Aerospike Nearly 10x Faster Than the Competition [Sponsor]

Words from this week’s sponsor, Aerospike:

Thumbtack Technology’s YCSB Benchmark shows Aerospike nearly 20x faster than Cassandra, Couchbase and Mongodb for consumer-facing applications that require extremely high throughput and low latency, and whose information can be represented using a Key-Value schema. Read it now!

Original title and link: YCSB Benchmark Shows Aerospike Nearly 10x Faster Than the Competition [Sponsor] (NoSQL database©myNoSQL)


My thanks to Instaclustr for sponsoring the last week to promote their AWS hosted, managed, low cost Apache Cassandra hosting services.

Most of the time managing services and databases is not our main competency. Plus with the frequent updates of the NoSQL databases, staying up to date is a challenge. Why not delegating these tasks to specialized services and moving the responsibility in their yard? It would buy us the time and resources to work on our applications and also train ourselves into managing these services.

Give Instaclustr a try and let me know how it worked for you.

Original title and link: Instaclustr (NoSQL database©myNoSQL)


Instaclustr - Cost Effective, High Performance Managed NoSQL Hosting [Sponsor]

Words from this week’s sponsor, Instaclustr:

On the 27th of February, Instaclustr, one of the first dedicated Apache Cassandra hosting platforms left beta. Running on Amazon EC2 infrastructure, Instaclustr dramatically reduces the deployment and management pains associated with running a Cassandra cluster.

Here’s what you’d get with Instaclustr:

  1. Totally managed: Instaclustr reduces the headaches associated with deploying and running a highly available Cassandra cluster. Deploy Cassandra in minutes, knowing that backups, monitoring, maintenance and tuning are all taken care of.

  2. Fast: Cassandra clusters managed by Instaclustr will provide consistently lower latency operations, with greater throughput per dollar than DynamoDB, MongoDB and other managed NoSQL offerings.

  3. Highly Available: Instaclustr deploys Cassandra on Amazon infrastructure, leveraging geographically distinct availability zones and on-demand instances to ensure your cluster is always available.

  4. Low Cost: Instaclustr has an incredibly low total cost of ownership when compared to other managed NoSQL offerings and includes email support and proactive monitoring.

For more details check how Instaclustr works and sign up for an account.

Original title and link: Instaclustr - Cost Effective, High Performance Managed NoSQL Hosting [Sponsor] (NoSQL database©myNoSQL)

Pivotal HD

My thanks to EMC Greenplum for sponsoring the last 2 weeks to promote Pivotal HD an enterprise-hardened revamped version of Hadoop that can run SQL queries using a new parallel query engine named HAWQ.

Since the announcement two weeks ago, Pivotal HD got a lot of coverage in the media. I would encourage you to take a look at the HAWQ whitepaper (PDF) or watch the webcast featuring Harper Reed. the former CTO for Obama 2012 campaign.

If your area of interests includes Hadoop (and it does since you are reading this blog), learning about Pivotal HD will prove useful. And it will help me show that myNoSQL’s readers are the smartest, most educated and informed data people.

Original title and link: Pivotal HD (NoSQL database©myNoSQL)


Introducing Pivotal HD - the World’s Most Powerful Apache Hadoop Distribution [Sponsor]

Words from this week’s sponsor:

On Monday, February 25, Greenplum, a Division of EMC introduced Pivotal HD: the world’s most powerful Hadoop distribution. Greenplum has spent the last two years building a new Hadoop platform that will leave the traditional database behind. Pivotal HD can store the massive amounts of information Hadoop was created to store, but it’s designed to ask questions of this data significantly faster than you can with the existing open source platform.

Greenplum is revamping Hadoop to operate more like a relational database, letting you rapidly ask questions of data using SQL, which has been a staple of the database world for decades. A team led by former Microsoft database designer Florian Waas has designed a new “query engine”, HAWQ, that can more quickly run SQL queries on data stored across a massive cluster of systems using the Hadoop File System. Compared to batch-oriented queries running against a Hadoop cluster, the combination of HDFS and HAWQ shows anywhere from 10x to 600x performance improvement. Here’s a link to a whitepaper about HAWQ, Pivotal HD’s parallel SQL engine for Hadoop (pdf).

Check out Harper Reed’s, former CTO for Obama 2012 campaign, webcast on Pivotal HD webcast here and request your early access to it.

Original title and link: Introducing Pivotal HD - the World’s Most Powerful Apache Hadoop Distribution [Sponsor] (NoSQL database©myNoSQL)