NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



nosql event: All content tagged as nosql event in NoSQL databases and polyglot persistence

The Continuing Story of Hadoop: Summarizing the Strata Conference

Jeff Kelly summarizing the Strata conference:

You know a technology is headed to the mainstream when the two “Elite” sponsors of the premier event designed to showcase that technology are Microsoft and EMC. Neither company is known for adopting and promoting emerging open source technologies, to put it mildly. But there they both were at Strata Conference, the event dedicated to open source Big Data approaches like Hadoop and NoSQL, topping the list of event sponsors. They were followed not far behind by fellow IT giants and Strata “Impact” sponsors IBM and Oracle.

Filing this under great events I’m missing while being 10000miles away.

Original title and link: The Continuing Story of Hadoop: Summarizing the Strata Conference (NoSQL database©myNoSQL)


Hadoop Summit 2011 in Review

For those of us that haven’t been at the Hadoop Summit 2011:

Ryan Rosario

The main takeaway from Hadoop Summit 2010 was Cascalog. I predict the main takeaway from Hadoop Summit 2011 is Spark.

Anant Jhingran

My essential points are that the “birthers” (where hadoop has been born) and “adopters” (where hadoop will be used in enterprises) have a strong intersection today, modulo some extras on both sides…

However, at t = 3 years from now, we can either go separate ways because of different demands… or come together […]

Dave Cahill

[Hadoop] No longer a West Coast early adopter phenomenon. Hadoop isn’t quite mainstream, but almost, not quite at enterprise level purchasing but getting close.

Barton George interviewing with Eric Baldescwieler

A 4 minutes interview with the Eric Baldescwieler, CEO of Hortonworks, the Yahoo! Hadoop spin-off:


Last, but not least you can read Derrick Harris’ overview post .

Original title and link: Hadoop Summit 2011 in Review (NoSQL database©myNoSQL)

Data Scientist Summit Videos

After seeing the excerpt from Jonathan Harris’ talk at Data Scientist Summit I really wanted to post a link to some of the videos. But they are all behind a registration gateway. Just in case you want to watch them—there are indeed some interesting titles— you’ll find them here.

Original title and link: Data Scientist Summit Videos (NoSQL database©myNoSQL)

Turning BigData into Stories

Ryan Rosario summarizing a panel from Data Scientist Summit, featuring Pete Skomoroch (LinkedIn), Sharon Franks Chiarella (Amazon Mechnical Turk), Gil Elbaz (Factual) and Toby Segaram (Google):

you can’t turn data into a story without joining the data with, well, other data.

Original title and link: Turning BigData into Stories (NoSQL databases © myNoSQL)

The Many Faces Of MapReduce - Hadoop and Beyond

The best panel from Structure Big Data 2011. Featuring Amr Awadallah[1], Mike Hoskins[2], Dwight Merriman[3], Todd Papaioannou[4], Ben Werther[5], the DataStax Brisk official announcement, and a cool parallel between Hadoop processing and cooking approaches from Amr. A must see.

Videos from MongoUK Event Thanks to SkillsMatter

10gen continued its MongoDB popularization tour around the world with three events in Europe: London, Paris, and Berlin. SkillsMatter, the organizers of MongoUK have recorded all the sessions and made them available here

Here is the list of the talks:

  • Welcome by Eliot Horowitz
  • Nosh Petigara: Building your 1st MongoDB application
  • Richard Kreuter: Mastering the MongoDB shell
  • Meghan Gill: MongoDB community resources
  • Richard Kreuter: Schema design: data as documents
  • Mathias Stearn: MongoDB Internals: Storage Engine
  • Graham Tackley: MongoDB at the Guardian
  • Russell Smith: Geo & Capped collections with MongoDB
  • Richard Kreuter: Indexing and Query Optimizer
  • Geoff Watts: BSON and ZMQ
  • Mathias Stearn: Administration
  • Eliot Horowitz: Open Q&A with Eliot Horowitz
  • Ashok Subramanian & Stephen Rose: Project Phoenix
  • Phillipp Krenn: Morphia: MongoDB for Java Developers
  • Eliot Horowitz: Scaling with MongoDB
  • Neil Bertlett: MongoDB as a backing store of Eclipse MF
  • Nosh Petigara: Deployment strategies
  • David Mytton: Monitoring MongoDB
  • Eliot Horowitz: MongoDB Project Roadmap

Original title and link: Videos from MongoUK Event Thanks to SkillsMatter (NoSQL databases © myNoSQL)

Does Big Data Need Big Budgets?

If you’d ask me this question, I’m sure my initial answer would be: “absolutely”. And I guess I would not be alone. But is that the right answer?

While watching GigaOm’s Structure Big Data event, there were two talks that gave me a different perspective on this question.

Firstly, it was the interview with Kevin Krim, the Global Head of Bloomberg Digital, which told the story of adopting, mining, and materializing Big Data inside a corporation that didn’t believe in it, nor did it allocate large budgets to it. The result: collecting more than a terabyte of data every day from 100 data points for every pageview and running 15 different parallel algorithms to make recommendations that led sometimes to 10x clickthrough rates. The interview is embedded at the end of this post.

The second story, coming from Pete Warden, founder of OpenHeatMap, is even more exciting. Pete has used a combination of right tools deployed on the cloud to mine Facebook data: 500 million pages for $100 — that was the cost before being sued by Facebook.

Pete Warden distilled his experience with these tools and has made available at a collection of data tools and open APIs in both an Amazon AMI format to be run on the cloud and as a VMWare image to run locally. I highly recommend watching Pete’s talk which I’ve embedded below.

While it depends on what definition of BigData we’d use, both these talks are leading to a simple conclusion:

  • you need imagination to get started with Big Data
  • you need to use the right tools for getting good results

Is this going to work at the scale of Twitter, LinkedIn, Facebook, Google? Probably not. But before getting at that size, you need to start somewhere. And both these talks suggest a clear answer to the question “does big data need big budgets?”: not always.

Hadoop and NoSQL Databases at Twitter

Three presentations covering the various NoSQL usages at Twitter:

  1. Kevin Weil talking about data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB on InfoQ

  2. Ryan King’s presentation from last year’s QCon SF NoSQL track on Gizzard, Cassandra, Hadoop, and Redis on InfoQ

  3. Dmitriy Ryaboy on Hadoop from Devoxx 2010:

By looking at the powered by NoSQL page and my records, Twitter seems to be the largest adopter of NoSQL solutions. Here is an updated version of who is using Cassandra and HBase

  • Twitter: Cassandra, HBase, Hadoop, Scribe, FlockDB, Redis
  • Facebook: Cassandra, HBase, Hadoop, Scribe, Hive
  • Netflix: Amazon SimpleDB, Cassandra
  • Digg: Cassandra
  • SimpleGeo: Cassandra
  • StumbleUpon: HBase, OpenTSDB
  • Yahoo!: Hadoop, HBase, PNUTS
  • Rackspace: Cassandra

And probably many more missing from the list. But that could change if you leave a comment.

Original title and link: Hadoop and NoSQL Databases at Twitter (NoSQL databases © myNoSQL)

Facebook Messages: FOSDEM NoSQL Event

From this year’s FOSDEM, Facebook talking about the technology behind the messaging platform:

Original title and link: Facebook Messages: FOSDEM NoSQL Event (NoSQL databases © myNoSQL)

Reconstructing Linked Data and Graph Databases

ReadWriteWeb has published a very interesting story of a project presented at last week’s Strata conference aiming to reconstruct linked data based on public data sources like Flickr and OpenStreetMap using a somehow classical”fuzzy matching” approach.

build a detailed database of information about places in Afghanistan, using only public sources on the Web. The goal is to describe in detail the towns and cities including everything from names, locations and populations, as well as lists and coordinates for schools, mosques, banks and hotels.

My gut feeling is that mixing in some graph database would make this problem not necessarily easier to address, but it would bring in a different angle to tackle it. Fuzzy matching is a search-based approach with an inductive flavor, while using a graph databases could bring in a deductive approach.

Original title and link: Reconstructing Linked Data and Graph Databases (NoSQL databases © myNoSQL)

Happy New Year SQLers and NoSQLers

Just want to wish all readers and friends I’ve made over here a great and exciting 2011. Happy New Year!

Now let’s get the party started!

Original title and link: Happy New Year SQLers and NoSQLers (NoSQL databases © myNoSQL)

To SQL or not to SQL Panel at CODEBITS IV

A panel discussion on NoSQL, NoSQL databases, and relational databases, featuring Salvatore Sanfilippo[1], Lenz Grimmer[2], Filipe David Borba Manana[3], and a forth person from SAPO whose name I couldn’t spell:

  1. Salvatore Sanfilippo: creator and main developer of Redis  ()
  2. Lenz Grimmer: MySQL community relations team  ()
  3. Filipe David Borba Manana: CouchDB committer  ()

Original title and link: To SQL or not to SQL Panel at CODEBITS IV (NoSQL databases © myNoSQL)