NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL events: All content tagged as NoSQL events in NoSQL databases and polyglot persistence

A Hadoop Week in Review

With Hadoop Summit taking place earlier this week, the amount of news and announcements related to the Hadoop ecosystem was impressive and after a busy week I had quite a bit of hard time figuring out the most interesting bits:

  1. Hortonworks announced Hortonworks Data Platform 1.0 with an interesting approach for high-availability.
  2. VMWare announced Project Serengeti for virtualization-friendly Hadoop. It’s open source and VMWare collaborates with all major Hadoop players (Cloudera, Hortonworks, MapR) to make it work.
  3. Some information about pricing for Hadoop support from Cloudera, Hortonworks, and MapR
  4. Amazon announced support for MapR Hadoop distribution on Amazon Elastic MapReduce
  5. Hive creators, previously at Facebook announce Qubole: a new on-demand Hadoop service

If some other news or announcements caught your attention this week, leave a comment or drop me a line.

Original title and link: A Hadoop Week in Review (NoSQL database©myNoSQL)

MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation

Kenneth Falck shares on his blog what he learned at the recent MongoDB event in Stockholm, covering:

  • MongoDB indexing
  • MongoDB replica sets
  • MongoDB sharing and performance

The one bit I wanted to emphasize before reading his post:

10gen has a shortlist of features they would like to develop soon. Full text search is at the top. Other things included are at least data compression and possibly schema validation as a related feature.

From the developer’s friendliness perspective MongoDB is the most attractive NoSQL database. And these features will make it even more so.

Original title and link: MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation (NoSQL database©myNoSQL)


Schedule Your Agenda for 2012 NoSQL Events

Reminded by Stefan Edlich’s post, I’ve updated the page of NoSQL conferences and events for 2012. There are already 7 NoSQL events scheduled for 2012 and I bet the calendar will get busier later this year:

Original title and link: Schedule Your Agenda for 2012 NoSQL Events (NoSQL database©myNoSQL)

Big and Small Data at Twitter: MySQL CE 2011

Twitter DBA Lead at Twitter, Jeremy Cole‘s talk about MySQL at Twitter from MySQL CE 2011:

Roland Bouman had some interesting notes (nb: actually tweets) from the talk:

  • 115 mln tweets a day, 1 bln tweets a week, about 50.000 new accounts / day

  • random server uptime 212d, 127 bln questions (6943/s) rows read: 1.36 mln/s

  • Use MySQL when it works, something else when not - fortunately MySQL often does work

  • MySQL is used by twitter because it’s robust, replication works and it’s easy to use and run

  • MySQL doesn’t work good for graphs, auto_increment, replication lag is a problem

  • MySQL replication improvements like crash safe multi-threaded slave exactly what they need

  • Twitter open sourced snowflake (id generation system) and Gizzard distributed data storage

  • Use soft launches: new code is launched in a disabled state, turn up slowly, back down if needed

  • Gizzard builds in MySQL/InnoDB handles sharding, replication, job scheduling

  • Twitter uses Cassandra too for some projects. high velocity writes, schemaless design

  • Twitter uses Hadoop for analyzing extremely large datasets: 10 to 100 blns rows (http logs)

  • Twitter also uses vertica for analysis, 100M - 10Blns of rows. Runs 100x faster than MySQL

  • MySQL’s happy place: <= 1.5 TB datasets, archive store for larger sets.

Original title and link: Big and Small Data at Twitter: MySQL CE 2011 (NoSQL databases © myNoSQL)

Scaling with Cassandra

Peter Schuller’s Scaling with Apache Cassadra recorded at Oredev:

I watched only the first couple of minutes, so comments and feedback are welcome.

Original title and link: Scaling with Cassandra (NoSQL databases © myNoSQL)