NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL event: All content tagged as NoSQL event in NoSQL databases and polyglot persistence

Overview of the Amazon Dynamo Paper

This is becoming a habit of the NOSQL summer meeting in ☞ Tokyo:

They did it before for the Google BigTable paper. If you speak Japanese and happen to be around in Tokyo I’d say you shouldn’t miss such an event.

Note: In case you missed it, you can get all NoSQL papers using this little “hack”

Google BigTable Paper Summarized

The slides below summarizing the Google BigTable paper are the result of a NOSQLSummer meeting in Tokyo. Nice!

Update: I just realized that the company that hosted this meeting, Gemini Mobile Technologies, is the same that announced yesterday the new key-value store Hibari

MongoFR Videos and Slides

This week Paris hosted MongoFR a one-day event focused on MongoDB ecosystem. Videos and slides from the event are now available ☞ here. Enjoy!

Just for your quick reference, here’s a list of the videos:

  • MongoDB Introduction
  • Ruby et MongoDB dans la pratique
  • Schema design
  • MongoDB for timeline storage at Fotopedia
  • MongoDB Administration
  • Comment passer de MySQL á MongoDB par un cas pratique, Oupsnow
  • MongoDB Indexing and Query Optimizer
  • Enjoy your development with MongoKit
  • MongoDB Sharding
  • One year with MongoDB at Silentale
  • MongoDB Map/Reduce, geospatial indexing, and other cool features ☞ part 1 and ☞ part 2

Berlin Buzzwords Presentations

The organizers of the Berlin Buzzwords NoSQL event have set up a ☞ wiki page with links to all presentations.

My 5 top favorites:

What are yours?

NOSQL Summer - Share the Love

Tim Anglade (@timanglade) came up with the brilliant idea of having a worldwide reading club for distributed systems & NoSQL-related scientific papers. In just one day, the site lists already 14 cities where NOSQL summer will take place (nb Bucharest is up too so ☞ go in and express your interest) and it’s up to you to ☞ share the love of NOSQL!

If Tim’s idea is purely fantastic, I’d like to propose a (probably really bad) slogan: Make NoSQL, not love war.


NoSQL Brazil Recap

Last weekend, Sao Paulo, Brazil ☞hosted the first NoSQL event. While I’d love to be part of all these events and report live from them, this is not really possible right now. But myNoSQL is extending his coverage with the help of the official ambassadors and here is Gleicon Moraes’ report from the no:sql(br).

This is the NoSQL Brazil (nosqlbr) recap. The conference was great, the room was full most of the time and the technical level was constant between presenters and audience questioning. Permeating the event was the feeling of using the right tool for the job, and that was great. Seemed like a consensus that noSQL is about not only SQL and everyone was very compromised with it. Its a bold statement for Brazil to have such a successful conference about something that even outside here is a new ground. Better yet is to know that speakers and audience alike were very well informed and experienced.

First there was the Opening, with the organizer - Alexandre Porcelli, lead developer for ☞ OpenSpotLight, which ranged from the event motivation and history (it jumped from a small meeting to a full day event at a hotel room), the motivation behind NoSQL and his involvement with it, from the beginning to OpenSpotLight.

Then I presented my noSQL and SQL anti-patterns talk - which was about my motivation to go noSQL, some SQL anti patterns I found (and was responsible for). The talk went well and I had a great time. The audience was very technical and I think some of my feelings hit home with them.

Luis Fernando Teston from OpenSpotLight brought a video from Salvatore Sanfilippo and gave a workshop on Redis, from compilation to data types and patterns. He discussed key/value datastores and gave some tips on good key names.

Guilherme Silveira from ☞ Restfulie fame, talked about REST and its meaning in HTTP enabled noSQL. He explained the hows and whys of being really HTTP compliant, from headers to caching politics. Also, he showed how he implemented it in a thin layer over CouchDB using Sinatra. His presentation style is energetic and I think he make it clear why self discovery and proper HTTP handling are healthy for such kind of databases.

Straight from 10Gen, Alberto Lerner gave a solid MongoDB introduction and explained how things works in 10Gen and MongoDB’s community. He talked about data management as a mission which must guide about the right technology and stressed out that the brazilian developers must take part in the community, ask questions, send code and take part in the projects they use. It was really graceful from the 10gen folks to send Alberto down here.

After lunch, Rodrigo Strauss presented Tio - the first brazilian noSQL. He explained his motivations and architecture from ground up. Its core is a pub/sub engine and there is a lot of use cases. Its based on the proven Boost library. Besides Tio, there is Primo and Tia and clients written on C++ and Python. He seems to be developing it alone, so any community help is welcome. The URL is ☞

RedHat guys Edgar Silva and Samuel Tauil presented their enterprise products and solutions for virtualization and JBoss based RestEasy.

Vinicius Carvalho from ☞ Sambatech brought a Hadoop showdown on his experience using it in a proof-of-concept distributed log processing system which he is using to extract useful info from webserver logs. He explained the big picture on hadoop and pig, and the best route for the first contact, which is using a hosted Hadoop service. Along with that he gave some hints on the size of the data he is managing from Sambatech services. That relates to noSQL because one of the ways to keep a constant flow of useful information coming for a database - any kind of it - is using Map/Reduce. Also, most of the databases implement some kind of internal Map/Reduce querying system.

Closing up the talks, Julio Viegas from Globalcode/SPC (local consumer protection service) showcased a very creative way to introduce noSQL on organizations and a very careful study on different types of database servers. He choose cassandra for a 3TB/5TB cache system, which runs along with a big Oracle installation. His setup has 10 servers and there was a rundown on important configuration tips along with a practical explanation on the writing/reading constraints.

After that there was an informal chat between presenters and audience about how to introduce these new tools to organizations and how to show them that the community makes non relational databases more trustable.

Even in the breaks there were groups talking about very technical issues and real life systems that use or could benefit of other data storage and management paradigm. Kudos to Alexandre Porcelli which in a ultra short time span organized and make that conference happen. Thanks to all who donated money and time (there still some bills to pay) but I wondered how much he could do with proper funding.

Pictures, more info and presentations will be avaliable later this week at ☞

Gleicon Moraes is a developer, hopefully specialized on distributed and large data systems. Had worked on major brazilian internet providers in the last 14 years and wrote a book about linux and programming in 2005 after a lot of articles for local magazines and one for Sysadmin magazine. Nowadays I keep a blog on my tech stuff ☞, projects on github: ☞ My interest on noSQL is having the right tool for the job, which is to handle a lot of data generated by my startup.

nosql:eu - Second day

The 2nd day at nosql:eu is over. It is time to review the great twits from the 1st day, the slides from the 1st day presentations and the great twits below.

For those of us that haven’t made it to ☞ nosql:eu conference I’ve extracted below some (hopefully most) of the most interesting twits from the conference. I’ve also post slides of the presentations as these are coming.

nosql:eu quotes

Check also the best twits from 1st day @ nosql:eu

  • kevinweil: Modifying my talk in realtime for #nosqleu. Adding Cassandra, HBase, FlockDB to already existing discussion of Scribe, Hadoop, Pig.
  • emileifrem: Cassandra is simply the best in its category. Check out @spyced’s latest deck: #nosql #nosqleu
  • maslett: RT @natishalom: @maslett with the planned support for memcache - gigaspaces turns memcache to a real NoSQL alternative IMO #nosqleu

    Note: personally I’d find that quite confusing. If Gigaspaces is not anymore an elastic cache, then what is it?

  • danharvey: #nosqleu question for today: how do you backup casandra / HBase for user/dev errors? The failure back up is built in.
  • Werner: Arrived at #nosqleu for the first presentation of the day.
  • awhitehouse: @werner We should challenge assumptions that DB partitioning papers make; sometimes smallest possibilities are treated as reality. #nosqleu
  • AndySeaborn: #nosqleu ☞
  • awhitehouse: @werner: We should all read “The 1995 SQL Reunion: People, Project, and Politics” ☞ #nosqleu
  • AndySeaborne: #nosqleu ☞
  • tlossen: “in real systems, there are no corners to cut” — werner vogels about the importance of occam’s razor in systems design #nosqleu
  • hungryblank: RT @tlossen: “in real systems, there are no corners to cut” — werner vogels about the importance of occam’s razor in systems design #nosqleu
  • matwall: @Werner #nosqleu nosql is about choice, not a fight between SQL and new tech.
  • tlossen: “you should all read the multics book” — werner vogels #nosqleu
  • martinbtt: #nosqleu @Werner “on the birth of dynamo”
  • tlossen: “real systems are pretty nasty things” — werner vogels #nosqleu
  • tlossen: “scaling amazon was all about the database, every year scaling out, scaling out ….” — werner vogels #nosqleu
  • tlossen: “scalability, availability, performance, cost-effectiveness are all in the end dominated by data management” — werner vogels #nosqleu
  • martinbtt: “The Amazon homepage is constructed by 200-300 different web services”. #Werner #soa #nosqleu
  • maslett: Amazon CTO @werner: “It all comes down to data management… that’s where the scalability is… that’s where most of the costs are” #nosqleu
  • tlossen: “i HATE eventual consistenty” — werner vogels #nosqleu
  • maslett: .@werner: “What we all want is strongly consistent systems - this eventual consistency stuff is a compromise.” #nosqleu
  • tlossen: “your customers will ALWAYS use your system in a way you did not expect” — werner vogels #nosqleu #dynamo
  • mfiguiere: #nosqleu Werner Vogels: “Customer put something in the shopping cart, they are about to give you money, that should ALWAYS works !”
  • monkchips: “In 2004 we felt we could no longer rely on commercial [relational] systems to operate at Amazon scale”. @werner vogels Amazon CTO, #nosqleu
  • buzzkills: “there were no comercial systems that could support amaon’s scale” [for many of their use cases] @Werner #nosqleu
  • tlossen: “at scale, ALL of this shit happens” — werner vogels on datacenter SNAFUs like flooding from the roof down etc. #nosqleu
  • tlossen: “scaling amazon = upgrading cessna to 747 in mid-flight” — werner vogels #nosqleu
  • tlossen: “object storage is FOREVER” — werner vogels on data outliving software #nosqleu
  • tlossen: “don’t forget, hardward LIES to you!” — werner vogels #nosqleu
  • awhitehouse: @werner: “Economies of scale are mostly about people” (and the knowledge they need to run your system) #nosqleu
  • tlossen: “we really have to dive deep and understand all the problems from top to bottom” - werner vogels on INTELLECTUAL economies of scale #nosqleu
  • beobal: “economies of scale are not just about technologies, it has a lot to do with people” @werner #nosqleu
  • tlossen: “transparency is EVIL” — werner vogels about NFS etc. #nosqleu
  • tlossen: “remember that storage is a very long-lasting relationship” — werner vogels #nosqleu
  • maslett: .@werner: “We shouldn’t all be doing this.” #nosqleu Companies should be focused on their business, not their databases.
  • simonw: Werner Vogels: “S3 is a better key/value store than Dynamo” (due to list/prefix operators) #nosqleu
  • tlossen: “if you keep your system simple, it drives simplicity at the customer side as well” — werner vogels on importance of occam’s razor #nosqleu
  • awhitehouse: “Simplicity needs to happen at the interface” … the API to your system drives the architecture @werner at #nosqleu
  • seanparsons : @Werner’s talk at #nosqleu was illuminating about the focus on managing interaction between systems.
  • CooperDino : WernerVogels at #NoSQLeu: When u do trillions of ops per day even the slightest probability becomes reality
  • martinbtt : Fantastic talk by @Werner at #nosqleu - loads of useful tech nuggets to take away. Great start to the day so far.
  • CooperDino : WernerVogels at #NoSQLeu: Bruce Lindsay & Jim Gray are our heroes, we should all read about their data sys work in the 70s
  • CooperDino : @Werner at #NoSQLeu: Last time Amazon was down was 2004 & it was related to an RDB crashing
  • CooperDino : @Werner at #NoSQLeu: 70% of storage operations in Amazon are key/value
  • CooperDino : WernerVogels at #NoSQLeu: If u have2 jump thru lots of hoops 2use any DB then it prob wrong choice. #JOOB is fresh choice4 #dotNet
  • CooperDino : @Werner at #NoSQLeu: Customers will not look at a DB in isolation, they will always look at where it sits in big picture
  • matwall : Head buzzing from inspiring talk from @werner at #nosqleu
  • monkchips : now @kevinweil (twitter’s analytics lead) presents via skype video… just showed us some very dark twitter offices ;-) #nosqleu
  • awhitehouse : Big hand to @kevinweil for giving his talk from Twitter HQ at 3am local time. #nosqleu
  • matwall : @kevinweil say twitter increase userbase by 300K per day, generate 7Tb of data *per day* #nosqleu
  • buzzkills : Twitter gave up on syslog because it didn’t scale #nosqleu
  • thobe : This is me contributing to the 300GB of twitter data generated while @kevinweil talk about it on #nosqleu
  • tlossen : “you write log lines — scribe does the rest” — kevin weill about logging at scale #nosqleu
  • buzzkills : @buzzkills apparently faceyb wrote scribe, Twitter are big contris #nosqleu (thx to @ianmeyers for correction)
  • matwall : @kevinweil from Twitter describing their Scribe -> Hadoop -> Pig pipeline for data alanysis at #nosqleu Very interesting, I want one.
  • tlossen : “want less java in your life? use pig!” — kevin weill, giving advice on hadoop #nosqleu
  • matwall : @kevinweill on datamining user data: It’s easy to answer questions, it’s hard to ask the right questions. #nosqleu
  • wwwicked : Loving the simplicity of a Pig script versus the equivalent Hadoop/Java code #nosqleu
  • beobal : “value the system that promotes innovation, iteration” @kevinweil #nosqleu
  • monkchips : facebook’s scribe at master - GitHub ☞ a logging system for client performance data, also used by twitter. #nosqleu
  • awhitehouse : @kevinweil: Twitter does most of its data analysis in Pig - scripts can call user-defined functions coded in Java (v. powerful) #nosqleu
  • matwall : Twitter using Apache Mahout coupled with Pig for machine learning when examining user behaviour #nosqleu
  • andrewgarner : Totally sold on Pig #nosqleu
  • wwwicked : A friend of mine said “NoSQL is retarded”. The more I’ve heard over the past 2 days, more more I realise he’s wildly wrong #nosqleu
  • emileifrem : @wwwicked Term is retarded. Notion all RDBMSes will be replaced is retarded. That we’re heading to a polyglot persistence era isnt. #nosqleu
  • monkchips : “we’re trying to move all tweets to Cassandra”. @kevinweil Twitter #nosqleu

    Note: You can read the whole story in myNoSQL exclusive Cassandra @ Twitter: And interview with Ryan King

  • tlossen : “better eventual consistency than POTENTIAL consistency” — kevin weil on reasons to use cassandra at twitter #nosqleu
  • maslett : Twitter is working with Digg to create real-time analytics for Cassandra. Plans to open source. #nosqleu
  • msk_y : RT @buzzkills: Twitter store their log files in Lzo compressed, protocol buffers format on hdfs #nosqleu
  • kingsleydavies : #nosqleu CouchDB used at BBC - typically used as a KVS and is used in iPlayer and parts of the homepage…
  • tlossen : “you can throw rocks and stones at it, and it just keeps going” — enda farrell (bbc) about robustness of couchdb #nosqleu
  • matwall : @endafarrell CouchDb restarts in < 1sec. Occasionally restart in production as restarts are far less than TCP timeout! #nosqleu
  • tlossen : enda farrell shared a neat idea: “pre-sharding” — running 4 instances of couchdb on every node [couchdb @ bbc talk] #nosqleu
  • matwall : @endafarrell “Having things that just work and are simple from the users perspective is brilliant” #nosqleu
  • CooperDino : #NoSQLeu: BBC web site handles 200m requests per day on 1.5TB of data using 8 servers & #CouchDB
  • monkchips : exciting! presentation at #nosqleu from Comcast chief engineer @jon_moore : Why Big Enterprises Are Interested in NoSQL
  • matwall : Agree with @jon_moore at #nosqleu : storage is a means to a business end, nosql contains intrinsic risk
  • benoitc : idealized api of comcast looks like the #couchdb one get,post, get _views #nosqleu
  • matwall : @jon_moore at #nosqleu Can I add more capacity without adding too many more sysadmins? Can my admins work 9-5?
  • matwall : @jon_moore at #nosqleu Is there a company behind product to provide operational support? Important for commoditization
  • monkchips : surprising requirement of the #nosqleu conference. NoSQL providers take note: Enterprises expect JMX support. java ain’t dead. devops?
  • wwwicked : #nosqleu @jon_moore made a fair point re: my comment about analytics on KV stores; may not be best idea but “they” will want to do it anyway
  • timanglade : Totally awesome break-down of the CAP theorem (in the context of Multiple Datacenters) by the amazing @jon_moore. Refreshingly enlightening.
  • kingsleydavies : loving the name *Tokyo Tyrant* and a great, upbeat start to @makoto_inoue preso… #nosqleu
  • matwall : @makato_inoue Says that @al3xandu’s site myNoSL is like “Hello magazine for nosql” :) #nosqleu
  • matwall : Can we have a 3 hour workshop with @makato_inoue please? He’s great! #nosqleu
  • kingsleydavies : +1 yeah… I fear we wont have enough time :-( RT @matwall: Can we have a 3 hour workshop with @makato_inoue please? He’s great! #nosqleu
  • maslett : great presentation on the highly random world of Tokyo Cabinet/Tyrant by @makoto_inoue #nosqleu
  • maslett : Quote of the day: “myNoSQL is the Hello magazine of NoSQL" #nosqleu
  • michaeltiberg : #nosqleu conference is to an end - attendees seems to be satisfied and that makes my day

Check also the best twits from 1st day @ nosql:eu

nosql:eu presentations

Check also the nosql:eu presentations from 1st day

On the Birth of Dynamo - Werner Vogels

Nothing here yet :-(.

Twitter’s use of Cassandra, Pig and HBase - Kevin Weil

Slides from Kevin Weil (@kevinweil) presentation on Twitter’s use of Cassandra, Pig and HBase

CouchDB at the BBC - Enda Farrell

Nothing here yet :(

Why Big Enterprises are Interested in NoSQL - Jon Moore

Slides from Jon Moore (@jon_moore) presentation: Why Big Enterprises are Interested in NoSQL

Memory as the New Disk: Why Redis Rocks - Tim Lossen

Slides from Tim Lossen (@tlossen): Memory as the New Disk: Why Redis Rocks

Tokyo Cabinet, Tokyo Tyrant and Kyoto Cabinet - Makoto Inoue

Nothing here yet :(

Notes from the field: NoSQL tools in Production - Matthew Ford

Slides from Matthew Ford (@matthewcford) Notes from the field: NoSQL tools in Production presentation

nosql:eu live twitter stream

fetching nosql:eu…

Check also the nosql:eu presentations from 1st day

nosql:eu - First day

In case you’ve missed the ☞ nosql:eu conference, you can find below the most interesting nosql:eu twits and some of the nosql:eu presentations. And make sure to check nosql:eu second day.

nosql:eu quotes

Check also the best twits from nosql:eu 2nd day

  • emileifrem: Very envious of the crowd that’s just about to roll into the @nosqleu venue. VolcaNoSQL has been rough, but it will still be a kickass show.
  • al3xandru: Make your twits through the ash cloud #nosqleu so those stopped by the #VolcaNoSQL can hear something too
  • nosqleu: All the equipment is set, the remote presentation setup has been tested and attendees are already pouring in. Let’s go — wish us luck!
  • jystewart: wow! @werner managed to get on an iceland-glasgow flight and so will make it to #nosqleu against all the odds
  • wwwicked: used Redis for analysis of leaked BNP membership list - mapping postcodes to constituencies #nosqleu
  • wwwicked: Guardian used a traditional RDBMS for first MPs expenses crowd-sourced review app; “possibly the worst possible implementation” #nosqleu

    Note: we’ve published a post on this topic Redis Usecase: replacing MySQL order by rand()

  • monkchips : Guardian Zeitgeist. “We use Big Table as a dumping ground for data you can sort by 1 or 2 columns when you need to” @simonw @nosqleu #nosql
  • zacksm: “spreadsheets are NoSQL too…” #nosqleu
  • kingsleydavies: use of #AppEngine and #BigTable w/ scatter/gather and shards at the guardian + GOOG sprdshts for rapid prototyping and releases.. #nosqleu
  • monkchips : “there is a form of NoSQL we have been using for years - spreadsheets”. says @simonw. #pragmatism #NoSQL #NoSQLeu
  • klbostee :The Guardian’s web architecture as shown at #nosqleu
  • matshenricson :Total total kick-ass presentation by two guys at ! The future of journalism, from a developer point of view. #nosqleu
  • monkchips : one of the best tech talks i have ever seen. @simonw and @matwal explained exactly how NoSQL supports the Guardian’s *business*. #nosqleu
  • philb0: You need to understand your DB query patterns before choosing a technology #nosqleu

    Note: last time I wrote about it was yesterday in considering data stores post.

  • awhitehouse: Use the right tool for the job » @timanglade at #nosqleu: “Let’s not hack a RDBMS to do a graph, let’s use a real graph database”
  • wwwicked: NoSQL sucks! It’s true… if you use it badly; e.g. don’t try analytics on a key-value store. #nosqleu
  • kingsleydavies: flagrant promotion ;-) #nosqleu 1 stop shop for NoSQL links and info…
  • ianmeyers : #nosqleu Wondering how many people have approached NOSQL from the perspective of NOT being anti-SQL?
  • mfiguiere: #nosqleu Le NoSQL n’est pas une question de volume de données, mais de représentation de données

    (trans): #nosqleu NoSQL is not a matter of volume of data, but data representation

  • wwwicked: “I am terrified of the uber database that is yet to come” - @matwall urges us to think Unix not Windows re: task/use-specific DBs #nosqleu
  • coderholic: Great morning so far at #nosqleu including spreadsheet backed websites from the guardian!
  • kingsleydavies: #nosqleu actually, some of the benefits of a choppy preso transmission, is I heard *enough* to want to chase up independantly
  • kingsleydavies: Key featr’s of #Riak: links, map-reduce, vctr clocks directly impl; dynmo conflict res; distrbtd, auto-scales (on the fly), durable #nosqleu
  • awhitehouse: awhitehouse Distributed key stores would benefit from better management tools (e.g. SNMP-based) - @hobbyist’s team looking at this. #nosqleu
  • benoitc: @mongodb definition by comparing itself to @CouchDB … #nosqleu
  • kingsleydavies: #nosqleu document oriented datastores sounds pretty much like ‘the new’ object DB to me ?
  • benoitc: for those in #nosqleu who have questions about #couchdb dont hesitate to ask me or to @endafarrell
  • matwall: MongoDB looking fantastic. I want one now! How best to pick from couch or mongo? #nosqleu
  • ck1125: wishing i booked a place to #nosqleu
  • buzzkills: Mongodb supports geolocation queries #nosqleu
  • kjlloydie: MongoDB looks really useful. Would love to see performance metrics for large datasets. #nosqleu
  • matwall: Wow. MongoDb features geo-location also. Beautiful query syntax in the python examples. #nosqleu
  • coderholic: The geo features of mongodb look very cool #nosqleu
  • wwwicked: MongoDB has quite a sexy update syntax. Geolocation features are nice, too. #nosqleu
  • buzzkills: Mongodb makes it easy to send deltas to your documents. Increase values, push values into arrays etc #nosqleu

    Note: it’s quite interesting to see that from tons of features, people got excited about geo support which was added only in the last MongoDB 1.4 release

  • wwwicked: Quite a hard sell by the MongoDB guy though. Far more so than the Riak guy #nosqleu
  • buzzkills: Mongodb prez is much more about what it does rather than how it does it, I think I would have prefered the latter. #nosqleu
  • jystewart: liking the look of mongodb’s geo features. not surprised @foursquare are using them #nosqleu
  • PaulDJohnston: #nosqleu I’m getting confused about when to use specific types of database… don’t just say “use mine” but “use mine for *this*”
  • mfiguiere:#nosqleu La plus large base MongoDB installée en production : 12 To sur une seule instance !

    Note: never heard of this MongoDB size before, so I’m wondering if the project is secret!

  • awhitehouse: Jonathan Ellis aka @spyced has just started a company called Riptano based around Cassandra #nosqleu

    Breaking: Riptano - First company focused on Cassandra started by Cassandra project chair, Jonathan Ellis

  • matwall: I wonder how cassandra knows which machines are in each rack? #nosqleu
  • mfiguiere: #nosqleu if your software wakes people up at 4 am to fix it, then you’re probably doing things wrong…
  • monkchips: Cassandra uses JMX? blimey. didn’t expect to hear that acronym today. “like most things in Java its quite clunky”. #cassandra #nosqleu
  • matwall: Feel like we’re watching the battle of the low end data structures at the mo. How does it help my business? #nosqleu
  • buzzkills: Good to hear digg are contributing a vector clock implementation to cassandra for the next version #nosqleu

    Note: Digg announced quite a few more goodies to be added to Cassandra. Some of them have already been included in the Cassandra 0.6.0 release

  • awhitehouse: @monkchips asked “what use cases does Cassandra cover?” - @spyced ref’d to this talk [“RDBMS’s don’t scale”] #nosqleu

    Note: Link to What every developer should know about database scalability presentation.

  • monkchips one of the most useful insights about “why NoSQL?” so far today. “data is more and more semi-structured” #nosqleu #neo4j
  • wwwicked Oh! Graph databases are nothing like what I pictured. No pun intended. #nosqleu
  • matwall Good examples of possible usages of Neo4j explained well. #nosqleu
  • kingsleydavies #nosqleu watching @thobe present on #neo4j graph DB. can already think og at least 1 use case to trial this.. great to see use cases in pres
  • kevinweil Looking forward to giving my #nosqleu talk remotely from TwitterHQ at 3am (11am London time) this morning
  • maslett #nosqleu phrase of the day: choose the best solution/tool/storage model for the job. There might be something in “Not Only SQL” after all
  • NeilRobbins For me the best talks of the day were the Cassandra & Riak talks, though thanks to the better quality audio the Cassandra talk wins #nosqleu
  • tom_wilkie #nosqleu good day. Guardian talk the best IMHO

Check also the best twits from nosql:eu 2nd day

nosql:eu presentations

Check also the nosql:eu presentations from 2nd day

NoSql at - Matthew Wall & Simon Willison

Slides from Matthew Wall (@matwall) & Simon Willison (@simonw) on NoSQL usage at

An Overview of NoSQL - Tim Anglade

Slides from Tim Anglade (@timanglade) An Overview of NoSQL

Key-value stores and Riak - Bryan Fink

Nothing here yet :(.

Document-oriented databases and MongoDB - Mathias Stearn

Slides from Mathias Stearn (remote) presentation on document-oriented databases and MongoDB

Column-oriented databases and Cassandra - Jonathan Ellis

Slides from Jonathan Ellis (@spyced) presentation on column-oriented databases and Cassandra:

Graph databases and Neo4j - Tobias Ivarsson

Slides from Tobias Ivarsson (@thobe) presentation on graph databases and Neo4j

Check also the nosql:eu presentations from 2nd day

NoSQL News & Links 2010-04-02

  1. Just added a video from the NoSQL smackdown at SXSW
  2. nnewton: ☞ MongoDB Cacti Graphs The graphics are definitely looking interesting.
  3. ☞ CouchDBX 0.11.0. Mac goodies from CouchDB.

Recap of NoSQL Live in Boston

While these are not the original recordings from the NoSQL Live in Boston event, they are still the best ☞ we will get.



Mathias Stearn: What’s New in MongoDB 1.4

Sandro Hawke: Toward Standards for NoSQL ☞ pdf

Tim Anglade: Crossroads, Inroads, Pitfalls & Bylaws

Breakout Sessions

Boris Iordanov: HyperGraphDB Breakout Session

Peter Neubauer: Neo4j Breakout Session

Rusty Klophaus: Riak Breakout Session

Alex Feinberg: Project Voldemort Breakout Session ☞ ppt

Lightning talks

Flinn Mueller: Versatile Storage Options with Tokyo Cabinet

Jim Wilson: Full-stack JavaScript ☞ pdf

James Williams: Using MongoDB with Groovy

Reports from NoSQL Live in Boston

In case you haven’t been able to make it to the NoSQL Live in Boston event and you don’t have the patience for the videos to come out, I have found a couple of reports from the event.

From the the ☞ End Point’s Blog:

I went in feeling convinced of the desirability of non-relational datastores for specific modeling situations (graphs) and for scalability/availability/volume concerns (Dynamo and BigTable derivatives), while feeling relatively skeptical of “document datastores”. I left feeling basically the same way, though decidedly less skeptical of CouchDB than I previously was.

And then on a ☞ follow up post:

The simplicity of the pure key/value store (Voldemort and Riak are more like this) brings flexibility in what you represent; having a somewhat more structured data model with which to work (as in Cassandra) can add some complexity to how you design your data, but brings improved flexibility in how you can navigate that data. (my note: very interesting remark)


[…] one might get the impression that Cassandra has the broadest range of interesting deployments, Voldemort has fewer but is still interesting (Linkedin is certainly no slouch), and Riak has nothing to point to outside Basho Technologies’ non-free Enterprise variant.

Last, but not least, by looking at what happened in the last couple of weeks, it looks like myNoSQL post on Cassandra @ Twitter has made quite some waves:

Of the three projects mentioned, Cassandra clearly has the “momentum” (a highly accurate indicator of future dominance).

Adam Marcus posted ☞ a long blog that summarizes most of the talks and panels. As you’d expect the most interesting discussions seems to have happened on the panels: “Scaling with NoSQL” (between memcached, Voldemort, Hypertable, Cassandra, HBase), “Schema design and document-oriented DBs” (CouchDB, MongoDB, Riak), and “Evolution of a Graph Data structure from research to production” (HypergraphDB, Neo4j, W3C RDF).

Some cool things covered on the Scaling with NoSQL panel:

  • what’s life for operations folks?
    • Voldemort: little babysitting
    • Cassandra: the engineering team is the operations
    • Hypertable: easy to deploy, but harder to get HDFS right
    • HBase: config changes require rsynching configs to all machines which is doesn’t scale well. Twitter, Ryan King suggests capistrano
  • use cases/deployments in the wild
  • random bits:
    • HDFS not designed for lots of random reads
    • Hypertable vs. HBase: Judd says c++ makes for more efficient memory and cpu footprint. (note this sounds as a quite old argument)
    • Voldemort is persistent key-value store, whereas memcache is not persistent
    • BigTable folks point out that range scans suck in all other systems. Automatic partitioning (at least in Cassandra) needs some love as well

Topics covered on the Schema design and document-oriented DBs panel:

  • indexing
  • foreign keys and relationships
  • schemas/migrations (?)
  • horizontal partitioning (note interesting to notice that neither MongoDB nor CouchDB do have anything working out of the box)
  • consistency

I had the chance to watch myself the Evolution of a Graph Data structure from research to production panel which was very interesting and covered subjects like:

  • query model
  • implementation details
  • support for schemas (for transfer of knowledge inside your team)
  • use cases/live deployments

For a per project personal overview of the event, you could check Brian R.Jackson’s ☞ post, covering Cassandra, Memcached, Tokyo Cabinet, Hypertable, HBase.

I hope the videos will get out pretty soon so you’ll have a chance to watch them yourself.