presentation: All content tagged as presentation in NoSQL databases and polyglot persistence
Thursday, 13 May 2010
Video: Will Leinweber: Relaxing with CouchDB
If we never have enough intro presentations to MongoDB, why would we have enough CouchDB videos?
Embedded below is a video of Will Leinweber presenting Relaxing with CouchDB (38 minutes)
Update: Below you can see the slides from Will Leinweber presentation at red dirt ruby conference on CouchDB, Ruby and You
:Tuesday, 13 April 2010
Presentation: Gary Dusbabek (Rackspace) on Cassandra
A presentation about Cassandra given by Rackspace’ Gary Dusbabek (@gdusbabek):
My notes:
What problems does it solve?
- Reliability at scale
- No Single point of failure (all nodes are identifical)
- Simple scaling
- linear
- High write thoughput
- Large data sets
What problems can’t it solve?
- No flexible indices
- No querying on non PK values
- Not good for binary data (>64mb) unless you chunck
- Row contents must fit in available memory
Concepts: CAP
- Cassandra chooses A and P but allows them to be tunable to have more C
Data Model
- Keyspace contains column families
- ColumnFamily:
- Standard or Super
- Two levels of indexes (key and column names)


Data Model
- Column and subcolumn sorting
- Specify your own comparator:
- TimeUUID
- Lexical UUID
- UTF8
- Bytes
- CreateYourOwn
Inserting: Writes
- Commit log for durability
- Memtable - no disk access (no reads or seeks)
- Sstables are final (become read only)
- Index
- Bloom filter
- Raw data
- Atomic within a ColumnFamily
- Bottom line: FAST!!
Note: make sure to check the slide for a nice visual description of Cassandra write operation. You should check also the Cassandra Write operation performance explained for more details.
Querying: Overview
- But secondary indices are being worked on (see ☞ CASSANDRA-749)
Querying: Reads
- Not as fast as writes
- Read repair when out of sync
- New in 0.6:
- Row cache (avoid
sstablelookup) - Key cache (avoid index scan)
- Row cache (avoid
Note: make sure you check the slide for a visual description of the Cassandra read operation. And you can also read the Cassandra Reads performance explained for more details.
Future Direction
- Range delete (delete these cols from those keys)
- Vector clocks (including server-side conflict resolution)
- Altering keyspace/column family definitions on a live cluster
- Byte[] keys
- Compression
- Multi-tenant support
- Less memory restrictions
Thursday, 8 April 2010
Presentation: Introducing Riak
This is the longest NoSQL presentation I’ve ever posted here: 209 slides! If you’re planning to beat Kevin Smith’s (@kevsmith) record please do let me know in advance so I can reserve enough time to go through it.
My notes below:
What is Riak?
- A flexible storage engine…
- … with a REST API …
- … and map/reduce capability …
- … designed to be fault-tolerant …
- … distributed …
- … and ops friendly
The Riak Way for CAP
- Pick Two
- For each operation
Riak Improvements on Amazon Dynamo N, R, W[1]
- N can vary per bucket
- R and W can vary per operation
- *Choose your own fault tolerance/performance tradeoff
Conflict resolution: Client Resolution[2]
- Can be set per-bucket or server-wide
- Conflicting data is “bubbled up” to the client
- Client picks the winner
Conflict resolution: Server Resolution
- “Last write wins”
- Enabled by default
- What most apps need 80% of the time
The presentation covers also:
- Linking objects (slide 78)
- Map/Reduce (slide 99)
References
- [1] N= number of replicas, R=number of replicas needed for a successful read, W=number of replicas needed for a successful write. (↩)
- [2] Jeff Darcy has an interesting article on ☞ conflict resolution (↩)
Tuesday, 6 April 2010
Hadoop User Group March Meeting Recap
The meeting hosted lots of discussions and 3 presentations:
Owen O’Malley: Upcoming Hadoop Security release
Owen O’Malley from the Yahoo! Hadoop Team provided an overview of the upcoming Hadoop Security release. Owen described the features and capabilities included as well as operational benefits. Yahoo! is very excited about adding security capabilities to Hadoop and views this as major milestone in continuing to make Hadoop an enterprise-grade platform.
Tyson Condie: Hadoop Online
Tyson Condie a Ph.D. student at the University of California, Berkeley, presented the innovative research around Hadoop Online efforts lead by Prof. Joseph M. Hellerstein . Tyson described a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, can reduce completion times and improve system utilization. Tyson included examples from the HOP - Hadoop Online Prototype project.
Bradford Cross: Flightcaster
Bradford Cross from Flightcaster provided an exciting overview on the FlightCaster flight delays prediction service and some cool insights into the airline industry. Bradford described how they built a scalable machine learning and data analysis platform using Clojure dynamic programming language wrapping Cascading and Hadoop. Bradford demonstrated how the use of Hadoop makes building scalable systems much simpler
Friday, 26 March 2010
Presentation: Tokyo Cabinet / Tyrant @ Nosql Paris
Embedded below are the slides of Florent Solt (@florentsolt) Tokyo Cabinet / Tyrant presented at Nosql Paris.
Florent seems to be working at Netvibes and his slides are presenting briefly how and what kind of Tokyo Cabinet setup is in use there.

I also liked the Tokyo Cabinet / Tyrant strength and weaknesses slides:
Tokyo Cabinet / Tyrant Weaknesses
- No bug tracker, no public code repository
Note: not so long ago, I’ve posted about these concerns in the Tokyo Cabinet community
- The documentation is not good enough
Note: I’m might have some good news here. Stay tuned!
- Under heavy load, master-master replication can fail
- Databases can be corrupted
Note: you might find Tokyo Cabinet database recovery useful for such unwanted situations
- With big tables, queries need a lot of RAM and time
- Tables seem slow & their configuration is not so clear
Note: you can learn more about the different Tokyo Cabinet database types and their configuration options
- No live backup, the copy function locks the database
Tokyo Cabinet / Tyrant Strenghts
- Easy to deploy and setup
- Easy to use
- It’s not a black box
- Good to very good performance for most of the time
- Small memory footprint
- A single Tokyo Tyrant process can handle thousands of connections
- Many command line tools
- Lua extensions
I’d definitely be interested to hear much more about how Netvibes is using Tokyo Cabinet / Tyrant, so ping me if you are ready to share more with the Tokyo Cabinet community.
Tuesday, 23 March 2010
Presentation: NoSQL databases by Harry Kauhanen
Quite a few interesting slides in Harry Kauhanen’s presentation:
Slide 5: Key-value stores
- The value is a binary object aka “blob” — the DB does not understand it and does not want to understand it
Slide 7: Document databases
- Key-value store, but the vlaue is (usually) structured and “understood” by the DB
- Querying data is possible (by other means than just a key)
Slide 10: ide column stores
- “a sparse, distributed multi-dimensional sorted map”
Slide 12: Graph databases
- “Relational database is a collection of loosely connected tables” whereas “Graph database is a multi-relational graph”
Slide 14:
- Relationships in RDBMS are “weak”
- Relationships in graph databases are first class citizens
Slide 23: Why NoSQL?
- Schema-free
- Massive data stores
- Scalability
- Some services simpler to implement than using RDBMS
- Great fit for many “Web 2.0” services
Slide 24: Why NOT NoSQL?
- RDBMS and tools are mature
- NoSQL implementations often “alpha”
- Data consistency, transactions
- “Don’t scale until you need it”
Monday, 22 March 2010
Presentation: Mathematics of Batch Processing
Do you remember the article on applying Amdhal’s Law to Hadoop Provisioning? Now you have it also in the form of a set of slides:
Thursday, 18 March 2010
Learn MongoDB in 104… slides
You can pretty much say that you know a lot about MongoDB if you go through Kyle Banker’s (@hwaet) slides below:
But before saying that you know everything you need, I’d strongly encourage you to review the following notes from running MongoDB in production.
Wednesday, 17 March 2010
Presentation: Redis Overview
In the light of the news about Redis more people will start looking at it, so here is another slide deck from Ryan Findley. Once you are done with the slides you should probably check this other awesome Redis presentation and take a look at the great list of Redis usecases.
Monday, 8 March 2010
Presentation: Overview of HBase at Meetup
Sslides for the Overview of HBase at Meetup presentation.
My notes:
- the options slide:

- “scaling is built in, but extra indexing is DIY”. We had a post on this subject HBase secondary indexes
- open source library for Java beans mapping to HBase tables ☞ meetup.beeno
Friday, 5 March 2010
Presentation: Intro to MongoDB by Alex Sharp
We’ve never got enough introductions to NoSQL systems. Embedded below are the slides from Alex Sharp’s (@ajsharp): Intro to MongoDB presentation. Just to allow you quick overview, you can find below also the text only version.
Text-only version of Intro to MongoDB
-
Slide: 1
Intro to MongoDB
Alex Sharp
twitter: @ajsharp
-
Slide: 2
So what is MongoDB?
-
Slide: 3
First and foremost…
-
Slide: 4
IT’S THE NEW HOTNESS!!!
-
Slide: 5
omgomgomg
SHINY OBJECTS
omgomgomg
-
Slide: 6
MongoDB (from “humongous”) is a scalable, high-performance, open source, schema-free, document-oriented database.
- mongodb.org
-
Slide: 7
Philosophy
-
Slide: 8
Philosophy
“One size fits all” approach no longer applies
-
Slide: 9
Philosophy
Non-relational DBs scale more easily, especially horizontally
-
Slide: 10
Philosophy
Focus on speed, performance, flexibility and scalability
-
Slide: 11
Philosophy
Not concerned with transactional stuff and relational semantics
-
Slide: 12
Philosophy
DBs should be an on-demand commodity, in a cloud-like fashion
-
Slide: 13
Philosophy
Mongo tries to achieve the performance of traditional key-value stores while maintaining functionality of traditional RDBMS
-
Slide: 14
Features
-
Slide: 15
Features
Standard database stuff
-
Slide: 16
Features
Standard database stuff
Indexing
-
Slide: 17
Features
Standard database stuff
Indexing
replication/failover support
-
Slide: 18
Features: Document Storage
Documents are stored in BSON (binary JSON)
-
Slide: 19
BSON is a binary serialization of JSON-like objects
Features: Document Storage
-
Slide: 20
Features: Document Storage
This is extremely powerful, b/c it means mongo understands JSON natively
-
Slide: 21
Features: Document Storage
Any valid JSON can be easily imported and queried
-
Slide: 22
Features
Schema-less; very flexible
-
Slide: 23
Features
Schema-less; very flexible
no more blocking ALTER TABLE
-
Slide: 24
Features
Auto-sharding (alpha)
-
Slide: 25
Features
Makes for easy horizontal scaling
-
Slide: 26
Features
Map/Reduce
-
Slide: 27
Features
Very, very fast
-
Slide: 28
Features
Super easy to install
-
Slide: 29
Features
Strong with major languages
-
Slide: 30
Features
Document-oriented = flexible
-
Slide: 31
Features: Querying
Rich, javascript-based query syntax
-
Slide: 32
Features: Querying
Rich, javascript-based query syntax
Allows us to deep, nested queries
-
Slide: 33
Features: Querying
Rich, javascript-based query syntax
Allows us to do deep, nested queries
db.order.find( { shipping: { carrier: "usps" } } ); -
Slide: 34
Features: Querying
Rich, javascript-based query syntax
Allows us to deep, nested queries
db.order.find( { shipping: { carrier: "usps" } } );shipping is an embedded document (object)
-
Slide: 35
Features: Binary Object Store
Efficient binary large object store via GridFS
-
Slide: 36
Features: Binary Object Store
Efficient binary large object store via GridFS
i.e. store images, videos, anything
-
Slide: 37
Concepts
-
Slide: 38
Concepts: Document-oriented
Think of “documents” as database records
-
Slide: 39
Concepts: Document-oriented
Think of “documents” as database records
Documents are basically just JSON objects that Mongo stores in binary
-
Slide: 40
Concepts: Document-oriented
Think of “collections” as database tables
-
Slide: 44
Concept Mapping
RDBMS (mysql, postgres)
Tables
Records/rows
Queries return record(s)
MongoDB
Collections
Documents/objects
Queries return a cursor
???
-
Slide: 45
Concepts: Cursors
Queries return “cursors” instead of collections
-
Slide: 46
Concepts: Cursors
Queries return “cursors” instead of collections
A cursor allows you to iterate through the result set
-
Slide: 47
Concepts: Cursors
Queries return “cursors” instead of collections
A cursor allows you to iterate through the result set
A big reason for this is performance
-
Slide: 48
Concepts: Cursors
Queries return “cursors” instead of collections
A cursor allows you to iterate through the result set
A big reason for this is performance
Much more efficient than loading all objects into memory
-
Slide: 49
Concepts: Cursors
The
find()function returns a cursor object -
Slide: 50
Concepts: Cursors
The
find()function returns a cursor objectvar cursor = db.logged_requests.find({ 'status_code' : 200 })cursor.hasNext() // "true"cursor.forEach( function (item) {print(tojson(item))});cursor.hasNext() // "false" -
Slide: 51
Cool Features
-
Slide: 52
Cool Features
Capped collections
-
Slide: 53
Cool Features
Capped collections
Fixed-sized, limited operation, auto-LRU age-out collections
-
Slide: 54
Cool Features
Capped collections
Fixed-sized, limited operation, auto-LRU age-out collections
Fixed insertion order
-
Slide: 55
Cool Features
Capped collections
Fixed-sized, limited operation, auto-LRU age-out collections
Fixed insertion order
Super fast
-
Slide: 56
Cool Features
Capped collections
Fixed-sized, limited operation, auto-LRU age-out collections
Fixed insertion order
Super fast
Ideal for logging and caching
-
Slide: 57
Cool Uses
Data Warehouse
Mongo understands JSON natively
-
Slide: 58
Cool Uses
Data Warehouse
Mongo understands JSON natively
Very powerful for analysis
-
Slide: 59
Cool Uses
Data Warehouse
Mongo understands JSON natively
Very powerful for analysis
Query a bunch of data from some web service
-
Slide: 60
Cool Uses
Data Warehouse
Mongo understands JSON natively
Very powerful for analysis
Query a bunch of data from some web service
Import into mongo (mongoimport -f filename.json)
-
Slide: 61
Cool Uses
Data Warehouse
Mongo understands JSON natively
Very powerful for analysis
Query a bunch of data from some web service
Import into mongo (mongoimport -f filename.json)
Analyze to your heart’s content
-
Slide: 62
Cool Uses
Harmonyapp.com
Large rails app for building websites (kind of a CMS)
-
Slide: 63
Cool Uses
Hardcore debugging
Spit out large amounts of data
-
Slide: 64
Limitations
Transaction support
-
Slide: 65
Limitations
Transaction support
Relational integrity
-
Slide: 66
Resources
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling