ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

presentation: All content tagged as presentation in NoSQL databases and polyglot persistence

Video: Will Leinweber: Relaxing with CouchDB

If we never have enough intro presentations to MongoDB, why would we have enough CouchDB videos?

Embedded below is a video of Will Leinweber presenting Relaxing with CouchDB (38 minutes)

07 feb 2009 16 00 relaxing with couchdb will leinweber large

Update: Below you can see the slides from Will Leinweber presentation at red dirt ruby conference on CouchDB, Ruby and You

:

Presentation: Gary Dusbabek (Rackspace) on Cassandra

A presentation about Cassandra given by Rackspace’ Gary Dusbabek (@gdusbabek):

My notes:

What problems does it solve?

  • Reliability at scale
    • No Single point of failure (all nodes are identifical)
  • Simple scaling
    • linear
  • High write thoughput
  • Large data sets

What problems can’t it solve?

  • No flexible indices
  • No querying on non PK values
  • Not good for binary data (>64mb) unless you chunck
  • Row contents must fit in available memory

Concepts: CAP

  • Cassandra chooses A and P but allows them to be tunable to have more C

Data Model

  • Keyspace contains column families
  • ColumnFamily:
    • Standard or Super
    • Two levels of indexes (key and column names)

Data Model

  • Column and subcolumn sorting
  • Specify your own comparator:
    • TimeUUID
    • Lexical UUID
    • UTF8
    • Bytes
    • CreateYourOwn

Inserting: Writes

  • Commit log for durability
  • Memtable - no disk access (no reads or seeks)
  • Sstables are final (become read only)
    • Index
    • Bloom filter
    • Raw data
  • Atomic within a ColumnFamily
  • Bottom line: FAST!!

Note: make sure to check the slide for a nice visual description of Cassandra write operation. You should check also the Cassandra Write operation performance explained for more details.

Querying: Overview

Querying: Reads

  • Not as fast as writes
  • Read repair when out of sync
  • New in 0.6:
    • Row cache (avoid sstable lookup)
    • Key cache (avoid index scan)

Note: make sure you check the slide for a visual description of the Cassandra read operation. And you can also read the Cassandra Reads performance explained for more details.

Future Direction

  • Range delete (delete these cols from those keys)
  • Vector clocks (including server-side conflict resolution)
  • Altering keyspace/column family definitions on a live cluster
  • Byte[] keys
  • Compression
  • Multi-tenant support
  • Less memory restrictions

Presentation: Introducing Riak

This is the longest NoSQL presentation I’ve ever posted here: 209 slides! If you’re planning to beat Kevin Smith’s (@kevsmith) record please do let me know in advance so I can reserve enough time to go through it.

My notes below:

What is Riak?

  • A flexible storage engine…
  • … with a REST API …
  • … and map/reduce capability …
  • … designed to be fault-tolerant …
  • … distributed …
  • … and ops friendly

The Riak Way for CAP

  • Pick Two
  • For each operation

Riak Improvements on Amazon Dynamo N, R, W[1]

  • N can vary per bucket
  • R and W can vary per operation
  • *Choose your own fault tolerance/performance tradeoff

Conflict resolution: Client Resolution[2]

  • Can be set per-bucket or server-wide
  • Conflicting data is “bubbled up” to the client
  • Client picks the winner

Conflict resolution: Server Resolution

  • “Last write wins”
  • Enabled by default
  • What most apps need 80% of the time

The presentation covers also:

  • Linking objects (slide 78)
  • Map/Reduce (slide 99)

References

  • [1] N= number of replicas, R=number of replicas needed for a successful read, W=number of replicas needed for a successful write. ()
  • [2] Jeff Darcy has an interesting article on ☞ conflict resolution ()

Hadoop User Group March Meeting Recap

The meeting hosted lots of discussions and 3 presentations:

Owen O’Malley: Upcoming Hadoop Security release

Owen O’Malley from the Yahoo! Hadoop Team provided an overview of the upcoming Hadoop Security release. Owen described the features and capabilities included as well as operational benefits. Yahoo! is very excited about adding security capabilities to Hadoop and views this as major milestone in continuing to make Hadoop an enterprise-grade platform.

Tyson Condie: Hadoop Online

Tyson Condie a Ph.D. student at the University of California, Berkeley, presented the innovative research around Hadoop Online efforts lead by Prof. Joseph M. Hellerstein . Tyson described a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, can reduce completion times and improve system utilization. Tyson included examples from the HOP - Hadoop Online Prototype project.

Bradford Cross: Flightcaster

Bradford Cross from Flightcaster provided an exciting overview on the FlightCaster flight delays prediction service and some cool insights into the airline industry. Bradford described how they built a scalable machine learning and data analysis platform using Clojure dynamic programming language wrapping Cascading and Hadoop. Bradford demonstrated how the use of Hadoop makes building scalable systems much simpler

via: http://developer.yahoo.net/blog/archives/2010/03/hadoop_summit_2010_june_29_santa_clara_registration_is_now_open.html


Presentation: Tokyo Cabinet / Tyrant @ Nosql Paris

Embedded below are the slides of Florent Solt (@florentsolt) Tokyo Cabinet / Tyrant presented at Nosql Paris.

Florent seems to be working at Netvibes and his slides are presenting briefly how and what kind of Tokyo Cabinet setup is in use there.

I also liked the Tokyo Cabinet / Tyrant strength and weaknesses slides:

Tokyo Cabinet / Tyrant Weaknesses

Tokyo Cabinet / Tyrant Strenghts

  • Easy to deploy and setup
  • Easy to use
  • It’s not a black box
  • Good to very good performance for most of the time
  • Small memory footprint
  • A single Tokyo Tyrant process can handle thousands of connections
  • Many command line tools
  • Lua extensions

I’d definitely be interested to hear much more about how Netvibes is using Tokyo Cabinet / Tyrant, so ping me if you are ready to share more with the Tokyo Cabinet community.


Presentation: NoSQL databases by Harry Kauhanen

Quite a few interesting slides in Harry Kauhanen’s presentation:

Slide 5: Key-value stores

  • The value is a binary object aka “blob” — the DB does not understand it and does not want to understand it

Slide 7: Document databases

  • Key-value store, but the vlaue is (usually) structured and “understood” by the DB
  • Querying data is possible (by other means than just a key)

Slide 10: ide column stores

  • “a sparse, distributed multi-dimensional sorted map”

Slide 12: Graph databases

  • “Relational database is a collection of loosely connected tables” whereas “Graph database is a multi-relational graph”

Slide 14:

  • Relationships in RDBMS are “weak”
  • Relationships in graph databases are first class citizens

Slide 23: Why NoSQL?

  • Schema-free
  • Massive data stores
  • Scalability
  • Some services simpler to implement than using RDBMS
  • Great fit for many “Web 2.0” services

Slide 24: Why NOT NoSQL?

  • RDBMS and tools are mature
  • NoSQL implementations often “alpha”
  • Data consistency, transactions
  • “Don’t scale until you need it”

Presentation: Mathematics of Batch Processing

Do you remember the article on applying Amdhal’s Law to Hadoop Provisioning? Now you have it also in the form of a set of slides:


Learn MongoDB in 104… slides

You can pretty much say that you know a lot about MongoDB if you go through Kyle Banker’s (@hwaet) slides below:

But before saying that you know everything you need, I’d strongly encourage you to review the following notes from running MongoDB in production.


Presentation: Redis Overview

In the light of the news about Redis more people will start looking at it, so here is another slide deck from Ryan Findley. Once you are done with the slides you should probably check this other awesome Redis presentation and take a look at the great list of Redis usecases.


Presentation: Overview of HBase at Meetup

Sslides for the Overview of HBase at Meetup presentation.

My notes:

  • the options slide:
  • “scaling is built in, but extra indexing is DIY”. We had a post on this subject HBase secondary indexes
  • open source library for Java beans mapping to HBase tables ☞ meetup.beeno

Presentation: Intro to MongoDB by Alex Sharp

We’ve never got enough introductions to NoSQL systems. Embedded below are the slides from Alex Sharp’s (@ajsharp): Intro to MongoDB presentation. Just to allow you quick overview, you can find below also the text only version.

Text-only version of Intro to MongoDB

  • Slide: 1

    Intro to MongoDB

    Alex Sharp

    twitter: @ajsharp

  • Slide: 2

    So what is MongoDB?

  • Slide: 3

    First and foremost…

  • Slide: 4

    IT’S THE NEW HOTNESS!!!

  • Slide: 5

    omgomgomg

    SHINY OBJECTS

    omgomgomg

  • Slide: 6

    MongoDB (from “humongous”) is a scalable, high-performance, open source, schema-free, document-oriented database.

    - mongodb.org

  • Slide: 7

    Philosophy

  • Slide: 8

    Philosophy

    “One size fits all” approach no longer applies

  • Slide: 9

    Philosophy

    Non-relational DBs scale more easily, especially horizontally

  • Slide: 10

    Philosophy

    Focus on speed, performance, flexibility and scalability

  • Slide: 11

    Philosophy

    Not concerned with transactional stuff and relational semantics

  • Slide: 12

    Philosophy

    DBs should be an on-demand commodity, in a cloud-like fashion

  • Slide: 13

    Philosophy

    Mongo tries to achieve the performance of traditional key-value stores while maintaining functionality of traditional RDBMS

  • Slide: 14

    Features

  • Slide: 15

    Features

    Standard database stuff

  • Slide: 16

    Features

    Standard database stuff

    Indexing

  • Slide: 17

    Features

    Standard database stuff

    Indexing

    replication/failover support

  • Slide: 18

    Features: Document Storage

    Documents are stored in BSON (binary JSON)

  • Slide: 19

    BSON is a binary serialization of JSON-like objects

    Features: Document Storage

  • Slide: 20

    Features: Document Storage

    This is extremely powerful, b/c it means mongo understands JSON natively

  • Slide: 21

    Features: Document Storage

    Any valid JSON can be easily imported and queried

  • Slide: 22

    Features

    Schema-less; very flexible

  • Slide: 23

    Features

    Schema-less; very flexible

    no more blocking ALTER TABLE

  • Slide: 24

    Features

    Auto-sharding (alpha)

  • Slide: 25

    Features

    Makes for easy horizontal scaling

  • Slide: 26

    Features

    Map/Reduce

  • Slide: 27

    Features

    Very, very fast

  • Slide: 28

    Features

    Super easy to install

  • Slide: 29

    Features

    Strong with major languages

  • Slide: 30

    Features

    Document-oriented = flexible

  • Slide: 31

    Features: Querying

    Rich, javascript-based query syntax

  • Slide: 32

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to deep, nested queries

  • Slide: 33

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to do deep, nested queries

    db.order.find( { shipping: { carrier: "usps" } } );

  • Slide: 34

    Features: Querying

    Rich, javascript-based query syntax

    Allows us to deep, nested queries

    db.order.find( { shipping: { carrier: "usps" } } );

    shipping is an embedded document (object)

  • Slide: 35

    Features: Binary Object Store

    Efficient binary large object store via GridFS

  • Slide: 36

    Features: Binary Object Store

    Efficient binary large object store via GridFS

    i.e. store images, videos, anything

  • Slide: 37

    Concepts

  • Slide: 38

    Concepts: Document-oriented

    Think of “documents” as database records

  • Slide: 39

    Concepts: Document-oriented

    Think of “documents” as database records

    Documents are basically just JSON objects that Mongo stores in binary

  • Slide: 40

    Concepts: Document-oriented

    Think of “collections” as database tables

  • Slide: 44

    Concept Mapping

    RDBMS (mysql, postgres)

    Tables

    Records/rows

    Queries return record(s)

    MongoDB

    Collections

    Documents/objects

    Queries return a cursor

     ???

  • Slide: 45

    Concepts: Cursors

    Queries return “cursors” instead of collections

  • Slide: 46

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

  • Slide: 47

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

    A big reason for this is performance

  • Slide: 48

    Concepts: Cursors

    Queries return “cursors” instead of collections

    A cursor allows you to iterate through the result set

    A big reason for this is performance

    Much more efficient than loading all objects into memory

  • Slide: 49

    Concepts: Cursors

    The find() function returns a cursor object

  • Slide: 50

    Concepts: Cursors

    The find() function returns a cursor object

    var cursor = db.logged_requests.find({ 'status_code' : 200 })

    cursor.hasNext() // "true"

    cursor.forEach( function (item) {

    print(tojson(item))

    });

    cursor.hasNext() // "false"

  • Slide: 51

    Cool Features

  • Slide: 52

    Cool Features

    Capped collections

  • Slide: 53

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

  • Slide: 54

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

  • Slide: 55

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

    Super fast

  • Slide: 56

    Cool Features

    Capped collections

    Fixed-sized, limited operation, auto-LRU age-out collections

    Fixed insertion order

    Super fast

    Ideal for logging and caching

  • Slide: 57

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

  • Slide: 58

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

  • Slide: 59

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

  • Slide: 60

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

    Import into mongo (mongoimport -f filename.json)

  • Slide: 61

    Cool Uses

    Data Warehouse

    Mongo understands JSON natively

    Very powerful for analysis

    Query a bunch of data from some web service

    Import into mongo (mongoimport -f filename.json)

    Analyze to your heart’s content

  • Slide: 62

    Cool Uses

    Harmonyapp.com

    Large rails app for building websites (kind of a CMS)

  • Slide: 63

    Cool Uses

    Hardcore debugging

    Spit out large amounts of data

  • Slide: 64

    Limitations

    Transaction support

  • Slide: 65

    Limitations

    Transaction support

    Relational integrity

  • Slide: 66

    Resources