voltdb: All content tagged as voltdb in NoSQL databases and polyglot persistence
Thursday, 5 January 2012
VoltDB for Real-Time Network Monitoring
From the announcement of VoltDB being used by the Japanese ISP, Sakura Internet, for their real-time Internet traffic monitoring and analysis platform for detecting and mitigating large-scale distributed denial of service (DDoS) attacks:
Tamihiro Yuzawa[1]: Our system needs to be capable of sifting through massive amounts of traffic flow data in real-time. VoltDB was our choice from the beginning because it’s a super-fast datastore that supports SQL.
Scott Jarr[2]: Sakura’s security infrastructure requires a datastore that can scale massively and on demand, without sacrificing data accuracy.
Mark these VoltDB keywords:
- fast (read in-memory)
- data consistency
- SQL
Original title and link: VoltDB for Real-Time Network Monitoring (©myNoSQL)
Friday, 8 July 2011
Comments on Urban Myths About NoSQL
Dan Weinreb comments on Michael Stonebraker’s Urban Myths about SQL (PDF) :
Dr. Michael Stonebraker recently posted a presentation entitled “Urban Myths about NoSQL”. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new “NoSQL” data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.
In fact, Michael Stonebraker bashes everything that is not his current product—this GigaOm interview is the latest example.
For now, I’m filing this away until VoltDB is sold.
Original title and link: Comments on Urban Myths About NoSQL (©myNoSQL)
Monday, 20 June 2011
Multi-Document Transactions in RavenDB vs Other NoSQL Databases
“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”
This is how Oren Eini starts his post about RavenDB support for multi-document transactions and the lack of it from MongoDB:
- For a single server, we support atomic multi document writes natively. (note that this isn’t the case for Mongo even for a single server).
- For multiple servers, we strongly recommend that your sharding strategy will localize documents, meaning that the actual update is only happening on a single server.
- For multi server, multi document atomic updates, we rely on distributed transactions.
In the NoSQL space, there are a couple of other solutions that support transactions:
- Google Megastore
- Redis has two mechanisms that come close to transactions: MULTI/EXEC/DISCARD and pipelining —this one is exemplified in this Redis based triplestore database implementation
- many of the graph databases (Neo4j, HyperGraphDB, InfoGrid)
If you look at these from the perspective of distributed systems, the only distributed ones that support transactions are Megastore and RavenDB. There’s also VoltDB which is all transactions. Are there any I’ve left out?
Original title and link: Multi-Document Transactions in RavenDB vs Other NoSQL Databases (NoSQL database©myNoSQL)
Tuesday, 17 May 2011
Short Notes about VoltDB
Here are the notes I’ve made while watching a webinar about building applications with VoltDB.
What I like:
- it forces you to think upfront about data partitioning by specifying partitioned or replicated tables
- it forces you to think about data access patterns by asking to define the Java-based stored procedures
- it provides both a synchronous and asynchronous API
- there’s an option to run any queries in development mode
What I don’t like:
- you need to compile and deploy the schema, queries, etc.
- you have to define the cluster topology in an XML file
- everything is transactional
Let’s say you have
k-factor2 and a materialized view: an insert will put your data on the 3 servers and the materialized view within a single transaction. - it’s not clear how you could evolve your schema
- the API doesn’t use timeouts
Original title and link: Short Notes about VoltDB (NoSQL databases © myNoSQL)
Wednesday, 23 March 2011
How Scalable is VoltDB?
Percona guys[1] have run, analyzed, and concluded about VoltDB scalability:
VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores
Considering the definition: “A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.”, the conclusion should be slightly updated:
VoltDB can scale up to 120 partitions on 39 servers with 300 CPU cores and 1.6 million TPS.
Bottom line:
- if you can fit your data into 40 servers’ memory
- you need ACID and SQL
- you are OK precompiled Java based stored procedures
- you don’t need multi data center deployments
now you can estimate how far you can go with VoltDB.
-
The company specialized on MySQL services and behind the MySQL Performance Blog ↩
Original title and link: How Scalable is VoltDB? (NoSQL databases © myNoSQL)
via: http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/
Thursday, 17 March 2011
MySQL Fork Drizzle Released
Drizzle aims to be different from MySQL, stripping out “unnecessary” features loved by enterprise and OEMs in the name of greater speed and simplicity and for reduced management overhead.
Drizzle has no stored procedures, triggers, or views […]
Aiming to provide a database for the cloud with support for massive concurrency optimized for increased performance, Drizzle team started by removing “non-essential” code and features. Michael Stonebraker’s VoltDB is focusing on a different set of optimizations for achieving performance — removing logging, locking, latching, buffer management[1].
Anyway, it is not about who’s approach is better, but which scenarios are covered by using a simplified MySQL compatible database or by an in-memory with predefined queries database.
-
The “NoSQL” Discussion has Nothing to Do With SQL:
If one eliminates any one of the above overhead components, one speeds up a DBMS by 25%. Eliminate three and your speedup is limited by a factor of two. You must get rid of all four to run a lot faster.
Original title and link: MySQL Fork Drizzle Released (NoSQL databases © myNoSQL)
via: http://www.channelregister.co.uk/2011/03/16/drizzle_released/
Tuesday, 18 January 2011
VoltDB: 3 Concepts that Makes it Fast
John Hugg lists the 3 concepts that make VoltDB fast:
- Exploit repeatable workloads: VoltDB exclusively uses a stored procedure interface.
- Partition data to horizontally scale: VoltDB devides data among a set of machines (or nodes) in a cluster to achieve parallelization of work and near linear scale-out.
- Build a SQL executor that’s specialized for the problem you’re trying to solve.: If stored procedures take microseconds, why interleave their execution with a complex system of row and table locks and thread synchronization? It’s much faster and simpler just to execute work serially.
Let’s take a quick look at these.
Using stored procedures — instead of allowing free form queries — would allow the system:
- to completely skip query parsing, creating and optimizing execution plans at runtime
- by analyzing (at deploy time) the set of stored procedures, it might also be possible to generate the appropriate indexes
The benefits of horizontally partitioned data are well understood: parallelization and also easier and cost effective hardware usage.
Single threaded execution can also help by removing the need for locking and reducing data access contention.
While these 3 solutions are making a lot of sense and can definitely make a system faster, there’s one major aspect of VoltDB that’s missing from the above list and which I think is critical to explaining its speed: VoltDB is an in-memory storage solution.
Here are a couple of examples of other NoSQL databases that benefit from being in memory (or as close as possible to it). MongoDB, while being a lot more liberal with the queries it accepts, can deliver very fast results by keeping as much data in memory as possible — remember what happened when it had to hit the disk more often? — and using appropriate indexes where needed. Redis and Memcached can deliver amazingly fast results because they keep all data in-memory. And Redis is single threaded while Memcached is not.
Original title and link: VoltDB: 3 Concepts that Makes it Fast (NoSQL databases © myNoSQL)
Friday, 19 November 2010
Integrating VoltDB and Hadoop
A paper on integrating VoltDB and Hadoop. From what I read, for now it works on a single direction (exporting data from VoltDB to Hadoop):
It is possible to design and develop a complete business solution utilizing both VoltDB and Hadoop from scratch. But you do not need to. VoltDB simplifies the process by providing an export facility that lets you automatically archive selected data from the VoltDB database. And you can use this export functionality with Hadoop.
See the paper below:
Tuesday, 9 November 2010
VoltDB Release: Version 1.2 Featuring Data Availability Enhancements
VoltDB 1.2 released earlier this month:
New data availability features. Version 1.2 introduces two important data availability enhancements. The first is network partition tolerance, which allows VoltDB to automatically detect, isolate and manage network failures. This is a critical feature for distributed database infrastructures including those deployed into public clouds such as Amazon’s EC2. The second availability feature, node rejoin, allows VoltDB database nodes that have been taken offline (e.g., for maintenance or repair) to “rejoin” the cluster while the database is live. Node rejoin dynamically resynchronizes all node data.
I’d love to read more about about the mechanisms used for automatically detecting, isolating and managing network failures. (If I remember correctly) The topic of reliably determining partitions in a distributed system is a central part of Seth Gilbert and Nancy Lynch paper on CAP theorem. It would also be interesting to understand how VoltDB deals with its strong consistency promise in these situations.
And some management tools (nb: by the announcement text I cannot tell if they are available only in the Enterprise version):
New consoles for provisioning, management and monitoring. New in the Enterprise Edition of version 1.2, the VoltDB Enterprise Manager (VEM) provides database and systems administrators with browser-based tools for managing production VoltDB databases. VEM offers a flexible suite of consoles for performing many common administrative and diagnostic activities.
Original title and link: VoltDB Release: Version 1.2 Featuring Data Availability Enhancements (NoSQL databases © myNoSQL)
via: https://voltdb.com/content/voltdb-releases-version-12-high-performance-oltp-database
Tuesday, 2 November 2010
Using MySQL as NoSQL: A Story for exceeding 750k qps
How many times do you need to run PK lookups per second? […] These are “SQL” overhead. It’s obvious that performance drops were caused by mostly SQL layer, not by “InnoDB(storage)” layer. MySQL has to do a lot of things like below while memcached/NoSQL do not neeed to do.
- Parsing SQL statements
- Opening, locking tables
- Making SQL execution plans
- Unlocking, closing tables
MySQL also has to do lots of concurrency controls.
The story has been out for a couple of weeks already, so I’ll not get into the details. But I felt like adding a couple of comments to the subject:
- existing RDBMS storage engines are most of the time very well thought and long time tested
- some NoSQL databases have realized that and allow plugging in such storage engines in their systems:
- Project Voldemort supports Berkley DB (and MySQL, but not sure it goes around the SQL engine)
- [Riak comes with Innostore], an InnoDB-based storage
- many of the findings in this article sound very close to the rationale behind VoltDB, including the pre-compiled, cluster deployed stored procedures
Original title and link: Using MySQL as NoSQL: A Story for exceeding 750k qps (NoSQL databases © myNoSQL)
via: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
Saturday, 9 October 2010
VoltDB: An SQL Developer’s Perspective
Two hours of VoltDB. Planning to watch it over the weekend:
Original title and link: VoltDB: An SQL Developer’s Perspective (NoSQL databases © myNoSQL)
Friday, 25 June 2010
NoSQL benchmarks and performance evaluations
Some say it is the right time to start having these around. Others are saying it’s way to early to start the “battle”. Users do want to see them and in case they’re lacking they create their own, most of the time using incomplete or wrong approaches.
But what am I talking about? As some of you might have guessed already:
NoSQL benchmarks and performance evaluations!
With their recent release of Riak 0.11.0, Basho guys have also published their internal ☞ benchmarking code. Similar internal benchmark code is ☞ available for MongoDB.
But users are more interested in seeing cross product benchmarks, even if most of the time constructing these is extremely complicated and they end up comparing apples with oranges.
All these being said and accepting that most of the time someone will figure out a way to invalidate the results, lets see what cross product benchmarks do we have in the NoSQL space.
Yahoo! Cloud Serving Benchmark
The Yahoo! Cloud Serving Benchmark’s goal is to facilitate performance comparisons of the new generation of cloud data serving systems. The source code is available on ☞ GitHub and Yahoo! has also published ☞ the results of running this benchmark against Cassandra, HBase, Yahoo!’s PNUTS, and a simple sharded MySQL implementation.
VoltDB Benchmark
VoltDB a new storage solution that calls itself the next-generation SQL RDBMS with ACID for fast-scaling OLTP applications has recently ☞ published the results of their benchmark comparing VoltDB and Cassandra.
It is worth noting that while being one of those apples to oranges comparisons (nb and the authors are well aware of it), there are still a couple of interesting and useful things to be learned from it (i.e. benchmarking procedure, tested scenarios, etc.)
Unfortunately at this time the source code is not yet available, but hopefully we will see it soon:
Going forward, we’re planning to release the code we used to do these benchmarks. We’d also like to try a few other storage layers
Hypertable and HBase Performance Evaluation
The guys behind Hypertable ☞ have published their results of comparing Hypertable with HBase using a benchmark based on the Google BigTable paper[1] from which both HBase and Hypertable are inheriting their architecture. Unfortunately, the benchmark code is not available at this moment.
Thanks to Stu Hood, now I know the code for this benchmark is available in the Hypertable distribution available ☞ here (tar.gz) and the configuration files are also available ☞ here (tar.gz)
So, as far as I could gather we have:
- ☞ Riak internal benchmark
- ☞ MongoDB internal benchmark
- ☞ Yahoo! Cloud Serving Benchmark
- results only of VoltDB Benchmark comparing VoltDB and Cassandra
- BigTable-inspired benchmark comparing Hypertable and HBase
Did I miss any?
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling