column store: All content tagged as column store in NoSQL databases and polyglot persistence
Wednesday, 5 October 2011
An Infobright Column Store Use Case
Alex Pinkin describes the difference a column store, Infobright, made to solving their problems implementing dashboards, reports, and alerts:
What is the secret sauce in Infobright? First, its column oriented storage model which leads to smaller disk I/O. Second, its “knowledge grid” which is aggregate data Infobright calculates during data loading. Data is stored in 65K Data Packs. Data Pack nodes in the knowledge grid contain a set of statistics about the data that is stored in each of the Data Packs. For instance, Infobright can pre-calculate min, max, and avg value for each column in the pack during the load, as well as keep track of distinct values for columns with low cardinality. Such metadata can really help when executing a query since it’s possible to ignore data packs which have no data matching filter criteria. If a data pack can be ignored, there is no penalty associated with decompressing the data pack.
Compared to our MySQL implementation, Infobright eliminated the need to create and manage indexes, as well as to partition tables.
Original title and link: An Infobright Column Store Use Case (©myNoSQL)
Tuesday, 24 May 2011
Dynamic Columns in MariaDB: a SQL to NoSQL Bridge
Dynamic columns allows you to store a different set of columns for every row in the table. […]
Dynamic columns works by storing the extra columns in a blob and having a small set of functions to manipulate it. The functions exist both in SQL and in the MariaDB client library to allow you to manipulate the data where it suits you best.
Nice addition to MariaDB that will hopefully make it to MySQL in the future. You can select, add, update and filter on a dynamic column (nb: though I couldn’t find if you can define indexes on dynamic columns or filtering will require full table scans). The only drawback I see at the current implementation is that dynamic columns are positional not named.
Original title and link: Dynamic Columns in MariaDB: a SQL to NoSQL Bridge (NoSQL databases © myNoSQL)
via: http://monty-says.blogspot.com/2011/05/dynamic-columns-as-bridge-between-sql.html
Wednesday, 2 March 2011
Storing RDF in Wide-Column Databases (Cassandra, HBase)
You basically have two options in how to store RDF data in wide-column databases like HBase and Cassandra: the resource-centric approach and the statement-centric approach.
In the statement-oriented approach, each RDF statement corresponds to a row key (for instance, a UUID) and contains subject, predicate and object columns. In Cassandra, each of these would be supercolumns that would then contain subcolumns such as type and value, to differentiate between RDF literals, blank nodes and URIs. If you needed to support named graphs, each row could also have a context column that would contain a list of the named graphs that the statement was part of.
[…]
In view of the previous considerations, the resource-oriented approach is generally a better natural fit for storing RDF data in wide-column databases. In this approach, each RDF subject/resource corresponds to a row key, and each RDF predicate/property corresponds to a column or supercolumn. Keyspaces can be used to represent RDF repositories, and column families can be used to represent named graphs.
Leaving aside Cassandra or HBase or Riak or the over a dozen existing solutions, you can always build a triple store on MongoDB.
Original title and link: Storing RDF in Wide-Column Databases (Cassandra, HBase) (NoSQL databases © myNoSQL)
via: http://www.semanticoverflow.com/questions/716/storing-rdf-data-into-hbase/813#813
Friday, 14 January 2011
6 Criteria for Real Column Stores
Michael Stonebraker has published on Vertica blog an article presenting 6 criteria for characterizing the completeness of a column store implementation:
I/O Characteristics
- IO-1 (basic column store): Every storage block contains data from only ONE column.
- IO-2: Aggressive compression
- IO-3: No record-ids
CPU Characteristics
- CPU-4: A column executor
- CPU-5: Executor runs on compressed data
- CPU-6: Executor can process columns that are key sequence or entry sequence
Michael’s post is going after big fishes in the ocean (SybaseIQ, EMC Greenplum, Aster Data, Oracle) and in case this is the area that interests you, you should also check Curt Monash’s follow up.
But getting back to these 6 criteria for column stores, I confess that this time these seem to make a lot of sense. So, I’m wondering how NoSQL column-stores — Cassandra, HBase, and Hypertable — are doing from this perspective. I’d really appreciate some expert comments so we have a follow up with the status of NoSQL column-stores according to these criteria.
Update: Alex Feinberg pointed me to Daniel Abadi’s article that clarifies the distinction between solutions Michael’s post is mentioning and the new NoSQL column stores.
While not remembering exactly this article, I’ve continued to maintain this separation and my post’s intention is to make sure the separation is kept, but also to get experts feedback on the following questions:
- do any of these criteria apply to NoSQL column stores?
- if a criterion applies than how NoSQL column stores score at it?
- if a criterion doesn’t apply, why doesn’t it apply?
Original title and link: 6 Criteria for Real Column Stores (NoSQL databases © myNoSQL)
Friday, 17 September 2010
Hypertable 0.9.4.0 Released, Over 40 Improvements and Bug Fixes
Many improvements to garbage collection, a Hypertable monitoring web interface, upgraded Thrift and many more. The complete list of changes for Hypertable 0.9.4.0 can be found ☞ here.
I’ve embedded also a presentation by Doug Judd on Hypertable (nb: if you prefer videos you should check this great presentation: Hypertable: The Ultimate Scaling Machine
Original title and link: Hypertable 0.9.4.0 Released, Over 40 Improvements and Bug Fixes (NoSQL databases © myNoSQL)
Saturday, 14 August 2010
Why Your Startup Should Use A Schema-less Database
SQL is phenomenal for enforcing rigidity onto tightly defined problems. It’s fast, mature, stable, and even a mediocre developer can JOIN their way out of a paper bag. Save it for your next government defense contract. Build your startup’s tech on the assumption that your business premise will change, and that you need to be ready for it. Your data schema is a direct corollary with how you view your business’ direction and tech goals. When you pivot, especially if it’s a significant one, your data may no longer make sense in the context of that change. Give yourself room to breath. A schema-less data model is MUCH easier to adapt to rapidly changing requirements than a highly structured, rigidly enforced schema.
Avoiding schema changes and data migration are good reasons.
Why Your Startup Should Use A Schema-less Database originally posted on the NoSQL blog: myNoSQL
via: http://www.cleverkoala.com/2010/08/why-your-startup-should-be-using-mongodb/
Thursday, 24 June 2010
VoltDB Don’ts Validating NoSQL Assumptions
Interesting to note that some VoltDB don’ts from the paper ☞ Do’s and Don’ts (pdf) are validating some major assumptions in the NoSQL space:
Don’t create tables with very large rows (that is, lots of columns or large VARCHAR columns). Several smaller tables with a common partitioning key are better.
Basically both wide-column stores (i.e. Cassandra, HBase, Hypertable) with their column-families and document databases (i.e. CouchDB, MongoDB, RavenDB, Terrastore) with their schema-less approach are addressing this issue.
- Don’t use ad hoc SQL queries as part of a production application.
Firstly this points to the mindset change required by the NoSQL space when doing data modeling: think about data access patterns.
Secondly, it pretty much validates CouchDB and RavenDB approaches of having queries defined upfront making their reads extremely fast.
Tuesday, 13 April 2010
HBase @ Adobe: An Interview with Adobe SaaS Infrastructure Team
About one month ago, the Adobe SaaS Infrastructure Team (ASIT) has published two excellent articles[1] on their experience and work with HBase. I had a chance to get into some more details with the team driving this effort — thanks a lot guys! — and here is the final result of our conversation:
myNoSQL: It looks like during your evaluation phase[2], you’ve only considered column-based solutions (Cassandra, HBase, Hypertable). Would you mind describing some of your requirements that lead you to not consider others?
ASIT: There weren’t many solutions that could respond to our needs in 2008 (approximately 1 year before the NoSQL movement started).
We considered other solutions as well (e.g. memcachedb, Berkeley DB), but we only evaluated the three you mentioned.
We had two divergent requirements: dynamic schema and huge number of records.
We had to provide a data storage service that could handle 40 Million entities, each with many different 1-N relations. These required real-time read/write access. Due to performance reasons, we used to denormalize data to avoid joins. At the same time we had to offer schema flexibility (clients could add their custom metadata), and this was relatively weird to do in MySQL. In the columnar model you could dynamically and easily add new columns.
Every few hours data had to be aggregated in order to compute some sort of billboards and other statistics (client facing, not internal usage information). We were familiar with Hadoop / Map-Reduce, and we were looking for something that could distribute the processing jobs.
myNoSQL: You also seemed to be concerned by data corruption scenarios — this being mentioned in both articles[3] — and that sounds like a sign of HBase not being stable enough. Is that the case? Or was it something else to lead you to dig deeper in this direction?
ASIT: Although HBase stability was initially unknown to us (we didn’t have experience running the system in production), we would have went through the same draconic testing with any other system. We promised our clients that we won’t lose or corrupt their data so this criteria is paramount.
In a relational database you have referential integrity, databases rarely get corrupted and there are multiple ways to backup your data and restore it. Moreover, you understand where the data is, and know what the filesystem guarantees are. Many times there are teams that deal with backups and database operations for you (not our case however).
When you go with a different model, and you have huge volumes of data, it becomes very hard to back it up frequently and restore it fast. With HBase and Hadoop under active development we need to make sure all the building blocks are “safe”. Our clients have different requirements, but there is a very important common one: data safety.
myNoSQL: The whole setup for being able to move from HBase to MySQL and back sounds really interesting. Could you expand a bit on what tools have you used?
ASIT: We built most of the system in-house. Everything was running on a bunch of cron jobs. The system would:
- Export the actual HBase tables to HDFS, using map-reduce. This also performed data integrity checks.
- We had two backup “end-points”. Once the data was exported to HDFS, we compressed it and pushed it into another distributed file system.
- Another step was to create a MySQL backup out of it and import it on a stand-by MySQL cluster that could act as failover storage. The MySQL failover didn’t have any processing capabilities, so it was running in “degraded mode”, but it was good enough for a temporary solution.
We had the same thing running “in reverse”. The MySQL backup was kept in more places, and it could be imported back in HBase.
On top of this we had monitoring so whenever something went wrong we got alerted. We also had variable backup frequency (keep 1 for each of last 6 months, one for each last 4 weeks, and the last 7 days, also 2 hourly level) to help with disk utilization.
This was supposed to be a temporarily solution, because beyond a certain threshold the data would have gotten to big to fit in MySQL.
Today we look at multi-site data replication as a way of backing up large amounts of data in a scalable manner and soon HBase will support this. We cannot do failover to a MySQL system with our current data volume :).
myNoSQL: Based on the two articles I have had a hard time deciding if the system is currently in production or not. Is it? In case it is, how many ops people are responsible for managing your HBase cluster?
ASIT: There is actually more than one cluster, one is currently in production and the larger one we’re mentioning in the article will enter production soon. This is going to sound controversial, but we - the team - are solely responsible with managing our clusters. Of course, someone else is dealing with the basics: rack&stack, replacing defective hardware, prepping new servers and making sure we are allocated the right IP addresses.
We believe in the dev-ops movement. We automate everything and do monitoring so the operations work for the production system is really low. Last time we had to intervene was due to a bug in the backup frequency algorithm (we exceeded our quota, and we had alerts pouring in).
For full disclosure: These are not business critical systems, today. At some point we will need a dedicated operations team but we’ll still continue to work on the entire vertical, from the feature development part down to actual service deployment and monitoring.
myNoSQL: You mention that you are developing against the HBase trunk. That could mean that you are updating your cluster quite frequently. How do you achieve this without compromising uptime?
ASIT: This is the area we’re starting to focus more in the same manner we did with data integrity.
The current production cluster runs on HBase and Hadoop 0.18.X. Since the first time it failed in December 2008 we didn’t have any downtime, but we didn’t upgrade it either as we envisioned that the new system would replace it completely at some point.
Currently it’s possible to upgrade a running cluster one machine at a time, without downtime if RPC compatibility is maintained between the builds. Major upgrades that impact the RPC interface or data-model could be done without downtime using multi-site replication, but we haven’t done it yet. However in order to have something solid we would need rolling upgrades capabilities from both HDFS and HBase. It theory, this will be possible once they migrate the current RPC to Avro and enable RPC versioning. In the meantime we make sure our clients understand that we might have downtime when upgrading. We’ll probably contribute to the rolling upgrades features as well.
myNoSQL: Based on the following quote from your articles, I’d be tempted to say that some of the requirements were more about ACID guarantees. How would you comment?
So, for us, juggling with Consistency, Availability and Partition tolerance is not as important as making sure that data is 100% safe.
ASIT: It boils down to the requirements of the applications that run on top of your system. If you provide APIs that need to be fully consistent you have to be able to ensure that in every layer of the system. Some of our clients do need full consistency.
Another reason why we need full consistency is to be able to correctly build higher level features such as distributed transactions and indexing. These go hand in hand and are a lot easier to implement when the underlaying system can provide certain guarantees. It’s much like the way Zookeeper relies on TCP for data transmission guarantees, rather than implementing it from scratch.
We’re not consistency “zealots” and understand the benefits of eventual consistency like increased performance and availability in some scenarios. A good example is how we’ll deal with multi-site replication. It’s just that most of our use-cases fit better over a fully consistent system.
myNoSQL: There are a couple more related to the Full consistency section, but I don’t want to flood you with too many questions.
ASIT: This is indeed an interesting topic and we’ll probably post more about it on ☞ http://hstack.org.
myNoSQL: Thanks a lot guys!
References
- [1] The two articles referenced from this interview are ☞ Why we’re using HBase part 1 and ☞ Why we’re using HBase part 2. (↩)
-
[2]
According to the ☞ first part: (↩)
There were no benchmarking reports then, no “NoSQL” moniker, therefore no hype :). We had to do our own objective assessment. The list was (luckily) short: HBase, Hypertable and Cassandra were on the table. MySQL was there as a last resort.
-
[3]
Just some fragments mentioning data integrity/corruption (↩)
We had to be able to detect corruption and fix it. As we had an encryption expert in the team (who authored a watermarking attack), he designed a model that would check consistency on-the-fly with CRC checksums and allow versioning. The thrift serialized data was wrapped in another layer that contained both the inner data types and the HBase location (row, family and qualifier). (He’s a bit paranoid sometimes, but that tends to come in handy when disaster strikes :). Pretty much what Avro does.
Most of our development efforts go towards data integrity. We have a draconian set of failover scenarios. We try to guarantee every byte regardless of the failure and we’re committed to fixing any HBase or HDFS bug that would imply data corruption or data loss before letting any of our production clients write a single byte.
Thursday, 25 February 2010
HBase Secondary Indexes
I have spent some time to understand the complex solution for HBase secondary indexes suggested ☞ over here. As I pointed out in that post comment thread, I do see a few major drawbacks to this approach. Anyway, now that the code seem to have been made ☞ available, I expect more experienced HBase users will take a look at it and agree or disagree with its approach.
Meanwhile, I get the feeling that this ☞ other solution might be better as it is built on HBase API and not trying to trick HBase behavior.
Expert opinions?
Update: Bruno Dumon is pointing out in the comments below that the two solutions are in fact pretty similar and that “my indexing package basically goes more into detail on that aspect: generating appropriate index row keys, while ignoring how updates should be pushed to the index (I’m thinking of some scalable queue solution for this)”.
Tuesday, 23 February 2010
Cassandra @ Twitter: An Interview with Ryan King
There have been confirmed rumors[1] about Twitter planning to use Cassandra for a long time. But except the mentioned post, I couldn’t find any other references.
Twitter is fun by itself and we all know that NoSQL projects love Twitter. So, imagine how excited I was when after posting about Cassandra 0.5.0 release, I received a short email from Ryan King, the lead of Cassandra efforts at Twitter simply saying that he would be glad to talk about these efforts.
So without further ado, here is the conversation I had with Ryan King (@rk) about Cassandra usage at Twitter:
MyNoSQL: Can you please start by stating the problem that lead you to look into NoSQL?
Ryan King: We have a lot of data, the growth factor in that data is huge and the rate of growth is accelerating.
We have a system in place based on shared mysql + memcache but its quickly becoming prohibitively costly (in terms of manpower) to operate. We need a system that can grow in a more automated fashion and be highly available.
MyNoSQL: I imagine you’ve investigated many possible approaches, so what are the major solutions that you have considered?
Ryan King:
- A more automated sharded mysql setup
- Various databases: HBase, Voldemort, MongoDB, MemcacheDB, Redis, Cassandra, HyperTable and probably some others I’m forgetting.
MyNoSQL: What kind of tests have you run to evaluate these systems?
Ryan King: We first evaluated them on their architectures by asking many questions along the lines of:
- How will we add new machines?
- Are their any single points of failure?
- Do the writes scale as well?
- How much administration will the system require?
- If its open source, is there a healthy community?
- How much time and effort would we have to expend to deploy and integrate it?
- Does it use technology which we know we can work with? *… and so on.
Asking these questions narrowed down our choices dramatically. Everything but Cassandra was ruled out by those questions. Given that it seemed to be our best choice, we went about testing its functionality (“can we reasonably model our data in this system?”) and load testing.
The load testing mostly focused on the write-path. In the medium/long term we’d like to be able to run without a cache in front of Cassandra, but for now we have plenty of memcache capacity and experience with scaling traffic that way.
MyNoSQL: If you draw a line, what were the top reasons for going with Cassandra?
Ryan King:
- No single points of failure
- Highly scalable writes (we have highly variable write traffic)
- A healthy and productive open source community
MyNoSQL: Will Cassandra completely replace the current solution?
Ryan King: Over time, yes. We’re currently moving our largest (and most painful to maintain) table — the statuses table, which contains all tweets and retweets. After this we’ll start putting some new projects on Cassandra and migrating other tables.
MyNoSQL: How do you plan to migrate existing data?
Ryan King: We have a nice system for dynamically controlling features on our site. We commonly use this to roll out new features incrementally across our user base. We use the same system for rolling out new infrastructure.
So to roll out the new data store we do this:
- Write code that can write to Cassandra in parallel to Mysql, but keep it disabled by the tool I mentioned above
- Slowly turn up the writes to Cassandra (we can do this by user groups “turn this feature on for employees only” or by percentages “turn this feature on for 1.2% of users”)
- Find a bug :)
- Turn the feature off
- Fix the bug and deploy
- GOTO #2
Eventually we get to a point where we’re doing 100% doubling of our writes and comfortable that we’re going to stay there. Then we:
- Take a backup from the mysql databases
Run an importer that imports the data to cassandra
Some side notes here about importing. We were originally trying to use the
BinaryMemtable[2] interface, but we actually found it to be too fast — it would saturate the backplane of our network. We’ve switched back to using the Thrift interface for bulk loading (and we still have to throttle it). The whole process takes about a week now. With infinite network bandwidth we could do it in about 7 hours on our current cluster.Once the data is imported we start turning on real read traffic to Cassandra (in parallel to the mysql traffic), again by user groups and percentages.
- Once we’re satisfied with the new system (we’re using the real production traffic with instrumentation in our application to QA the new datastore) we can start turning down traffic to the mysql databases.
A philosophical note here — our process for rolling out new major infrastructure can be summed up as “integrate first, then iterate”. We try to get new systems integrated into the application code base as early in their development as possible (but likely only activated for a small number of people). This allows us to iterate on many fronts in parallel: design, engineering, operations, etc.
MyNoSQL: Please include anything I’ve missed.
Ryan King: I can’t really think of anything else.
MyNoSQL: Thank you very much!
References
- [1] The ☞ Up and running with Cassandra is probably one of the most detailed posts on the internet about Cassandra. MyNoSQL has also a post listing the best resources available for Cassandra documentation. It contains a fair amount of details about using Cassandra for storing Twitter data. (↩)
-
[2]
If you remember our post Cassandra in Production @ Digg, Digg is using Hadoop and Cassandra’s
BinaryMemtableto load preserialized data. (↩)
Thursday, 11 February 2010
Why doesn't disk usage immediately decrease when I remove data in Cassandra?
Jonathan Ellis (@spyced) explains the complexity of performing a delete operation in a distributed, eventually consistent system and how Cassandra deals with this operation.
Thus, a delete operation can’t just wipe out all traces of the data being removed immediately […] So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request. […] Cassandra does what distributed systems designers frequently do when confronted with a problem we don’t know how to solve: define some additional constraints that turn it into one that we do. Here, we defined a constant,
GCGraceSeconds, and had each node track tombstone age locally.
The post also includes some details about how Cassandra is dealing with eventual consistency by supporting hinted handoff ☞, read repair ☞ and anti entropy ☞ for reducing the inconsistency window.
via: http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
Wednesday, 27 January 2010
A great week for the column stores: Cassandra and HBase New Releases
I think this is a great week for the NoSQL column stores, marking new releases from both of the most important column-based solutions we are tracking here on MyNoSQL: Cassandra and HBase.
While their releases went pretty quiet, both Cassandra 0.5.0 and HBase 0.20.3 contain a ton of goodies that MyNoSQL tried to track:
I really hope that both Cassandra and HBase communities will start being more “public” about their work and will give us a chance to read about them more often here on MyNoSQL.
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling