About one month ago, the Adobe SaaS Infrastructure Team (ASIT) has published two excellent articles on their experience and work with HBase. I had a chance to get into some more details with the team driving this effort — thanks a lot guys! — and here is the final result of our conversation:
myNoSQL: It looks like during your evaluation phase, you’ve only considered column-based solutions (Cassandra, HBase, Hypertable). Would you mind describing some of your requirements that lead you to not consider others?
ASIT: There weren’t many solutions that could respond to our needs in 2008 (approximately 1 year before the NoSQL movement started).
We considered other solutions as well (e.g. memcachedb, Berkeley DB), but we only evaluated the three you mentioned.
We had two divergent requirements: dynamic schema and huge number of records.
We had to provide a data storage service that could handle 40 Million entities, each with many different 1-N relations. These required real-time read/write access. Due to performance reasons, we used to denormalize data to avoid joins. At the same time we had to offer schema flexibility (clients could add their custom metadata), and this was relatively weird to do in MySQL. In the columnar model you could dynamically and easily add new columns.
Every few hours data had to be aggregated in order to compute some sort of billboards and other statistics (client facing, not internal usage information). We were familiar with Hadoop / Map-Reduce, and we were looking for something that could distribute the processing jobs.
myNoSQL: You also seemed to be concerned by data corruption scenarios — this being mentioned in both articles — and that sounds like a sign of HBase not being stable enough. Is that the case? Or was it something else to lead you to dig deeper in this direction?
ASIT: Although HBase stability was initially unknown to us (we didn’t have experience running the system in production), we would have went through the same draconic testing with any other system. We promised our clients that we won’t lose or corrupt their data so this criteria is paramount.
In a relational database you have referential integrity, databases rarely get corrupted and there are multiple ways to backup your data and restore it. Moreover, you understand where the data is, and know what the filesystem guarantees are. Many times there are teams that deal with backups and database operations for you (not our case however).
When you go with a different model, and you have huge volumes of data, it becomes very hard to back it up frequently and restore it fast. With HBase and Hadoop under active development we need to make sure all the building blocks are “safe”. Our clients have different requirements, but there is a very important common one: data safety.
myNoSQL: The whole setup for being able to move from HBase to MySQL and back sounds really interesting. Could you expand a bit on what tools have you used?
ASIT: We built most of the system in-house. Everything was running on a bunch of cron jobs. The system would:
- Export the actual HBase tables to HDFS, using map-reduce. This also performed data integrity checks.
- We had two backup “end-points”. Once the data was exported to HDFS, we compressed it and pushed it into another distributed file system.
- Another step was to create a MySQL backup out of it and import it on a stand-by MySQL cluster that could act as failover storage. The MySQL failover didn’t have any processing capabilities, so it was running in “degraded mode”, but it was good enough for a temporary solution.
We had the same thing running “in reverse”. The MySQL backup was kept in more places, and it could be imported back in HBase.
On top of this we had monitoring so whenever something went wrong we got alerted. We also had variable backup frequency (keep 1 for each of last 6 months, one for each last 4 weeks, and the last 7 days, also 2 hourly level) to help with disk utilization.
This was supposed to be a temporarily solution, because beyond a certain threshold the data would have gotten to big to fit in MySQL.
Today we look at multi-site data replication as a way of backing up large amounts of data in a scalable manner and soon HBase will support this. We cannot do failover to a MySQL system with our current data volume :).
myNoSQL: Based on the two articles I have had a hard time deciding if the system is currently in production or not. Is it? In case it is, how many ops people are responsible for managing your HBase cluster?
ASIT: There is actually more than one cluster, one is currently in production and the larger one we’re mentioning in the article will enter production soon. This is going to sound controversial, but we - the team - are solely responsible with managing our clusters. Of course, someone else is dealing with the basics: rack&stack, replacing defective hardware, prepping new servers and making sure we are allocated the right IP addresses.
We believe in the dev-ops movement. We automate everything and do monitoring so the operations work for the production system is really low. Last time we had to intervene was due to a bug in the backup frequency algorithm (we exceeded our quota, and we had alerts pouring in).
For full disclosure: These are not business critical systems, today. At some point we will need a dedicated operations team but we’ll still continue to work on the entire vertical, from the feature development part down to actual service deployment and monitoring.
myNoSQL: You mention that you are developing against the HBase trunk. That could mean that you are updating your cluster quite frequently. How do you achieve this without compromising uptime?
ASIT: This is the area we’re starting to focus more in the same manner we did with data integrity.
The current production cluster runs on HBase and Hadoop 0.18.X. Since the first time it failed in December 2008 we didn’t have any downtime, but we didn’t upgrade it either as we envisioned that the new system would replace it completely at some point.
Currently it’s possible to upgrade a running cluster one machine at a time, without downtime if RPC compatibility is maintained between the builds. Major upgrades that impact the RPC interface or data-model could be done without downtime using multi-site replication, but we haven’t done it yet. However in order to have something solid we would need rolling upgrades capabilities from both HDFS and HBase. It theory, this will be possible once they migrate the current RPC to Avro and enable RPC versioning. In the meantime we make sure our clients understand that we might have downtime when upgrading. We’ll probably contribute to the rolling upgrades features as well.
myNoSQL: Based on the following quote from your articles, I’d be tempted to say that some of the requirements were more about ACID guarantees. How would you comment?
So, for us, juggling with Consistency, Availability and Partition tolerance is not as important as making sure that data is 100% safe.
ASIT: It boils down to the requirements of the applications that run on top of your system. If you provide APIs that need to be fully consistent you have to be able to ensure that in every layer of the system. Some of our clients do need full consistency.
Another reason why we need full consistency is to be able to correctly build higher level features such as distributed transactions and indexing. These go hand in hand and are a lot easier to implement when the underlaying system can provide certain guarantees. It’s much like the way Zookeeper relies on TCP for data transmission guarantees, rather than implementing it from scratch.
We’re not consistency “zealots” and understand the benefits of eventual consistency like increased performance and availability in some scenarios. A good example is how we’ll deal with multi-site replication. It’s just that most of our use-cases fit better over a fully consistent system.
myNoSQL: There are a couple more related to the Full consistency section, but I don’t want to flood you with too many questions.
myNoSQL: Thanks a lot guys!
-  The two articles referenced from this interview are ☞ Why we’re using HBase part 1 and ☞ Why we’re using HBase part 2. (↩)
According to the ☞ first part: (↩)
There were no benchmarking reports then, no “NoSQL” moniker, therefore no hype :). We had to do our own objective assessment. The list was (luckily) short: HBase, Hypertable and Cassandra were on the table. MySQL was there as a last resort.
Just some fragments mentioning data integrity/corruption (↩)
We had to be able to detect corruption and fix it. As we had an encryption expert in the team (who authored a watermarking attack), he designed a model that would check consistency on-the-fly with CRC checksums and allow versioning. The thrift serialized data was wrapped in another layer that contained both the inner data types and the HBase location (row, family and qualifier). (He’s a bit paranoid sometimes, but that tends to come in handy when disaster strikes :). Pretty much what Avro does.
Most of our development efforts go towards data integrity. We have a draconian set of failover scenarios. We try to guarantee every byte regardless of the failure and we’re committed to fixing any HBase or HDFS bug that would imply data corruption or data loss before letting any of our production clients write a single byte.