NoSQL theory: All content tagged as NoSQL theory in NoSQL databases and polyglot persistence
The 5 key elements for a firehose data system as per a presentation by Josh Berkus, CEO of PostgreSQL Experts Inc. summarized by Brian Proffitt on ITworld:
- Queuing software to manage out-of-sequence data
- Buffering techniques to deal with component outages
- Materialized views that update data into aggregate tables
- Configuration management for all the systems in the solution
- Comprehensive monitoring to look for failures
Basically firehose data systems are the perfect showcase of the 4 V’s in Big Data. To get an idea of the complexity involved by such systems check the DataSift architecture which relies on MySQL, HBase, Memcached, Redis, Kafka to deal just1 with the Twitter firehose.
While Twitter is a high volume service, the Internet of Things or the Sensor networks are producing much much more data. ↩
Original title and link: 5 Key Elements for a Firehose Data System ( ©myNoSQL)
Werner Vogels in the post about Amazon DynamoDB:
We had been pushing the scalability of commercially available technologies to their limits and finally reached a point where these third party technologies could no longer be used without significant risk. This was not our technology vendors’ fault; Amazon’s scaling needs were beyond the specs for their technologies and we were using them in ways that most of their customers were not. A number of outages at the height of the 2004 holiday shopping season can be traced back to scaling commercial technologies beyond their boundaries.
Here is what I wrote about the history behind NoSQL databases:
Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.
It feels rewarding to get such confirmation from people that are at the forefront of NoSQL.
Original title and link: The History of NoSQL: This Was Not Our Technology Vendors’ Fault ( ©myNoSQL)
After reading about MarkLogic Packaging feature, I was wondering if managing configurations would not be better done with tools like Puppet or Chef instead of a custom built solution even if it comes packaged with your NoSQL database.
- You’ve been working on an application on your development machine. Now it’s time to move your application to the staging or testing servers. What follows is a tedious process of reviewing the settings on your development machine and applying them to the staging machine. How sure are you that you got all the indexes just right?
- You’ve got a certified configuration that you want to deploy onto a new cluster. Getting the hardware setup and installing the server itself isn’t too hard, but now you have to make sure that all the application servers and databases are setup. Can you see another tedious process coming?
If you’ve been involved or responsible for managing the configuration of a NoSQL database deployment, I’d really love to learn what solution and tools have been used.
Original title and link: NoSQL Databases Configuration Management ( ©myNoSQL)
How many times have you got an answer that applies to your specific scenario when providing a short list of performance and scalability requirements? MySQL/InnoDB can do 750k qps, Cassandra is scaling linearly, MongoDB can do 8 mil ops/s. Is any of these the answer for your application?
How many times did you get all the requirements right at the spec time?
How many times did requirements remain the same during the development cycle?
How many times did production reality confirmed your bullet list requirements?
Original title and link: Asking for Performance and Scalability Advice on StackOverflow ( ©myNoSQL)