Thursday, 2 May 2013
Banks and the Ethernal Consistency Example or What trumps consistency
Todd Hoff extracts and expands on some thoughts about BASE vs ACID from Eric Brewer’s NoSQL: Past, Present, Future published on InfoQ:
Consistency it turns out is not the Holy Grail. What trumps consistency is:
- Auditing
- Risk Management
- Availability
But the cornerstone of the availability vs consistency conversation is:
Availability correlates with revenue and consistency generally does not.
✚ Over time Michael Stonebraker has been the most prominent supporter of exactly the opposite argument.
✚ Remember Emin Gün Sirer’s The NoSQL Partition Tolerance Myth? He used the bank example too.
Original title and link: Banks and the Ethernal Consistency Example or What trumps consistency (©myNoSQL)
Wikipedia Adopts MariaDB
The technical details of Wikipedia’s migration from MySQL to MariaDB:
As a read-heavy site, Wikipedia aggressively uses edge caching. Approximately 90% of pageviews are served entirely from the edge while at the application layer, we utilize both memcached and redis in addition to MySQL. Despite that, the MySQL databases serving English Wikipedia alone reach a daily peak of ~50k queries/second. Most are read queries served by load-balanced slaves, depending on consistency requirements. 80% of the English Wikipedia query load (up to 40k qps) are typically handled by just two database servers at any given time. Our most common query type (40% of all) has a median execution time of ~0.2ms and a 95th percentile time of ~50ms. To successfully use MariaDB in production, we need it to keep up with the level of performance obtained from Facebook’s MySQL fork, and to behave consistently as traffic patterns change.
As you can see in this post, the only “political” point made is hidden within true reasons:
Equally important, as supporters of the free culture movement, the Wikimedia Foundation strongly prefers free software projects; that includes a preference for projects without bifurcated code bases between differently licensed free and enterprise editions. We welcome and support the MariaDB Foundation as a not-for-profit steward of the free and open MySQL related database community.
Slightly different to Wikipedia Migrates to MariaDB.
Original title and link: Wikipedia Adopts MariaDB (©myNoSQL)
via: https://blog.wikimedia.org/2013/04/22/wikipedia-adopts-mariadb/
MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service
Cloud and MySQL related:
We are deeply sorry to announce that Xeround’s public cloud offering will be discontinued soon. All Xeround FREE database instances will be terminated on May 8th, and the paid plans terminated on May 15th.
This was announced on May 1st.
✚ This only means more for Amazon RDS.
Original title and link: MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service (©myNoSQL)
via: http://xeround.com/blog/2013/05/discontinuing-of-xeround-cloud-database-public-service
Microsoft Azure Sales Top $1 Billion Challenging Amazon
Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.
Interesting numbers:
- market share: Amazon Web Services 71%, Microsoft Azure 20%
- Azure grew 48% in the last 6 months
- Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
- Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).
Amazon hasn’t given many details about the AWS platform, except 3 numbers:
- number of objects stored in S3. This has been doubling every year for the last 4 years
- Q4 2012: 1.3trillions
- Q3 2011: 566b
- Q4 2010: 262b
- Q4 2009: 102b
- Q4 2008: 40b
- Q4 2007: 14b
- Q4 2006: 2.9b
- number of requests per second AWS
- number of EMR clusters (?) spun
According to some slides from last October/November:
- S3 stored over 1.3 trillion objects
- AWS handles over 830k requests/s
- 3.7mil EMR clusters spun since 2010
While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.
✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.
Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (©myNoSQL)
Wednesday, 1 May 2013
Wikipedia Migrates to MariaDB... but facts are facts
Jon Buys:
There was, and continues to be, concern over Oracle’s treatment of the open source competitor to their own Oracle database. I personally have wondered what motivation, if any, Oracle has to maintain MySQL. They may simply be milking the revenue stream created by MySQL AB until the well goes dry. Since MariaDB is surpassing MySQL in performance and community goodwill, that day may come sooner rather than later.
A couple of little known things:
- Oracle has been house for InnoDB since 2005. InnoDB was and continues to be the default, recommended engine for MySQL. Before and after Oracle acquired MySQL through Sun Microsystems.
- Oracle has been house for Sleepycat’s BerkleyDB since 2006. Those products are definitely not dead. Community-wise maybe they haven’t put much effort into extending it.
Facts are facts.
Original title and link: Wikipedia Migrates to MariaDB… but facts are facts (©myNoSQL)
US patent office embraces MarkLogic
It looks like the US Patent and Trademark Office will, at least, get some better search functionality across its database:
Now, MarkLogic, an a NoSQL database provider specializing in MXL data services, is working with the US Patent and Trademark Office (USPTO) in an effort to make applications easier to complete, and make the review process easier to do. XML is one way to render different types of information like word documents, PDFs, and images into a central database and present the information back to users. Previously, if an individual or company wanted to apply for a patent or trademark, they had to look through paper manuals and applications to understand how to present their unique inventions for review.
I just hope MarkLogic could implement some sort of triggers or rules that would deny completely unreasonable patents.
Original title and link: US patent office embraces MarkLogic (©myNoSQL)
via: http://civsourceonline.com/2013/04/25/us-patent-office-embraces-big-data/
Tuesday, 30 April 2013
Hadoop Drives Down Costs
Darryl K. Taft reporting the experience of using Hadoop at UC Irvine Medical Center:
Because they were bleeding money, the team wanted a cost-effective solution. “Our target was $500 per terabyte. We were at $100,000 per terabyte with the old system,” Peterson said. “With our Hadoop cluster, we’re now at $900 per terabyte.”
How are these costs calculated?
- Fixed costs: hardware, any one time licenses
- Recurring costs: hardware replacement, energy, HR
Is this all?
Original title and link: Hadoop Drives Down Costs (©myNoSQL)
Cloudera Impala 1.0 Release Notes and A Couple of Questions
This is what I’ve been looking for since posting about Impala 1.0: the release notes. From the new features list:
- support for
ALTER TABLE REFRESHfor a single table- Hints for specifying particular join strategies
- Dynamic resource management, allowing high concurrency for Impala queries
Question: if I remember correctly Impala uses a single process on each machine to execute queries.
- is it multi-threaded?
- does it do any memory/CPU management so one query is not completely exhausting any of these resources?
- what happens with the queries executing when this process fails?
Original title and link: Cloudera Impala 1.0 Release Notes and A Couple of Questions (©myNoSQL)
Cloudera Impala Brings SQL Querying To Hadoop
InformationWeek about today’s Impala 1.0 release:
Impala supports direct querying of data in the Hadoop Distributed File System (HDFS) and HBase (NoSQL database) indexes, and Cloudera claims it’s 3X to 30X faster than Hive. Beta customers report results that are falling into that range. Six3 Systems, for example, a systems integrator serving federal agencies, has seen at least 14X faster querying than Hive, according to analytics developer Wayne Wheeles.
Original title and link: Cloudera Impala Brings SQL Querying To Hadoop (©myNoSQL)
Impala 1.0 - That was fast
Cloudera announces Impala 1.0 GA release.
That was fast—I guess this is one of the (little) advantages of having Hortonworks working on Stinger, Pivotal on HAWQ, Qubole offering Hive, Pig and Sqoop as-a-Service
Original title and link: Impala 1.0 - That was fast (©myNoSQL)
Redis on Windows Stress Tests
Claudio Caldato1 reports on the advance the Microsoft team is making towards releasing a stable, (very well) tested version of Redis for Windows:
In phase I of our stress testing, we put Redis on Windows through various tests with execution times ranging from 1 to 16 days, and configurations ranging from a simple single-master setup to more complex configurations such as the one shown below, with one master and four replicas.
The team also published the details of their stress tests here
-
Claudio Caldato is Principal Program Manager Lead in Microsoft Open Technologies team. ↩
Original title and link: Redis on Windows Stress Tests (©myNoSQL)
Data Science of the Facebook World
This long post from Stephen Wolfram is a true display of the fascination of data. Even if you’ll get no real data out of it, read it as a lesson on how to play, display, and interpret data.
Original title and link: Data Science of the Facebook World (©myNoSQL)
via: http://blog.stephenwolfram.com/2013/04/data-science-of-the-facebook-world/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling