Monday, 20 May 2013
MySQL 5.6, InnoDB and fast storage: 240k QPS
Mark Callaghan runs some benchmarks against MySQL 5.6.11:
Using MySQL 5.6.11 and InnoDB with a few hacks the peak throughput was about 240,000 QPS and 210,000 block reads/second. The test server has 32 cores (16 physical cores, 32 logical cores with HT enabled). This is a great result that can probably be even better. Contention on fil_system->mutex was the bottleneck and I think that can be improved (see feature request #69276). I wonder if 400,000 block reads/second is possible?
Original title and link: MySQL 5.6, InnoDB and fast storage: 240k QPS (©myNoSQL)
via: http://mysqlha.blogspot.com/2013/05/mysql-56-innodb-and-fast-storage.html
Wednesday, 15 May 2013
MongoLab offers MongoDB on Google Cloud Platform
This was fast:
This week at Google I/O we are launching support for MongoLab‘s fifth cloud provider – Google Cloud Platform. You can now use MongoLab to provision and manage MongoDB deployments on Google Compute Engine (GCE)!
Good move for MongoLab and good win for MongoDB users. I’ve read a lot of good things about Google’s Cloud Platform.
Original title and link: MongoLab offers MongoDB on Google Cloud Platform (©myNoSQL)
via: http://blog.mongolab.com/2013/05/mongolab-now-supports-google-cloud-platform/
Introducing Google Cloud Datastore
Urs Hölzle in a post summarizing some of the announcements at Google I/O:
Google Cloud Datastore is a fully managed and schemaless solution for storing non-relational data. Based on the popular App Engine High Replication Datastore, Cloud Datastore is a standalone service that features automatic scalability and high availability while still providing powerful capabilities such as ACID transactions, SQL-like queries, indexes and more.
I’m heading over to the project’s site to read more.
Original title and link: Introducing Google Cloud Datastore (©myNoSQL)
via: http://googlecloudplatform.blogspot.com/2013/05/ushering-in-next-generation-of.html
Hadoop for graphs - GraphLab picks up $6.75m from Madrona and NEA
Robin Wauters for TNW:
Seattle startup GraphLab claims it is building the “fastest machine-learning analytics engine for graph datasets”, based on the popular open-source distributed graph computation framework with the same name, and it has just raised capital to come through on its promise.
Good luck to GraphLab’s team.
✚ Here’s a short list of MapReduce implementations for graphs.
Original title and link: Hadoop for graphs - GraphLab picks up $6.75m from Madrona and NEA (©myNoSQL)
via: http://thenextweb.com/insider/2013/05/14/graphlab-funding/
Tuesday, 14 May 2013
Hadoop, Moh's Law and Corollaries
Robert Novak’s proposes Moh’s law and Rob’s corollaries to Hadoop and Big Data:
- Hadoop is hard.
- Make sure your’re measuring what you think you’re measuring.
- Make sure you’re measuring what you need to be measuring.
For the first, I’m somehow confident that Cloudera and Hortonworks and others will finally solve it over time. But for the latter you are the only responsible. Not even a SaaS can save you.
Original title and link: Hadoop, Moh’s Law and Corollaries (©myNoSQL)
via: https://rsts11.wordpress.com/2013/05/14/mohs-law-and-big-data-rsts11/
This is why big data is the sweet spot for SaaS … and here are 5 reasons why it is Not
Derrick Harris in an article about some SEO-as-a-Service company I haven’t heard about:
People often ask me where the smart money is in big data. I often tell them that’s a foolish question, because I’m not an investor — but if I were, I’d look to software as a service.
There are two primary reasons why, the first of which is obvious: Companies are tired of managing applications and infrastructure, so something that optimizes a common task using techniques they don’t know on servers they don’t have to manage is probably compelling. It’s called cloud computing.
The other reason is that the big part of big data really is important if you want to get a really clear picture of what’s happening in any given space. While no single end-user company can (or likely would) address search-engine optimization, for example, by building a massive store comprised of data from hundreds or thousands of companies as well as the entire web, a cloud service dedicated to that specific task can.
These are obvious advantages of moving the responsibility to a third party service. But I don’t believe SaaS is the future of big data and here’s why big data is not the sweet spot of SaaS:
- a SaaS solution is good at a particular job, but it’s rarely the case that particular job is answering all your company questions and reveal the insights in your data. SaaS solutions will tell you want they, not you, think is important about your data.
- the promise of a SaaS solution to give you access to more aggregate data sounds wrong. Big data is mostly about your data and each customer will have access to their own slices. Indeed a SaaS solution could augment your data with open data or extra data you’d need to pay for.
- transporting your data to each SaaS to answer every question your company has is extremely expensive. If possible.
- the nature and form of the questions big data tries to answer is changing. SaaS services will not adapt as fast as you want to the range and depth you need.
- having your data in different SaaS solutions is just equivalent to having it in different internal silos. Except you’d pay someone else to protect the silo. The costs of breaking these silos will be much, much higher, so long term you might actually find a real reason why you cannot analyze your data.
Big Data is about agility. It’s about experiments. It’s trial and error. SaaS is about none of these when speaking years and years of data.
Original title and link: This is why big data is the sweet spot for SaaS … and here are 5 reasons why it is Not (©myNoSQL)
Monday, 13 May 2013
Even web giants like Facebook and Yahoo generally aren’t dealing with big data
Even web giants like Facebook and Yahoo generally aren’t dealing with big data, and the application of Google-style tools is inappropriate.
Facebook and Yahoo run their own giant, in-house “clusters”—collections of powerful servers—for crunching data. The necessity of these clusters is one of the hallmarks of big data. After all, data isn’t all that “big” if you could chew through it on your PC at home. The necessity of breaking problems into many small parts, and processing each on a large array of computers, characterizes classic big data problems like Google’s need to compute the rank of every single web page on the planet.
But it appears that for both Facebook and Yahoo, those same clusters are unnecessary for many of the tasks which they’re handed.
I guess we need some sort of “big journalism” sooner rather than later.
Original title and link: Even web giants like Facebook and Yahoo generally aren’t dealing with big data (©myNoSQL)
via: http://qz.com/81661/most-data-isnt-big-and-businesses-are-wasting-money-pretending-it-is/
What Open Source Hadoop Coming to Windows Means to IT
This will open up Hadoop to a large number of organizations that have no in- house Linux skills. Shaun Connolly, vice president of Corporate Strategy at Hortonworks, explains the thinking behind moving HDP to Windows in this way: “Essentially it’s a market-driven decision,” he says. “Hadoop is built for the scaleout commodity hardware market, and the commodity hardware market is 70% Windows by install base and expertise.”
Employees in Windows-only companies will be able to make use of Hadoop easily because Excel can be used as a business intelligence tool to view the results of Hadoop Big Data analysis (whether Hadoop is running on Windows or Linux). “Ideally we want Microsoft users to be oblivious to the fact that everything is coming from Hadoop,” says Connolly. “If end users can consume data without any learning curve, thanks to tools like Excel, then they get more value.”
Either the data or the logic above is not sound:
- those Windows machines that make up the 70% of the market are probably running Excel
- those 70% of the market Windows machines are not going to run Hadoop
Based on this sort of market-share decisions, tomorrow we should see Hadoop for iOS and Android and Nokia. Sometime soon Microsoft will release Excel for iOS and maybe Android.
Original title and link: What Open Source Hadoop Coming to Windows Means to IT (©myNoSQL)
via: http://www.cio.com/article/733260/What_Open_Source_Hadoop_Coming_to_Windows_Means_to_IT
MetLife uses MongoDB
InformationWeek, in an article about MetLife migrating to MongoDB:
“We had 60 different teams working together as one group, and they were working nights and weekends not because they had to but because they were excited and wanted to,” says Gary Hoberman, MetLife’s senior VP and CIO of regional application development.
Just imagine how many nights and weekends and holidays these guys would put in if allowed to use an IDE. Like vim or emacs.
Original title and link: MetLife uses MongoDB (©myNoSQL)
Bootstrapping Neo4j With Spring-Data...without XML
The emphasis is on without XML:
With the maturing of Spring-Data I started porting all my personal projects to use Spring Data for bootstrapping.
Quite a bit of annotations needs, but I’d go with that instead of XML.
Original title and link: Bootstrapping Neo4j With Spring-Data…without XML (©myNoSQL)
via: http://codepitbull.wordpress.com/2013/05/12/bootstrapping-neo4j-with-spring-data-without-xml/
Cloudera Announces Cloudera Developer Kit, Enabling Developers to Build Hadoop Apps Faster
I didn’t know what to think of this announcement after reading the WSJ title . After checking the project GitHub page, I still don’t know what to make of it.
Original title and link: Cloudera Announces Cloudera Developer Kit, Enabling Developers to Build Hadoop Apps Faster (©myNoSQL)
Monday, 6 May 2013
Advantages of developing NoSQL applications on .NET platforms using FatDB [sponsor]
Words from this week’s sponsor, FatCloud:
FatDB is a full implementation of NoSQL databases for Windows .Net development, extending database functionality by integrating a Map Reduce work queue, file management system, a high speed cache, and application services. Therefore, FatDB is uniquely suited to as a platform to construct applications that are scalable, reliable, responsive to market changes, and cost effective. FatDB enables powerful, scalable applications providing the agility and performance required through:
- Reduces complexity. Applications are developed faster.
- Increases elasticity. Applications can quickly respond to shifts in demands.
- Portability. Applications can move to the cloud and back.
From these operating factors, FatDB is ideally suited for:
- Mobile. Great when trying to accommodate unpredictable usage, requiring applications to be elastics to cope with changes in demand.
- Financial Services. Financial applications requiring real-time data access with extremely high availability.
- E-Commerce. Provides flexible data structures to capitalize on new market opportunities.
- Manufacturing. Systems must respond against peak production, providing insight into trends and feedback mechanics.
Simply, FatDB can help you develop NoSQL applications in .Net with less effort and significantly less cost, higher quality and performance, for demanding cloud-based applications. Download a free Developer’s edition at FatCloud.
Original title and link: Advantages of developing NoSQL applications on .NET platforms using FatDB [sponsor] (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling