Wednesday, 22 May 2013
What is Apache Bigtop?
The project founder, Roman Shaposhnik defining what is Apache Bigtop:
The elevator pitch for Bigtop has always been: Bigtop is to Hadoop what Debian is to Linux. The most surprising development to me was how well that message resonates with the commercial vendors in the Big Data space. I’m still amazed at how quickly the “Powered by Bigtop” list is growing.
Original title and link: What is Apache Bigtop? (©myNoSQL)
via: http://blog.cloudera.com/blog/2013/05/meet-the-project-founder-roman-shaposhnik/
Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL
Nokia’s big data ecosystem consists of a centralized, petabyte-scale Hadoop cluster that is interconnected with a 100-TB Teradata enterprise data warehouse (EDW), numerous Oracle and MySQL data marts, and visualization technologies that allow Nokia’s 60,000+ users around the world tap into the massive data store. Multi-structured data is constantly being streamed into Hadoop from the relational systems, and hundreds of thousands of Scribe processes run every day to move data from, for example, servers in Singapore to a Hadoop cluster in the UK. Nokia is also a big user of Apache Sqoop and Apache HBase.
In the coming years you’ll hear more often stories—sales pitches—about single unified platforms solving all these problems at once. But platforms that will survive and thrive are those that will accomplish two things:
- keep the data gates open: in and out.
- work with different other platform to make this efficiently for users
Original title and link: Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL (©myNoSQL)
Big Data Industry Atlas
Forbes published this chart based on Wikibon data:
It’s an $18 billion industry heading to $50 billion in five years, according to tech researchers at Wikibon. Make note of the names in the inner circle. They’re the pure plays with the newest science—and are likely to get gobbled up by the growth-hungry incumbents on the outside.
To save your eyes, in the inner circle:
- LucidWorks
- Datameer
- Kognitio
- Couchbase
- Basho
- Datastax
- Hortonworks
- Fractal Analytics
- Mapr
- Paraccel (nb: Paraccel has already been acquired by Actian)
- Guavus
- Alteryx
- 10gen
- 1010data
- Actian
- Cloudera
- Palantir
- MJ Sigma
- Opera Solutions
- Splunk
- Sisense
- Rainstor
- Calpoint
- Think Big Analytics
- Aerospike
- Digital Reasoning
The big data market is still shaping. But soon (not very soon though), we’ll see some clear segments with leaders and challengers. And then…, then we will see a lot of acquisitions and mergers.
Original title and link: Big Data Industry Atlas (©myNoSQL)
via: http://www.forbes.com/special-report/2013/industry-atlas.html
I expect ParAccel to fail too - Curt Monash
Curt Monash about the his prediction of ParAccel’s future failure:
Reasons include:
- ParAccel’s small market share and traction.
- The disruption of any acquisition like this one.
- My general view of Actian as a company.
One thing to keep in mind: ParAccel also hasAmazon as investors and it is in production behind the Amazon Redshift service. That’s a lot of visibility.
Original title and link: I expect ParAccel to fail too - Curt Monash (©myNoSQL)
via: http://www.dbms2.com/2013/04/25/goodbye-vectorwise-farewell-paraccel/
Tuesday, 21 May 2013
memcached turns 10 years old
Ars Technica:
This week, memcached, a piece of software that prevents much of the Internet from melting down, turns 10 years old. Despite its age, memcached is still the go-to solution for many programmers and sysadmins managing heavy workloads. Without memcached, Ars Technica would likely be unable to serve this article to you at all.
According to a commenter on HN, memcached’s 10th birthday would actually be tomorrow (May 22nd). I’m pretty sure it will outlive today.
✚ I’ve left a comment myself on HN in which I express my admiration for obvious tools that have changed the face of software development (e.g. memcached, JUnit, etc.)
Original title and link: memcached turns 10 years old (©myNoSQL)
The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js
MongoDB, ExpressJS, AngularJS and Node.js the MEAN stack or as the first commenter on the post called it: “the hipster stack”:
A few weeks ago, a friend of mine asked me for help with PostgreSQL. As someone who’s been blissfully SQL-free for a year, I was quite curious to find out why he wasn’t just using MongoDB instead.
It’s all roses on the way to MongoDB.
Original title and link: The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js (©myNoSQL)
via: http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and
Apache Hive 0.11: Stinger Phase 1 Delivered
Owen O’Malley on Hortonworks’ blog:
As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. This substantial release embodies the work of a wide group of people from Microsoft, Facebook , Yahoo, SAP and others. Together we have addressed 386 JIRA tickets, of which there were 28 new features and 276 bug fixes. There were FIFTY-FIVE developers involved in this and I would like to thank every one of them.
This is indeed the power of open. But don’t forget that too much bragging might diminish it: keep repeating a word and its value will slowly vanish.
Original title and link: Apache Hive 0.11: Stinger Phase 1 Delivered (©myNoSQL)
via: http://hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/
6 Key Hardware Considerations for Deploying Hadoop in Your Environment
To deploy, configure, manage and scale Hadoop clusters in a way that optimizes performance and resource utilization there is a lot to consider.
The 6 aspects presented in the post: OS, MapReduce slots available across nodes, memory, storage, capacity, network. It would be a lot more useful to put these in some order based on the scenarios the Hadoop cluster will have to solve.
Original title and link: 6 Key Hardware Considerations for Deploying Hadoop in Your Environment (©myNoSQL)
10 questions to ask when hosting your database on AWS
Dharshan Rangegowda, founder of Scalegrid, posted a list of 10 questions that should be answered before hosting your MongoDB on AWS. But these are generic enough to extend to any database-on-AWS solution. They cover aspects like HA, backup and restore, monitoring, and basic security. If you haven’t done this before, save them as a quick check list.
✚ Just because you set up HA and backups, it doesn’t mean they’ll actually work when you need them. Test them over and over again. Make it part of your regular procedures.
Original title and link: 10 questions to ask when hosting your database on AWS (©myNoSQL)
via: http://blog.mongodirector.com/10-questions-to-ask-and-answer-when-hosting-mongodb-on-aws/
Monday, 20 May 2013
Hadoop, Security, and DataStax Enterprise
But the eWeek article demonstrates that the same concerns [nb: about security] exist where Hadoop implementations are concerned. The article says: “It [Hadoop] was not written to support hardened security, compliance, encryption, policy enablement and risk management.”
The story goes like this: in the early days of NoSQL, when no NoSQL database had any sort of security features, people behind the projects answered: “it’s too early. we’re focusing on more important features. and you can still get around security by placing your database behind firewalls”. Today, when more and more NoSQL databases are adding security features, the story these same people are telling is quite different: “ohhh, security is critical. we don’t really see how you could run a database without these features”.
Security is always critical. And exactly the same can be said about maintaining a solid, coherent story of what you are telling your users.
Original title and link: Hadoop, Security, and DataStax Enterprise (©myNoSQL)
via: http://www.datastax.com/2013/04/hadoop-security-and-the-enterprise
The Master-Slave Architecture of HBase
Fantastic post by Matteo Bertozzi looking at HBase’s master-slave architecture:
At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.
Original title and link: The Master-Slave Architecture of HBase (©myNoSQL)
via: https://blogs.apache.org/hbase/entry/hbase_who_needs_a_master
Neo4j Blog: Reloading my Beergraph - using an in-graph-alcohol-percentage-index
Rik Van Bruggen about data modeling in Neo4j:
One of the things that spurred the discussion was - probably not coincidentally - the AlcoholPercentage. Many people were expecting that to be a property of the Beerbrand - but instead in my beergraph, I had “pulled it out”. The main reason at the time was more coincidence than anything else, but when you think of it - it’s actually a fantastic thing to “pull things out” and normalise the data model much further than you probably would in a relational model. By making the alcoholpercentage a node of its own, it allowed me to do more interesting queries and pathfinding operations - which led to interesting beer recommendations. Which is what this is all about, right?
I can see where this is going, but I’m not sure I agree it’s the right approach. Basically in this case it works because the domain of the field is both discrete and small. Ideally, though, what you’d actually want is an index that could give you nodes that are “close-to-some value” (e.g.: “give me the beers in the 6.9-7.1 range”)
Original title and link: Neo4j Blog: Reloading my Beergraph - using an in-graph-alcohol-percentage-index (©myNoSQL)
via: http://blog.neo4j.org/2013/05/reloading-my-beergraph-using-in-graph.html
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
