sponsor: All content tagged as sponsor in NoSQL databases and polyglot persistence
Words from this week’s sponsor, Pervasive/Actian:
So, you want to pull a buttload (That’s a technical term.) of data out of a relational database and slam it into HDFS or HBase for processing. Well, maybe you’ve got a nice, powerful Hadoop cluster, but that old school database isn’t designed for parallel data exports. How do you get the data moved into Hadoop before you’re eligible for retirement?
Use the new Actian Rushloader. It’s a nice, simple, free tool that allows you to pull data from any database that has a JDBC driver, as well as log files, delimited files, HBase and ARFF files. RushLoader functions on any operating system with a JVM and with any file system, including Amazon S3, UNIX and HDFS.
The nice thing about RushLoader is that on the surface, it’s a quick and easy, point and click workflow tool, a cut down version of the KNIME open source data mining platform. Under the covers, it uses the DataRush engine that divides and optimizes workloads at runtime, so it takes full advantage of as much parallel hardware power as you give it, without you having to do any coding work to make it happen.
Configure the data query in the Rushloader database reader like this:
(t = a table name, c = a column name) Select * from t where c =?
Set up a parameter query for ? like this:
Select distinct c from table
These three steps will give you all the distinct values in the column, and send a separate query for each value to the database. Having each row query separated allows the DataRush engine to automatically spread the work across the available machines and threads, giving you a high speed parallel data pull. There’s more info on parameter queries is in the DataRush docs, and the new Actian big data community provides a DataRush toolset discussion forum if you run into trouble.
The free RushLoader includes simple row and column filtering. If you want to get any more sophisticated about the load - add data quality checks, do aggregations, sorting, source joins, lookups, that sort of thing - you have to move up to the commercial version, RushAnalytics. If all you need is a lot of data pulled from an RDBMS and slammed into Hadoop, Rushloader can do the job faster by far than anything else on the market.
Original title and link: 3 Steps for a Fast Relational Database to Hadoop Data Load [Sponsor] ( ©myNoSQL)
This week’s sponsor doesn’t have a specific message. But I do have one for them.
The people behind this road events are the fine folks from Trifork. They’ve been organizing JAOO, nowadays GOTO, for quite a while. They’ve also been part of the QCon conferences. If you’ve ever been to any of these events you’ll know immediately what I mean. I haven’t been yet to a NoSQL roadshow, but besides Berlin, Copenhagen, Zurich and Amsterdam, I’ve heard they’ll pass by San Francisco too. Most probably I’ll be there.
While the conference roster changes from event to event, I’m pretty sure you’ll get some of the best. Looking at Berlin, I can see Michael Hunger, Chris Molozian and Pavlo Baron.
If you know me and you really, really want to go to the event drop me a line and I might be able to do something for you. ↩
Original title and link: NoSQL Search Roadshow [Sponsor] ( ©myNoSQL)
Words from this week’s sponsor, Aerospike:
A new benchmark study evaluates Aerospike, Cassandra, Couchbase and Mongodb and examines the benefits of using a NoSQL database with the ability to process transactions in the face of hardware or other node failures.
Original title and link: A new benchmark study evaluates Aerospike, Cassandra, Couchbase and MongoDB ( ©myNoSQL)
Words from this week’s sponsor, Aerospike:
Thumbtack Technology’s YCSB Benchmark shows Aerospike nearly 20x faster than Cassandra, Couchbase and Mongodb for consumer-facing applications that require extremely high throughput and low latency, and whose information can be represented using a Key-Value schema. Read it now!
Original title and link: YCSB Benchmark Shows Aerospike Nearly 10x Faster Than the Competition [Sponsor] ( ©myNoSQL)
Words from this week’s sponsor, Instaclustr:
On the 27th of February, Instaclustr, one of the first dedicated Apache Cassandra hosting platforms left beta. Running on Amazon EC2 infrastructure, Instaclustr dramatically reduces the deployment and management pains associated with running a Cassandra cluster.
Here’s what you’d get with Instaclustr:
Totally managed: Instaclustr reduces the headaches associated with deploying and running a highly available Cassandra cluster. Deploy Cassandra in minutes, knowing that backups, monitoring, maintenance and tuning are all taken care of.
Fast: Cassandra clusters managed by Instaclustr will provide consistently lower latency operations, with greater throughput per dollar than DynamoDB, MongoDB and other managed NoSQL offerings.
Highly Available: Instaclustr deploys Cassandra on Amazon infrastructure, leveraging geographically distinct availability zones and on-demand instances to ensure your cluster is always available.
Low Cost: Instaclustr has an incredibly low total cost of ownership when compared to other managed NoSQL offerings and includes email support and proactive monitoring.
For more details check how Instaclustr works and sign up for an account.
Original title and link: Instaclustr - Cost Effective, High Performance Managed NoSQL Hosting [Sponsor] ( ©myNoSQL)
Words from this week’s sponsor:
On Monday, February 25, Greenplum, a Division of EMC introduced Pivotal HD: the world’s most powerful Hadoop distribution. Greenplum has spent the last two years building a new Hadoop platform that will leave the traditional database behind. Pivotal HD can store the massive amounts of information Hadoop was created to store, but it’s designed to ask questions of this data significantly faster than you can with the existing open source platform.
Greenplum is revamping Hadoop to operate more like a relational database, letting you rapidly ask questions of data using SQL, which has been a staple of the database world for decades. A team led by former Microsoft database designer Florian Waas has designed a new “query engine”, HAWQ, that can more quickly run SQL queries on data stored across a massive cluster of systems using the Hadoop File System. Compared to batch-oriented queries running against a Hadoop cluster, the combination of HDFS and HAWQ shows anywhere from 10x to 600x performance improvement. Here’s a link to a whitepaper about HAWQ, Pivotal HD’s parallel SQL engine for Hadoop (pdf).
Original title and link: Introducing Pivotal HD - the World’s Most Powerful Apache Hadoop Distribution [Sponsor] ( ©myNoSQL)