ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Data processing command line-style

Very often I jump to using Python for any sort of data processing. And I totally forget about the powerful tools available on pretty much every Linux/Mac box1.

Jeroen Janssens’s 7 command-line tools for data science presents 6 command line tools for fetching, filtering and transforming data: jq, json2csv, csvkit, scrape, xml2json, sample.

Then Leonardo Trabuco’s Working with data on the command line gives a quick roundup of the standard Linux tools: head, tail, less, awk, cut, sort, uniq, wc, grep, shuf.

If you understand the philosophy of Linux tools and get familiar with some of the tools listed above — I’ve never got too deep into awk and sed almost always tricks me, you’ll be able to do some nice data processing experimentation directly from the command line.


  1. The one excuse I usually find for myself when doing this is that debugging command line tools behavior is not as pleasant as debugging some Python scripts. _Sort of an OK argument, but still an excuse._ 

Original title and link: Data processing command line-style (NoSQL database©myNoSQL)