DevOps

Tuesday, September 20, 2016

Zeppelin and Spark: Merge Multiple CSVs into Parquet

›
Introduction The purpose of this article is to demonstrate how to load multiple CSV files on an HDFS filesystem into a single Dataframe an...
12 comments:
Saturday, July 16, 2016

AWS: Syncing a Local Directory to an S3 Storage Bucket

›
The S3 PUT operation only supports uploading one object per HTTP request. This can be problematic when thousands (or even millions) of fi...
3 comments:
Friday, July 15, 2016

OS X Terminal Recipes

›
Copy the first n files in a directory to a specified destination directory: $ find . -maxdepth 1 -type f | head -1000 | xargs -I {} mv {}...
2 comments:
Wednesday, July 13, 2016

Exposing a Python App via Django using Vagrant

›
Introduction I have a python application.  It's inner workings are complex, but the I/O is simple.  Text input (application/text) com...
2 comments:
Tuesday, July 12, 2016

Zeppelin and Spark: Transforming a CSV to Parquet

›
Transform a CSV file to Parquet Format Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem.  Parq...
2 comments:
Friday, June 10, 2016

Zeppelin and Spark: Finding Associated Hashtags

›
Introduction I'm finding that eBay related spam accounts for nearly 5% of all the tweets I'm analyzing. The @eBay username is a g...

Zeppelin and Spark: Ad Hoc Twitter Feedback Analysis

›
Introduction A convenient screen in Zeppelin for performing ad-hoc analysis on twitter data for entity (brand or show) mentions. Paragr...
7 comments:
›
Home
View web version

About Me

My photo
Craig Trim
View my complete profile
Powered by Blogger.