DevOps

Tuesday, September 20, 2016

Zeppelin and Spark: Merge Multiple CSVs into Parquet

Introduction The purpose of this article is to demonstrate how to load multiple CSV files on an HDFS filesystem into a single Dataframe an...

Saturday, July 16, 2016

AWS: Syncing a Local Directory to an S3 Storage Bucket

The S3 PUT operation only supports uploading one object per HTTP request. This can be problematic when thousands (or even millions) of fi...

Friday, July 15, 2016

OS X Terminal Recipes

Copy the first n files in a directory to a specified destination directory: $ find . -maxdepth 1 -type f | head -1000 | xargs -I {} mv {}...

Wednesday, July 13, 2016

Exposing a Python App via Django using Vagrant

Introduction I have a python application. It's inner workings are complex, but the I/O is simple. Text input (application/text) com...

Tuesday, July 12, 2016

Zeppelin and Spark: Transforming a CSV to Parquet

Transform a CSV file to Parquet Format Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem. Parq...

Friday, June 10, 2016

Zeppelin and Spark: Finding Associated Hashtags

Introduction I'm finding that eBay related spam accounts for nearly 5% of all the tweets I'm analyzing. The @eBay username is a g...

Zeppelin and Spark: Ad Hoc Twitter Feedback Analysis

Introduction A convenient screen in Zeppelin for performing ad-hoc analysis on twitter data for entity (brand or show) mentions. Paragr...

View web version

About Me

Craig Trim

View my complete profile

Powered by Blogger.