DevOps
Tuesday, September 20, 2016
Zeppelin and Spark: Merge Multiple CSVs into Parquet
›
Introduction The purpose of this article is to demonstrate how to load multiple CSV files on an HDFS filesystem into a single Dataframe an...
12 comments:
Saturday, July 16, 2016
AWS: Syncing a Local Directory to an S3 Storage Bucket
›
The S3 PUT operation only supports uploading one object per HTTP request. This can be problematic when thousands (or even millions) of fi...
3 comments:
Friday, July 15, 2016
OS X Terminal Recipes
›
Copy the first n files in a directory to a specified destination directory: $ find . -maxdepth 1 -type f | head -1000 | xargs -I {} mv {}...
2 comments:
Wednesday, July 13, 2016
Exposing a Python App via Django using Vagrant
›
Introduction I have a python application. It's inner workings are complex, but the I/O is simple. Text input (application/text) com...
2 comments:
Tuesday, July 12, 2016
Zeppelin and Spark: Transforming a CSV to Parquet
›
Transform a CSV file to Parquet Format Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem. Parq...
2 comments:
Friday, June 10, 2016
Zeppelin and Spark: Finding Associated Hashtags
›
Introduction I'm finding that eBay related spam accounts for nearly 5% of all the tweets I'm analyzing. The @eBay username is a g...
Zeppelin and Spark: Ad Hoc Twitter Feedback Analysis
›
Introduction A convenient screen in Zeppelin for performing ad-hoc analysis on twitter data for entity (brand or show) mentions. Paragr...
7 comments:
›
Home
View web version