Friday, June 10, 2016

Zeppelin and Spark: Ad Hoc Twitter Feedback Analysis

Introduction

A convenient screen in Zeppelin for performing ad-hoc analysis on twitter data for entity (brand or show) mentions. Paragraphs:
  1. Load data in parquet and view schema
  2. View breakdown by language
  3. View tweet distribution by twitter handle
  4. View tweet distribution by search keywords

Ad-Hoc Analysis in Zeppelin



Loading the data

%pyspark

import locale
locale.setlocale(locale.LC_ALL, 'en_US')

path="/data/output/parquet/tweets"

df_fb = sqlContext.read.parquet(path)
df_fb.registerTempTable("df_fb")

df_fb.limit(1).show()


Output:
+----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+
|    userid|         posted_time|category|       parent|       entity|           tweetid|             content|lang|search_object|       thandle|
+----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+
|2217980506|2014-01-27 22:27:...|    show|grammy awards|grammy awards|427930900505329665|@DionneJames are ...|  en|      grammys|louisemorse123|
+----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+



Breakdown by Language

This paragraph gives a breakdown by language. Zeppelin provides a useful pie chart to quickly get a sense of the distribution between the languages involved.

%sql

select
    lang, count(lang)
from
    df_fb
where
    entity=lower('${entity}')
group by
    lang


Output:



Distribution by Twitter Handle

This is a useful paragraph to get a sense of distribution across twitter handles. In use cases where individual buzz is desirable, spam and/or official and/or fan accounts can be spotted quickly.

%sql

select
    thandle, count(thandle) as total
from
    df_fb
where
    entity=lower('${entity}')
and
    lang=lower('${lang}')
group by
    thandle
order by
    total desc


Output:


This paragraph can be further refined by adding this additional WHERE clause:
and
    lower(thandle) like lower('%${entity}%')


This will output the distribution across all twitter handles that are similar to the entity name:

 It's not unlikely that a hard-core fan of a show or brand might take that name and make it part of their own username.  Having said that, if the use case requires differentiation between individual buzz and corporate/spam buzz, this becomes an easy way of identifying the fat tail (e.g. the top 10).


Distribution by Search Objects

The search patterns used to find the tweets can be analyzed with a similar distribution. If a given search object is wrong or ambiguous (like using "friends" to find instances of the sitcom Friends), this distribution can show the impact. It's worth quickly investigating the fat tail to look for evidences of ambiguity or anything that might have been used incorrectly.

%sql

select
    search_object, count(search_object) as total
from
    df_fb
where
    entity=lower('${entity}')
and
    lang=lower('${lang}')
group by
    search_object
order by
    total desc


Output:

7 comments:

  1. Interesting Post. Looking for this information for a while. Thanks for Posting.
    DevOps Online Training

    ReplyDelete
  2. I am really thankful for posting such useful information. It really made me understand lot of important concepts in the topic. Keep up the good work!
    Oracle Training in Chennai | Oracle Course in Chennai

    ReplyDelete
  3. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.






    mobile phone repair in Novi
    iphone repair in Novi
    cell phone repair in Novi
    phone repair in Novi
    tablet repair in Novi
    ipad repair in Novi
    mobile phone repair Novi
    iphone repair Novi
    cell phone repair Novi
    phone repair Novi

    ReplyDelete
  4. Physics Help Online

    [url=https://www.calltutors.com/Articles/physics-help-online]Physics Help Online[/url]

    [url=”https://www.calltutors.com/Articles/physics-help-online”]Physics Help Online[/url]

    [https://www.calltutors.com/Articles/physics-help-online|[Physics Help Online]]

    https://www.calltutors.com/blog/kinematics-physics-equations/

    ReplyDelete