Introduction
A convenient screen in Zeppelin for performing ad-hoc analysis on twitter data for entity (brand or show) mentions. Paragraphs:- Load data in parquet and view schema
- View breakdown by language
- View tweet distribution by twitter handle
- View tweet distribution by search keywords
Ad-Hoc Analysis in Zeppelin |
Loading the data
%pyspark import locale locale.setlocale(locale.LC_ALL, 'en_US') path="/data/output/parquet/tweets" df_fb = sqlContext.read.parquet(path) df_fb.registerTempTable("df_fb") df_fb.limit(1).show()
Output:
+----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+ | userid| posted_time|category| parent| entity| tweetid| content|lang|search_object| thandle| +----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+ |2217980506|2014-01-27 22:27:...| show|grammy awards|grammy awards|427930900505329665|@DionneJames are ...| en| grammys|louisemorse123| +----------+--------------------+--------+-------------+-------------+------------------+--------------------+----+-------------+--------------+
Breakdown by Language
This paragraph gives a breakdown by language. Zeppelin provides a useful pie chart to quickly get a sense of the distribution between the languages involved.%sql select lang, count(lang) from df_fb where entity=lower('${entity}') group by lang
Output:
Distribution by Twitter Handle
This is a useful paragraph to get a sense of distribution across twitter handles. In use cases where individual buzz is desirable, spam and/or official and/or fan accounts can be spotted quickly.%sql select thandle, count(thandle) as total from df_fb where entity=lower('${entity}') and lang=lower('${lang}') group by thandle order by total desc
Output:
This paragraph can be further refined by adding this additional WHERE clause:
and lower(thandle) like lower('%${entity}%')
This will output the distribution across all twitter handles that are similar to the entity name:
It's not unlikely that a hard-core fan of a show or brand might take that name and make it part of their own username. Having said that, if the use case requires differentiation between individual buzz and corporate/spam buzz, this becomes an easy way of identifying the fat tail (e.g. the top 10).
Distribution by Search Objects
The search patterns used to find the tweets can be analyzed with a similar distribution. If a given search object is wrong or ambiguous (like using "friends" to find instances of the sitcom Friends), this distribution can show the impact. It's worth quickly investigating the fat tail to look for evidences of ambiguity or anything that might have been used incorrectly.%sql select search_object, count(search_object) as total from df_fb where entity=lower('${entity}') and lang=lower('${lang}') group by search_object order by total desc
Output:
Thanks for sharing very useful Information.
ReplyDeleteDevOps Online Training
Interesting Post. Looking for this information for a while. Thanks for Posting.
ReplyDeleteDevOps Online Training
nice message thanks for sharing to us
ReplyDeleteaws training center in chennai
aws training in chennai
aws training in omr
aws training in sholinganallur
aws training institute in chennai
best aws training in sholinganallur
best angularjs training in chennai
angular js training in sholinganallur
I am really thankful for posting such useful information. It really made me understand lot of important concepts in the topic. Keep up the good work!
ReplyDeleteOracle Training in Chennai | Oracle Course in Chennai
Thanks for sharing very useful Information on Apache Spark Training
ReplyDeletePositive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.
ReplyDeletemobile phone repair in Novi
iphone repair in Novi
cell phone repair in Novi
phone repair in Novi
tablet repair in Novi
ipad repair in Novi
mobile phone repair Novi
iphone repair Novi
cell phone repair Novi
phone repair Novi
Physics Help Online
ReplyDelete[url=https://www.calltutors.com/Articles/physics-help-online]Physics Help Online[/url]
[url=”https://www.calltutors.com/Articles/physics-help-online”]Physics Help Online[/url]
[https://www.calltutors.com/Articles/physics-help-online|[Physics Help Online]]
https://www.calltutors.com/blog/kinematics-physics-equations/