Michael W Kearney’s rtweet package makes analyzing tweets from Twitter quite easy. The code I use below is only a slight modification of the code presented on his Github page for the package. My comments appear with a single
# and comments I copied from Michael’s code have two
What we can all do on this Christmas Day in short: Install R, install the rtweet package, and then edit his example code to fly away into analytical glory.
“Glory” is too strong a word for this particular small n analysis.
#goofyreligion activity over the last 9 completed days (December 16 – December 24, 2017).
My imperfect recollection of the
#goofyreligion origins shared by developers, conference coordinators, and speakers Dave Rael and Reid Evans during the December 11, 2017 Developer on Fire podcast (hosted by Dave): They were talking with Jose Gonzalez at the Kansas City Developer’s Conference in early, 2017 about the value of remaining physically active despite responsibilities and intensely cerebral work. I have a vague sense about other details but rather than be wrong and invest the time on Christmas Day to listen again, you can listen to that fine podcast and learn more.
Pablo Rivera may have immediately joined while still in Puerto Rico and remains a participant now that he lives in Atlanta. I joined them at some point, perhaps around when Pavneet Singh Saund joined. And this week, it looks like someone named Jon Hider, who loves Iowa State and swimming and running long distances, joined as well.
Jon did not use
#goofyreligion so he will not appear in this analysis.
You need not have Jon’s icy daredevil spirit to join us. Just Tweet something expressing that you engaged in physical activity of some kind and use
#goofyreligion. We are an inoffensive group, please believe us. Please do not be bothered by our hashtag. Please.
Let’s see what
rtweet can tell us about the latest
#goofyreligion activity. Loading the data after following the
rtweet Installation instructions.
rt <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span><- search_tweets( "#goofyreligion", n = 200, include_rts = FALSE ) head(rt)
Sixty-eight columns, wow.
Searching the Interwebs is not quickly enough revealing the time zone of the
created_at field so let’s look at my tweets:
UTC – coordinated universal time, which Bing shows me is 5 hours ahead of my current Eastern Standard Time. Looking at the output, the greatest time of 23:28:58 is 6:28 PM and the earliest of 12:14:50 is 7:14 AM. Both of these are reasonable times for me to be tweeting.
Plotting a time-series of tweets for all users of
# Count the rows, one row per tweet tweetct = nrow(rt) # twitter API only captures between 6 and 9 days of Tweets ## plot time series of tweets ts_plot(rt, by = "6 hours") + ggplot2::theme_minimal() + ggplot2::theme(plot.title = ggplot2::element_text(face = "bold")) + ggplot2::labs( x = NULL, y = NULL, title = "Frequency of #goofyreligion Twitter statuses from past 9 days", subtitle = paste0("Twitter status (tweet) counts (n = ", tweetct, ") aggregated using 6-hour intervals"), caption = "\nSource: Data collected from Twitter's REST API via rtweet" )
Working with only 27 tweets makes this a bit silly but I observe that one tweet often triggers a second. It seems the
#goofyreligion support group members inspire each other’s activity or they happen to often tweet at similar times. To examine this pattern further, let’s look at the times associated with more than one tweet.
# Adapted from Michael's Kerney's github page for ts_plot.R ## reformat time var dtvar <- rt$created_at # 6 hour interval interval <- 3600 * 6 roundedtm <- floor(as.numeric(dtvar) / interval) * interval ## center so value is interval mid-point roundedtm <- roundedtm + round(interval * .5, 0) ## Subtract 5 hours so we are in Eastern Standard Time, familiar to me roundedtm <- roundedtm - (5*3600) ## return to date-time roundedtm <- as.POSIXct(roundedtm, tz = "UTC", origin = "1970-01-01") # sanity check - remembering that I subtracted 5 hours head(dtvar) head(roundedtm) # Looks good so we add the column to the data frame and focus on # times associated with more than one tweet rt$roundedtm <- roundedtm rt %>% group_by(roundedtm) %>% summarise(ct = n()) %>% arrange(desc(ct)) %>% filter(ct > 1)
These are centered at the 6-hour midpoint so we see #goofyreligion was most active from 7 AM Eastern to 7 PM Eastern, and particularly from 7 AM to 1 PM. I am typing this blog shortly before my tweet and it is about 9:30 PM Eastern on Christmas night. The pattern should hold (i.e., the 6-hour block inclusive of my upcoming tweet will include only my tweet).
Finally, let’s examine activity by user.
rt %>% dplyr::group_by(screen_name) %>% ts_plot("days", trim = 1L) + ggplot2::geom_point() + ggplot2::theme_minimal() + ggplot2::theme( legend.title = ggplot2::element_blank(), legend.position = "bottom", plot.title = ggplot2::element_text(face = "bold")) + ggplot2::labs( x = NULL, y = NULL, title = "Frequency of #goofyreligion tweets by user", subtitle = "Counts aggregated by day from December 16 - 24, 2017", caption = "\nSource: Data collected from Twitter's REST API via rtweet" )
Jose stars in this drab, unimodal distribution with a measly but modal two tweets on December 20th. This depiction belies the encouragement we often provide via tweeted replies. Our counts overlap to the point of causing data loss in this depiction. For example, we cannot tell if Pavneet tweeted on December 22nd. Neither a yellow dot or line appear on that day.
I would enjoy doing this again with more tweets. Have some fun and join us today. Listen to a post-run Jose thanking Reid for his encouragement: