Analyzing #goofyreligion with #rstats and rtweet (Twitter API R package)

Michael W Kearney’s rtweet package makes analyzing tweets from Twitter quite easy. The code I use below is only a slight modification of the code presented on his Github page for the package. My comments appear with a single # and comments I copied from Michael’s code have two ##.

What we can all do on this Christmas Day in short: Install R, install the rtweet package, and then edit his example code to fly away into analytical glory.

“Glory” is too strong a word for this particular small n analysis.

Let’s analyze #goofyreligion activity over the last 9 completed days (December 16 – December 24, 2017).

My imperfect recollection of the #goofyreligion origins shared by developers, conference coordinators, and speakers Dave Rael and Reid Evans during the December 11, 2017 Developer on Fire podcast (hosted by Dave): They were talking with Jose Gonzalez at the Kansas City Developer’s Conference in early, 2017 about the value of remaining physically active despite responsibilities and intensely cerebral work. I have a vague sense about other details but rather than be wrong and invest the time on Christmas Day to listen again, you can listen to that fine podcast and learn more.

Pablo Rivera may have immediately joined while still in Puerto Rico and remains a participant now that he lives in Atlanta. I joined them at some point, perhaps around when Pavneet Singh Saund joined. And this week, it looks like someone named Jon Hider, who loves Iowa State and swimming and running long distances, joined as well.

jonhider

Jon did not use #goofyreligion so he will not appear in this analysis.

You need not have Jon’s icy daredevil spirit to join us. Just Tweet something expressing that you engaged in physical activity of some kind and use #goofyreligion. We are an inoffensive group, please believe us. Please do not be bothered by our hashtag. Please.

Let’s see what rtweet can tell us about the latest #goofyreligion activity. Loading the data after following the rtweet Installation instructions.

rt <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span><- search_tweets(
  "#goofyreligion", n = 200, include_rts = FALSE
)
head(rt)

Sixty-eight columns, wow.

Searching the Interwebs is not quickly enough revealing the time zone of the created_at field so let’s look at my tweets:

rt[rt$screen_name=='rick_pack2',]$created_at

timezone

UTC – coordinated universal time, which Bing shows me is 5 hours ahead of my current Eastern Standard Time. Looking at the output, the greatest time of 23:28:58 is 6:28 PM and the earliest of 12:14:50 is 7:14 AM. Both of these are reasonable times for me to be tweeting.

Plotting a time-series of tweets for all users of #goofyreligion.

# Count the rows, one row per tweet
tweetct = nrow(rt)
# twitter API only captures between 6 and 9 days of Tweets
## plot time series of tweets
ts_plot(rt, by = "6 hours") +
  ggplot2::theme_minimal() +
  ggplot2::theme(plot.title = ggplot2::element_text(face = "bold")) +
  ggplot2::labs(
    x = NULL, y = NULL,
    title = "Frequency of #goofyreligion Twitter statuses from past 9 days",
    subtitle = paste0("Twitter status (tweet) counts (n = ", tweetct, ") aggregated using 6-hour intervals"),
    caption = "\nSource: Data collected from Twitter's REST API via rtweet"
  )

rtweet_timeseries1

Working with only 27 tweets makes this a bit silly but I observe that one tweet often triggers a second. It seems the #goofyreligion support group members inspire each other’s activity or they happen to often tweet at similar times. To examine this pattern further, let’s look at the times associated with more than one tweet.

# Adapted from Michael's Kerney's github page for ts_plot.R
## reformat time var
dtvar <- rt$created_at
# 6 hour interval
interval  <- 3600 * 6
roundedtm <- floor(as.numeric(dtvar) / interval) * interval
## center so value is interval mid-point
roundedtm <- roundedtm + round(interval * .5, 0)
## Subtract 5 hours so we are in Eastern Standard Time, familiar to me
roundedtm <- roundedtm - (5*3600)

## return to date-time
roundedtm <- as.POSIXct(roundedtm, tz = "UTC", origin = "1970-01-01")
# sanity check - remembering that I subtracted 5 hours
head(dtvar)
head(roundedtm)
# Looks good so we add the column to the data frame and focus on
# times associated with more than one tweet
rt$roundedtm <- roundedtm rt %>%
    group_by(roundedtm) %>%
    summarise(ct = n()) %>%
    arrange(desc(ct)) %>%
    filter(ct > 1)

rtweet_timeseries2

These are centered at the 6-hour midpoint so we see #goofyreligion was most active from 7 AM Eastern to 7 PM Eastern, and particularly from 7 AM to 1 PM. I am typing this blog shortly before my tweet and it is about 9:30 PM Eastern on Christmas night. The pattern should hold (i.e., the 6-hour block inclusive of my upcoming tweet will include only my tweet).

Finally, let’s examine activity by user.

rt %>%
  dplyr::group_by(screen_name) %>%
  ts_plot("days", trim = 1L) +
  ggplot2::geom_point() +
  ggplot2::theme_minimal() +
  ggplot2::theme(
    legend.title = ggplot2::element_blank(),
    legend.position = "bottom",
    plot.title = ggplot2::element_text(face = "bold")) +
  ggplot2::labs(
    x = NULL, y = NULL,
    title = "Frequency of #goofyreligion tweets by user",
    subtitle = "Counts aggregated by day from December 16 - 24, 2017",
    caption = "\nSource: Data collected from Twitter's REST API via rtweet"
  )
 

rtweet_user_day

Jose stars in this drab, unimodal distribution with a measly but modal two tweets on December 20th. This depiction belies the encouragement we often provide via tweeted replies. Our counts overlap to the point of causing data loss in this depiction. For example, we cannot tell if Pavneet tweeted on December 22nd. Neither a yellow dot or line appear on that day.

I would enjoy doing this again with more tweets. Have some fun and join us today. Listen to a post-run Jose thanking Reid for his encouragement:

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s