Econometrics & Data Analysis

I’m using this section as a notebook compiling all data-orientated posts, atleast until I figure out how to code sections filtered by tag.

Visualising UK Breathalyzer Data (29/09/19)

I had some fun today exploring the gov.uk Breathalyser data with some animated plots 👇

There seems to be a fairly robust pattern in the time & weekday that breathalyser test fails occur, with late at night on weekends unsurprisingly being the peak.

Beyond this it’s difficult to discern any trends over time within such a short period (t=5). That said, this is just one of the datasets in the collection so I’ll probably follow up with some proper analysis later.

One question I’m interested in is how police behaviour and procedure might affect the data. Essentially, we can’t take the data as a representative sample of the level of drunk driving at any given time, since this is only one side of the story. The other side is the process by which police initiate a test, which might have its own dynamics. For example, police might be less likely to expect a driver to be drunk at midday, and so initiate less tests, leading to the data exhibiting a pattern that enforces that expectation. Shift patterns and the number of police on duty might also effect the number of tests conducted. If we examine the volume and relative success (fails:passes) of tests over time we might be able to assess these possibilities.

Constructing the gganimate plots

First off, since the time and weekday data was spread across different excel sheets for each year, I compiled and reformatted them into a new sheet in excel.

I then imported the data into R using the readxl package.

    library(readxl)
    time_f<-read_excel("ras51003 WITH REFORMATTED AGGREGATED TIME DAY DATA.xls", sheet = "T2")

For some reason the 00:00 time values were imported as nulls, so I manually changed these values

    time_f[1,1]<-"00:00"
    time_f[25,1]<-"00:00"
    time_f[49,1]<-"00:00"
    time_f[73,1]<-"00:00"
    time_f[97,1]<-"00:00"

I then used the stringr pakcage to reformat all time values into a more readable length.

    library(stringr)
    time_f$Time<-str_sub(time_f$Time, 1, str_length(time_f$Time)-3)

Next I installed gganimate and the other packages required to output animated plots. In gganimate you first construct a static plot with ggplot2, and then add arguments for the variable you want to animate (Year). Arguments include ease_aes() to smooth movement and ggtitle() to change the title based on the frame. The colour animation (added in the ggplot aesthetic argument) was used to make the Year value clearer and make it more visible when the loop restarts.

    install.packages(ggplo2)
    install.packages(gganimate)
    install.packages('gifski')
    install.packages('png')
    
    ggplot(time_f, aes(x=Time, y=Fails, color=Year)) +
    	geom_point() +
    	scale_color_gradient(low = "blue", high = "red") +
    	xlab("Time of Day") +
    	ylab("Failures") +
    	theme(legend.title = element_blank()) +
    transition_states(Year) +
    	ease_aes('cubic-in-out') +
    	ggtitle('Annual Failed Breathalyzer Tests by Hour of Day, Year = {closest_state}',
    	subtitle = 'Frame {frame} of {nframes}') +
    	anim_save()