# Scraping AFLM Match Chains and Kick-in Play-on Analysis

library(data.table)
library(devtools)
library(dplyr)
library(httr)
library(jsonlite)
library(lubridate)
library(plotly)
library(stringr)
library(tidyr)

If you’ve used the AFL app recently then you have likely noticed the arrival of a new and convoluted feature: the Augmented Reality (AR) Tracker, which - if you can satisfy its demands for a flat surface to project onto - will display some pretty nifty data showing the the field location and outcomes of kicks, handballs, and shots at goal.

It’s a bit like the 3D-camera that came attached to my first smartphone. It sounds sleek and sexy in a pitch, and piques the interest enough to get a couple of uses when it first arrives - but within a week it’s forgotten forever. Cool-factor aside, it’s just not that insightful to see the data like this.

Perhaps you, like me, have glanced at the AR Tracker and idly daydreamed about getting a hold of the raw data that powers it. That dream is now a reality. Here you’ll find a guide on how to acces it - and a little analysis on kicking in and playing on just for fun.

## The Data

To access the data, I have two options for you.

The first is a script of R functions for scraping the data. The script is stored on GitHub. If you’ve installed and loaded the devtools package in R, you can load the script with a line of code.

source_url("https://raw.githubusercontent.com/DataByJosh/AFL-Data/main/AFLM_Match_Chains/Scraper.R")

This will load nine functions into your R session. The only one intended for direct use is get_match_chains(season, round).

Season will default to your system year if left blank. There isn’t currenly any data available for past season, so 2021 is the only worthwhile value you can write in here, for now.

Round will default to all rounds if left blank. So, if you want to scrape a whole season at once, blankj is the way to go. If you do want to specify a round, note that finals rounds are still referred to by number - for example, the first week of finals is 24.

data <- get_match_chains(2021)

A pleasant surprise about this dataset is that it contains far more information than is actually visible currently on the AR Tracker. It records more than 70 different statistical events in a time series broken up into chains of possession, each possessing x and y coordinates describing where the event happened on the field. There are venue dimensions, too.

If you’re not keen on using R to get the data, or finding something broken in the scraper, the second option is a bit more universal and foolproof - all the data is available in round-by-round csvs on GitHub.

## Kicking in and playing on

Such richly detailed data opens up many possibilities for analysis. As a first baby step, I’m going to take a brief and simple look at the first topic that popped into my heads. Kick-ins - should you play-on or not?

In 2019 the AFL loosened up the rules around kick-ins, providing incentive to for players taking the kick to play on. Defenders have taken up that incentive with gusto, and in doing so sparked some debate over players padding their stats.

Some high-minded part of my me wants to believe that, professionals that they are, footy players are not making the decision to play-on or not from a kick-in solely to add to their disposal count.

However, they’re only human. Given an easy way to make ourselves look good at our jobs on paper, I suspect most of us would fail to resist temptation - I know I would.

But is this selfish stat-padding that hurts the team? Or are kick-in play-on chains more likely to end in a score, thereby entirely justifying the decision to stuff one’s stats?

To start answering that question, I’ll filter the dataset down to just the kick-ins. In the data, kicking in after a behind is referred to as a “Kickin”, providing a convenient distinction from simillarly labelled events like “Kick into F50” or “OOF Kick in”. I’ll also filter out any data from the finals matches that have occured so far this year.

kick_in_data <- data %>% filter(description %like% "Kickin")
kick_in_data <- kick_in_data %>% filter(roundNumber <= 23)

We are left with more than 4000 kick-ins from 2021, a very robust dataset. Let’s start with the basics - how often do players kick-in and play-on?

prop.table(table(kick_in_data$description)) ## ## Kickin long Kickin play on Kickin short ## 0.008351756 0.828543355 0.163104888 It’s clear that playing on has become the vastly preferred option, used 82.9% of the time. We may not have and pre-2019 data to compare this too, but there’s no doubt this represent a significant shift in player behaviour at kick-ins. In particular, the rule changes have just about killed the long kick-in from the goalsquare, which is now represents less than one percent of all kick-ins. We know how kick-in chains start, so let’s look at how they end. The finalState variable in the dataset records for each event how the chain it belongs to ended. Given how the very low number of long kick-ins, I’m going to roll them in with the short ones. kick_in_data$description <- kick_in_data$description %>% str_replace("Kickin long", "Does not play-on") kick_in_data$description <- kick_in_data$description %>% str_replace("Kickin play on", "Play-on") kick_in_data$description <- kick_in_data$description %>% str_replace("Kickin short", "Does not play-on") prop.table(table(kick_in_data$description,kick_in_data$finalState), 1) ## ## ballUpCall behind endQuarter goal outOfBounds ## Does not play-on 0.058739255 0.044412607 0.011461318 0.050143266 0.164756447 ## Play-on 0.071449748 0.025793063 0.010969463 0.046842573 0.164838423 ## ## rushed turnover ## Does not play-on 0.007163324 0.663323782 ## Play-on 0.008301216 0.671805514 Before anything else, we must acknowledge that kick-ins simply don’t generate a lot of scores. Whethery playing on or, the most kick-in chains end with the ball being turned over, out of bounds, or balled up. When it comes as kick-in chains that do hit the scoreboard, the news for fans of playing on is not great. 4.7% of kick-in chains end in goals if the kicker plays on, 5% if not. The gap between the two is hardly a gorge, but ‘more goals’ simply isn’t available as an argument in favour our playing on. Starting at the raw numbers above is to enough to make anyone’s eye’s glaz over - let’s add some visuals. We’re going to cut it down to just goals behinds and put these in a bar chart to get a better look at them. Move your cursor over the bars to see the percentage values. plot_data <- as.data.frame(prop.table(table(kick_in_data$description,kick_in_data\$finalState), 1))
plot_data <- plot_data %>% spread(Var2, Freq)

plot <- plot_ly(plot_data,
x = ~Var1,
y = ~goal,
type = "bar",
name = "Goal",
hovertemplate = "%{y:.1%}") %>%
name = "Behind",
hovertemplate = "%{y:.1%}") %>%
layout(barmode = "group",
title = "<b>Percentage of kick-in chains that ended in scoring shots - 2021 Home & Away</b>",
xaxis = list(title = "<b>Type of kick-in</b>", categoryorder = "array", categoryarray = c("Play-on","Does not play-on")),
yaxis = list(title = "",tickformat = ".1%"),
legend = list(title=list(text="<b>Type of score</b>"))) %>%
config(displayModeBar = FALSE)

plot

Kick-in play-on aficionados can hang their hats on one thing. These chains may be less likely to hit the scoreboard, but if they do, that score is more likely to be a goal.

This feels like a logical result. The faster a team moves the ball, the more likely they’ll get ahead of the opposition defence and create a favourable shot at goal - provided they make it there.

This ‘taking the game on’ that so is often spruiked by the commentary box undoubtedly adds some pizazz to the experience of watching football. And when it works, it works well.

But, boring though it may be, the numbers suggest - when it comes to kick-ins at least - short, slow and steady wins the race.

Of course, this analysis is just the tip of the iceberg when it comes not just to kick-ins, but the match chains dataset itself. The answers to dozens of fascinating footy questions are buried in there somewhere.

If you are so inclined, please avail yourself of the data. I can’t wait to see what you produce.

Feedback, corrections, coding tips, questions and suggestions are always welcome.