## 11 June 2017

### Triathlon math part 2: What are realistic event speeds?

In my last post, I looked at triathlon events from purely a mathematical point of view, and asked the question: "Given how long each event is, which event helps your time the most if you decide to push the pace?" If you assume that you can swim as fast as you can bike, then our initial guess that the bike leg is the most important holds up, because the bike leg is so much longer.

That conclusion breaks down if you break that assumption! There are regions on the correlation matrices I plotted last time where the gain in time is very similar for different events. Which regions are physically realistic? Can I really improve my swimming speed from 1 km/hr to 30 km/hr? Is it worth it to sacrifice 3 km/hr on the bike in order to gain 2 km/hr on the run? The answer might really depend on which speed you're starting from and which speed you're going to for each event.

To get a more grounded idea of the relevant speeds, I downloaded the data from all 429 competitors in the Overall category for our race so we can see how actual athletes perform (data available here - my team was in the relay category so it doesn't include us).

When you look at how the overall triathlon finishing times are distributed, the first thing that pops out is that the top finishers are closer to the pack than the long tail of slower athletes:
but if you look at everyone's average speed, it's much more evenly distributed:

These two plots are consistent with each other, since the separation between two competitors increases the longer they are out on the course. If one athlete has speed $$v$$, and the other has a speed that is a fraction $$f$$ of $$v$$ (i.e. $$f \times v$$, then the difference in time between them at the end of the course (distance $$d$$) is
$\Delta T = \frac{d}{v} \left(1 - \frac{1}{f}\right)$

- so it's not a linear relationship.

When you look at how the times for each event are distributed, there's a significant overlap between events in how long it takes - the fastest runners finish their 5k faster than a significant chunk of the swimmers (myself included)!

Considering the amount of overlap above, I was surprised by how neatly the events separate themselves out in terms of speed. Each event occupies a pretty well-defined space all by itself:
I think that comparing these two plots can clarify the answers we're seeking. The Speed histograms are all fairly symmetric and almost Normal - suggesting that the athletes are pulled randomly out of a population with some average value. On the other hand, the Time histograms all have right tails - especially the run and the bike. What this tells me is that if you're out in the tail of the run or bike time histograms, there is some other athlete very close to you in terms of fitness (read: there's hope!) who is running or biking just a little bit faster and getting a much bigger time benefit.

Next up (probably): focusing in on the space these speeds occupy on the correlation matrices from last time.

Edit: I updated the speed histograms with a fit to a Gaussian. The agreement looks pretty good!