11 June 2017

Triathlon math part 2: What are realistic event speeds?

In my last post, I looked at triathlon events from purely a mathematical point of view, and asked the question: "Given how long each event is, which event helps your time the most if you decide to push the pace?" If you assume that you can swim as fast as you can bike, then our initial guess that the bike leg is the most important holds up, because the bike leg is so much longer.

That conclusion breaks down if you break that assumption! There are regions on the correlation matrices I plotted last time where the gain in time is very similar for different events. Which regions are physically realistic? Can I really improve my swimming speed from 1 km/hr to 30 km/hr? Is it worth it to sacrifice 3 km/hr on the bike in order to gain 2 km/hr on the run? The answer might really depend on which speed you're starting from and which speed you're going to for each event.

To get a more grounded idea of the relevant speeds, I downloaded the data from all 429 competitors in the Overall category for our race so we can see how actual athletes perform (data available here - my team was in the relay category so it doesn't include us).

When you look at how the overall triathlon finishing times are distributed, the first thing that pops out is that the top finishers are closer to the pack than the long tail of slower athletes:
 but if you look at everyone's average speed, it's much more evenly distributed:

These two plots are consistent with each other, since the separation between two competitors increases the longer they are out on the course. If one athlete has speed \( v \), and the other has a speed that is a fraction \( f \) of \( v \) (i.e. \( f \times v \), then the difference in time between them at the end of the course (distance \( d \)) is
\Delta T = \frac{d}{v} \left(1 - \frac{1}{f}\right)

- so it's not a linear relationship.

When you look at how the times for each event are distributed, there's a significant overlap between events in how long it takes - the fastest runners finish their 5k faster than a significant chunk of the swimmers (myself included)!

Considering the amount of overlap above, I was surprised by how neatly the events separate themselves out in terms of speed. Each event occupies a pretty well-defined space all by itself:
I think that comparing these two plots can clarify the answers we're seeking. The Speed histograms are all fairly symmetric and almost Normal - suggesting that the athletes are pulled randomly out of a population with some average value. On the other hand, the Time histograms all have right tails - especially the run and the bike. What this tells me is that if you're out in the tail of the run or bike time histograms, there is some other athlete very close to you in terms of fitness (read: there's hope!) who is running or biking just a little bit faster and getting a much bigger time benefit.

Next up (probably): focusing in on the space these speeds occupy on the correlation matrices from last time.

Edit: I updated the speed histograms with a fit to a Gaussian. The agreement looks pretty good!

05 June 2017

Triathlon math - Which event affects your time the most?

This weekend two friends and I had a blast doing the relay version of the Rock Hall Sprint Triathlon (we so cleverly named ourselves Team U+1F4A9 -- you can find our results here if you can figure out the name). I was really excited for my first open-water event since I started swimming consistently a few years ago, and my first ever swim race! I finished the 750 m-long swim leg in 17:45 -- a little slower than my pool times, but at least I finished, and I know that for next time I have to practice swimming in a straight line. Devin turned out a solid time for the bike course, and Carolyn finished the run 1:30 under her prediction, so all in all we had a really fun and successful triathlon.

Biting my medal to see if it's really made from real gold

During the long drive home, Devin and I got into a debate over whether the swim, the bike, or the run improve your time the most when you increase your pace. Our guess was that it's the bike, since it just takes up so much more time. Once I got home, I decided to model it and see if our guess holds up. Heads up - I use some basic calculus.

The total time it takes to complete a triathlon is just the sum of the times it takes to complete the three stages. Since the time it takes to cover a distance is distance / speed, this equation gave me my starting point.
\[ \mathrm{Total~time} = \frac{d_{\mathrm{swim}}}{v_{\mathrm{swim}}} + \frac{d_{\mathrm{bike}}}{v_{\mathrm{bike}}} + \frac{d_{\mathrm{run}}}{v_{\mathrm{run}}} \] For a given race, the different distances are fixed. For example, for our race, the swim was 750 m (0.75 km), the bike was 15 miles (24.14 km), and the run was 5k. The variables here are the three velocities - how fast you choose to take each leg. For our purposes, the average speed is perfectly fine (velocity and speed are interchangeable). The first question is: How does the total time change, if you take the different legs at different speeds? For these we want the partial derivatives: \[ \frac{\partial T}{\partial v_i} = - \frac{d_i}{v_i^2} \] (replace i with swim, bike, or run - it's the same for all of them. There are two takeaways here: First, the time you save depends on the distance you're covering. This makes intuitive sense, and is the reasoning behind our initial guess that the bike is the most important. However, there's a twist! The second takeaway is that the rate of change depends on the inverse of the velocity squared. This means that at slow speeds, a small change makes a big difference, but at high speeds, a small change makes very little difference. Going from 4 km/hr to 5 km/hr - reasonable swimming speeds - is a much bigger overall difference than going from 20 km/hr to 21 km/hr. In the plot here, look at how much steeper 1/x^2 is than 1/x at 5, and how much flatter it is around 20.
Here's what the derivatives look like for each event:

As we expected, the bike has the biggest derivate, meaning that it affects the time most. This is driven by the fact that d_bike is much larger than d_swim and d_run.
Here is the experiment I set up: Let's say you do the bike at velocity v1 one time, and velocity v2 the next. How much time did you gain? This is given by integrating \( \frac{\partial T}{\partial v_i} \) over the range of velocities, v1 to v2:
dT = \int_{v_1}^{v_2} \frac{\partial T}{\partial v} dv

We're all set up now! I picked a velocity range that covers reasonable values for the run, bike, and swim - 1 km/hr to 40 km/hr - and calculated the total time you would gain if you did a leg at one velocity and then again at another velocity, for all pairs of velocities. This is what that looks like:

The darkness of each pixel tells you how many minutes you would save for the Rock Hall Sprint Triathlon if you did it at Speed 1 (x-axis) versus if you did it at Speed 2 (y axis). Darker = saving more time, as you can see in the scale bars. The white diagonal shows that, intuitively, if you do an event twice at the same speed, you will get 0 difference in time. The pixels below the white diagonal technically correspond to Speed 1 being faster than Speed 2, so the pixel values should be negative (i.e. you took longer the second time than the first time). I just multiplied them by -1 to make it easier to plot; in any case, the graphs are symmetric about the diagonal.

There are a few things that stand out. All the graphs are on the same scale, so the fact that the bike is the darkest of the three tells you that, all things being equal (like, Speed 1 and Speed 2), the bike nets you the best decrease in time. This is just because \(d_{\mathrm{bike}}\) is the longest.

We also said that improving at the slow end of the range nets you more improvement than on the fast end of the range, and we see that here with left and bottom sides being the darkest. That's important! Realistic swimming speeds are going to be concentrated down in the lower left, whereas cycling speeds will be further up the diagonal into the relatively brighter regions.

It's hard to tell by eye which event has the darkest regions over the relevant speeds. In the next post, I'll take a closer look on how our predictions hold up once you actually start to compare realistic speeds and speed changes for each event.