Cyclist performance modeling


@stevemz had some criticisms of the CP based models vs the FTP based models. I think it’s an interesting topic so here we are.

FTP is a useful metric in that it’s simpile to communicate, a single number, it’s a simple concept, power I can do for an hour, and it is easy to test, go hard for an hour. It falls short in that it isn’t an accurate predictor of performance in anything other than a 40k tt and is only useful for designing workouts in so far that most people are mostly the same and the workouts can be made conservatively enough to be useful to most people.

Adding additional parameters, i.e W’ and peak power, allows for a much more predictive model to be created. Given the amount of data collected on riders and the computation power generally available it is much more reasonable to use the more complex model than it was 20 years ago.

CP + W’ also folds nicely into the active research/study going into polarized training as CP can be a reasonable predictor if LT1.

I’d be interested to hear others opinions on the topic, especially @chad and @nate.


I disagree. Skills and tactics held equal, FTP is a large determinant in performance across a wide variety of different disciplines (excluding short duration track riding). Within a category, yes, there is variation and outliers, but on average, a Cat 1 rider will have a higher w/kg FTP than a Cat 3 rider and I think it would be pretty easy to pick out who would win a variety of different events if we had absolute and relative FTP. Hell ,you could build an ELO simulator and train it on this data to prove or disprove this point :wink:

The Critical Power/W’ model predicts anaerobic work capacity and critical power, which is different than FTP and has no corresponding physiological breakpoint [critical power] when you measure it in a lab.

I’m not aware of a Critical Power model that predicts LT1, especially since W’ is anaerobic work capacity and LT1 is an aerobic marker. Xert predicts LTP which is your threshold after you’ve depleted W’, which is fundamentally different than LT1. If there is a study around LT1/VT1 and CP, please link it so I can read it :slight_smile:


Okay, with that out of the way, in order to get a useful Critical Power plot, you need to have a short (5-45 second), medium (90s - 5min), and long (technically 12min, but its now recommended to be 20-30min) maximal effort that are far enough apart to be maximal, but close enough together to be within a reasonable amount of time. Let’s say just say 2 weeks.

If you’ve never gone out and done a 3MMP and 12MMP, they are soul destroying. And you won’t get training zones out of it. You’ll get W’ and critical power, which is useful if you are a rider focused on the anaerobic side of the equation, but that is only one side.

Cycling is a predominantly aerobic sport (even with a 1MMP effort, you are still working aerobically); by and large the reason why training plans look similar is that below threshold, most people function pretty similarly. When you get to the pointy ends of the histogram, there are people who respond somewhat differently, but for the most part, people respond to sub-threshold work pretty similarly.

The difficulty with almost all of the performance modeling that exists is that its very difficult to pull out the subjective factors that exist before fitting it to what might or might not be sub maximal or maximal data points. WKO4 is the only exception here and actually does a pretty good job, even though it may be the least user friendly product ever built. WKO4/Coggan iLevels/Optimized gets the closest to having high fidelity zones than any other product, but we are talking about a pretty small difference comparatively. And you have to feed WKO4 really good data for it to get close. Which usually involved regular testing :slight_smile:


Reserved followup post on consistency, plan execution, and performance measurement, which are inherently linked to this conversation.


And so says the Classic vs iLevel chart:

I’m beginning to think that, yes, for the vast majority of if, the classic levels will work just fine for us. But also agree that semi & pro riders might require more precision in their power training.


You can have two riders with very similar FTP one that has snap, the other that can crank hard for 20 min, and another that can go all day. Their FTP is the same, but their ability to deploy their ‘above aerobic work’ is dramatically different.

The only accurate way to find LT1 is to do a lactate test. That’s why it’s largely useless in my opinion. Even if CP/LTP don’t directly line up with LT1 a margin of error could be established based on the available metrics to get a good range of where LT1.

I think the filtering maximal/submaximal efforts is a non issue. If you find a harder effort that one is maximal. If an effort isn’t harder than any other recent effort it obviously wasn’t maximal. In my mind part of motivation for FTP vNext, be that CP or something else, should be eliminating dedicated testing.


We agree on this. My point was that FTP is the bigger factor, and that the other factors are secondary. A Cat 3 rider with a great pVO2max and repeatability relative to their FTP will be more likely to beat another Cat 3 in a punchy road race, but will likely be unceremoniously spat out the back halfway through the P/1/2 because their aerobic capability is lacking relative to the mean in that field.

LT1 is more likely to be related to FTP than it is to CP. You can also reasonably estimate LT1 by feel.

Why? Why is testing bad?


Just for the record in case it doesn’t come across from my posts:

  • I’m in favor of modeling, but the current models aren’t very good (with WKO4 excluded and falling into the solid category)
  • You have to feed a model good data, which usually means some type of testing
  • Increased precision and specificity is icing on the cake, not the cake itself


Why is testing good? Testing doesn’t make us faster it just tells us how fast we are in a more objective way. If you can get to that objective answer from the mess of day to day efforts then you can spend more time training to get faster than training to test.


Exactly as you said, objectivity.

I’m an advocate for testing, because you can control the variables and the timing to a much greater degree than race/ride data.

If I’m going to train in a structured way, I’m a strong proponent of gathering the anchor points in a structured way as well.


If a sprinter is doing structured training to push his max power or increase his 10s power doing an FTP test is close to useless and his progress isn’t captured anywhere.


I’m not saying to only do an FTP test. Obviously sprinters should be testing and evaluating their sprint numbers to track progress for their chosen race discipline.

If you are talking about track sprinters, then yes, they shouldn’t waste time doing an FTP test (they have other structured methods of evaluation). But if we are talking about a road racer who considers themselves a sprinter, they should absolutely be evaluating their FTP in the context of whether they can get to the end of the race with enough gas left to let it rip.


But we can build a model that captures the road sprinter and the track sprinter and place their capabilities on a comparable continuum.


Yes, sort of, and no depending on several factors:

  1. The model has to be good
  2. The model has to have high quality, recent, and relevant data
  3. The individual can’t vary significantly from the data set that the model was trained on
  4. The quality of the prescription will be directly related to the quality of the data points near the specific energy system that you are looking to target

For example, here is what different power durations looks like from a variability perspective when compared to FTP

You’ll notice a massive variation in the short duration part of the graph that gets increasingly less so when out towards 3600s. This variation can also change within the course of a training cycle (i.e. during one cycle my 20min:FTP ratio may fluctuate inside the range of 92-97% depending on the type of work I’m doing).

It’s difficult to predict longer power durations from short power durations (and vice versa) and I’m sure you’ve found that a 3-5% variation in power depending on the zone can mean the difference between getting through a workout or failing it.

I’m not anti-models; quite the opposite. I’ve spent significant time working with Xert, Golden Cheetah, WKO4, TrainerRoad, and the Coggan/Friel/CTS tests in my own personal trial and error to see how they all match up. In my own experience, whenever I’ve relied on the models over a consistent set of benchmark testing, my training suffered and I wasn’t able to complete workouts where I should have and I got slower. When I use consistent testing and use that to set zones, I get faster.

Science behind Sweet Spot Methodology