Thank you for taking the time to explain,
@Peyresourde.
I understand that some degree of subjectivity is unavoidable, and I am not trying to challenge your judgement(or model). I would howver like to know more about the methodological approach behind it.
Importantly, how did you develop the model and the categories itself? Was there an exploratory and confirmatory process, for example by splitting the data?
Did you test alternative model specifications with different weightings on a subset of climbs, and then evaluate how well they predicted performance in another dataset with the same riders? In other words, did you arrive at the current structure because it provided the best fit across samples?
I ask because I want to be cautious about the risk of models being shaped, even unintentionally, by prior expectations or narratives about particular riders or races, and I am interested in understanding how you worked to guard against that.
My starting point was that I was unhappy with the existing metrics.
LR uses ASLP, which only incoorporates altitude, but is IMO too extreme and also undervalues short climbs.
W2W adjusts for stage hardness and altitude. But it devalues TTs extremely. Especially in mixed TTs like PDBF 2020, it is even harder to perform on the climb than after a 180k road stage. W2W just takes the overall kilojoules and slams a fat minus adjustment on the effort.
In contrast, it seems to overvalue long stages. E.g. on the Lombardia climbs almost everyone does a PB which is not likely to be true.
My first premise is to rather adjust too little than too much. There are studies for the effect of altitude and heat, but I also take these with a grain of salt because some of the effects seem very high to me.
My next premise was that the approach to a climb is very important (even more important than how hard the overall stage has been). As for how to incorporate this? I did not use any scientific method, I basically just winged it and did what seemed plausible to me.
Certain prior expectations play a role in my process. E.g. Carapaz did a really high level performance in the Giro last year on the stage he won (on Pietra di Bismantova). It was a small climb and not very steep, no one else analyzed it. Then you catch yourself thinking: Carapaz normally never has high watts, this is a complete outlier --> There must have been a tailwind or maybe the segment was wrong etc.
And for a famously good climber, it may be the other way around: he won and gapped everyone by 30 seconds, yet the performance seems low --> Maybe the tarmac was worse than I thought, or there was a headwind?
I do this as a hobby, and I intentionally do a lot of the adjustments manually and by feel. Otherwise I would have no fun doing it. I also think this way may be more accurate than some automated system using a unified method (where would you even get accurate data for all parameters? Strava is an option, but it did not exist back in the day and many riders don't post power).
So certainly, for some performances I input the data once, it seems good and I never look at it again. While for some others, I put a lot of effort in to get it right/plausible.