• The Cycling News forum is looking to add some volunteer moderators with Red Rick's recent retirement. If you're interested in helping keep our discussions on track, send a direct message to @SHaines here on the forum, or use the Contact Us form to message the Community Team.

    In the meanwhile, please use the Report option if you see a post that doesn't fit within the forum rules.

    Thanks!

Tour de France stage length statistics

Data mined from Memoire du Cyclisme:
TdF_stagelengthlognormal_400px.png

TDF_avg_km_vs_decade_400px.png
 
Apr 8, 2010
1,257
0
0
Visit site
ingsve said:
Can you explain the y axis on the first graph. What is the rank and how is it determined?
If you have a data set, for example observations of the length of TdF stages, for example
187 km,153 km, 230 km, 202km, 219 km
you find the rank of each observation by ordering (from smallest to largest, potentially you could pick any ordering on the set but lets not go there.) the data:
153 km,187 km, 202km, 219 km , 230 km
and determine the place in this list of each observation, s.t. 187 km has rank 2 and 230km have rank 5.

Edit: Given that it says the rank is between 0 and 1 i guess the rank as I described have been modified in some way, like: mod rank=rank/(maximum rank). I have no idea about why you would plot the logarithm of the absolute value of rank/(1-rank) against the observation.

@OP: are you trying to say something with this or just throwing arbitrary graphs at us?
 
Magnus said:
If you have a data set, for example observations of the length of TdF stages, for example
187 km,153 km, 230 km, 202km, 219 km
you find the rank of each observation by ordering (from smallest to largest, potentially you could pick any ordering on the set but lets not go there.) the data:
153 km,187 km, 202km, 219 km , 230 km
and determine the place in this list of each observation, s.t. 187 km has rank 2 and 230km have rank 5.

Edit: Given that it says the rank is between 0 and 1 i guess the rank as I described have been modified in some way, like: mod rank=rank/(maximum rank). I have no idea about why you would plot the logarithm of the absolute value of rank/(1-rank) against the observation.

@OP: are you trying to say something with this or just throwing arbitrary graphs at us?

Ya, I assumed it had something to do with ranking them from shortest to longest and it was the normalisation to get it between 0-1 that was a bit confusing.
 
If I have a set of stage distances for the decade, for example

100 km, 200km, 150km, 120 km, 175 km

I sort them as follows:
1. 100 km
2. 120 km
3. 150 km
4. 175 km
5. 200 km

But each decade has a different number of climbs, so I normalize the rank to fall between zero and one, as follows:

0.1, 100 km
0.3, 120 km
0.5, 150 km
0.7, 175 km
0.9, 200 km

You can see here each climb falls within a bin which is 1/5 wide.

But obviously the difference, statistically, between 0.001 and 0.011 is a lot bigger than the difference between 0.500 and 0.510, even though both differ by 0.01. So then I do the logarthimic transformation ln | r / (1 - r ) |. On the above numbers, this becomes:

-2.2, 100 km
-0.8, 120 km
0, 150 km
0.8, 175 km
2.2, 200 km

Basically it's just a scheme to represent graphically the distribution of climb lengths for decades with differing numbers of climbs, sort of like a histogram but with better detail.