# Tour de France stage length statistics

#### djconnel

Data mined from Memoire du Cyclisme:  #### ingsve

Can you explain the y axis on the first graph. What is the rank and how is it determined?

#### Magnus

ingsve said:
Can you explain the y axis on the first graph. What is the rank and how is it determined?
If you have a data set, for example observations of the length of TdF stages, for example
187 km,153 km, 230 km, 202km, 219 km
you find the rank of each observation by ordering (from smallest to largest, potentially you could pick any ordering on the set but lets not go there.) the data:
153 km,187 km, 202km, 219 km , 230 km
and determine the place in this list of each observation, s.t. 187 km has rank 2 and 230km have rank 5.

Edit: Given that it says the rank is between 0 and 1 i guess the rank as I described have been modified in some way, like: mod rank=rank/(maximum rank). I have no idea about why you would plot the logarithm of the absolute value of rank/(1-rank) against the observation.

@OP: are you trying to say something with this or just throwing arbitrary graphs at us?

#### ingsve

Magnus said:
If you have a data set, for example observations of the length of TdF stages, for example
187 km,153 km, 230 km, 202km, 219 km
you find the rank of each observation by ordering (from smallest to largest, potentially you could pick any ordering on the set but lets not go there.) the data:
153 km,187 km, 202km, 219 km , 230 km
and determine the place in this list of each observation, s.t. 187 km has rank 2 and 230km have rank 5.

Edit: Given that it says the rank is between 0 and 1 i guess the rank as I described have been modified in some way, like: mod rank=rank/(maximum rank). I have no idea about why you would plot the logarithm of the absolute value of rank/(1-rank) against the observation.

@OP: are you trying to say something with this or just throwing arbitrary graphs at us?

Ya, I assumed it had something to do with ranking them from shortest to longest and it was the normalisation to get it between 0-1 that was a bit confusing.

#### djconnel

If I have a set of stage distances for the decade, for example

100 km, 200km, 150km, 120 km, 175 km

I sort them as follows:
1. 100 km
2. 120 km
3. 150 km
4. 175 km
5. 200 km

But each decade has a different number of climbs, so I normalize the rank to fall between zero and one, as follows:

0.1, 100 km
0.3, 120 km
0.5, 150 km
0.7, 175 km
0.9, 200 km

You can see here each climb falls within a bin which is 1/5 wide.

But obviously the difference, statistically, between 0.001 and 0.011 is a lot bigger than the difference between 0.500 and 0.510, even though both differ by 0.01. So then I do the logarthimic transformation ln | r / (1 - r ) |. On the above numbers, this becomes:

-2.2, 100 km
-0.8, 120 km
0, 150 km
0.8, 175 km
2.2, 200 km

Basically it's just a scheme to represent graphically the distribution of climb lengths for decades with differing numbers of climbs, sort of like a histogram but with better detail.