I’ve looked over the full article now. Here's a key passage from the discussion:
Results of other studies17–19found remarkable increases in reported submaximal tests, namely constant-load time-to-exhaustion tests, of 22–70%. These trials used short (between 3 min and 20 min) tests that, similar to the maximal exercise test, lead to exhaustion and therefore are less representative of real-life cycling. Our submaximal test was designed to closely mimic a road time trial of 45 min and in line with that was not intended to lead to exhaustion.
The goal of using rHuEPO in professional sports is to improve performance during road races, not in maximal exercise tests. Participants therefore took part in a race designed to mimic a professional road race at Mont Ventoux about 12 days after the last dose of the treatment period, which also tested the validity of our laboratory exercise tests as biomarkers of real cycling performance. The two treatment groups did not differ in race time or mean power output, thereby raising doubt about the predictive value of the increase in maximal
exercise test parameters by rHuEPO for performance in a road race. This outcome is further supported by the fact that rHuEPO treatment did not show an appreciable effect on a submaximal exercise test in the laboratory.
In claiming that their tests more closely mimic actual race conditions, the authors are assuming the riders pace themselves up climbs, aiming to ride at the maximum possible power short of exhaustion. As the authors themselves put it, the submaximal test (carried out in the laboratory prior to the climb, and used to support the latter results) was designed to mimic a time trial: “Participants were instructed to produce the highest mean power output during a 45-min period, attempting to mimic competitive cycling time trials.”
But of course climbs usually are not ridden in this manner. Often the leaders ride at a level below max until the last few km, when they go all out. Other times there will be attacks, when riders put out power that isn’t sustainable for more than a few seconds. Not to mention escapes earlier in the stage. So I don’t think that one can dismiss the effects on maximum power as irrelevant to a race. At best, one might argue—based just on this study, and not on others—that EPO should not improve time trialing.
I'm also wondering: since apparently all the subjects climbed MV at the same time, could there have been a pacing effect? Suppose the EPO group was slightly stronger, and the controls just drafted behind them? At that speed, there wouldn't be much air resistance, but one might speculate that the controls pushed themselves just slightly more to stay up with the EPO group, who didn't have anyone to pace them? Since there was apparently a considerable spread in times, this couldn't have affected every rider all the way up, but could it have had some effect?
Some other points.
1) There appears to be an error in Table 1. It lists the EPO group’s average baseline (before treatment) maximum power as 4.19 watts/kg, significantly lower than the 4.36 value for the controls. But in Table 2, the EPO value is 4.37. Other values in Table 1 are the same as the corresponding values in Table 2.
2) Table 2 lists the values of various parameters measured during the study. EPO was given weekly for eight weeks, and measurements of parameters were made at 2, 4, 6 and 8 weeks (days 11, 25, 39 and 53 in Table 2). The values at those various time periods were averaged to give a mean value at the end of the study, and this is the value used to determine if there was a significant difference between the two groups.
This is valid if the baseline values are the same, but sometimes they aren’t. E.g., the LT and LT power values are less at the baseline for the EPO group than for the controls. The difference is presumably not significant, but it means that the difference in the other direction at the end of the study is greater. Thus the LT for the EPO group increases by about 6%, compared to less than 2% for the controls. As carton says, that's not nothing. Moreover, while the EPO group starts out lower, it’s higher at every time period measured. So I would be cautious in concluding that EPO had no effect on the LT values, as the authors do (p-values in Table 2 insignificant).
3) A curious thing about Table 2 is that in virtually every case, the highest value is at day 39, not at day 53. If this were just for the EPO group, one might conclude that EPO’s maximum effects were achieved at six weeks, and after that there was actually a slight decline, perhaps due to endogenous feedback mechanisms. But the control group also exhibits peak values at day 39. So there was some systematic bias in the measurements at this time.
fmk: The authors do cite the Audran study. But based on what you posted, that just determined physiological effects, and is not inconsistent with anything reported by the current study.