Just a quick reply, the appendix is available for download here. In most journals, the appendix and other supplemental materials are distributed separately from the paper. (If the link doesn't work, go to the paper's landing page and click "Supplemental Materials".) The appendix doesn't list the full data set.Merckx index said:They may be included in an Appendix referred to by the authors—I’m sure they have the data somewhere--but unfortunately, the Appendix is not included in the free access to the article following registration.
One thing to note is that training efforts were measured and no significant results between the groups were found for training hours, training power, and training distance. While training may indeed be a bias, the random design of the study makes it unlikely the bias is systematic towards one group, given the checks they did. So, the averages should not be systematically biased towards one group.
However, while the between-group differences may not be biased by differences in training, it may increase the individual differences or variance within the groups. As far as I can tell, training effort is not included in any of the statistical models, increasing the error/unexplained variance of the model and thereby decreasing the power of the statistical test. This means that it is less likely to reject the hypothesis that EPO does nothing (the null hypothesis) even if in reality EPO is beneficial. However, including training variables in the model would mean that more effects (parameters) should be estimated, also decreasing the power of the procedures.
Still, I'm not convinced that this truly was a factor in the null finding, although it couldn't hurt to perform a study in which training was more closely monitored and/or controlled.
On the 45-minute test
One thing thing that I do question is the 45-minute test which serves as the foundation for the conclusion that EPO is not beneficial in cycling. (The maximum power tests do show a beneficial effect of EPO). While I'm not an expert in cycling performance tests, almost every text I've read on cycling training and time trialing mentions that pacing a 45-60 minute effort is really difficult. (It's usually mentioned in relation to FTP-tests.) I know it certainly took me a while to get it right when I started racing and even then I sometimes got it wrong. Given that most of the participants are not competing or competed at club level, I find it unlikely that most would have the skills to perfectly pace such an effort.
So, what are we actually measuring? True maximum submaximal performance? Their ability to pace 45-minute efforts? The effects of experiencing it at time point 1, learning from it, and then doing better at day 46 (i.e., a testing effect)? Due to the difficulty of pacing and the relative inexperience of the participants, both the validity (i.e., does it truly measure submaximal performance) and the reliability (i.e., does the test provide consistent results/would the same participant get similar results if repeated under similar conditions) of the measure can be questioned. As far as I can tell they did not analyse the efforts to check for pacing problems. (This is also evident by the fact they don't mention pacing problems; I find it highly unlikely that all participants paced both their tests appropriately.)
This means that the results may not only be influenced by the true physical abilities, but also by their ability to pace the effort. This increases individual differences and thereby the error/unexplained part of the statistical model. As with the training effects mentioned above, this decreases the power of the statistical tests. As the probability values do approach significance (p = 0.086, 7.66 watts average increase for placebo, 13.55 watts average increase for EPO), problems with this test may have had a severe influence on the results of the analysis.