Second, because the abnormality in Armstrong's profile was a series of flat results, not spikes in blood values, the software was not engineered to detect that abnormality. In other words, the abnormality does not show up in the software and would not have been flagged, but it does become evident to an expert when reviewing the raw blood results.
Yes, as I said before, and the question is why not? Almost certainly software can be designed for this purpose, but if not, or even if so, all of the data may have to be reviewed by experts. But Gore’s conclusion suggests that some abnormalities that are not picked up by the current software may nevertheless be detected rather easily.
Gore said there was a one in a million chance that all seven of LA’s 2009 and 2010 reticulocyte values could be as low as they were. I’m not sure, but he may have reached this conclusion by determining the probability of any one value being that low, and then raising it to the seventh power. If we assume all of these values fell in a fairly narrow range, the probability of any one of those values being in that range would be one/(seventh root of one million), or about seven (7.2 to be more exact). That is, if the probability of one low value was about 1/7, the probability of the seven TDF values all being that low, assuming they were independent of each other, would be (1/7) to the seventh power, or about one in a million.
There are two simple ways Gore might have come up with this probability of about one-seventh:
1) about 1/7 of all of LA’s reticulocyte values were in this range. For example, it may be that these seven values were the only ones in this range, out of a total of fifty (7/50 is close to 1/7.2). So the probability of those seven all being that low would be (7/50) to the seventh power, or about one in a million.
2) A more accurate way to estimate probability would be to add up all the reticulocyte values, and determine a mean and SD. In any normal distribution, about one-sixth of the values will fall one SD or more below the mean. One-seventh of the values would fall below a SD of slightly more than one. So if all seven of these values fell below this criterion, again, the probability would be (1/7) to the seventh power, or about one in a million.
This is a quick and dirty approach,and could be criticized on several grounds, e.g., that values taken within a three week period are not completely independent of each other, but may be somewhat correlated. There are ways one might correct for this. In any case, whether Gore used this approach or not, something like it could be used as a quick screen. In fact, the passport might at this stage be like the gel EPO test--more a matter of an expert judging a pattern than fitting some software. As shown in that video, it doesn't take a lot of time or any formulas to see that all the TDF HT values are relatively high, and most important, close together throughout the three weeks, and that conversely, all the retic values are low and together for the same period.
That the software is apparently blind to this is a very serious weakness, and one wonders why it was approved for use when it is so well known that HT values fall during a three week Tour. A GT is of course one of the most likely times a rider will blood dope, so at a minimum, any values obtained during a GT should not be analyzed by, or only by, software, but by an expert aware of the usual drop. Once a suspicious pattern, that is, one in which the values stay together and don't drop, is noted, there be may other formulas that could be applied to it, e.g., cluster type approaches that evaluate how close a group of values are to one another. These could be evaluated relative to their own baseline, that is, several samples taken during a three week period during the offseason or other official baseline period, so that one could control for the fact that there might be some correlation simply because of proximity in time.