Willy_Voet said:
Your points are bolded, I have moved them around to combine my answers.
1. I am not looking at 1 data point, I am looking at the overall trend, from start of dataset (offseason = lowest Hgb) to end of dataset (mid-GT = highest Hgb).
DM's data goes up and down. The last 2 data points go up. That is not a trend. You have no trend line.
Yes, trend lines. And correlation coefficients. I did stats at 3rd year uni, I know what you are talking about. R^2, nice and safe.
Notice I did not say trend line, I said trend.
Willy_Voet said:
2. If you are a statistician or similar, why use such a craptastic graph type, clumping 3 rider's values one on top of the other, seriously? Are you deliberately trying to obfuscate?
It's called an "overlay" and I didn't label the axes. That was the only crappy part. If you have never seen two (or more) sets of data overlaid on a single graph (they were color coded, one rider per color), then I'm not sure what to say. Actually, you posted one yourself, the CV and DM data.
But my data was 2 dimensional - not 3 dimensional compressed into 2 like yours. And timing of data is critical in this analysis. Maybe you can tell me why? Your tone of superiority at graph experience is noted, although you're tending towards the Krebs Cycle, "I am a PhD how dare you question me" line of reasoning. I'll acknowledge I used "craptastic" but I believe it fits, as I will reiterate - timing of the values is critical. You graph pruned that info from the dataset entirely.
Willy_Voet said:
However, there is a difference between connecting data points and drawing conclusions about a trend.
Now we're back at trend. What happened to trend line? I think you mean line of best fit, and it requires you to "explain", mathematically, the pattern of values seen. The higher the correlation coefficient, the more likely you are describing the data.
I am talking about a trend - namely, a general direction in which something is developing or changing.
Willy_Voet said:
Let's try this:
(time is in seconds, I forgot to include units)
There are 2 trends here:
I) Mary is getting slower.
II) Mark was improving, but plateaued.
I cannot say anything about John.
You cannot say anything mathematically about John, with any sort of confidence, where as I can. He's getting slower. The trend line or line of best fit does not correlate well, no, but if I was coaching him, I'd be looking at his training and trying to work out why he is getting slower. This is the difference between someone in the lab all day and someone out in the real world, and I acknowledge this is entirely insulting, dismissive and maybe even arrogant.
ETA: one of us is reading the graph wrong.
Month 1, Mary does 100m in 15 seconds
Month 4, Mary does 100m in 10 seconds.
Mary is definitely getting quicker...
Change the vertical axis to weight (kg) above lean race weight, and tell me he's not getting fatter, just because the coefficient is only 0.38.
My main concern with your example, and I am happy to discuss another one should you wish to fine-tune the analogy and can be bothered continuing this is this:
John is entirely in control of how fast he runs at each moment. His speed is dependent on a lot of variables but you are measuring John conciously controlling his body.
Hgb, on the other hand, is incredibly difficult to modify for a human being
naturally, given the WADA protocols in place for its collection (at least 2 hours post-exericse, seated for 10 minutes, training and altitude load and sickness taken into consideration).
There are also many studies where Hgb variation over time for elite athletes has been studied and published, and they set up expected trends and variations, etc.
Willy_Voet said:
That is the same case for DM's data. It is scattered about and the last 2 data points just happen to go up. No trend.
And I will repeat. You could draw a line of best fit and not see anything. But it's a physiological process, that is known to follow a specific pattern, and in this instance, it is not. Did you look up plasma expansion yet?
Willy_Voet said:
5. You confessed above not being a hematologist - so why are you getting so uppity about a very obvious analysis?
An "obvious analysis" is your opinion. You have not analyzed the data by any statistical method. Plotting a "connect the dots" graph and claiming a "trend" is not an "analysis".
That is correct. Given there are 8 data points, it would seem remiss to do so - degrees of freedom is way down.
You see a limited data set and say this is inconclusive.
I see a limited data set and add one more data point you can't graph on this chart: David Millar is known as the most vocal anti-doping advocate in the professional cycling peloton, is part owner of "Team Clean", Garmin, and won a stage in this year's Tour. The Tour he claims was won by "Mr Clean" Bradley Wiggins.
So why doesn't Millar release all his blood values, and remove the chance for hacks like me to poke and prod them without any statistical certainty?
Willy_Voet said:
I reject crappy graphs and analyses where authors try and tweak the presentation of the data.
I do the same - which is why I rejected your graph previously. Timing is more important than Hgb and Retic correlations, given you can deliberately mess with both.
Willy_Voet said:
Have you looked at the paper on the BioPassport? You can generate a scatter plot of HgB versus %retics. The "clean" rider clusters nicely, the "doped" rider has one value out of whack. The "false postive" rider has 2 data points slightly out of his cluster, but he was presented as an example of how a threshold must be carefully calibrated.
Now, the formula they use for Off score {Hb x 10 - (60 x sqrt(%ret))}, must weight Hb and %ret appropriately to set the threshold.
Yes I have - and I have a post to discuss how lax the ABP is. But I digress. Your cluster graph is yet again missing the most vital pieces of information - timing.
This says to me off-score is useless: