New study shows leg flexion less efficient than extension.

Page 13 - Get up to date with the latest news, scores & standings from the Cycling News Community.
Jun 4, 2015
785
0
3,280
Re: New study shows leg flexion less efficient than extensio

FrankDay said:
PhitBoy said:
FrankDay said:
technique can play a role in efficiency.



What technique results in the best efficiency? A serious researcher should be looking for the variables that can be changed that affect this important parameter.

A technique in which each leg uses 180 deg. of the circle, uses only the most powerful muscles and can bend tangential maximal force around a 90 deg. arc of this semi circle.
 
Mar 18, 2009
2,553
0
0
Re: New study shows leg flexion less efficient than extensio

FrankDay said:
Dr. Coggan, I did a little search to see if I could find some of these studies and I ran across your "slideshare". Perhaps you could tell me why for slide 22 you say "Evidence that increasing mechanical effectiveness does not improve cycling efficiency • Longitudinal (interventional) observations – Removing toe-clips and cleats does not reduce efficiency (Coyle et al. J Appl Physiol 1988; 64:2622-2630, Ostler et al. J Sports Sci 2008; 26:47-55) – Training using uncoupled cranks does not improve efficiency (Bohm et al. Eur J Appl Physiol 2008; 103:225-232, Williams et al. Int J Sports Physiol Perform 2009; 4:18-28) – Acutely altering pedal stroke to be “rounder” reduces efficiency (Korff et al. Med Sci Sports Exerc 2007; 39:991-995) " Is there a reason you mentioned Bohm and not Lutrell?

I simply listed the studies supporting the conclusion that is the slide's title. TBH, I don't recall whether I mentioned Lutrell et al. when giving the talk, but if I did, I'm sure I would also pointed out the study represents an undeclared conflict-of-interest on the part of the senior author.
 
Sep 23, 2010
3,596
1
0
So, here is a new study trying to replicate Lutrell. http://www.researchgate.net/publica..._using_uncoupled_cranks?citationList=outgoing
Objectives: Uncoupled cycling cranks are designed to remove the ability of one leg to
assist the other during the cycling action. It has been suggested that training with this
type of crank can increase mechanical efficiency. However, whether these improvements
can confer performance enhancement in already well-trained cyclists has not
been reported. Method: Fourteen well-trained cyclists (13 males, 1 female; 32.4 ± 8.8
y; 74.5 ± 10.3 kg; Vo2max 60.6 ± 5.5 mL·kg−1·min−1; mean ± SD) participated in this
study. Participants were randomized to training on a stationary bicycle using either an
uncoupled (n = 7) or traditional crank (n = 7) system. Training involved 1-h sessions,
3 days per week for 6 weeks, and at a heart rate equivalent to 70% of peak power
output (PPO) substituted into the training schedule in place of other training. Vo2max,
lactate threshold, gross efficiency, and cycling performance were measured before
and following the training intervention. Pre- and posttesting was conducted using
traditional cranks. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.
Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

Most of the uncoupled cranks studies that have "failed" to show a difference look about the same, there being a trend to showing a difference but not reaching the scientific standard of 0.05, or a 1 in 20 chance the result is due to chance so the author is forced by convention to say he found no difference.

As the authors of this study wrote in their discussion: "The lack of effect of training using uncoupled cranks on GE is in contrast to that of Luttrell and Potteiger. A potential reason for these disparate results may be related to the participant recruitment criteria."

So, here is the problem with interpreting these studies. Here we have a study in which the PC group shows an increased improvement in efficiency over the control group that has a 75% chance of being a real change. And, the PC group has an increase in power over the control group that has an 87% chance of being real. Yet, because the results don't reach the 95% chance of being real, that scientists have generally agreed is required to be shown before someone can claim their study to have shown a difference, Fergie and others feel free to state that PC's have been "proven" to be a failure and a fraud when, in fact, the data suggests the exact opposite. This scenario is pretty much the case with every study that has looked at the PC's, the trend is there but they don't reach the p<0.05 level so the author concludes, rightly for a scientific paper, the data shows no difference. The problem it seems is it is difficult to show this difference, especially in experienced cyclists, in only 6 weeks.

My guess is if someone were to do a meta-analysis of all the studies out there there is enough data now to demonstrate a difference in both power improvement and efficiency improvement to the scientific standard. Until then, the world will just have to accept that pretty much all of the studies have shown PC's to be effective, just not to the 95% confidence level. This seems more an issue with study design (not enough subjects, not lasting long enough, etc.) than with the concept itself. So, if one is satisfied with evidence to the 70-90% level then it is there. If one needs the 95% level, then you may have to wait a bit.
 
Sep 23, 2010
3,596
1
0
Re: New study shows leg flexion less efficient than extensio

acoggan said:
FrankDay said:
Dr. Coggan, I did a little search to see if I could find some of these studies and I ran across your "slideshare". Perhaps you could tell me why for slide 22 you say "Evidence that increasing mechanical effectiveness does not improve cycling efficiency • Longitudinal (interventional) observations – Removing toe-clips and cleats does not reduce efficiency (Coyle et al. J Appl Physiol 1988; 64:2622-2630, Ostler et al. J Sports Sci 2008; 26:47-55) – Training using uncoupled cranks does not improve efficiency (Bohm et al. Eur J Appl Physiol 2008; 103:225-232, Williams et al. Int J Sports Physiol Perform 2009; 4:18-28) – Acutely altering pedal stroke to be “rounder” reduces efficiency (Korff et al. Med Sci Sports Exerc 2007; 39:991-995) " Is there a reason you mentioned Bohm and not Lutrell?

I simply listed the studies supporting the conclusion that is the slide's title. TBH, I don't recall whether I mentioned Lutrell et al. when giving the talk, but if I did, I'm sure I would also pointed out the study represents an undeclared conflict-of-interest on the part of the senior author.
LOL. Yes, the author asked for and accepted our loaning him a pair of cranks for the purposes of his doing this study and didn't mention that. Such a conflict. LOL.
 
Jun 1, 2014
385
0
0
Re:

FrankDay said:
So, here is a new study trying to replicate Lutrell. http://www.researchgate.net/publica..._using_uncoupled_cranks?citationList=outgoing
Objectives: Uncoupled cycling cranks are designed to remove the ability of one leg to
assist the other during the cycling action. It has been suggested that training with this
type of crank can increase mechanical efficiency. However, whether these improvements
can confer performance enhancement in already well-trained cyclists has not
been reported. Method: Fourteen well-trained cyclists (13 males, 1 female; 32.4 ± 8.8
y; 74.5 ± 10.3 kg; Vo2max 60.6 ± 5.5 mL·kg−1·min−1; mean ± SD) participated in this
study. Participants were randomized to training on a stationary bicycle using either an
uncoupled (n = 7) or traditional crank (n = 7) system. Training involved 1-h sessions,
3 days per week for 6 weeks, and at a heart rate equivalent to 70% of peak power
output (PPO) substituted into the training schedule in place of other training. Vo2max,
lactate threshold, gross efficiency, and cycling performance were measured before
and following the training intervention. Pre- and posttesting was conducted using
traditional cranks. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.
Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

Most of the uncoupled cranks studies that have "failed" to show a difference look about the same, there being a trend to showing a difference but not reaching the scientific standard of 0.05, or a 1 in 20 chance the result is due to chance so the author is forced by convention to say he found no difference.

As the authors of this study wrote in their discussion: "The lack of effect of training using uncoupled cranks on GE is in contrast to that of Luttrell and Potteiger. A potential reason for these disparate results may be related to the participant recruitment criteria."

So, here is the problem with interpreting these studies. Here we have a study in which the PC group shows an increased improvement in efficiency over the control group that has a 75% chance of being a real change. And, the PC group has an increase in power over the control group that has an 87% chance of being real. Yet, because the results don't reach the 95% chance of being real, that scientists have generally agreed is required to be shown before someone can claim their study to have shown a difference, Fergie and others feel free to state that PC's have been "proven" to be a failure and a fraud when, in fact, the data suggests the exact opposite. This scenario is pretty much the case with every study that has looked at the PC's, the trend is there but they don't reach the p<0.05 level so the author concludes, rightly for a scientific paper, the data shows no difference. The problem it seems is it is difficult to show this difference, especially in experienced cyclists, in only 6 weeks.

My guess is if someone were to do a meta-analysis of all the studies out there there is enough data now to demonstrate a difference in both power improvement and efficiency improvement to the scientific standard. Until then, the world will just have to accept that pretty much all of the studies have shown PC's to be effective, just not to the 95% confidence level. This seems more an issue with study design (not enough subjects, not lasting long enough, etc.) than with the concept itself. So, if one is satisfied with evidence to the 70-90% level then it is there. If one needs the 95% level, then you may have to wait a bit.
Is a 'bit' another 10-15years or maybe never. Quite a lengthy discussion about why yet another study shows no benefit for uncoupled cranks.
 
Mar 10, 2009
2,973
5
11,485
Re:

FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

More statistical comedy.
 
Apr 21, 2009
3,095
0
13,480
Re: Re:

Alex Simmons/RST said:
FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

More statistical comedy.

I LOL'ed. Must not be any stats required to do a Med or Engineering Degree.
 
Jun 1, 2014
385
0
0
Re: Re:

FrankDay said:
sciguy said:
FrankDay said:

You consider something published in 2009 a new study? That in and of itself is pretty darn interesting.
It was new to me and no one had mentioned it here before so it is new to this discussion. I think that counts as a kind of new.
Hahahahahaha...you really are a total joke.
 
Sep 23, 2010
3,596
1
0
Re: Re:

JamesCun said:
FrankDay said:
So, here is a new study trying to replicate Lutrell. http://www.researchgate.net/publica..._using_uncoupled_cranks?citationList=outgoing
Objectives: Uncoupled cycling cranks are designed to remove the ability of one leg to
assist the other during the cycling action. It has been suggested that training with this
type of crank can increase mechanical efficiency. However, whether these improvements
can confer performance enhancement in already well-trained cyclists has not
been reported. Method: Fourteen well-trained cyclists (13 males, 1 female; 32.4 ± 8.8
y; 74.5 ± 10.3 kg; Vo2max 60.6 ± 5.5 mL·kg−1·min−1; mean ± SD) participated in this
study. Participants were randomized to training on a stationary bicycle using either an
uncoupled (n = 7) or traditional crank (n = 7) system. Training involved 1-h sessions,
3 days per week for 6 weeks, and at a heart rate equivalent to 70% of peak power
output (PPO) substituted into the training schedule in place of other training. Vo2max,
lactate threshold, gross efficiency, and cycling performance were measured before
and following the training intervention. Pre- and posttesting was conducted using
traditional cranks. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.
Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

Most of the uncoupled cranks studies that have "failed" to show a difference look about the same, there being a trend to showing a difference but not reaching the scientific standard of 0.05, or a 1 in 20 chance the result is due to chance so the author is forced by convention to say he found no difference.

As the authors of this study wrote in their discussion: "The lack of effect of training using uncoupled cranks on GE is in contrast to that of Luttrell and Potteiger. A potential reason for these disparate results may be related to the participant recruitment criteria."

So, here is the problem with interpreting these studies. Here we have a study in which the PC group shows an increased improvement in efficiency over the control group that has a 75% chance of being a real change. And, the PC group has an increase in power over the control group that has an 87% chance of being real. Yet, because the results don't reach the 95% chance of being real, that scientists have generally agreed is required to be shown before someone can claim their study to have shown a difference, Fergie and others feel free to state that PC's have been "proven" to be a failure and a fraud when, in fact, the data suggests the exact opposite. This scenario is pretty much the case with every study that has looked at the PC's, the trend is there but they don't reach the p<0.05 level so the author concludes, rightly for a scientific paper, the data shows no difference. The problem it seems is it is difficult to show this difference, especially in experienced cyclists, in only 6 weeks.

My guess is if someone were to do a meta-analysis of all the studies out there there is enough data now to demonstrate a difference in both power improvement and efficiency improvement to the scientific standard. Until then, the world will just have to accept that pretty much all of the studies have shown PC's to be effective, just not to the 95% confidence level. This seems more an issue with study design (not enough subjects, not lasting long enough, etc.) than with the concept itself. So, if one is satisfied with evidence to the 70-90% level then it is there. If one needs the 95% level, then you may have to wait a bit.
Is a 'bit' another 10-15years or maybe never. Quite a lengthy discussion about why yet another study shows no benefit for uncoupled cranks.
Well, you won't have to wait forever if you are simply looking for the occasional positive study, as they have already occurred. It does seem though that the way most researchers design their studies that 6 weeks is an inadequate amount of time to uncover a positive result especially when dealing with experienced cyclists. As shown, this (and other) studies have come close to reaching p<0.05 but close doesn't make the cut in scientific publications. But, "failure" seems a black or white issue to you (and others here) - with no color in between, including grey, allowed. I was only pointing out that uncoupled cranks only "failed" in a very specific way because the positive change that was demonstrated didn't reach a specific level even though the data suggested there is a real effect.
 
Jun 1, 2014
385
0
0
Re: Re:

FrankDay said:
Well, you won't have to wait forever if you are simply looking for the occasional positive study, as they have already occurred. It does seem though that the way most researchers design their studies that 6 weeks is an inadequate amount of time to uncover a positive result especially when dealing with experienced cyclists. As shown, this (and other) studies have come close to reaching p<0.05 but close doesn't make the cut in scientific publications. But, "failure" seems a black or white issue to you (and others here) - with no color in between, including grey, allowed. I was only pointing out that uncoupled cranks only "failed" in a very specific way because the positive change that was demonstrated didn't reach a specific level even though the data suggested there is a real effect.

Do you mean ones like Dixon and Luttrel? The ones that are suspect at best, outright bogus at worst? Spin it anyway you want, there is no data to support your product.
 
Sep 23, 2010
3,596
1
0
Re: New study shows leg flexion less efficient than extensio

PhitBoy said:
FrankDay said:
people see a result that doesn't conform with their bias and just discount it.

Oh the irony.
I am not aware of a single PC study that doesn't conform to my bias. It is my belief (bias) that in general it takes about 6 weeks (usually involving more than 3 hours per week training) for most to BEGIN seeing improvement resulting from the training. The failure of any study lasting only 6 weeks and involving part-time use to reach the p<0.05 arbitrary standard doesn't surprise me at all. In fact, I was surprised by the magnitude of the Lutrell results. I believe this is a result of his participants not being particularly good cyclists. The better the cyclist the harder it is to get improvement and the slower it will come making it very difficult to achieve that p<0.05 arbitrary standard. Dixon also surprises me a bit, not because he saw an improvement (an immersion intervention at >8 hrs per week is substantial and most - but not all - would be seeing improvement in 6 weeks with this degree of intervention) but, again, with the size of the improvement in such a short period.

Pretty much every study done so far has shown improvement. The only "problem" (which isn't a surprise to me because of these studies design) is they fail to reach the p<0.05 arbitrary standard.
 
Jul 25, 2012
12,967
1,970
25,680
Re: Re:

FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

This is absolute and total rubbish. A p-value does not in any way support probabilistic reasoning about hypotheses. A p-value is a frequentist statistic, it gives you a yes or no answer. You would need to use a Bayesian statistic to assess a range of hypotheses.

There is no such thing as a "level" of significance when assessing p-values, it is either significant or it isn't.
 
Jun 4, 2015
785
0
3,280
Re:

FrankDay said:
So, here is a new study trying to replicate Lutrell. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.

Your problem is you believe or are trying to get other people to believe that after prolonged training with PC's a masher can add additional torque at BDC, UPSTROKE AND TDC to his maximal mashing down stroke torque. During steady TT riding it does not work like that, his down stroke torque will be reduced by the action of his other leg, leading to an overall loss of power.
 
Sep 23, 2010
3,596
1
0
Re: Re:

King Boonen said:
FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

This is absolute and total rubbish. A p-value does not in any way support probabilistic reasoning about hypotheses. A p-value is a frequentist statistic, it gives you a yes or no answer. You would need to use a Bayesian statistic to assess a range of hypotheses.

There is no such thing as a "level" of significance when assessing p-values, it is either significant or it isn't.
No one is trying to look at a range of hypotheses but only at the hypothesis of each study. And, what is considered "significant" is an arbitrary choice. Scientific journals have generally considered p<0.05 the cut off to sustain a claim that the intervention had an effect but I have seen studies where 0.1 is the acceptable cut-off and others where 0.01 is the cut-off. The only thing this cut-off tell someone is how sure they are the result represents a valid finding. It is arbitrary and must be taken in context. If a study has a p of 0.05001 it does not reach the cut-off but would a small change in design have allowed it to make the cut-off (more participants, longer duration, etc.). Even if it does make the cut-off there is still a chance the results reflect random variation.
 
Sep 23, 2010
3,596
1
0
Re: Re:

backdoor said:
FrankDay said:
So, here is a new study trying to replicate Lutrell. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.

Your problem is you believe or are trying to get other people to believe that after prolonged training with PC's a masher can add additional torque at BDC, UPSTROKE AND TDC to his maximal mashing down stroke torque. During steady TT riding it does not work like that, his down stroke torque will be reduced by the action of his other leg, leading to an overall loss of power.
That is not what the studies show. Essentially everyone has shown an increase in power over control. The only issue is they haven't reached the 95% confidence level.
 
Jul 25, 2012
12,967
1,970
25,680
Re: Re:

FrankDay said:
King Boonen said:
FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

This is absolute and total rubbish. A p-value does not in any way support probabilistic reasoning about hypotheses. A p-value is a frequentist statistic, it gives you a yes or no answer. You would need to use a Bayesian statistic to assess a range of hypotheses.

There is no such thing as a "level" of significance when assessing p-values, it is either significant or it isn't.
No one is trying to look at a range of hypotheses but only at the hypothesis of each study. And, what is considered "significant" is an arbitrary choice. Scientific journals have generally considered p<0.05 the cut off to sustain a claim that the intervention had an effect but I have seen studies where 0.1 is the acceptable cut-off and others where 0.01 is the cut-off. The only thing this cut-off tell someone is how sure they are the result represents a valid finding. It is arbitrary and must be taken in context. If a study has a p of 0.05001 it does not reach the cut-off but would a small change in design have allowed it to make the cut-off (more participants, longer duration, etc.). Even if it does make the cut-off there is still a chance the results reflect random variation.

1. Actually Frank that is exactly what you have done by attempting to apply a continuous probability to a p-value and it is complete and total rubbish. A p-value is a yes/no statistic.

2. No, it does not. Again you attempting to apply a linear scale of significance to a p-value and that cannot be done. All it allows you to do is reject the null hypothesis if it is less than the chosen level of significance, it is not in any way related to "how sure" they can be of the results as it is purely a statistical calculation and may not relate to a real difference anyway. This is a version of the prosecutors fallacy.

3. Standard practice when the results do not allow you to reject the null hypothesis is to consider increasing sample size or redesign your experiment but a p-value will give you no indication of whether this will be successful as IT IS NOT A CONTINUOUS STATISTIC. Repeating the exact same experiment is extremely unlikely to give the same p-value.
 
Jun 4, 2015
785
0
3,280
Re: Re:

FrankDay said:
backdoor said:
FrankDay said:
So, here is a new study trying to replicate Lutrell. Results: No differences were observed between the groups for
changes in Vo2max, lactate threshold, gross efficiency, or average power maintained
during a 30-minute time trial. Conclusion: Our results indicate that 6 weeks (18 sessions)
of training using an uncoupled crank system does not result in changes in any
physiological or performance measures in well-trained cyclists.

Your problem is you believe or are trying to get other people to believe that after prolonged training with PC's a masher can add additional torque at BDC, UPSTROKE AND TDC to his maximal mashing down stroke torque. During steady TT riding it does not work like that, his down stroke torque will be reduced by the action of his other leg, leading to an overall loss of power.

That is not what the studies show. Essentially everyone has shown an increase in power over control. The only issue is they haven't reached the 95% confidence level.

That is because the brains of the riders reverted back to what worked best for them on their return to standard cranks. Did the PC'ers use PC's while doing the TT test ? If 6 months training gives a 40% power increase, 6 weeks should give a 10% improvement.
 
Sep 23, 2010
3,596
1
0
Re: Re:

King Boonen said:
FrankDay said:
King Boonen said:
FrankDay said:
Many of you will note that they show "no difference". Of course, there are differences but they just don't reach the P<.05 level of significance.
For instance: Gross efficiency in the PC group improved from 19.7 to 20.9 (a 6% improvement) while the control group improved from 19.8 to 20.3 (a 2.5% improvement). This difference only reached the 0.25 level of significance. So, there is a 1 in 4 chance this difference is due to chance or a 3 in 4 chance (75%) the differences are real.

Then, time-trial power. The PC group improved from 284 to 298 watts (5%) while the control group improved from 274 to 281 watts (2.5%). This difference only reached the 0.125 level of significance. So, there is a 1 in 8 chance this difference is due to chance or a 7 in 8 chance (87.5%) the differences are real.

This is absolute and total rubbish. A p-value does not in any way support probabilistic reasoning about hypotheses. A p-value is a frequentist statistic, it gives you a yes or no answer. You would need to use a Bayesian statistic to assess a range of hypotheses.

There is no such thing as a "level" of significance when assessing p-values, it is either significant or it isn't.
No one is trying to look at a range of hypotheses but only at the hypothesis of each study. And, what is considered "significant" is an arbitrary choice. Scientific journals have generally considered p<0.05 the cut off to sustain a claim that the intervention had an effect but I have seen studies where 0.1 is the acceptable cut-off and others where 0.01 is the cut-off. The only thing this cut-off tell someone is how sure they are the result represents a valid finding. It is arbitrary and must be taken in context. If a study has a p of 0.05001 it does not reach the cut-off but would a small change in design have allowed it to make the cut-off (more participants, longer duration, etc.). Even if it does make the cut-off there is still a chance the results reflect random variation.

1. Actually Frank that is exactly what you have done by attempting to apply a continuous probability to a p-value and it is complete and total rubbish. A p-value is a yes/no statistic.
Phooey! https://en.wikipedia.org/wiki/P-value
Before the test is performed, a threshold value is chosen, called the significance level of the test, traditionally 5% or 1%...An equivalent interpretation is that p-value is the probability of obtaining the observed sample results,
2. No, it does not. Again you attempting to apply a linear scale of significance to a p-value and that cannot be done. All it allows you to do is reject the null hypothesis if it is less than the chosen level of significance, it is not in any way related to "how sure" they can be of the results as it is purely a statistical calculation and may not relate to a real difference anyway. This is a version of the prosecutors fallacy.
Again, from the article.
An equivalent interpretation is that p-value is the probability of obtaining the observed sample results,
3. Standard practice when the results do not allow you to reject the null hypothesis is to consider increasing sample size or redesign your experiment but a p-value will give you no indication of whether this will be successful as IT IS NOT A CONTINUOUS STATISTIC. Repeating the exact same experiment is extremely unlikely to give the same p-value.
I agree that repeating the exact same study is unlikely to give the exact same result and that the p-value gives no indication, in and of itself, as to whether changing the study design might affect the outcome but if one can, from their education and experience and from the hypothesis they are trying to test discern a reason as to why the study didn't reach the arbitrary significance level then one might be able to redesign the study to see if they are correct or not. Failure of a study to reach the arbitrary significance level is not evidence that the hypothesis is incorrect per se, only that the study as completed did not demonstrate the difference required by the arbitrary choice of the study design. But, if a trend is seen in the data then it is reasonable to look to see if a different design (more subjects, more time, etc) might uncover the "truth". It is why when studies are published they also include the methods and the raw data so others might see errors in the design or interpretation that might lead to better follow on studies. If all we got was "I studied this and found no difference" what does that mean?

The most difficult part of doing a study is the interpretation of the data. Simply looking at whether a study reaches the arbitrary statistical significance cut-off level as the only indicator of the studies worth is the lazy way out.
 
Sep 23, 2010
3,596
1
0
Those of you who aren't up on their statistics but still want to follow this argument really need to read this wikipedia article of p-value: https://en.wikipedia.org/wiki/P-value

Relevant excerpts include:

In statistics, the p-value is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. Before the test is performed, a threshold value is chosen, called the significance level of the test, traditionally 5% or 1% [1] and denoted as α.

If the p-value is equal to or smaller than the significance level (α), it suggests that the observed data are inconsistent with the assumption that the null hypothesis is true and thus that hypothesis must be rejected (but this does not automatically mean the alternative hypothesis can be accepted as true). When the p-value is calculated correctly, such a test is guaranteed to control the Type I error rate to be no greater than α.

An equivalent interpretation is that p-value is the probability of obtaining the observed sample results, or "more extreme" results, when the null hypothesis is actually true (here, "more extreme" is dependent on the way the hypothesis is tested).[2]
The smaller the p-value, the larger the significance because it tells the investigator that the hypothesis under consideration may not adequately explain the observation. The hypothesis H is rejected if any of these probabilities is less than or equal to a small, fixed but arbitrarily pre-defined threshold value \alpha, which is referred to as the level of significance. Unlike the p-value, the \alpha level is not derived from any observational data and does not depend on the underlying hypothesis; the value of \alpha is instead determined by the consensus of the research community that the investigator is working in.

Examples

Here a few simple examples follow, each illustrating a potential pitfall.

One roll of a pair of dice

Suppose a researcher rolls a pair of dice once and assumes a null hypothesis that the dice are fair, not loaded or weighted toward any specific number/roll/result; uniform. The test statistic is "the sum of the rolled numbers" and is one-tailed. The researcher rolls the dice and observes that both dice show 6, yielding a test statistic of 12. The p-value of this outcome is 1/36 (because under the assumption of the null hypothesis, the test statistic is uniformly distributed) or about 0.028 (the highest test statistic out of 6×6 = 36 possible outcomes). If the researcher assumed a significance level of 0.05, this result would be deemed significant and the hypothesis that the dice are fair would be rejected.

In this case, a single roll provides a very weak basis (that is, insufficient data) to draw a meaningful conclusion about the dice. This illustrates the danger with blindly applying p-value without considering the experiment design.

Sample size dependence

Suppose a researcher flips a coin some arbitrary number of times (n) and assumes a null hypothesis that the coin is fair. The test statistic is the total number of heads and is two-tailed test. Suppose the researcher observes heads for each flip, yielding a test statistic of n and a p-value of 2/2n. If the coin was flipped only 5 times, the p-value would be 2/32 = 0.0625, which is not significant at the 0.05 level. But if the coin was flipped 10 times, the p-value would be 2/1024 ≈ 0.002, which is significant at the 0.05 level.

In both cases the data suggest that the null hypothesis is false (that is, the coin is not fair somehow), but changing the sample size changes the p-value and significance level. In the first case, the sample size is not large enough to allow the null hypothesis to be rejected at the 0.05 level (in fact, the p-value can never be below 0.05 for the coin example).

This demonstrates that in interpreting p-values, one must also know the sample size, which complicates the analysis.

History

While the modern use of p-values was popularized by Fisher in the 1920s, computations of p-values date back to the 1770s, where they were calculated by Pierre-Simon Laplace:[6]

In the 1770s Laplace considered the statistics of almost half a million births. The statistics showed an excess of boys compared to girls. He concluded by calculation of a p-value that the excess was a real, but unexplained, effect.

The p-value was first formally introduced by Karl Pearson, in his Pearson's chi-squared test,[7] using the chi-squared distribution and notated as capital P.[7] The p-values for the chi-squared distribution (for various values of χ2 and degrees of freedom), now notated as P, was calculated in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII). The use of the p-value in statistics was popularized by Ronald Fisher,[8] and it plays a central role in Fisher's approach to statistics.[9]

In his influential book Statistical Methods for Research Workers (1925), Fisher proposes the level p = 0.05, or a 1 in 20 chance of being exceeded by chance, as a limit for statistical significance, and applies this to a normal distribution (as a two-tailed test), thus yielding the rule of two standard deviations (on a normal distribution) for statistical significance (see 68–95–99.7 rule).[10][d][11]

He then computes a table of values, similar to Elderton but, importantly, reverses the roles of χ2 and p. That is, rather than computing p for different values of χ2 (and degrees of freedom n), he computes values of χ2 that yield specified p-values, specifically 0.99, 0.98, 0.95, 0,90, 0.80, 0.70, 0.50, 0.30, 0.20, 0.10, 0.05, 0.02, and 0.01.[12] That allowed computed values of χ2 to be compared against cutoffs and encouraged the use of p-values (especially 0.05, 0.02, and 0.01) as cutoffs, instead of computing and reporting p-values themselves. The same type of tables were then compiled in (Fisher & Yates 1938), which cemented the approach.[11]

As an illustration of the application of p-values to the design and interpretation of experiments, in his following book The Design of Experiments (1935), Fisher presented the lady tasting tea experiment,[13] which is the archetypal example of the p-value.

To evaluate a lady's claim that she (Muriel Bristol) could distinguish by taste how tea is prepared (first adding the milk to the cup, then the tea, or first tea, then milk), she was sequentially presented with 8 cups: 4 prepared one way, 4 prepared the other, and asked to determine the preparation of each cup (knowing that there were 4 of each). In that case, the null hypothesis was that she had no special ability, the test was Fisher's exact test, and the p-value was 1/\binom{8}{4} = 1/70 \approx 0.014, so Fisher was willing to reject the null hypothesis (consider the outcome highly unlikely to be due to chance) if all were classified correctly. (In the actual experiment, Bristol correctly classified all 8 cups.)

Fisher reiterated the p = 0.05 threshold and explained its rationale, stating:[14]

It is usual and convenient for experimenters to take 5 per cent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard, and, by this means, to eliminate from further discussion the greater part of the fluctuations which chance causes have introduced into their experimental results.

He also applies this threshold to the design of experiments, noting that had only 6 cups been presented (3 of each), a perfect classification would have only yielded a p-value of 1/\binom{6}{3} = 1/20 = 0.05, which would not have met this level of significance.[14] Fisher also underlined the frequentist interpretation of p, as the long-run proportion of values at least as extreme as the data, assuming the null hypothesis is true.

In later editions, Fisher explicitly contrasted the use of the p-value for statistical inference in science with the Neyman–Pearson method, which he terms "Acceptance Procedures".[15] Fisher emphasizes that while fixed levels such as 5%, 2%, and 1% are convenient, the exact p-value can be used, and the strength of evidence can and will be revised with further experimentation. In contrast, decision procedures require a clear-cut decision, yielding an irreversible action, and the procedure is based on costs of error, which, he argues, are inapplicable to scientific research.

Misunderstandings

Despite the ubiquity of p-value tests, this particular test for statistical significance has been criticized for its inherent shortcomings and the potential for misinterpretation.

The data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level (which however does not imply that the null hypothesis is true). In Fisher's formulation, there is a disjunction: a low p-value means either that the null hypothesis is true and a highly improbable event has occurred or that the null hypothesis is false.

However, people interpret the p-value in many incorrect ways and try to draw other conclusions from p-values, which do not follow.

The p-value does not in itself allow reasoning about the probabilities of hypotheses, which requires multiple hypotheses or a range of hypotheses, with a prior distribution of likelihoods between them, as in Bayesian statistics. There, one uses a likelihood function for all possible values of the prior instead of the p-value for a single null hypothesis.

The p-value refers only to a single hypothesis, called the null hypothesis and does not make reference to or allow conclusions about any other hypotheses, such as the alternative hypothesis in Neyman–Pearson statistical hypothesis testing. In that approach,one instead has a decision function between two alternatives, often based on a test statistic, and computes the rate of Type I and type II errors as α and β. However, the p-value of a test statistic cannot be directly compared to these error rates α and β. Instead, it is fed into a decision function.

There are several common misunderstandings about p-values.[16][17]

The p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false. It is not connected to either. In fact, frequentist statistics does not and cannot attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a p-value can be very close to zero and the posterior probability of the null is very close to unity (if there is no alternative hypothesis with a large enough a priori probability that would explain the results more easily), Lindley's paradox. There are also a priori probability distributions bin which the posterior probability and the p-value have similar or equal values.[18]
The p-value is not the probability that a finding is "merely a fluke." Calculating the p-value is based on the assumption that every finding is a fluke, the product of chance alone. Thus, the probability that the result is due to chance is in fact unity. The phrase "the results are due to chance" is used to mean that the null hypothesis is probably correct. However, that is merely a restatement of the inverse probability fallacy since the p-value cannot be used to figure out the probability of a hypothesis being true.
The p-value is not the probability of falsely rejecting the null hypothesis. That error is a version of the so-called prosecutor's fallacy.
The p-value is not the probability that replicating the experiment would yield the same conclusion. Quantifying the replicability of an experiment was attempted through the concept of p-rep.
The significance level, such as 0.05, is not determined by the p-value. Rather, the significance level is decided by the person conducting the experiment (with the value 0.05 widely used by the scientific community) before the data are viewed, and it is compared against the calculated p-value after the test has been performed. (However, reporting a p-value is more useful than simply saying that the results were or were not significant at a given level and allows readers to decide for themselves whether to consider the results significant.)
The p-value does not indicate the size or importance of the observed effect. The two vary together, however, and the larger the effect, the smaller the sample size that will be required to get a significant p-value (see effect size).

Criticisms

Critics of p-values point out that the criterion used to decide "statistical significance" is based on an arbitrary choice of level (often set at 0.05).[19] If significance testing is applied to hypotheses that are known to be false in advance, a non-significant result will simply reflect an insufficient sample size; a p-value depends only on the information obtained from a given experiment.

The p-value is incompatible with the likelihood principle and depends on the experiment design, the test statistic in question. That is, the definition of "more extreme" data depends on the sampling methodology adopted by the investigator;[20] for example, the situation in which the investigator flips the coin 100 times, yielding 50 heads, has a set of extreme data that is different from the situation in which the investigator continues to flip the coin until 50 heads are achieved yielding 100 flips.[21] That is to be expected, as the experiments are different experiments, and the sample spaces and the probability distributions for the outcomes are different even though the observed data (50 heads out of 100 flips) are the same for the two experiments.

Fisher proposed p as an informal measure of evidence against the null hypothesis. He called on researchers to combine p in the mind with other types of evidence for and against that hypothesis such as the a priori plausibility of the hypothesis and the relative strengths of results from previous studies.[22]

In very rare cases, the use of p-values has been banned by certain journals.[23]
 
Apr 21, 2009
3,095
0
13,480
Just out of curiosity Frank, do you even Research? Or just take what you think you see on wikipedia for granted :rolleyes:
 
Mar 10, 2009
2,973
5
11,485
Like I said before, it's just more statistical comedy.

Frank, you can't (legitimately) use p values in the manner you have. It's no more complex than that.

Doing so demonstrates that you either don't understand this, or if you do, that you are deliberately attempting to mislead others.
 
Sep 23, 2010
3,596
1
0
Re:

Alex Simmons/RST said:
Like I said before, it's just more statistical comedy.

Frank, you can't (legitimately) use p values in the manner you have. It's no more complex than that.

Doing so demonstrates that you either don't understand this, or if you do, that you are deliberately attempting to mislead others.
Well, according to the article I linked it is preferrable to tell people what the P value is and let them draw their own conclusion as to the worth of the data than to draw some arbitrary boundary that is crossed or not to define significance. You are the folks that are misusing this data trying to imply these studies mean more than they do.