Oh wow, anything can get published these days...
My training isn't sports science but, based on what I've read in the literature, this study seems seriously flawed.
1/ The protocol, twice weekly and 8x4min intervals @80% peak-power, is a very low work-rate. Such short intervals typically are done at 90+ % of peak power, and often 3x/week. LT intervals are performed at about 80% of peak-power, and typically done 2x20m, or maybe 3x20m, so 8x4mins would probably be poor at improving FTP too.
If these are well-trained cyclists, as the authors claim, this low work-rate would lead to undertraining? That there was any performance gains at all implies that this either: the protocol -- including the recovery rides -- wasn't the participants' only training during the four week period, they were overtrained/fatigued going into this study, or that they weren't actually well-trained cyclists? (Either way, results are then of little value.)
2/ Since HR lags power by a considerable amount (a commonly used, but approximate and hopeless figure, is 30 seconds), any attempt to do a short interval at a fixed HR, without using specific techniques to account for this lag, would be expected to result in a greater quantity of work being performed, in the first few intervals anyway, versus the PM interval protocol. This is because the "area under the curve" on the work-done graph will be greater for HR-regulated intervals, if the participants successfully achieve the desired HR for as much of the interval as possible. (This is a maths thing, if you don't believe me, go look up "integration", and then think about what the HR-vs-time and power-vs-time plots will look like. There are non-linear effects too, though significant, I'm not going to cover them here.) And if the participants only try to achieve the desired HR by the end of the interval, this could lead to a lower workload than expected. And any attempts to regulate power based on HR during the interval will be extremely difficult due to the HR latency, and will require considerable experience. So it was no surprise to see that the std. dev. on the average work-rate for the HR group was far higher than for the PM group.
This is why many coaches consider HR as either unimportant or worthless for short intervals. So pacing by HR for these intervals effectively means very little pacing at all, i.e. the participants were mostly training by PE for early part of the interval, hence were less likely to undertrain (as shown by the larger std.dev. on the average power of their intervals, so this HR group actually managed to accumulate some intensity during the first 1.5 mins of each interval, as the HR was ramping up, then they tailed off for the rest of the interval).
3/ Workload between intervals was not controlled or measured. Between a participant's measly 80% "warmup" intervals, were they bored and did they do real intervals, for example? Did they stop pedaling completely? This is poor design of the experiment, since many HIT protocols have target workloads for both "on" and "off" periods to control for this, i.e. 8x [4mins@90% + 1.5mins@50%].
4/ If a cyclist can perform two sessions of 8x4min@80% intervals in the 1st week, which would be extremely likely with such a light workload, then this fixed workload would lead to a plateau. Workload needs to increase to force the organism to continue to adapt. And because the PM users had a tool that allows them to accurately regulate their effort, allowing them to actually execute the training plan, then they are more likely to achieve the plan's goal, a plateau (or more likely undertraining if they were actually well-trained cyclists).
5/ The statistical techniques used by the authors of the paper were obsolete, at best. This is more a problem with the low-level of statistics knowledge and understanding in the social-sciences, I suspect. (My background in frequentist statistics is weak, so I don't know how to fix it, but the statistical analyses used in this paper are very laughable these days.)
My take:
All this study showed was that a PM allows the user to execute their desired training plan more effectively, and since the training plan would be expected to lead to undertraining, the outcome was still unexpected, since power slightly increased. But there seems to be so many design flaws that I'm not going to lose any sleep over this.
The HR group though, completely hopeless with pacing, and therefore executing the desired workload, demonstrated what coaches already know about HRMs, they're rubbish for short intervals.