You mentioned the other day that you analyzed a correlation between peloton size and incidence of crashes ( number of crashed riders and impact of crashes)? I only briefly looked at the Tour de France to gauge an idea (Google AI search). Since I missed that if you could share or link to that it would be helpful?
Yes, I have spent more than two years collecting and analysing data on this. I have not forgotten your important questions, and I do intend to answer them properly. But because this topic matters a lot to me, I want that answer to be clear, rigorous, and comprehensive.
I cannot provide that yet, because even the most basic questions involve at least 20 relevant factors that need to be considered and adjusted for. And arguments for or against must be layed out. Cycling race data is extremely noisy and very time consuming to work through. On top of that, crashes that bring down multiple riders are actually quite rare relative to the number of kilometres ridden by any given bunch.
To give just the simplest example, when I review Grand Tour stages in replay, I manually pause the footage and record things like the number of riders in each bunch when a crash happens (I literally count up to 160), how many riders go down, how many abandon, how many never make it back to the same group, where in the bunch the crash occurs, the approximate speed, when in the race it happens, the weather conditions, the type of finale, the severity of injuries based on medical reports, and many other variables. I originally tried to do this for all WorldTour races, but quickly realised that was not realistic.
As we all know, cycling races are long, so this takes a great deal of time.
I have made it a deliberate goal to present the case in two separate ways: one report built around the arguments, and one built around the data. I think that is the cleanest approach.
What I can say for now is that the raw correlation appears fairly weak (because crashing is not that common per 20 km ridden in a bunch), but it can look much stronger depending on how the data is broken down. As I often point out, when large breakaways, even groups of 20 to 40 riders, are riding for the win, they almost never crash, and they essentially never have mass crashes involving five or more riders. If I consider those stages, relative to big bunch stages, the effect is very significant. To me, the natural experiments created by different stage types are a more informative indicator. But that, too, can certainly be debated.
Lst word on this for now: Some of you mainly seem to be talking about crashes in sprints. These few kms depending on how you break it down are BY FAR the most dangerous in terms of frequency and to some degree also severity. Bunch sprinting in general need to be looked at with a different set of variables in mind, in addition to peloton size. I completely agree with that.