Motocross analysis and insight

Why?

The main part of the "Why?" is about providing context so that you can compare two very different situations -- different riders, different years, different classes, different race locations.
Race %-ile: A rider finishes 4th. Sounds good - although just missed the podium. But without knowing how many riders were in the race, we can't really evaluate just how good that 4th place rank in the race is. If it's 4th out of 5, then it doesn't sound so good anymore. 4th out of 10 -- now that looks decent again. 4th out of 25, on the other hand would be a really good result. Without %-ile, the rank in the race can be misleading.

Modified Advancement Points (mAP): As with the rank in race, traditional Advancement Points may do a good job at their purpose, but they are not suited for evaluating the actual performance of the population as a whole. Since our Modified Advancement Points number gives credit for the number of riders in the race, from 2 up to as many as possible, and it factors in the quality of competition, it is a better metric for evaluating performance and comparing across different races.

Overall Finish %: Attempts to put a racer's full-year performance into context with all of the other racers (Pro & Amateur) in terms of %. One thing that enables is an Aging Curve. Our curve (which is always something of a work in progress because of (mainly) the difficulties of avoiding Survivor Bias) tracks the average yearly performance for riders aged 4 to 40+. It allows for tracking the performance of riders (if we have their age) compared to others their age level from the past and present. With this comparison, you can identify outliers who are performing much better than average, and you can estimate the improvement (or decline) year to year based on the age of the rider.

Time vs Heat: Finishing 1st is better than finishing 2nd, obviously, but what the 1st/2nd ranking doesn't tell us is HOW MUCH better 1st was than 2nd. If 1st wins by a tenth of a second, then 1st and 2nd were nearly equal; if 1st wins by several seconds, then that is a more impressive win. Comparing the racers' times to the heat/race average allows us to describe the performance in more detail than just the rank.

Z-Score: Winning a short race by a 3 seconds might be more impressive than winning a longer race by 5 seconds. Converting to Z-Score/Standard Deviation allows us to use the same terms to compare performance in one race to performance in other races, regardless of the length or number of laps.

"True" Time vs Heat: Though there are issues with this method since it pretends that DNFs don't happen, it gives us additional information about what the riders were doing before they exited the race. If you believe that DNFs are largely out of control of the rider (for instance: crashes due to other riders, equipment failure), then the "True" time corrects for this (or at least provides an alternative view than the official result).

Some of the data has issues, be it a typo/error in the original source, a miscoding by me, or a name is misspelled (or perhaps a female was accidentally not excluded). Please let me know any problems you see with the data or questions about what/who/why (or even when/where/how).   Feedback welcomed

Next: Who?