Motocross analysis and insight

What?

MotoXGraphs is an attempt to better visualize and quantify Motocross racing. From amateurs to pros, we focus on data to challenge conventional wisdom about high performance in motocross. Data WON'T always have all the answers -- it's just a tool, and the more tools you have, the better -- but we will strive to make motocross followers better equipped to find the truth in the race results.

1. What You See Is All You Know -- data is only part of the motocross story. The numbers can't see what the track conditions were like, can't see if someone was riding injured, can't see if the rider didn't finish because of his own mistake or because of equipment failure. But at the same time, data can see things that our naked eye can't -- and that's the purpose of MotoXGraphs, to help understand what it is we can't easily see and make it clear.

2. What's in the data? Thanks to AMA, eScore, RacerX Vault, and other sources, much of the underlying data is available for anyone to see online. Pro results go back into the '70s, as do the Loretta Lynn's Finals results; Amateur race results only go back to 2006, so we are dealing with a limited number of years there; the lap times data from eScore starts around 2008 - it does not exist for every race and is less comprehensive the farther back you go. Right now, data is limited to Men/Boys and Motocross (women/girls results and Supercross analysis coming soon). It is also limited to A, B, and Open classes (more in "Who?").

3. What's behind the data categories you may not have heard of before?
Race %-ile: The rider's race result in terms of percentile. Lower is better. For instance, finishing 4th out of 10 would be 40th %-ile while 4th out of 25 would be 16th %-ile.
Modified Advancement Points (mAP): Describes how well the racer fared compared to the average rider in the race. Similar to the conventional Advancement Points, finishing at the top of the race earns more points, as does competing in a race with more riders. mAP, though, considers 0 to be the average result, while above-average results are positive and below-average results are negative. mAP also adjusts for the quality of competition in the race - for instance finishing 4th out of 25 in a normal race is good, but doing that in a Regional Qualifier for Loretta Lynn's is better, and at the Loretta Lynn's final is even better.
Overall Finish %: Attempts to put a racer's full-year performance into context with all of the other racers (Pro & Amateur) in terms of %. Starting with the Pro "Upper" class (i.e. 450cc), the average racer in this class is at an "Overall Finish %" of 50% for the year. If a rider in the Amateur 125A class finishes near the top of the class for the year, he might have an Overall Finish % of 90%, which would mean (very hypothetically) that he would have finished in the 90th percentile had he ridden in the Pro Upper class for the year. (Referring to these as "%" breaks down a bit mathematically, but please ignore that (or don't -- we're happy to revise our methods if there's something better) and focus on the idea of putting the yearly results of riders from all classes into a similar context so they can be compared.)
Time vs Heat: The number of seconds it took the rider to finish the race compared to the average time in the heat. 5 seconds faster than average is -5, while 5 seconds slower than average is +5.
Z-Score: Sounds fancy but is a simple way to compare the "Time vs Heat" to the average across different races with varying lengths. Positive is better, and a Z-Score of 1 would mean the rider finished 1 standard deviation faster than average in the race/heat, while a Z-Score of -1 would mean 1 standard deviation slower.
"True" Time vs Heat: Looks at only the laps finished by each rider. For instance, if a rider was in 3rd after several laps but then crashes, rather than a DNF, this stat evaluates the performance as if the rider had finished the race at the pace he had ridden so far. The "True" time reports that the rider finished 3rd rather than last (i.e. DNF).

Some of the data has issues, be it a typo/error in the original source, a miscoding by me, or a name is misspelled (or perhaps a female was accidentally not excluded). Please let me know any problems you see with the data or questions about what/who/why (or even when/where/how).   Feedback welcomed

Next: Why?