Every time a cyclist trains with a power meter, their effort is recorded — every pedal stroke, every interval, every hour in the saddle. Most of that data goes nowhere useful. Not because it lacks information, but because the tools we use to analyse training weren’t built to see it.
This is the starting problem for LightBox.
**A dataset that hasn’t been looked at properly**
The GoldenCheetah Open Data corpus contains more than 4,500 athlete-years of complete cycling power files, donated to open science by athletes and coaches who wanted their data to matter. It is one of the largest open datasets in sports science. The rides are complete — not summarised, not aggregated — raw power at every second.
The standard approach to a dataset like this is to compute training load metrics: a formula that collapses each ride to a single number representing how hard the athlete worked. Or to extract maximal power profiles — the highest power a rider sustained for five seconds, for a minute, for twenty minutes. These are useful. They’re also a narrow window. A training load score tells you almost nothing about how the effort was structured. A maximal power profile tells you what a rider’s ceiling is, not how they got there.
**The gap this program occupies**
Cycling science has approached performance from two directions that don’t quite meet. Traditional sports physiology produces interpretable results — but it works on small, often elite samples and was built around the laboratory, not the power file. The data-driven turn in sports science produces powerful pattern recognition — but the outputs are often physiologically opaque: a prediction without a mechanism, a cluster without a name.
LightBox sits in the unoccupied space between them. Every tool in the program produces outputs in physiological units — the kind a coach can act on, the kind an athlete can understand. The commitment to interpretability is not aesthetic preference. It’s a research constraint: a result that can’t be explained can’t be applied, and if it can’t be applied, it’s hard to know whether it’s right.
**A cascade, not a collection of studies**
The program is structured as a research cascade. Each study introduces one analytical tool, validates it on the corpus, and hands it forward as infrastructure for the next step.
The first study asks how maximal efforts are paced across the power-duration domain — not for one athlete, but at population scale. The second introduces a model that segments the full structure of a ride into physiologically labelled phases, and derives a fitness tracker from how that structure drifts over time. The third addresses durability — how performance degrades under accumulated fatigue — stratified by athlete type across the full corpus.
Each step builds on the last. The cascade converges on the Puchowicz Model of Exercise Segment Analysis: a unified framework for characterising effort at the ride level and across a season, in units that mean something.
**What we’re building toward**
This is a program introduction, not a findings report. The papers are in progress; the tools are being built and validated.
What’s already clear is the scope of what becomes possible when the full power file is treated as signal rather than noise — when the question isn’t just “how hard did this rider work?” but “how did they work, and what does that tell us about how they perform?”
That’s the question LightBox is built to answer.
Leave a comment