How TrapStats Predicts Greyhound Winners

Most "AI tipster" sites are black boxes that promise winners and quietly tally only their wins. TrapStats works differently: every model output, every calibration step, every honest limitation is documented so you can decide for yourself whether to act on it. This article walks through what we actually do.

All performance figures here are simulated/paper results; past performance does not guarantee future results. This is research and education, not betting advice.

The model

The core predictor is a LightGBM gradient-boosting classifier trained on roughly 75,000 historical UK and Irish race entries spanning 2018 to today. The model produces a calibrated win probability for each dog in each race. We retrain the model nightly so the latest form, track conditions and trainer trends feed into tomorrow's predictions.

The most useful features fall into four families:

Recent form: average finish position over the last 3 and 5 runs, win rate, weighted form, days since last race and last win.
Pace and bend position: average bend-1 position from the dog's history, finishing-bend split, and bend1 − finish deltas that catch closers vs front-runners.
Track × trap bias: rolling 180-day win rate of each trap at each track, with sample size as a confidence signal.
Market signal: starting price (when available), market rank within the race, and the gap between the dog's form-implied probability and the market-implied one.

Plus race context: distance, field size, going, grade and trainer 90-day strike rates at both the overall level and the specific track.

Honest calibration, not magic

After training, the raw model outputs are passed through a Beta calibrator (a parametric, smooth, monotonic transformation) that aligns the predicted probabilities with the empirical win rates on a held-out slice. We deliberately moved away from isotonic regression after observing that on narrow input ranges (Denis ML's selection arm sits in 30–50% prob) isotonic collapses to a near-constant and produces phantom EV. Beta calibration handles that input range cleanly.

The output is what we display: a calibrated prob_win for each entry that, on the validation set, lines up close to the empirical win rate at every confidence bucket.

The hard truth: live performance differs

This is the part most prediction sites hide. On the live selection arm — picks the system actually records into TrapStats' Denis tracker — the empirical win rate has been roughly flat at ~28% across model confidence buckets from 0.30 to 0.65. In other words, a pick the model rates 50% wins at almost the same rate as one it rates 30% on live races. Calibration fixes the level of the probability, but it cannot create discrimination that isn't there in the live selection slice.

This is a known issue with selection-arm bias: the model is most confident on dogs that look like prototypical winners, and the market also prices those dogs the shortest. The price discrepancy we measure is concentrated in forecast price > 3.5 picks — at flat 28% win rate, only the longer-priced picks clear breakeven.

We publish this finding because hiding it would be dishonest, and because it informs how TrapStats operates today: the denis_safety_min_price=3.5 floor exists precisely because the data showed price < 3.5 lost about 94 simulated units across 4 weeks while price >= 3.5 won 52.

Layered safety

The selection pipeline runs every candidate through five layers of guards before recording a selection:

L0 — Test suite: 120+ pytest cases lock in the integrity contract.
L1 — DB CHECK constraints: invalid status transitions and impossible field values are rejected at the database.
L2 — Runtime invariants: the selection code refuses to write a selection when temporal, composition or form-data invariants don't hold.
L3 — Contract tests: end-to-end paths verify the price floor, EV cap, SP cap and confidence-gap guards.
L4 — Reconcilers: nightly cross-checks that catch any settled selection whose result-pipeline output disagrees with the stored outcome.
L5 — Observability: a structured-log skip counter (denis_safety_skipped) breaks down every rejection so operators see why candidates aren't being selected.

Above that, an observe-only shadow recalibrator continuously refits a calibrator on live outcomes and logs prob_win_shadow/ev_shadow next to the production numbers, so any future cutover is grounded in real validation rather than backtest hope.

What this means in practice

If you use TrapStats predictions, three things to keep in mind:

The displayed prob_win is a calibrated training-set probability, not a guarantee. Empirical live win rate is closer to 28% across the selection zone.
The signal lives in price: longer forecast prices (≥3.5) historically pay off in simulation; shorter favourites do not.
Read the "Why this pick" panel on the /denis page. It shows the decision chain, top SHAP factors, field comparison, and shadow estimate side-by-side. That's the honest reasoning.

Predictions are tools, not promises. We document the tools fully so you can use them with eyes open.