Saturday 12:00PM can't get here fast enough, here is how I spent my week passing the time ps) it helped that I was triying to use up my API calls before the end of the month
The Game: 2025 Season Opener • Date: August 30, 2025, 12:00 PM EST
Location: Ohio Stadium, Columbus, Ohio • Forecast: 75°F, Partly Sunny
The Challenge: Build the most accurate prediction system possible without biasing to Vegas
The Discovery: Aggregate talent metrics are fundamentally flawed
It's late August 2025, and I'm getting antsy waiting for kickoff. Ohio State hosts Texas on Saturday at noon - the season opener we've all been waiting for. Instead of just doom-scrolling Twitter for the hundredth time, I figured I'd build a prediction model to see what the data actually says about this game. No peeking at Vegas lines, no bias toward my beloved Buckeyes - just pure historical data and see where it leads.
Goal: Test what a model trained purely on historical data would predict
Resources: CFBD API, 847 historical games, individual player data, current rosters, ML models
Timeline: August 2025 - weeks before season opener kickoff
Expected: The model would likely be wrong, but wanted to see how wrong and why
Context: Historical data from previous OSU-Texas meetings, including their recent CFP semifinal
What started as a way to kill time before Saturday became a deep dive into why most talent rankings are completely wrong. Here's how I went from a wildly wrong prediction to something that actually makes sense.
Started with what seemed like a solid foundation - grabbed 847 historical games from the CFBD API, pulled in all the fancy metrics everyone uses: EPA per play, S&P+ ratings, recruiting composites. Built a Random Forest model with 19 features. Nothing too crazy, just good old-fashioned data science.
The Expectation: Figured this would give me a reasonable prediction. All the fancy metrics, solid methodology, plenty of historical data. What could go wrong?
Hit "run" and waited for my reasonable prediction. Instead, the model spits out Ohio State -15.8 points. Wait, what? That's not a close game, that's a blowout. Time for some reality checking.
Reality Check: Vegas had OSU favored by 2.5 points. I was off by 13+ points! That's not just wrong, that's embarrassingly broken. Time to figure out what went sideways.
Okay, the OSU bias wasn't shocking - the model was drunk on our playoff run and recent dominance. But I needed to figure out exactly what was causing such a massive systematic error. Time to dig into the feature importance.
The diagnosis was clear: why should a punter's recruiting rating matter as much as the quarterback's? The solution was obvious - weight positions by their actual importance to winning games.
The Plan: Knew the OSU bias was coming from recent dominance. Next step: get individual player data, weight by position importance, and see the real talent picture.
Time to build this right. Researched how much each position actually impacts winning, collected individual player ratings for both teams, and built a proper weighting framework.
Ran the new position-weighted analysis on both teams. The results? Yeah, figured it would be better but didn't realize how much better.
Interactive chart showing position-specific advantages. Positive values = advantage for that team. Sayin was rated higher than Manning so the 247 edge goes to OSU for QB but this will be normalized by the game stats we have for Arch from 2024. So the Model will purely project for Julian but try to factor in the stats for Arch.
Texas actually has a slight talent advantage (+0.5 points) when you weight by positions that actually matter. That 44.7-point OSU advantage? Turns out it was pretty misleading.
Texas Advantages: RB (+2.7), CB (+2.3), OL (+1.4)
OSU Advantages: WR (+1.2), QB (+0.7), S (+0.5)
Reality: Much closer game than aggregate suggested
Position weighting was a good start, but there was still the question: how do you evaluate experience vs pure talent? Time to build a system that accounts for the reality that highly-rated recruits don't always perform.
Applied the experience weighting system to see how much it mattered. Turns out both teams have a mix of experience levels, making this a complex factor.
Solution: Created experience tiers with dynamic weighting that shifts from 247 ratings to actual performance based on games played. Accounts for the reality that recruit ratings don't always translate to college production.
Combined everything I'd learned: position weighting, experience factors, and cleaner historical analysis. Ran the final model and got something that actually seemed reasonable.
A comprehensive look at how raw data transforms into predictions through systematic processing and machine learning.
How the initial shocking result led to systematic analysis and the discovery of aggregate talent bias.
From identifying the problem to implementing position-weighted analysis that revolutionized the predictions.
Here's what made the difference: moving from naive aggregation to sophisticated position-weighted analysis.
Radar chart comparing different approaches across key dimensions (10 = best, 1 = worst)
Position | Weight | OSU Avg Rating | TEX Avg Rating | Advantage |
---|---|---|---|---|
QB | 25% | 92.3 | 89.1 | OSU +0.7 |
RB | 8% | 88.2 | 94.1 | TEX +2.7 |
WR | 12% | 91.7 | 89.3 | OSU +1.2 |
OL | 20% | 87.9 | 90.2 | TEX +1.4 |
DL | 15% | 90.1 | 89.8 | OSU +0.3 |
LB | 8% | 88.7 | 88.5 | OSU +0.1 |
CB | 12% | 86.4 | 91.2 | TEX +2.3 |
S | 8% | 89.3 | 88.6 | OSU +0.5 |
A detailed breakdown showing exactly how each methodology performed and what we learned.
Methodology | Prediction | Vegas Line | Error | Accuracy Level | Key Innovation |
---|---|---|---|---|---|
Historical Aggregate | OSU -15.8 | OSU -2.5 | 13.3 points | ❌ Poor | 4-year talent composites |
Position-Weighted | OSU -5.5 | OSU -2.5 | 3.0 points | ✅ Much better | Individual player analysis |
Experience-Weighted | OSU +4.1 | OSU -2.5 | 6.6 points | ⚠️ Acceptable | Rookie vs proven tiers |
The simplest advanced approach (Position-Weighted) performed best. Sometimes sophisticated doesn't mean over-engineered - clean implementation of good ideas beats complex systems.
Beyond just fixing my broken model, I found some pretty fundamental problems with how we usually think about team talent.
Problem: Treating all talent equally creates massive systematic errors.
Reality: A 5-star punter ≠ 5-star quarterback in game impact.
Solution: Position-specific weighting based on actual football impact.
Problem: 247 ratings don't account for college production reality.
Reality: Many 5-star recruits don't pan out; some 3-stars become stars.
Solution: Dynamic weighting that shifts from projections to performance.
Problem: Historical momentum assumes roster continuity.
Reality: College football has massive yearly turnover.
Solution: Momentum bias correction with roster continuity weighting.
Problem: Models can become disconnected from reality.
Reality: Vegas incorporates information models miss.
Solution: Use market as calibration benchmark, not blind target.
Traditional talent metrics treat all positions equally
QB (25%) and OL (20%) matter most
Professional oddsmakers understand nuance
By looking at individual player ratings and weighting them by position importance, I found that Texas actually has a slight talent advantage (+0.5 points) when you focus on what really matters in football.
After 7 days of intense development and testing, here's what an OSU fan learned about the game and sports analytics.
"The most sophisticated system isn't always the most accurate. Sometimes the cleanest implementation of a good idea beats complex over-engineering."
What the models actually learned and which factors drive predictions in each methodology.
Metric | Historical Aggregate | Position-Weighted | Experience-Weighted |
---|---|---|---|
Prediction Error | 13.3 points | 3.0 points | 6.6 points |
Cross-Validation RMSE | 18.4 | 12.1 | 14.7 |
Training Time | 23.4 min | 31.7 min | 47.2 min |
Feature Count | 19 | 34 | 67 |
Interpretability | High | High | Medium |
Robustness Score | 6.2/10 | 8.7/10 | 7.4/10 |
What started as a fun week coding project became a deep dive into systematic bias and the importance of questioning assumptions.
Look, I started this whole thing because I was bored and anxious about Saturday's game. Figured I'd build a quick model, maybe it would tell me we're going to dominate, and I'd feel better about things.
Then my model spits out OSU -15.8 and I'm thinking "this seems... aggressive." Vegas says -2.5. That's a 13-point gap, which in college football terms is the difference between a close game and a blowout.
So instead of just accepting that my model is garbage, I got curious. Why is it so wrong? Turns out the answer is pretty simple: treating a punter's recruiting stars the same as a quarterback's is really dumb.
Sometimes the best way to kill anxiety about your team is to discover they're not actually as dominant as you thought.
Now I've got a model predicting OSU -5.5, which feels way more realistic. Whether we win by 6 or lose by 1, at least I spent the week learning something instead of just doom-scrolling social media.
Our model predicts OSU -5.5, but experts see multiple ways this game could unfold. Here's what happens if the key factors they identify come into play.
Position-weighted analysis accounting for QB importance, OL protection, and skill position matchups
Model Adjustment: If Texas defense limits OSU explosive plays and wins the trenches battle, the inexperienced QB factor becomes decisive. Texas covers and potentially wins outright.
Model Adjustment: When experts see the game as truly even with minimal advantages either way, our model reduces the spread to essentially a pick 'em with slight home field edge.
Model Adjustment: If OSU's passing attack dominates and home field provides significant advantage, the talent differential becomes more apparent. OSU covers comfortably.
Our position-weighted model provides a baseline prediction, but expert analysis reveals key scenarios that could dramatically shift the outcome. The beauty of this approach is understanding not just what might happen, but why different scenarios lead to different results.
For the curious, here's what powered this week-long analytical journey.