New-DUPR’s Data-Driven Approach
Pickleball, especially the professional pickleball landscape, has gone through a tumultuous period of late. A continuing surge in the growth of the sport paired with the ever-present “will they/won’t they” relationship between the MLP and the PPA saw DUPR spun out from its former MLP umbrella and into a verdant green pasture of new ownership and an opportunity to fully take the reins on our own destiny.
With the changing of the guard, DUPR can refocus on what it can do best—provide universal, ageless, and genderless ratings for recreational players and professionals alike to foster competitive matches, ensure honest tournament entries, and find partners to play with or against no matter where in the world you are.
You’ve heard that story before and maybe you’ve bought in and are along for the ride or maybe you’ve been burned before and are feeling a little skeptical. This series of blog posts I’ll be writing in the coming weeks is for both of you as we build up to the release of our first algo change reintroducing the points effect since THE algo change back in June. For those of you with your DUPR-branded bucket hats already atop your head, this is a chance to get a peek behind the scenes at what a real research and sports analytics team looks like now that we can be, first and foremost, a research and sports analytics team. For those of you remaining dubious, I hope this provides insights into how we learn from the past, how we approach the problems to solve, and can demonstrate that, while we are certainly ever-imperfect (as all science is), we take this responsibility seriously and have the process in place to deliver results.
A History and What We’ve Learned
For a relatively young company, DUPR has certainly seen its fair share of iterations. I’m sure there are more nuanced stratifications to be drawn, but from an algorithmic side, we’ve basically lived 2 lives:
- Once-Per-Week Ratings Update
- Instant and Transparent Results
Each of these algorithm styles has its pros and cons that need to be balanced across all of the company’s objectives–namely, in our case, user experience and accuracy. Let’s dive in.
Once-Per-Week Ratings Update
The original algo, characterized by its update every week, moved slowly. For many of our enthusiastic players eagerly participating in matches throughout the week, the anticipation of seeing their DUPR rating update every Tuesday was a unique experience but lacked the benefits of instant gratification. Further, by combining the effects of an entire week’s performance into one single ratings move, it was hard to understand the effects and build faith in its quality. More often than not, the algo was working “as intended”, but there was absolutely no way for you to go back and verify, leading to skepticism and distrust. Clearly, the user experience needed an overhaul.
But “working as intended” for this algo was also quite raw; in places where the model couldn’t completely solve a problem, heuristics were used to hold the ship together. This is not altogether uncommon. In many algorithms across industries, heuristics are used to optimize a secondary constraint: labor-hours. If you can put something out into the world that moves the needle in a positive direction, the next biggest improvement may be had by going out and finding a second or third needle to positively move rather than spending far more time on needle number one. As long as you’re not looking for these needles in a quantitative haystack, you’re probably better off going out and finding one. When DUPR started, the problem of accurate pickleball ratings was less of a quantitative haystack and more of a quantitative needle-stack.
One of the needles we uncovered ended up being crucially important: connectivity.
How do you accurately compare Player A to Player B if they’ve never met, and may not even be in the same state or even country?
What happens as those connections grow or change? The full deep-dive into this thought process is worthy of its own blog post (and it will absolutely be one down the line when we also talk about ratings reliability!), but the important visualization here is that pickleball is a web of relationships, and information flows through these relationships in a definable and studyable way. Unlike most other sports with highly connected infrastructures that ship their players around the country to play each other in person (football, basketball, hockey, soccer, etc.,) or those with online presences that can generate artificial proximity (online chess and other forms of E-Gaming), pickleball is still relatively disjointed and communal.
A graphical representation of every doubles pickleball match on DUPR. Each node is a player and each line is a match, clustered by strength of connection.
The initial connectivity solution provided a significant boost in the accuracy of the system, but without the tools in place at the time to explain it, it contributed to effectively all of the confusion among the users of that system.
We understood this confusion was a significant issue in the original approach and demanded a change, but the complexities of this transition, both mathematically and technically, were immense.
Instant and Transparent Results
Faced with tight deadlines and a focus on immediate solutions, the team worked diligently to address emerging issues. This rapid pace naturally placed significant demands on the tech stack, particularly when managing the data infrastructure requirements of large-scale, instantaneous updates for the entire population. The initially simplistic effects of an instantaneous algo and the prerequisites around things like unrated player initialization had to be solved in real time after going live. Eventually, after having whacked all of the moles, we got ourselves back to a steady state and were able to take a breath and try to take a pulse on the ecosystem at-large.
One thing was incredibly clear.
Despite the bugs, engagement was up with DUPR. Like, UP up. Getting immediate and transparent results was so gratifying to the user base as compared to the previous version that we saw the fastest growth in match entry per user than we had ever seen and new users were flooding the platform at the highest rate in company history.
Being able to know that “winning meant going up” gave players a concrete DUPR goal that aligned with their pickleball goal for a given match. Whereas the population of users (mostly) understands and believes that accounting for every single point would be more accurate, there was no denying that having no idea the amount of points you’d need to score in a given match to go up could lead to frustrations, especially if it was never explained why a rating had changed after the fact!
Immediacy was fun and transparency was key.
But the transition was bumpier than we had hoped for and the algorithm needed a few rounds of iteration until we got back to the plant-wide accuracy we expect from DUPR.
Going Forward
We knew we couldn’t move away from immediacy and transparency, and obviously we needed a way to reintroduce that crucial element of connectivity. So now, with the ability to forge our own research and development path as we see fit, the question becomes: How do we generate the most accurate possible system within the constraints of that user experience? Well, for that, we first need to understand what we mean by accuracy.
How We Measure Accuracy
I ask you to predict the outcome of a coin toss and the chance that it ends up on heads. (I also ask that you bear with me for some very light math… I promise). Common sense has likely given you the answer to the question already: 50%. Imagine you are asked to evaluate two different models that are in the business of predicting coin tosses (if you have a good one, please let me know and we can make some Super Bowl Prop Bets together this year…).
- Model 1 says there is a 55% chance of landing on heads
- Model 2 says there is a 100% chance of landing on heads
When comparing these two models, both of them guess “heads” every time and get the answer right 50% of the time. Yet, clearly, both of these models are “wrong”. The right answer to guess is, of course, 50% (although a recent paper that looked at 351k real coin flips says there’s actually a ~51% chance a coin tends to land on the same side it started on!), but there are gradients to wrong-ness that clearly don’t come out by just seeing how many times our guess was right.
For instance, the moment we ever get a coin flip that lands on tails, we can definitively say that Model 2 is instantly and completely discredited. It thought tails was literally impossible and failed to account for the fundamental randomness and noise in the process of a coin flip. Model 1 more correctly aligns with that fundamental randomness, but maybe not enough.
We need to measure both how often we’re correct and whether we are well calibrated to that fundamental randomness. Being correct as often as you think you ought to be correct is just as important as getting the actual heads/tails guess right. If you think a coin has a 51% chance of landing on heads, you’d want to observe yourself being correct guessing heads about 51% of the time. If you were correct in guessing heads 100% of the time, then your model wasn’t as confident as it could’ve been, despite it guessing correctly 100% of the time.
There are numerous statistical ways to evaluate this, but the important takeaway (while staying underneath my mathy-ness budget) is that writing an algorithm that is meant to accurately predict which team wins a match of pickleball is both about constructing a guess that, on average, is more right than wrong and simultaneously is trying to think about how confident its guess is in the first place.
How We Advance from Here
Ok, so we have an algorithmic starting place and we have a barometer of success, but how do we actually take the former and achieve the latter? Over the past few weeks, the analytics team has been able to rewrite and modernize our research pipeline from scratch under the sound principles that are required of a project being built for the scale and duration we envision for DUPR.
We created a full-fledged simulation engine with an ever-increasing list of model hyperparameters that we can now strategically and confidently tune through how they improve our success measurements, and have modularized our algorithm so that each individual component can be studied and improved specifically.
The hard work has been done to make the analytics workflow easy. Now the process is relatively straightforward:
- Define a research project
- Generate a model that fits into our engine and expose its hyperparameters to the simulator
- Run the simulator over the grid of possible values
- Pick the result that improves the predictive ability the most
Rinse. Repeat. Refine.
In the coming weeks, I’ll be continuing this series diving into how this has been applied at DUPR so far as well as poking around in some other fun curiosities. I hope you stay tuned and enjoy the behind the scenes look at DUPR’s analytical process!
Written By: Scott Mendelssohn - Head of Analytics, DUPR