Modelling tennis

Method

Basic ideas

It is important to note that Ttogger and Ttogger Pro use a purely mathematical method for modelling tennis matches. The model is not based on historical data and it does not use parameters which are fitted to any kind of reference data; all the historical information and/or the expectations about the players’ performances have to be provided implicitly by the user’s input.

The model is based on the assumption that the outcome of each point is independent and identically distributed, i.e., that the probabilities of each player to win a point do not depend on the actual score. This idea is identical to using constant point winning probabilities for each player; by using hierarchical Markov chains, these constant point winning probabilities lead to constant game winning probabilities, constant set winning probabilities and finally to a number for the match winning probability. Over the last decade or so, different authors showed the validity of this ansatz in scientific research papers.

We refine this method by introducing a frontrunner bias, which allows an automatic adjustment of probabilities after the first set. It has been shown that an adjustment of percentages after the first set leads to a better description of the match, as this adjustment can be understood as a partial inclusion of players’ forms. Since this adjustment is done automatically in our program, its effects can be included in the pre-match calculation of all probabilities and odds. For details on the frontrunner bias, click the Bias? button in one of our programs.

Technical Details

The user has to provide input values for each player’s game winning probabilities in service and return matches, as well as a value for the frontrunner bias. The program then renormalizes Player A’s service game winning probability and Player B’s return game winning probability, so that the renormalized values add up to exactly 100% (one of the players has to win Player A’s service game); obviously, the same is done for the other two game winning probabilities. Both programs ask for an input of game winning probabilities to minimise the effect of rounding errors. The tennis scoring mode is difference-enhancing at every level: Small differences in point winning probabilities lead to larger differences in game winning probabilities, even larger differences in set winning probabilities and finally to the largest differences in match winning probabilities. For this reason, the use of point winning probabilities, which are rounded to full percentages (as can be found, e.g., on the ATP website) would lead to a large uncertainty of calculated probabilities and odds due to the rounding errors of the input values. This uncertainty is strongly reduced by starting from an input of game winning probabilities.

If you want to know more…

For information on the model and performance of independent and identically distributed points, as well as on its revised variants, please search the internet – most of the information is readily available from different authors.

Leave a comment