How lucky were India on their last tour of Australia? A statistical model tells us

Kartikeya Date

Oct 24, 2024, 02:34 AM

Rishabh Pant made 146 in 111 balls against England in India's first innings of the Test at Edgbaston in 2022. It remains the second-quickest three-figure innings by an Indian batter in Test cricket since records for deliveries faced became available. The quickest was Mohammed Azharuddin's 109 in 77 balls against South Africa in November 1996.

How good was Pant's innings? This is a difficult question to answer well, mostly because it is under-determined. For instance, it is not difficult to imagine that if Cheteshwar Pujara played the same 111 balls that Pant faced, he would make different choices and score a different (and almost certainly smaller) number of runs. Would Pujara be more or less likely than Pant to survive for those 111 deliveries? More generally, if the average batter made the choices that Pant made on those 111 deliveries, how many runs would result? And how many times would the average batter be dismissed, having made those choices? What are the expected runs and expected dismissals for the choices Pant made?

The answer, according to the model that is described below, can be as follows.

Of the 111 balls Pant faced, 73 were from right-arm quick bowlers and 38 from finger spinners. A summary of the shots Pant attempted is in the table below. xR is the expected runs for the average left-hand batter against the given bowling style when playing a specific shot.

For example, Pant attempted to pull the right-arm fast-medium bowler nine times (eight of those deliveries were legal) and scored 16 runs. In the record, a left-hand batter has been dismissed 54 times in 485 false shots on the pull against the right-arm pacer. A false shot on the pull for this match-up produces 358 runs off 473 deliveries, while a successful pull shot produces 2464 (off 996). Pant played two false shots in those nine attempts. The expected wickets (xW) for those nine attempts is 0.23 (2*54/485). The expected runs (xR) figure is 18.2.

Summing this up for all the shots that Pant attempted against each bowling style, and applying those to the average left-hand batter, it turns out that the average left-hand batter would have scored 97.5 runs off 111 balls and been dismissed 2.4 times, against bowlers of the type Pant faced in that innings.

The model used in this article relies on the ball-by-ball record collected by ESPNcricinfo, which lists what shot was attempted off which delivery and whether or not the batter was in control. The model also considers what style the bowler was bowling, and whether the batter is a right-hander or a left-hander. It is illustrated using examples in the table below.

When the left-hand batter sweeps the offspinner and is in control, 2.09 runs are scored per shot. When the left-hand batter successfully sweeps the slow left-arm orthodox bowler, 1.81 runs result per shot. When the left-hand batter is not in control of the sweep shot against the offbreak bowler, 0.118 dismissals occur; the corresponding figure against the slow left-arm orthodox bowler is 0.097 dismissals.

To round out the information in the table, the average left-hand batter fails to control the sweep against the offspin bowler 33.5% of the time (1686 out of 5025 attempts fail), while 30.8% of sweeps against the slow left-arm orthodox bowler (1244 out of 4039) fail. For comparison, when the left-hand batter attempts to drive the offbreak bowler, the expected-runs figure is 0.89, the expected dismissals 0.141, and 9% (2354 out of 26109) attempts fail. The sweep involves greater risk, greater reward, and is more difficult to pull off than the drive. This is also why, typically, the field is set to defend the drive more often than it is to defend the sweep.

Readers will note that when the ball is turning more, the drive and the sweep both carry greater risk than usual. It would be reasonable to think that the expected-dismissals figure for the drive or the sweep on a turning pitch should be higher than it would be on a flat pitch.

The way the model used in this article accounts for the conditions is through the false shots record. On a turning pitch, the batter is likely to play false shots more often. For example, suppose that a left-hand batter attempts the sweep ten times against an offspin bowler on a flat pitch, and plays two false shots, instead of the expected three or four. The expected runs for these ten attempts would be 17.7. The eight successful attempts would generate 16.72, and the two failed attempts 0.94. The expected dismissals would be 0.24.

On a turning pitch, the batter is likely to miss more sweeps. Let's say the batter misses five sweeps. In this case, xR would be 12.8 runs, and xW would be 0.59. In this way, the xR and xW for every ball, and therefore for every batter and every bowler in every innings, can be estimated. The essential intuition here is that it is the false shot that makes a dismissal possible. When false shots from a particular shot type are more frequent, dismissal from that shot category is more likely too. The conditions only matter to the extent that they modify the likelihood of the occurrence of the false shot. In other words, conditions are easy or difficult depending on how often false shots occur in them.

The same can also be said for bowlers. Facing James Anderson (right-arm fast-medium, under ESPNcricinfo's classification) is a more daunting proposition than facing the average right-arm fast-medium bowler in a Test match. Anderson is more daunting because he challenges the middle of the bat more often than the average right-arm fast-medium bowler does. By evaluating expected dismissals based on the occurrence of false shots, the model accounts for this distinction. For instance, in England, Anderson induces a false shot every 4.9 balls, while the average right-arm fast-medium bowler does so every 5.1 balls. In Australia, Anderson induces a false shot every 6.4 balls, while the average right-arm fast-medium bowler does so every 6.2 balls. The model will return a higher expected-wickets figure than average against Anderson in England, and a lower xW than average against Anderson in Australia.

R Ashwin induces a false shot every 5.4 balls in India, while the average offspinner does so every 6.2 balls. Outside India the gap is narrower (6.9 balls per false shot against Ashwin, 7.2 balls per false shot against the average offspinner). The model is able to accommodate these distinctions.

The table below lists the 15 Test innings since 2014 with the highest xW. These could be considered the 15 most unlikely Test innings, in terms of their size and length, in the last ten years.

The table below lists the 15 unluckiest match bowling efforts in Test cricket since 2014. Mohammed Shami collected 182 for 2 at The Oval in 2018. He induced 107 false shots in the match. Over the 10,770 deliveries Shami has bowled in his Test career since the start of 2014, his xW/xR is 228.5/5775.0. He actual figures are 212/5896. His expected career bowling average since the start of 2014 is 25.7; his actual bowling average since then is 27.8.

Jasprit Bumrah's 0 for 92 in the 2021 World Test Championship final also features in the list below. He induced 55 false shots in that match without getting a wicket. This was one of only four instances of a bowler going wicketless in a Test since 2014 while producing an expected wickets total in excess of five wickets. Of the 339 instances since 2014 when a bowler has bowled at least 15 overs in a match and gone wicketless, the average expected wickets for such a bowler have been 1.77. Over his 37-Test career so far, Bumrah's xW/xR is 168.3/3498.0. His actual career haul is 164/3365. His expected career average (20.8) closely matches his actual career average (20.51).

Only 1.6% of individual Test innings involve an xW of 3.5 or more. About 5% of Test innings involve 2.5 xW or more (see the graph below). The average individual three-figure score in a Test match involves 2.72 xW. The average innings where the xW is 1.0 (that is, between 0.50 and 1.49) produces 31.2 runs. The distribution of all innings and centuries in the graph below shows how much luckier a batter has to be than average to reach a century.

Of the 792 Test hundreds scored since the start of 2014, only 41 have come in innings where the expected average (xR divided by xW) of the rest of the batters in the innings is less than 20 runs per wicket. Only eight have come in innings where xAve for the rest of the batters is less than 15 runs per wicket. These are:

1. Aiden Markram's 106 (103 balls) against India in Cape Town, 2024
2. KL Rahul's 101 (137) against South Africa in Centurion, 2023
3. Steven Smith's 109 (202) against India in Pune, 2017
4. Dimuth Karunaratne's 107(174) against India in Bangalore, 2022
5. Ajinkya Rahane's 103 (154) against England at Lord's, 2014
6. Dimuth Karunaratne's 158 not out (222) against South Africa in Galle, 2018
7. Dean Elgar's 136 (228) against England at The Oval, 2017
8. Dinesh Chandimal's 119 (186) against West Indies in St Lucia, 2018

The xR and xW models extend the intuition underlying the control measurement to specify risks. For instance, India's infamous 36 all out innings had an expected wickets/runs of 3.2/47.1 from those 128 balls. India's fourth innings in Sydney on that tour lasted 786 balls from which they scored 334 for 5. The expected wickets/runs from those deliveries were 13.5/376.1.

England made 420 all out in 613 balls in the third innings in Hyderabad in January this year. The expected wickets/runs from those 613 balls were 15.7/391.3. Over the course of the series, the Indian batting produced an expected average of 42.2 (their actual average in the series was 39.7), while England's expected average was 26.0 (actual, 25.6). The figures belie the idea that it was a close series and that England were close to winning it. India were only 28 runs away from a 5-0 result.

In Australia in 2020-21, India were decidedly the luckier of the two sides. Their expected average with the bat was 29.3 (actual 30.4). Australia's expected average was 37.0 (actual 29.3). Essentially, enough Australian batters fell to early mistakes to nullify the difference in quality between the Australian and Indian attacks. The gap between the two attacks was narrower in the first two Tests (Australian batting: 32.7 xAve, Indian batting: 27.4 xAve) in 2020, than it was in the last two Tests, played in 2021 (Australian batting: 40.0 xAve, Indian batting: 30.0 xAve) after India had lost several players to injuries.

The model could be modified, for instance, to consider the innings of the match in which the shot is attempted, to add greater texture. For a right-hand batter sweeping the slow left-arm orthodox bowler, the expected-wickets figure from innings one through innings four is 0.110, 0.111, 0.136, 0.123. In other words, the chance of a dismissal for a false shot on the sweep is between 11% and 14%. The conversion rate of false shot to dismissal is only marginally affected by the innings in the match.

The temptation to build ever more elaborate sets of categories should be resisted. The larger the number of categories, the smaller the number of deliveries in each category, and consequently, the less stable the average expectation from each category. With more categories, it also becomes more difficult to keep them apart and ensure that they do not describe overlapping features. For example, ESPNcricinfo's classification includes four categories of right-arm seam bowlers - right-arm medium, right-arm medium-fast, right-arm fast-medium, and right-arm fast. It becomes difficult to distinguish between the middle two. But it is also, on the other hand, easy to see why these categories might be useful. Consider, for instance, Colin de Grandhomme (medium), Chaminda Vaas, especially after his injury (medium-fast), Glenn McGrath (fast-medium), and Brett Lee (fast). The speed gun readings suggest that fast-medium bowlers fall back into the medium-fast category at times during Test matches, especially in flat batting conditions, when there's a lot of bowling to be done. If anything, having a two-pronged classification of seam bowlers - fast and medium - would be sufficient. Ideally, an expected runs/wickets model would include the trajectory of the delivery and the batter's control as its inputs. Absent this, the categories provided by ESPNcricinfo offer a usable proxy.

This expected runs/wickets model is relatively easy to implement. They provide a baseline expectation and make it possible to measure both the relative quality of the teams involved in a match as well as relative good (or bad) fortune enjoyed by each. A model along the lines described in this article should be available in the coverage of every Test match.

The figures used in this article include Tests completed on or before September 25, 2024