Is the ICC's pitch-rating system fit for purpose?

Scott Oliver

Mar 30, 2023, 06:07 AM

No other sport obsesses quite as much as cricket over the surfaces on which it is played. Pitches are not only a perennial object of fascination but also the subject of controversy. Take the preliminaries for the Border-Gavaskar Trophy series, with the usual dance of pre-emptive suspicion and defensiveness. A bullish Ravi Shastri called for pitches that turned from the outset, and Ian Healy talked up Australia's chances thus: "I think if they produce fair Indian wickets that are good batting wickets to begin with… we win. If they're unfair wickets … then I think India play those conditions better than us."

Then the covers came off in Nagpur and it was apparent that the pitch had been selectively watered, mowed and rolled, and that this "differential preparation" - which left bare patches outside the left-handers' off stump on a spinner's length at both ends - had ostensibly been tailored to suit the home team, who had one leftie in the top seven to the visitors' four, and two left-arm spinners to the visitors' none. Australia's players maintained a strategic silence, but was this pushing home advantage too far?

The match referee, Andy Pycroft, ultimately decided that the pitch was not worthy of sanction, yet questions around pitch preparation were nevertheless again brought into sharp focus. In the age of bilateral series, with World Test Championship points on the line, will pitch-doctoring become an ever greater temptation, as Rahul Dravid observed recently? And, more broadly, what is a "good" or "fair" pitch, and how is it determined?

How the ICC's pitch-rating system works now

The ICC's Pitch and Outfield Monitoring Process was introduced in 2006 and updated in January 2018 in an effort, they say, to reflect the variety of conditions worldwide and make member boards more accountable for the pitches they produce, as well as to introduce greater transparency in the rating of pitches.
One of six potential ratings applies to both pitch and outfield for each game: very good, good, average, below average, poor and unfit, with the bottom three incurring demerit points (1, 3 and 5 respectively for the pitch, 0, 2 and 5 for the outfield). Pick up five demerit points in a rolling five-year period and your ICC ground accreditation is suspended for 12 months. Pick up ten and it is two years without international cricket. Hugely consequential for the local association, perhaps less so for the national board. In situations where a pitch underperforms, match referees must consult umpires and captains before assigning a rating.

A pitch is deemed to be "below average" if there is "either very little carry and/or bounce and/or more than occasional seam movement, or occasional variable (but not excessive or dangerous) bounce and/or occasional variable carry". Fine, but how do you determine this?

A pitch is deemed "poor" if it "does not allow an even contest between bat and ball", whether that favours batters or bowlers. The ICC's guidance goes on to invoke "excessive seam movement", "excessive unevenness of bounce", "excessive assistance to spin bowlers, especially early in the match" and "little or no seam movement or turn at any stage in the match together with no significant bounce or carry" as well as "excessive dryness" and "excessive moistness". Fine, but how exactly do you determine all that?

The notes for "clarification" in Appendix A to the ICC's literature for the ratings tell us that "Excessive means 'too much'". Sure, but how exactly do you measure that?

Too much is left to interpretation in the pitch-marking process

The truth is that it is rare for pitches to be given any of the bottom three marks. From the men's World Cup in July 2019 to the end of 2022, only six Test pitches out of 135 (and one outfield) were given a "below average" rating, five of them in 2022. Two of 2022's "below average" marks were for Rawalpindi. The first was given by Ranjan Madugalle when Australia's visit in March produced 14 wickets across the five days for 1187 runs. The second was given by Pycroft after England's visit last December, although this was subsequently overturned on appeal, which is heard by the chair of the ICC's Cricket Committee, currently Sourav Ganguly, and the ICC general manager for cricket, currently Wasim Khan, the former CEO of the Pakistan Cricket Board. How did they arrive at this judgement?

The official explanation was that, "having reviewed footage of the Test Match, the ICC appeal panel […] were unanimous in their opinion that, while the guidelines had been followed by the Match Referee […] there were several redeeming features - including the fact that a result was achieved following a compelling game, with 37 out of a possible 39 wickets being taken. As such, the appeal panel concluded that the wicket did not warrant the 'below average' rating."

This is a curious logic. Ben Stokes' team scored at a historically unprecedented rate (921 runs at 6.73 runs per over) to "put time back into the game", thus drastically increasing the chance that wickets would be lost (every 43.2 balls to Pakistan's 75.6), and they won with just ten minutes' light remaining on the fifth evening. It is almost certain that England's strategy was devised after contemplating the Australia Test match in March. Is the ICC saying that such a pitch is adequate provided the Bazball approach is adopted?

When approached, in the spirit of transparency, about exactly how much of the match footage was reviewed, the ICC would only refer to the press release.

According to the pitch-ratings guidelines, an "average" pitch "lacks carry, and/or bounce and/or occasional seam movement, but [is] consistent in carry and bounce". Fine, but consistency is a property determined by frequency, and adjudicating on this implies one would watch the whole game - that is, have the full data set, as would a match referee - to be able to assess how regularly deliveries misbehaved. Was this done by the appeal panel?

What emerges from all this is a sense that the process for marking pitches contains too much "interpretative latitude" in the criteria, and as such, lacks empirical robustness - borne out by how the judgement of a person who watched an entire game (and, presumably, consulted umpires and captains, as per ICC protocol) can be overturned by those who did not. This makes it likely that a match referee who has had a "below average" mark rescinded on appeal will, the next time he finds himself deciding between "average" or "below average", be inclined to play safe, not least because the criteria plausibly allow it. Why put one's neck out?

Pycroft's next two Tests after the Rawalpindi appeal verdict was returned in January were the first two of the Border-Gavaskar series. Both the "differentially prepared" Nagpur strip (on which a wicket fell every 47.1 deliveries, albeit with Australia only selecting two frontline spinners, one of whom was a debutant) and the pitch in Delhi (a wicket every 38.8 deliveries, both sides playing three front-line spinners) were marked as "average".

The pitch for the third Test, in Indore (a wicket every 38.5 deliveries, same spin-bowling line-ups) was rated "poor" by Chris Broad, initially incurring three demerit points. The strip for the bore draw in Ahmedabad (a somnolent 1970s run rate of 2.9 and a wicket winkled every 115.7 deliveries, 22 in five days on a surface that barely changed) was rated "average", entirely understandable after the Rawalpindi overrule but surely not healthy for Test cricket.

The BCCI appealed the Indore decision; Ganguly had to recuse himself from the review process, nominating a proxy, Roger Harper. It mattered little, as the outcome was again the same: Wasim Khan and Harper "reviewed the footage" of the match and despite feeling that "the guidelines had been followed" by Broad, ultimately decided "there was not enough excessive variable bounce to warrant the 'poor' rating". Not enough. Okay then.

As opaque as all this sounds, it was evidently a good outcome for the BCCI, although one can imagine circumstances in which it may not even have bothered appealing - after all, it is not really the national board that is being sanctioned but the local association, which loses both revenue and prestige. And here is where the scope for abuse lies: Crucial matches with WTC points at stake could, in theory, be assigned to a country's second-tier grounds, with instructions to produce doctored, advantage-seeking pitches in full knowledge of the risk, or even likelihood, of demerit points, and the venue's potential loss of ICC accreditation - taking one for the team, as it were - would be duly compensated by the board.

Why not use ball-tracking to refine and add precision to the pitch-rating process?

Ultimately, the subjective, interpretative element, the lack of empirical rigour in the pitch-ratings criteria, does little to help match referees (none of whom are permitted to express an opinion about the system), and in some instances could place them under an onerous degree of "political" pressure. Presumably, then, they would welcome a more objective and data-driven framework for their assessments.
The solution, potentially, is staring cricket in the face: not neutral curators but the ball-tracking technology that has been a mandatory part of the infrastructure at all ICC fixtures since the DRS was introduced in November 2009.

Essentially, match referees are rating a pitch's performance properties: pace, bounce, lateral deviation, consistency, deterioration over time. The majority of these are already measured by ball-tracking technology providers for use in their broadcasts. It is not beyond the realms of technological possibility that these properties could be given precisely calibrated parameters, within which pitches must fall to attain the various ratings, beyond which they are considered extreme.

The first step would be a deep dive into those 13-plus years of ball-tracking data (565 Tests and counting), establishing the relationships between the quantified performance properties exhibited by the various pitches and the marks assigned them. Cricketing common sense would suggest that there ought to be a fairly coherent set of correspondences between referees' verdicts and the data.

From there, you start to build the parameters. There would be some complexity here, even if some of the variables ought to be straightforwardly amenable to "parameterisation". In particular: loss of pace after pitching, consistency of pace loss (and its deterioration across the match), bounce, consistency of bounce (and its deterioration). Beyond certain thresholds, pitches would be sanctioned accordingly.

Less amenable to parameterisation, and thus more difficult to use to build a regulatory framework, would be lateral deviation, for both seam and spin (even if one would expect the deep dive to yield strong correspondences between pitch ratings and the ball-tracking data for sideways movement). Deviation upon pitching is immediately visible, of course, but the bowler's skill plays a big part. For spinners, the relevant input variables producing the degree of turn are numerous: the revolutions imparted on the ball by the bowler, the axis of rotation, the pace of the delivery, the angle of incidence with the pitch, and the age of the ball.

These variables can overlap and interact in ways that offset each other and potentially resist any one-size-fits-all parameterisation. For instance, a pitch may show "excessive" turn (once this has been defined) but it might be fairly slow turn with relatively uniform bounce. One might, in this instance, use the technology to model a relationship between pace loss and degree of turn for spinners, which would be calibrated against consensus notions of bat-ball balance.

For all the complexity around lateral deviation (where do you set the parameters, and how rigidly?), a couple of things need to be said here.

First, however difficult it is to create the framework, none of this lies beyond the scope of the existing technology. (Whether for contractual or commercial reasons, Hawk-Eye declined to comment on the viability of using its technology to assess pitch performance.)

Second, the goal is to improve the existing system, not make one that is absolutely prescriptive and infallible. The difficulties in devising an all-encompassing a priori model should not be seen as a weakness but rather a simple recognition of complexity. Seatbelts don't prevent 100% of road-accident fatalities, but having them is better than not. Thus, while it might be justified to mark down a surface on the basis of a precisely quantified pace loss after pitching, it might not be desirable to do so automatically on the basis of a fixed amount of lateral deviation. Other factors would have to be weighed up - but this would be done, precisely, by using the information provided by the ball-tracking technology.

Third, nothing is necessarily going to change. These are heuristic tools that make for a more robustly scientific way of using the criteria that are already in place and the values set out there in relation to the balance of the game. However, by supplementing the qualitative (the ICC's pitch-ratings criteria descriptions) with the quantitative (ball-tracking data), you would inevitably increase match referees' confidence in their assessments, particularly in the face of querulous and powerful national boards, and thus boost the public's confidence in the process as a whole. As such, those 565 Tests would perhaps serve as "legal precedent" of sorts: "Pitch X was marked 'poor' because it exhibited an average of n degrees of lateral deviation for seamers' full-pace deliveries on the first day, similarly to Test Y in city Z." And these verdicts would be reached independently of how the teams played on the wicket, since the latter involves facets of the game such as intent, strategy and competence that ought to be extraneous to the pitch-rating process.

Will developing a technology-backed framework for marking pitches mean pitches become homogenous across the international game, bleeding it of variety? No. The ball-tracking technology would simply establish a set of rigorous performance parameters a pitch would need to reach in order to be classified as "average", "good", "very good", and so on. It then becomes a question of the optimal way of achieving those in any given environment - which would also build knowledge about pitch preparation that could be hugely beneficial to the emerging cricketing nations, where such expertise is thinner on the ground.

A technology-backed pitch-ratings method would reduce cultural tensions

Of course, if sanctions for substandard surfaces impacted national teams (through the docking of WTC points), it would immediately remove the incentive for their boards to "request" egregiously advantage-seeking pitches whenever it became expedient - be that for sporting, political or other reasons.
Less conspiratorially, developing a more precise, data-backed framework would increase the confidence of and in referees around what is often a politically charged issue. This might prove analogous to the introduction of neutral umpires (or even the DRS, which potentially obviates the need for match officials needing to be seen to be neutral).

And here is arguably the most important, though perhaps least tangible, benefit: The type of cultural tensions that crop up when pitch ratings are discussed - the defensiveness and suspicion, the accusations and denials - would be deprived of most of their oxygen. Sensitivities would be defused. This is not a trifling point in the age of social media, which have proven to be state-of-the-art antagonism machines. As the not-so-old joke has it, in a poll asking whether society had grown more divided, 50% said yes and 50% no.

An example of these simmering sensitivities being stirred came with the most recent pitch before Indore to pick up a demerit point: last December's Brisbane Test between Australia and South Africa, completed inside two days. Close observers were quick to point out the game's almost identical duration (especially the distribution of overs across the four innings) to the day-night Ahmedabad Test between India and England in February 2021.

If a subcontinent Test would have finished in 2 days, the reactions would be quite different to say the least. #AUSvSA pic.twitter.com/yvcH0rWweL
— Wasim Jaffer (@WasimJaffer14) December 18, 2022

Before the Gabba pitch had even been marked, the defensiveness and pre-emptive sense of grievance kicked in. Wasim Jaffer tweeted a meme comparing likely reactions to a two-day pitch in the SENA nations (South Africa, England, New Zealand, Australia) and the subcontinent, in essence implying that if that two-day Brisbane result had come on an Indian wicket, the cricket world would be up in arms. If social media is an animosity amplifier, Jaffer was perhaps equivalent to the populist leader using a straw man to roil up a sense of victimhood among his base (1.2 million Twitter followers now) - though the idea of victimhood is a somewhat quaint notion for Indian cricket in 2023.

Of course, the irony is that Brisbane was marked "below average" by Richie Richardson, with both sets of players and even the curator agreeing it was wholly merited, whereas that Ahmedabad pitch - the shortest Test since 1935, a surface on which Joe Root took 5 for 8 - was rated "average" by Javagal Srinath, standing as match referee due to Covid travel restrictions.

This is not to suggest anything improper from Srinath. After all, a year later he assigned a "below average" rating to the Bengaluru Test pitch, a day-night match that lasted 223.2 overs. It is simply to emphasise how, given the interpretative latitude baked into the ICC's pitch-ratings criteria, any referee's assessment of a pitch teetering between "average" and "below average" ratings might ultimately be a matter of perception, unconsciously influenced or conditioned by cultural background ("This isn't a turner, mate!"), a point on which Jaffer is inadvertently correct.

A further factor here is that, although the Gabba surface was overly damp to begin with and thus became pockmarked, producing variable bounce at speed as the surface baked, in general terms, pitches with excessive seam movement early in the game are not equivalent to those with excessive spin. In theory, the former can improve as the game develops. A pitch that is excessively dry and crumbling at the outset is not going to get any better. (Nevertheless, where a pitch has been prepared in rainy conditions and the curator is fully aware that it is overly damp to begin with, and thus fearful of a demerit, yet the umpires are keen to start the game in front of a full stadium, there would have to be some latitude in the referee's pitch rating to reflect this expediency.)

A more objective pitch-rating process would help prevent abuse of the system

One would hope that the ICC has a keen interest in tightening all this up, in using the resources that are already available. Because ultimately there could be far more on the line than defusing cultural sensitivities or preventing WTC chicanery. Relieving the potential pressure on referees to reach the "correct" verdicts in certain circumstances might be about protecting the pitch-ratings process from possible abuse or even corruption.

Consider the following hypothetical scenario. A massive stadium named after a firebrand populist leader finds itself on four demerit points six months out from that country hosting an ICC tournament in which the stadium has been earmarked to host several games, including the final. Before then, however, the ground stages a marquee Test match and produces another slightly questionable surface, jeopardising its ICC accreditation. Given sport's utility as a vehicle for a regime's "soft power", the wider interest in the rating assigned to the pitch in these circumstances would be intense, the pressure on the match referee potentially overwhelming.

Or another hot-potato scenario, more economic in nature. A ground on one of the Caribbean islands sits on the precipice of suspension. It is hosting various games in the Under-19 World Cup, but in a few months' time will stage a Test match against England, with 10,000 Barmy Army members expected to visit. Should a fifth demerit point be accrued, the hit to the economy would be substantial. Again, one imagines local politicians would be unusually invested in the difference between a prospective "average" and "below average" pitch rating in one of those U-19 World Cup games.

Even if a match referee were impervious to whatever pressures might be exerted, as well as to any temptation to play safe (which surely increases every time a pitch verdict is overturned), a national board can always exercise its right of appeal and potentially bring its influence to bear. After all, if Pycroft can watch every ball of the Rawalpindi Test and have his considered judgement overruled by officials deducing the nature of the pitch from the scorecard, tail wagging dog, then why not roll the dice and appeal? If Broad, having seen a ball in the first over of a game he watched in its entirety explode through the surface and rag square, only to have his verdict overturned by administrators watching "footage" and deciding on that basis whether the variable bounce was acceptable or "excessive", then why not see if those wholly unscientific definitions can be stretched and bent a little more favourably?

Both Rawalpindi and Indore show that the pitch-ratings system urgently needs greater empirical heft and objectivity, not least to save match referees from being regularly thrown under the bus, but also to prevent a wider loss of credibility in the system. The ICC for its part says it is comfortable with the process that's in place, but does its executive really have the clout to change things for the better, even if they wanted to?

In the end, the barrier to reform may well be precisely what the Woolf Report identified in 2012: that the ICC executive is ultimately toothless in the face of the national boards, and the latter - notionally equal, though some clearly more equal than others - might not want change, whether it helps the game or not. It simply may not be in the interests of some powerful members to close off the possibility of a little pitch-doctoring, a little advantage-seeking skulduggery, particularly those with a surplus of international venues and the potential, therefore, to game the system.

In such circumstances, the canny, careerist member of the ICC executive may reckon that the smart move is to rock the boat as little as possible, to keep the big boys sweet, to take the path of least resistance. Without any real regulatory bite over bilateral cricket, the ICC effectively becomes what Gideon Haigh described as "an events management organisation that sends out ranking emails". And so inertia reigns and, as far as marking pitches is concerned, vagueness prevails, with the result that grievance festers and cricket, ultimately, loses.