Tag Archives: Quality of Competition

Hockey Desperately Needs a Better Competition Metric (Part 2 of 2)

EDMONTON, AB - OCTOBER 25: Connor McDavid #97 of the Edmonton Oilers battles for the puck against Drew Doughty #8 of the Los Angeles Kings on October 25, 2015 at Rexall Place in Edmonton, Alberta, Canada. (Photo by Andy Devlin/NHLI via Getty Images)
EDMONTON, AB – OCTOBER 25: Connor McDavid #97 of the Edmonton Oilers battles for the puck against Drew Doughty #8 of the Los Angeles Kings on October 25, 2015 at Rexall Place in Edmonton, Alberta, Canada. (Photo by Andy Devlin/NHLI via Getty Images)

This article is part 2 of 2.

In part 1, I noted that using shot metrics for evaluating individual players is heavily influenced by teammates, coaches usage (zone starts), and competition*.

I believe we have decent tools for understanding the effect of teammates and zone starts – but I believe this is not at all true for competition metrics (dubbed QoC, or Quality of Competition).

And the reality is that understanding competition is critical to using shot metrics for player evaluation. If current QoC measures are not good, this means QoC is a huge weakness in the use of shot metrics for player evaluation.

I believe this is the case.

Let’s see if I can make a convincing case for you!

*Truthfully, there are quite a few other contextual factors, like team, and score state. These shot metrics have been around for a decade plus, and they’ve been studied (and are now often adjusted) heavily. Some of the effects that have been identified can be quite subtle and counterintuitive. From the point of view of assessing *a* player on *a* team, it doesn’t hurt us to focus on these three factors.

It Just Doesn’t Matter – You’re Kidding, Right?

If you bring up Quality of Competition with many fancystats people, they’ll often look at you and flat out tell you that “quality of competition doesn’t matter.”

This response will surprise many – and frankly, it should.

We know competition matters.

We know that a player is going to have a way harder time facing Sidney Crosby than facing Tanner Glass.

We know that coaches gameplan to face Taylor Hall, not his roommate Luke Gazdic (so long, lads). And they gameplan primarily with player matchups.

Are our eyes and the coaches that far out to lunch?

Yes, say the fancystats. Because, they say, when you calculate quality of competition, you just don’t see that much difference in the level of competition faced by different players. Therefore, so conventional wisdom dictates, it doesn’t matter.

The Numbers Suggest Matchups Matter

I don’t have to rely on just the eye test to contradict this line of thought – the numbers do the work too. For example, here are the head to head matchup numbers (I trot these out as a textbook example of coaching matchups) for the three Montreal defense pairs against Edmonton from the game on February 7th, 2016:

vs

Hall

McDavid

Subban-Markov

~ 3 mins

~ 10 mins

Petry-Emelin

~ 8 mins

~ 5 mins

Gilbert-Barberio

~ 40 seconds

~ 14 seconds

Does that look like “Quality of Competition” doesn’t matter? It sure mattered for both Hall and McDavid, not to mention all three Montreal defense pairs. Fifteen minutes vs 14 seconds is not a coincidence. That was gameplanned.

So how do we reconcile this?

Let’s dig in and see why maybe conventional wisdom is just plain wrong – maybe the problem is not with the quality of competition but the way in which we measure it.

It Would Hit You Like Peter Gabriel’s Sledgehammer

I’ll start by showing you an extremely valuable tool for assessing players in the context of zone starts and QoC, which is Rob Vollman’s Player Usage Charts, often called sledgehammer charts.

This chart is for Oiler defensemen in 2015-2016:

This shows three of the four things we’ve talked about previously:

  • The bubble colour (blue good) shows the shot metrics balance of good/bad for that individual
  • The farther to the right the bubble, the more faceoffs a player was on the ice for in the offensive zone – favourable zone starts or coaches usage in other words
  • The higher the bubble, the tougher the Quality of Competition

Notice something about the QoC though. See how it has such a narrow range? The weakest guy on there is Clendening at -0.6. The toughest is Klefbom at a shade over 1.0.

If you’re not familiar with “CorsiRel” (I’ll explain later), take my word for it: that’s not a very meaningful range. If you told me Player A has a CorsiRel of 1.0, and another has a CorsiRel of 0.0, I wouldn’t ascribe a lot of value to that difference. Yet that range easily encompasses 8 of the 11 defenders on the chart.

So no wonder the fancystatters say QoC doesn’t matter. The entire range we see, for a full season for an entire defensive corps worst to last, is a very small difference. Clendening basically faced barely weaker competition than did Klefbom.

Or did he?  That doesn’t sound right, does it?  Yeah, the Oiler D was a tire fire and injuries played havoc – but Todd McLellan wasn’t sending Clendening out to face Joe Thornton if he could help it.

To figure out what might be wrong, let’s dig in to see how we come up with these numbers that show such a thin margin of difference.

Time Weighs On Me

The process for calculating a QoC metric starts by assigning every player in the league a value that reflects how tough they are as competition.

Then when we need the QoC level faced by a particular player:

  • we look at all the players he faced, multiply (weight) the amount of time spent against that player with the competition value of that player
  • we add it all up, and presto, you have a QoC measure for the given player

Assuming that the time on ice calculations are reasonably fixed by, you know, time on ice, it should be clear that the validity of this QoC metric is almost entirely dependent on the validity of the ‘competition value’ assigned to each player.

If that competition value isn’t good, then you have a GIGO (garbage in garbage out) situation, and your QoC metric isn’t going to work either.

There are three different data values that are commonly used for calculating a QoC metric, so let’s take a look at each one and see if it meets the test of validity.

Using Corsi for Qoc

Many fancystats people who feel that QoC doesn’t matter will point to this post by Eric Tulsky to justify their reasoning.

Tulsky (now employed by the Hurricanes) is very, very smart, and one of the pillars of the hockey fancystats movement. He’s as important and influential as Vic Ferarri (Tim Barnes), JLikens (Tore Purdy), Gabe Desjardins, and mc79hockey (Tyler Dellow). So when he speaks – we listen.

The money quote in his piece is this:

Everyone faces opponents with both good and bad shot differential, and the differences in time spent against various strength opponents by these metrics are minimal.

Yet all that said – I think Tulsky’s conclusions in that post on QoC are wrong. I would assert that the problem he encounters, and the reason he gets the poor results that he does, is that he uses a player’s raw Corsi (shot differential) as the sole ‘competition value’ measure.

All his metric does is tell you is how a player did against other players of varying good and bad shot differential. It actually does a poor job of telling you the quality of the players faced, which is the leap of faith being made. Yet the leap is unjustified, because players of much, much different ability can have the same raw Corsi score.

To test that, we can rank all the players last season by raw Corsi, and here’s a few of the problems we immediately see:

  • Patrice Cormier (played two games for WPG) is the toughest competition in the league
  • He’s joined in the Top 10 by E Rodrigues, Sgarbossa, J Welsh, Dowd, Poirier, Brown, Tangradi, Witkowski, and Forbort.
  • Mark Arcobello is in the top 20, approximately 25 spots ahead of Joe Thornton
  • Anze Kopitar just signed for $10MM/yr while everyone nodded their head in agreement – while Cody Hodgson might have to look for work in Europe, and this will garner the same reaction. Yet using raw Corsi as the measure, they are the same level of competition (57.5%)
  • Chris Kunitz is about 55th on the list – approximately 40 spots ahead of Sidney Crosby
  • Don’t feel bad, Sid – at least you’re miles ahead of Kessel, Jamie Benn, and Nikita Nikitin – who is himself several spots above Brent Burns and Alex Ovechkin.

*Note: all data sourced from the outstanding site corsica.hockey. Pull up the league’s players, sort them using the factors above for the 2015-2016 season, and you should be able to recreate everything I’m describing above.

I could go on, but you get the picture, right? The busts I’ve listed are not rare. They’re all over the place.

Now, why might we be seeing these really strange results?

  • Sample size!  Poor players play little, and that means their shot metrics can jump all over the place.  Play two minutes, have your line get two shots and give up one shot, and raw Corsi will anoint you one of the toughest players in the league. We can account for this when looking at the data, but computationally it can wreak havoc if unaccounted for.
  • Even with large sample sizes, you can get very minimal difference in shot differential between very different players because of coaches matching lines and playing “like vs like”. The best players tend to play against the best players and their Corsi is limited due to playing against the best. Similarly, mediocre players tend to play against mediocre players and their Corsi is inflated accordingly. It’s part of the problem we’re trying to solve!
  • For that same reason, raw Corsi tends to overinflate the value of 3rd pairing Dmen, because they so often are playing against stick-optional players who are Corsi black holes.
  • The raw Corsi number is heavily influenced by the quality of the team around a player.

Corsi is a highly valuable statistic, particularly as a counterpoint to more traditional measures like boxcars. But as a standalone measure for gauging the value of a player, it is deeply flawed. Any statistic that uses raw Corsi as its only measure of quality is going to fail. GIGO, remember?

Knowing what we know – is it a surprise that Tulsky got the results he got?

So we should go ahead and rule out using raw Corsi as a useful basis for QoC.

Using Relative Corsi for QoC

If you aren’t familiar with RelCorsi, it’s pretty simple: instead of using a raw number, for each player we just take the number ‘relative’ to the teams numbers.

For example, a player with a raw Corsi of 52 but on a team that is at 54 will get a -2, while a player with a raw Corsi of 48 will get a +2 if his team is at 46.

The idea here is good players on bad teams tend to get hammered on Corsi, while bad players on good teams tend to get a boost. So we cover that off by looking at how good a player is relative to their team.

Using RelCor as the basis for a QoC metric does in general appear to produce better results. When you look at a list of players using RelCor to sort them, the cream seems to be more likely to rise to the top.

Still, if you pull up a table of players sorted by RelCor (the Vollman sledgehammer I posted earlier uses this metric as its base for QoC), again you very quickly start to see the issues:

  • Our top 10 is once again a murderers row of Vitale, Sgarbossa, Corey Power Potter Play, Rodrigues, Brown, Tangradi, Poirier, Cormier, Welsh, and Strachan.
  • Of all the players with regular ice time, officially your toughest competition is Nino Niederreiter.  Nino?  No no!
  • Top defenders Karlsson and Hedman are right up there, but they are followed closely by R Pulock and D Pouliot, well ahead of say OEL and Doughty.
  • Poor Sid, he can’t even crack the Top 100 this time.

Again, if we try and deconstruct why we get these wonky results, it suggests two significant flaws:

  • Coach’s deployment. Who a player plays and when they play is a major driver of RelCor. You can see this once again with 3rd pairing D men, whose RelCor, like their raw Corsi, is often inflated.
  • The depth of the team. Good players on deep teams tend to have weaker RelCors than those on bad teams (the opposite of the raw Corsi effect). This is why Nicklas Backstrom (+1.97) and Sam Gagner (+1.95) can have very similar RelCor numbers while being vastly different to play against.

RelCor is a very valuable metric in the right context, but suffers terribly as a standalone metric for gauging the value of a player.

Like raw Corsi, despite its widespread use we should rule out relative Corsi as a useful standalone basis for QoC.

Using 5v5 TOI for QoC

This is probably the most widely used (and arguably best) tool for delineating QoC. This was also pioneered by the venerable Eric Tulsky.

When we sort a list of players using the aggregated TOI per game of their “average” opponent, we see the cream tend to rise to the top even moreso than with RelCor.

And analyzing the data under the hood used to generate this QoC, our top three “toughest competition” players are now Ryan Suter, Erik Karlsson, and Drew Doughty. Sounding good, right?

But like with the two Corsi measures, if you look at the ratings using this measure, you can still see problematic results all over, with clearly poor players ranked ahead of good players quite often. For example:

  • The top of the list is all defensemen.
  • Our best forward is Evander Kane, at #105. Next up are Patrick Kane (123rd), John Tavares (134th), and Taylor Hall (144th). All top notch players, but the ranking is problematic to say the least. Especially when you see Roman Polak at 124th.
  • Even among defensemen, is Subban really on par with Michael del Zotto? Is Jordan Oesterle the same as OEL? Is Kris Russel so much better than Giordano, Vlasic, and Muzzin?
  • Poor old Crosby is still not in the Top 100, although he finally is when you look at just forwards.
  • Nuge is finally living up to his potential, though, ahead of Duchene and Stamkos!

OK, I’ll stop there. You get my point. This isn’t the occasional cherry picked bust, you can see odd results like this all over.

Looking at the reasons for these busts, you see at least two clear reasons:

  • Poor defensemen generally get as much or more time on ice than do very good forwards. Putting all players regardless of position on the same TOI scale simply doesn’t work. (Just imagine if we included goaltenders in this list – even the worst goalies would of course skyrocket to the top of the list).
  • Depth of roster has a significant effect as well. Poor players on bad teams get lots of ice time – it’s a big part of what makes them bad teams after all. Coaches also have favourites or assign sideburns to players for reasons other than hockeying (e.g. Justin Schultz and the Oilers is arguably a good example of both weak depth of roster and coach’s favoritism).

So once again, we find ourselves concluding that the underlying measure to this QoC, TOI, tells you a lot about a player, but there are very real concerns in using it as a standalone measure.

Another problem shows up when we actually try to use this measure in the context of QoC: competition blending.

As a player moves up and down the roster (due to injuries or coaches preference) their QoC changes. At the end of the year we are left with one number to evaluate their QoC but if this roster shuttling has happened, that one number doesn’t represent who they actually played very well.

A good example of the blending problem is Mark Fayne during this past year.  When you look at his overall TOIQoC, he is either 1 or 2 on the Oilers, denoting that he had the toughest matchups.

His overall CF% was also 49.4%, so a reasonable conclusion was that “he held his own against the best”.  Turns out – it wasn’t really true.  He got shredded like coleslaw against the tough matchups.

Down the road, Woodguy (@Woodguy55) and I will show you why this is not really true, and that it is a failing of TOIC as a metric. It tells us how much TOI a player’s average opponent had, but it doesn’t tell us anything more.  We’re left to guess, with the information often pointing us in the wrong direction.

A Malfunction in the Metric

Let’s review what we’ve discussed and found so far:

  • QoC measures as currently used do not show a large differentiation in the competition faced by NHL players. This is often at odds with observed head to head matchups.
  • Even when they do show a difference, they give us no context on how to use that to adjust the varying shot metrics results that we see. Does an increase of 0.5 QoC make up for a 3% Corsi differential between players?  Remember from Part 1 that understanding the context of competition is critical to assessing the performance of the player.  Now we have a number – but it doesn’t really help.
  • The three metrics most commonly used as the basis for QoC are demonstrably poor when used as a standalone measure of ‘quality’ of player.
  • So it should be no surprise that assessments using these QoC measures produce results at odds with observation.
  • Do those odd results reflect reality on the ice, or a malfunction in the metric? Looking in depth at the underlying measures, the principle of GIGO suggests it may very well be the metric that is at fault.

Which leaves us … where?

We know competition is a critical contextual aspect of using shot metrics to evaluate players.

But our current QoC metrics appear to be built on a foundation of sand.

Hockey desperately needs a better competition metric.

Now lest this article seem like one long shrill complaint, or cry for help … it’s not. It’s setting the background for a QoC project that Woodguy and I have been working on for quite some time.

Hopefully we’ll convince you there is an answer to this problem, but it requires approaching QoC in an entirely different way.

Stay tuned!

P.S.

And the next time someone tells you “quality of competition doesn’t matter”, you tell them that “common QoC metrics are built on poor foundational metrics that cannot be used in isolation for measuring the quality of players. Ever hear of GIGO?”

Then drop the mic and walk.

Dig skateboarding? Click the pic and grab this new “Thrasher Magazine” inspired tee!
Click on the pic and grab a new 16-bit Fighting Looch tee!
Click the pic and grab a 16-bit McDavid tee for the summer!
If you’re a fan of Lowetide, you need this shirt! Click the pic and get yours today!

Fancystats Fundamentals and Why Hockey Desperately Needs a Better Competition Metric (Part 1 of 2)

Hockey needs a better competition metric – without it, the value of fancystats for the evaluation of individual players is significantly weakened.

Let me tell you why!

A Background Tutorial

I’m assuming most of the people reading an article with this title are probably familiar with fancystats. But I’m hoping there are a few readers who are a little nervous around fancystats – and I’m hoping I can capture your interest too.

On that note, I’m going to do a bit of a grounds-up tutorial on fancystats to set the stage here in part 1, then get into the meat of discussion on competition in part 2. I hope those of you with lots of knowledge in that arena already will bear with me – or feel free to skip ahead!

Fancystats – Counting the Good and the Bad

Personally, I’m always a bit baffled by the hatred and contempt for the most common shot metric fancystats (the ones with the odd names like Corsi and Fenwick).

Here’s the thing: it’s not like these metrics are measuring anything unrelated to hockey. In fact, they’re measuring something fundamental to hockey, which is shots. No shots, no goals. No goals, no wins. No wins … sucks.

I like to think of it this way. If my team has the puck and is shooting the puck at the other guys net, this is almost without exception a good thing. Sometimes it’s a tiny good thing, sometimes it’s a major good thing, but it’s pretty much always a good thing.

Conversely, if the bad guys have the puck and are shooting it at my net, it is almost without exception a bad thing. Sometimes it’s a tiny bad thing, sometimes it’s a major bad thing, but it’s pretty much always a bad thing.

In the end, these ‘fancy’ stats are not fancy, and they’re not really even stats!  We’re just counting up good things and bad things and seeing whether our team had more or less of those things. The only wrinkle to note is that we focus these counts on even strength (5v5) time. Not that these shot attempt good and bad things don’t matter on the PK or the PP, it’s just that there are other arguably better ways to measure effectiveness on special teams.

We usually express our resulting good/bad count as a percent – 50% means the two were even, 45% means our team is at a 5% deficit in good things, and 55% means our team has a 5% advantage in the good things we counted (the percentages are always expressed from the viewpoint of a specific team).

How do we know this good thing/bad thing has value? Well, there has been a ton of work done by the math guys to show that these things we’re counting have a ton of value in terms of measuring repeatable skill, and in terms of predicting the rest of the season, or the playoffs, or next season. Better than almost anything else we can count, even goals! If you want to get into the details, there is a ton of information out there on Google.

But you can ignore that if you want – at the core, just remember that we’re counting good and bad hockey-related things, and unsurprisingly, good teams tend to have way better counts, because they’re simply better at that whole hockeying thing than are bad teams.

That’s it! You are now conversant with the big bad Corsi! Welcome to the dark side.

Breaking It Down to a Whole Nuvver Level

It’s a slightly more complicated picture when we try and apply this concept to individuals however.

We’re still counting good and bad things as they happen, but now we’re counting them in the context of the five players on the ice. At the end of the [game, series, season], we count up the good and bad things that happened while each player was on the ice, and Bob’s your uncle: player Corsi.

It’s one thing to calculate that number, though. This ‘five player on the ice’ thing means it does get a little tricky though when it comes to using the numbers to evaluate an individual player.

By tricky, I do mean tricky, not impossible. I emphasize this point because there are some folks out there, some even with a massive media platform, who dismiss these stats as exclusively team-level stats, not applicable to individual players.

Unfortunately, this is wrong, and all it demonstrates is a (sometimes profound) lack of knowledge about basic statistics. Modern statistical methods are all about looking at a large mixed dataset and teasing out individual effects.

As it happens, once you get a large volume of these good/bad counts for an individual player, that player will have played with such a large number of teammates rotating through that his number does indeed start to reflect his individual contribution.

It’s why this ‘team level’ stat is almost always different, in many cases radically so, between teammates.

And yet it’s still tricky.  Why?

The Unbearable Difficulty of Context

What makes it tricky is that in order for a player’s Corsi number to make sense, to have it give us a believable gauge of how that individual is doing, we need to understand the context in which that number was generated.

At the individual level in hockey, the most important context is provided by:

  • teammates
  • zone starts
  • competition

They’re important, because each of them drastically affect the count of good/bad things. Again, this is hockey, not stats. Who you play with, where you start, and who you face makes a huge difference to your success.

WOWY, Lookit Those Teammates

Wait a second, sez you, didn’t you just tell me that the teammate issue sorts itself out when you have lots of data?

Well, mostly it does. Those teammates do rotate a lot, and do allow us to get a better picture of the individual. What confounds us is not the other four skaters out on the ice, though, it’s usually just one, maybe two.

Players on the ice tend to have their ice time occur in tandem with one other player, more heavily than anyone else.

For defenders, this should be obvious – that other player is the D partner.

Less obvious is that when you look at time on ice breakdowns, forwards often show much the same pattern. With a few rare and exceptional lines (e.g. The Kane line in CHI), the third player on a line usually changes more often than the other two. Defense pairs rotate too. Maybe injury, maybe the coaches blender, but forward pairs do tend to stand out.

Luckily, we have a tool that helps deal with this scenario. It’s called WOWY – without you with you.

The idea is simple – given two players, Frank and Peter we’ll call them, take a look at how things (our count of good and bad things) went with Frank and Pete on the ice together, then when Frank was on the ice but without Pete, and then when Pete  was on the ice without Frank.

Sometimes you have to dig a bit deeper, such as if Frank and Peter play with radically different levels of skill when apart, like Frank gets Taylor Hall and Peter gets Taylor’s New Jersey roommate, Luke Gazdic.

But usually the two players tend to separate, and you see quality differences quite quickly.

WOWY analysis is always useful when looking at players, and in my opinion, is mandatory when trying to assess defensemen.

Corsi, or the tale of good and bad hockey things. WOWY, the tale of Frank and Peter.

Remember those, and you are well on your way to being a fancystats expert. Well done!

I’m In Da Zone

OK, so we’ve got a handle on teammates.

The second aspect of context we talked about is how the coach uses a player – whether they’re starting in their own zone a lot, or gifted offensive zone time, or neither (or both).

Turns out, this doesn’t matter nearly as much as you’d think, for a few reasons:

  • Most players are not that buried – even “25% offensive zone starts”, which seems like a harsh number, often represents something in the order of 2 O and 6 D faceoffs during a game. Yes it can add up, but it’s still in the end just four zone starts more in the d zone. Not that much in the context of 20 or more shifts per game.
  • Most shifts start on the fly, not with a faceoff. So a players ability drives defensive (or offensive) zone starts to a large extent, not the other way round. Put another way, good players tend to force faceoffs in the o zone, and bad players tend to get stuck in the d zone, and faceoffs (or goals against!) are part and parcel. So good or bad zone starts can be a symptom rather than a cause of good or poor numbers.
  • Faceoff wins are generally around 50% give or take a few points. Think of the four d zone starts from the first bullet point.  Now remember that the two teams are going to split those somewhere between 45 and 55%.  That’s basically two faceoffs that are a problem, in the context of 20 shifts.  Zone start differences diminish rapidly when you start cutting them in half.

Rather than go farther on this topic, I’ll recommend you read this two-part article by Matt Cane:

In summary: there’s reason to believe that zone starts affect a player’s numbers less than you’d think; and when they do – we have an idea of how much, and can adjust for them.

Competition

So of our three critical contextual factors, we’ve talked about two of them: teammate effects (for which we have WOWY), and zone starts (which aren’t a strong as most think, and can be adjusted for in any case).

What about competition?

Well, now things get peachy … by which I mean juicy and somewhat hairy.

Watching games, you can see coaches scrapping to get the right players on the ice against the other teams players.  Checking line to shutdown the big line?  Or go power vs power?  What about getting easy matchups for that second line?  That’s the chess game in hockey, though some coaches are clearly playing checkers.

On-ice competition is a big deal, and a critical part of measuring players. A player with 50% good/bad things is doing great if he’s always facing Sidney Crosby, and incredibly poorly if he’s facing Lauri “Korpse” Korpikoski.

How do we get a handle on that?

We’ll talk in depth about competition and how we (fail to) measure it in Part 2 of this article.

If you’re a fan of Lowetide, you need this shirt! Click the pic and get yours today!