Categories
General Baseball General Red Sox General Yankees

The Ins and Outs of Infields

Bill Conlin over in Philly says the Phils have the best infield of the modern era. Wow. 

He acknowledges that there are contenders: The Yankees in present form, and past incarnations of the Orioles and Dodgers. The problem, of course, is that Conlin is looking at offense, specifically "power" as measured in two of the most misleading power stats — home runs and RBI. Last I checked, you want your infielders to be pretty good at defense, too.

So, Ryan Howard, Chase Utley, Placido Polanco and Jimmy Rollins is a pretty impressive offensive infield. Except for Rollins. And Polanco.

OK, so that's not a good start. Rollins and Polanco had an OPS+ below 90 last year. Rollins had 21 home runs, but he slugged just .423. Polanco's value is solely on defense, which is great … for a shortstop, maybe even a second baseman. Not so great for a third baseman.

So rather than just do a straight comparison of the Yankees' and Phillies' infielders, let's look at baseball's top infielders by position in 2009 and see what happens.

I'm loath to use a stat like WAR because I doubt Bill Conlin would accept it. I'll use wOBA, which has a very simple explanation: It takes the run value of every single thing a hitter does and compiles it, then scales it so the number plays similarly to on-base percentage, where the high .300s is good, etc. No fancy math. It looks at the average value of every single outcome under the control of the batter through the entirety of baseball history, then takes that value and applies it to what a particular hitter did in his appearances in 2009. It is adjusted for ballpark and league, which allows us to compare without the noise of those effects. Besides, it is also an offense-only stat, which is what Conlin is focusing on.

First base

  1. Albert Pujols, STL, .449
  2. Prince Fielder, MIL, .420
  3. Joey Votto, CIN, .418
  4. Kevin Youkilis, BOS, .413
  5. Derrek Lee, CHC, .412
  6. Adrian Gonzalez, SDP, .402
  7. Mark Teixeira, NYY, .402
  8. Miguel Cabrera, DET, .402
  9. Adam Dunn, WAS, .394
  10. Ryan Howard, PHI, .393

For the record, the Rays' Carlos Pena is 14th or 15th, depending on how you treat Victor Martinez. Since Victor is the Sox' starting catcher for 2010, I'll say Pena is 14th.

Second base

  1. Ben Zobrist, TAM, .408
  2. Chase Utley, PHI, .402
  3. Robinson Cano, NYY, .370
  4. Dustin Pedroia, BOS. .360
  5. Ian Kinsler, TEX, .358

Shortstop

  1. Hanley Ramirez, FLA, .410
  2. Troy Tulowitzki, COL, .393
  3. Derek Jeter, NYY, .390
  4. Jason Bartlett, TAM, .389
  5. Yuniel Escobar, ATL, .357
  6. Marco Scutaro, TOR (now BOS), .354
  7. Asdrubal Cabrera, CLE, .354
  8. Miguel Tejada, HOU, .344
  9. Erick Aybar, LAA, .339
  10. Elvis Andrus, TEX, .322 …

We could go on for a while. Jimmy Rollins is 15th, just behind Rafael Furcal and just ahead of Orlando Cabrera.

Third base

  1. Alex Rodriguez, NYY, .413
  2. Pablo Sandoval, SFG, .396
  3. Michael Young, TEX, .385
  4. Mark Reynolds, ARI, .381
  5. Evan Longoria, TAM, .380

I'm going to stop here. Placido Polanco had a .321 wOBA as a second baseman, which would place him 18th on this list. Neither Adrian Beltre nor Mike Lowell had enough plate appearances to qualify (Youkilis actually qualifies for both 1B and 3B and ranks first here. Beltre had a .305 wOBA, which is awful; Lowell had a .346 wOBA, which would rank him 13th.

So what does this tell us?

Well, from a purely offensive standpoint, based solely on 2009 the Yankees have a better first baseman, shortstop and third baseman, and are very close at second. The Red Sox have a better first baseman, shortstop and third baseman, and are very close at second. (I say this because if both Beltre and Lowell repeat their 2009 performances, then the Sox will be getting Lowell's production into the lineup, and that still exceeds Polanco's.) The Rays have a better first baseman, second baseman, shortstop and third baseman. The Rangers have a better shortstop and third baseman, and are in the ballpark at second (but much worse at first).

So just looking at offense alone, the Phillies had the fifth-best infield in 2009. Their true competition is with the Rangers for fourth, given that they split the four positions. I'll grant that the Yankees, Sox and Rays probably have acquired/developed an historic amount of infield talent that will all be on display in one year (can you imagine if Beltre finds that he really likes hitting in Fenway?), never mind it all being in the same division. But that doesn't change the fact that if you can't even field one of the top three hitting infields in 2010, there's no way it's the best of the modern era.

Now, Conlin does throw a bone to defense, by noting Gold Gloves, which of course tell us nothing. Defensive stats are notoriously flaky, especially on one-year samples, but it's better than anecdotal evidence, which is essentially what a Gold Glove is. So let's look at UZR/150, the number of runs per 150 games a fielder has saved with his glove.

First base

  1. Kevin Youkilis, BOS, 15.2
  2. Kendry Morales, LAA, 5.0
  3. Derrek Lee, CHC, 4.7
  4. Adrian Gonzalez, SDP, 3.4
  5. Paul Konerko, CWS, 3.2
  6. Miguel Cabrera, DET, 3.1
  7. Justin Morneau, MIN, 3.1
  8. Russell Branyan, SEA, 2.4
  9. James Loney, LAD, 1.2
  10. Ryan Howard, PHI, 1.2

Teixeira ranks 16th at -4.1. I'm skeptical of Youkilis' number, given the small sample at first (he split time almost evenly with third, where he was -5), but suffice it to say Youkilis is obviously one of the best-fielding first basemen in baseball, and was so again last season.

Second base

  1. Ben Zobrist, TAM, 30.8
  2. Freddy Sanchez, PIT/SFG, 11.3
  3. Chase Utley, PHI, 11.3
  4. Placido Polanco, DET (now PHI), 11.0
  5. Dustin Pedroia, BOS, 10.6
  6. Ian Kinsler, TEX, 9.6

Cano ranks 15th at -5.2. Same caveat applies to Zobrist as applied to Youkilis. In any case, we all know Zobrist is a very good defensive second baseman and is expected to play there full-time in 2010.

Shortstop

  1. Jack Wilson, PIT/SEA, 20.4
  2. Cesar Izturis, BAL, 14.1
  3. Adam Everett, DET, 13.6
  4. Elvis Andrus, TEX, 11.7
  5. Alex Gonzalez, CIN/BOS, 10.5
  6. J.J. Hardy, MIL, 8.8
  7. Rafael Furcal, LAD, 8.5
  8. Derek Jeter, NYY, 8.4
  9. Ryan Theriot, CHC, 8.3
  10. Erick Aybar, LAA, 7.6

Rollins ranks 12th at 2.9, Scutaro comes in 14th, at 1.0, Bartlett is 20th.

Third base

  1. Adrian Beltre, SEA (now BOS), 21.0
  2. Ryan Zimmerman, WAS, 21.2
  3. Evan Longoria, TAM, 19.2
  4. Chone Figgins, LAA, 18.8
  5. Kevin Kouzmanoff, SDP, 10.7

Polanco would rank just above Kouzmanoff here, if you could translate second base numbers directly to third, and I don't think you can. The Phillies' third baseman last season, Pedro Feliz, ranks 10th, at 5.0. A-Rod ranks 17th at -11.7.

So put it all together on defense, and you find the Phillies in much better shape.

They're better defensively than the Yankees at first, second and third. They're better defensively than the Rays at first and short, are close at second, and the jury's out at third. But the Red Sox are better at first, short and third, and are close at second.

So the Phillies are better, but they're still not the best defensive infield of 2010 (on paper), never mind of the entire modern era of baseball.

Aw heck, let's put it all together in WAR, comparing the five teams:

First base

  1. (4th overall) Kevin Youkilis, 5.7
  2. (7th) Mark Teixiera, 5.2
  3. (8th) Ryan Howard, 4.8
  4. (15th) Carlos Pena, 2.7
  5. (–) Hank Blalock, 0.0

Second base

  1. (1st) Ben Zobrist, 8.6
  2. (2nd) Chase Utley, 7.6
  3. (3rd) Dustin Pedroia, 5.2
  4. (4th) Ian Kinsler, 4.6
  5. (6th) Robinson Cano, 4.4

Shortstop

  1. (1st) Derek Jeter, 7.4
  2. (4th) Jason Bartlett, 4.8
  3. (5th) Marco Scutaro, 4.5
  4. (10th) Elvis Andrus, 3.0
  5. (13th) Jimmy Rollins, 2.4

Third base

  1. (1st) Evan Longoria, 7.2
  2. (5th) Alex Rodriguez, 4.4
  3. (8th) Michael Young, 3.9
  4. (11th) Placido Polanco, 2.9*
  5. (17th) Adrian Beltre, 2.4

* I took the +2.0 position adjustment for third base and applied it to Polanco's total (subtracting the 2.2 adjustment he gets for second base, so it's not much of a difference) to get this number.

The Phillies rank third, second, fifth and fourth among the five teams. The Yankees rank second, fifth, first and second. The Red Sox first, third, third and fifth. The Rays fourth, first, second and first. The Rangers fifth, fourth, fourth and third.

If we gave them five points for first down to one point for fifth, the clubs would rank this way:

  1. Tampa: 16
  2. New York: 14
  3. Boston: 12
  4. Philly: 10
  5. Texas: 8

98 replies on “The Ins and Outs of Infields”

I knew you were the right man for the job! Great post Paul, as usual. The only thing that bothers me is that WE always have to bend for these old timers. Why can’t they recognize that there are formulas and data out there that summarizes a players ability (for the most part) and in usually pretty darn accurate. Instead we have to just accept that these ideas are just too far fetched.
Is Conlin the guy who used to be on The Sports Reporters years ago?

See…this is why I just don’t buy all the statistical stuff in baseball.
Ex.
So Mark Teixeira didn’t save a single run at first base last year? He actually cost us 4.1 runs? (if I understand the statistic correctly?) This is asinine. I am no expert but I watched every single yankee game last year and there is no f-ing way that the statistic holds up. No way.
(not an indictment of you Paul, just the watered down, crazy statistical nonsense that has become pervasive in these types of comparisons/arguments)

There’s a lot of single-year variation in UZR, krueg, and most people recommend taking three-year samples. But your argument that watching every game is somehow more credible than the data produced by those games is simply wrong.
Eyes lie. This is why sports writers write ridiculous things like “the Phillies have the best infield of baseball’s modern era” and praise Jimmy Rollins’ bat despite the fact that he’s actually quite terrible with it. David Eckstein is the product of people who watched every game he played and decided there was “no way” he couldn’t be a terrific player.
Defense is even worse about this than offense. If a player dives and makes a great play, that sticks with you. If a player with better range gets to the ball with no problem, it doesn’t. The second player is the better defender, but the first player gets all the glory. An even worse defender might never even get to the ball in question, and since he’s so far away, it looks like a clean hit, instead of a ball the best defender would have gotten to with ease. In which case, watching every game may give you the impression that Player A is the best defender, when in fact he is poor, and that Players B and C are similar defenders, when in fact Player B is terrific and Player C is terrible.
The fact is, “crazy statistical nonsense” is simply data produced by the games you watch everyday. If the data disagree with your assumptions, then you should test your assumptions and test the data and determine which is correct. For example, there are multiple defensive stats out there. Do they agree with you, or do they agree with the UZR numbers posted here? Do the numbers here vary widely from previous data for the same players? If you disbelieve the data, then check it out. Simply falling back on your deceiving eyes and imperfect memory is not a particularly effective means of analysis.

UZR for some reason seemed to undervalue tex and cano last season. I actually mentioned this at some point. As someone who watches almost every game, I find it basically implausible that these guys can be considered defensive liabilities. Perhaps not the very best in their positions but certainly not below avg.
Im not sure about all the ins and outs of the defensive metrics but the one that baseball reference uses says that Tex’s defense was worth 7.2 runs over 1200 innings last year and Cano was pegged at 8.1. (Pedroia was 8.6 for reference purposes). So one can decide what to take and what not to. I would be interested to see how the systems that the teams use to measure defense stack up these various players.

“If you disbelieve the data, then check it out. Simply falling back on your deceiving eyes and imperfect memory is not a particularly effective means of analysis.”
Ouch dude…deceiving eyes and imperfect memory? At least you aren’t smug much Paul. OK, Tex is the 15th best 1st baseman. Laughable.

Paul while I understand what you say about one’s eyes and what they say. I dont think its fair to completely discount them in a discussion of defense. Even as a scientist, I dont believe its fair to say Statistics can tell the whole story and watching the game means nothing. Defensive metrics simply dont take into account a players ability to position properly, read developing plays properly (such as cutoffs, etc), and make in-game decisions. Defensive statistics are a valuable tool in evaluating a player but are not the ONLY tool one can and should use.

Further to that point, the fact there is disagreement between the systems and value of various players should say that simply basing ones conclusions on particular statistic is not a particularly effective means of analysis either.

At the very least if you plug average values in for the guys that you think are better than the stats say, Conlin is still OFF HIS ROCKER. Which is the gist of this post, that our lying eyes lead us to make statements that don’t necessarily measure anything but our own subjectivity. Paul’s last comment explains this reasonably.
Forget who you root for: Paul wasn’t writing this to insult the Yankees, and if that is what you take away from it then that’s a shame. He does a magnificent job of exposing Conlin, and he has clearly qualified the stats cited while doing so.

I think we all need to realize that UZR is very, very far from perfect, and needs to be taken with a salt mine’s worth of salt.
From a purely offensive standpoint, the 2009 Yankees most likely had the best group of players in the modern era of baseball. The 2009 Phillies don’t even come close to that designation. That gap widens even more if you decide to include catcher.
Defense-wise…we just don’t know. And since we don’t have the ‘F/X’ data from historical seasons…we likely never will. It does no one any favors to speak in any way definitively on this topic. Defensive statistics are a very, very far way from being definitive. Please, for the sake of argument, let’s treat them as such.

Just because you measure something doesn’t make the measurement accurate. If the thermometer is off it doesn’t matter how many measurements you take, most especially if the error isn’t regular. Variability also has another name: Unreliable.
Hitting is a refined enough skill, measured over a very big sample, that it’s hard to argue what the numbers show. You either hit the ball or you don’t. The same can not be said of fielding. Defense has too many degrees of freedom to be truly quantifiable at this time. The new cameras will help with questions of range, but that’s only one component of defense.
Another reason I doubt the defensive numbers? Jeter. If a player can go from well-below average to well-above average based on some off-season exercises, then how are we measuring a quantifiable skill? That level of variability makes me very skeptical of the whole exercise.
For an example closer to a SF heart: Do you really think Ellsbury was one of the worst centerfielders last year? Or are you going to make a gerrymandered distinction between outfield and infield defense? If anything outfield defense is easier to measure. It’s almost all about range. I guess that’s why the Sox chose to move the younger player to the much easier LF position.
Between Jeter’s “excellence” and Ellsbury’s “horrid” range, that variability is sufficient for me to deem the defensive stats, so far, unreliable.

“Conlin is still OFF HIS ROCKER.”
That’s obvious. NoMaas has the results of emails a fan sent him. Needless to say, the guy doesn’t respect his readers.
“There’s a lot of single-year variation in UZR, krueg, and most people recommend taking three-year samples.”
And yet, three-year samples are so degraded that everyone will regress to the mean. Which three year sample do you take for Jeter? Or for Ellsbury which one year do you believe, because averaging across his three years masks the finding that he’s horrid (2009). Or is he actually awesome (2008) or average (2007)? Oh, he’s somewhere in between? How helpful…

I want to make something really clear, if it wasn’t from my earlier comments. This post is why this blog is interesting. It’s why the blogosphere is relevant, and why people like Paul (and other bloggers around the nets) are unique talents, and why old media is getting lapped. Whether you like the way Paul uses the data (and Paul himself qualifies his usage), it is inarguable that there is tremendous effort at looking at the numbers in as subjective a fashion as is possible. And it is, even with limitation, illuminating.
These are the kinds of posts that YF and I probably never dreamed would be part of this site when we started it, and for that we are grateful to Paul (and others like him), who elevate the discussion, in moments, beyond that which we read on the broadsheets and what we talk about in bars. For that, we (we all, as far as I am concerned) owe him thanks.

Let me say it again: GREAT POST PAUL. Fantastic perspective on Conlin’s assertions. I’m surprised at the level of disappointment some here are expressing when a stat associated to a player “don’t look right.” Well, if you don’t like the numbers, don’t look at them. There’s no requirement that you do to enjoy the game.
> Just because you measure something doesn’t make the measurement accurate … thermometer
That’s an simplification of a dynamic, evolving, robust, open system with constant critical evaluation from people diligently working to improve the tools available and from those chipping/chirping at it from the periphery.
> Jeter
*Eyes Glaze Over* *furble mumble something intangible something*
Bill James didn’t stick a meat thermometer up Derek Jeter’s ass on a Thursday day-night and decide he didn’t have range. Good thing the world doesn’t own just one thermometer or one set of eyes to read it.
If you get pissed off because some resultant number doesn’t please you, don’t be pissed off at the numbers. Either get educated and involved in the process to help understand it, or Just Watch the game and forget about the stats. Or both.
Great post.

For the record, I totally agree with Paul’s conclusions also and do appreciate these in-depth looks. I do think its important in any discussion to talk about what the statistics are telling us and how they are derived. I think that the points that were brought up in relation to the defensive metrics were valid and not meant as a criticism of Paul but of the limitations of quantifying something like defense. I hope that SF and YF feel that these types of comments and discussion are germane to the topic as much as the analysis which precedes.

“If you get pissed off because some resultant number doesn’t please you, don’t be pissed off at the numbers. ”
I certainly wasnt just pissed off at the numbers, I just showed how they arent always to be trusted given their volatility.

“Bill James didn’t stick a meat thermometer up Derek Jeter’s ass”
Why the hostility? If Jeter goes from well-below average to well-above average in one or two off-seasons, then I simply don’t trust the numbers. Ellsbury makes the point much more cleanly. He went from being one of the best CF to one of the worst CF. So what do we call him? Average? Wait another three years? What do we do with those numbers?
“If you get pissed off”
You read what I wrote and think I’m “pissed off”? Nothing could be further from the truth.
“Either get educated and involved in the process to help understand it”
It’s not simply a matter of getting educated if the tools for measurement are faulty. The tools themselves have to improve. The new fielding cameras will help with quantifying range but that’s just one element. I’m afraid that there are just too few data points with too many variables (fields, pitchers, other fielders) to make a consistent case. That’s why a three-year sample is recommended…except it just adds to the problem if there’s already variability – like with Jeter and Ellsbury.
Thanks for a fun post.

“They are all just numbers. They only get volatile when someone hangs an adjective on them, and someone else takes it personally.”
Are you being facetious? Measurements can certainly be volatile, variable, unreliable, etc. Pick your own adjective. Just because something is measured doesn’t mean there’s truth there if we just look a little closer.
Go look at Ellsbury’s CF numbers. They’re so variable I just don’t know how they could be describing the same player. But they are. So what am I supposed to take away? That we won’t know Ellsbury’s true defensive contribution until 2013? When he’s in his mid-30s?

I was thinking something kind of basic: isn’t it possible for players to have good and bad years, both offensively and defensively? Wouldn’t it be possible that a player could perform differently from year-to-year, sometimes significantly so, at either the plate or in the field? Aren’t there countless factors that might make this happen, legitimately? That numbers vary from year to year seems like nothing but reality to me. I guess the swing doesn’t bother me as much as it might some other people.
That’s not to say that defensive measurements are exact, they clearly are not according to many of the number-crunchers who actually use and cite them, even knowing their volatility. But a swing doesn’t necessarily indicate a flawed method, it might actually indicate a swing in performance, right?

I dont believe its fair to say Statistics can tell the whole story and watching the game means nothing.
If you can find where I said that, I’ll retract that sentiment. What I said, and certainly meant, was that if data don’t agree with what you think you’ve observed — and what is data but the numerical (or otherwise) recording of observable events? — then test both the data and the observations.
I think UZR is now suffering at the hands of some who don’t like what it has to say. Those saying that larger sample sizes are “degraded” are not interested in having an intelligent discussion. Those picking out isolated examples of wide variation as a way to discount the use of a statistic are also not interested in having an intelligent discussion. Oh wait, that’s all the same person.
OK, Tex is the 15th best 1st baseman. Laughable.
Almost as laughable as dismissing contradictory data without actually researching the case first. I know, I know, it snowed in NYC this year, so global warming is bunk, right?

Here is a different fielding metric, this illustrates how unclear these metrics can be and also why i have long held a skepticism of UZR.
From baseball-reference.com: (i only had a chance to look up the top few you listed) Here is the statistic they give for total fielding runs above avg for 2009:
1. Adrian Gonzalez 11.7
2. Mark Teixeira 7.2
3. Kevin Youkilis 6.4
4. Derrek Lee 2.2
5. Paul Konerko 2.2
6. Kendry Morales 1.5
7. Miguel Cabrera -1.7
8. Carlos Pena -11.0
This list is pretty different than the one that UZR generated. Youk’s number would be higher if he had more games at 1b for sure. Sorry there hasnt been the time to be more comprehensive but I do believe this makes my point.

“Why the hostility?” – No hostility. I just make crude jokes for (mostly my own ego-maniacal benefit) to illustrate that the metaphor of a borked (or off-scale — it’s probably accurate within its frame of reference–but that’s why we use multiple samples) thermometer being equal to the body of work that is result of the hundreds/thousands of individuals that have and continue to take measure and apply those measurements is ridiculously shallow. I am probably only funny to me, and the hostility is just a side-effect. But, with the math, there is rigor and many eyes. Question it? Yes, absolutely. Compare it to one person passing judgment with a broken thermometer? No.
“It’s not simply a matter of getting educated if the tools for measurement are faulty.”
You are right. It’s not a simple matter of getting educated. But that very admonition you make is bunk, making an “if” statement insinuating that the very process is broken. The tools that people are working on are not simple. And it’s not simply making an “if” statement to derail the conversation to your own design, of which derailment is the design from what I can gather.

“That numbers vary from year to year seems like nothing but reality to me.”
That’s exactly the point. And that’s why they play the games. For as good as the projections and statistics might be, they don’t tell the whole story. The games do. Aren’t the best projections looking at an R-squared of .6? That’s a lot of variability. And hitting is easiest to know – lots of data in a restricted, controlled setting. Pitching is much tougher. Defense is still pretty bad.
” Those saying that larger sample sizes are “degraded” are not interested in having an intelligent discussion. ”
Where’s the backup to that assertion? An average of two extremes isn’t an average at all. Yes, the “sample” is bigger. But the underlying cause for the variability hasn’t been explained any better. The average makes something seem more certain when it is anything but. In statistical parlance it’s called the standard error of the mean.
“Those picking out isolated examples of wide variation as a way to discount the use of a statistic are also not interested in having an intelligent discussion.”
Again, where’s your backup? If the measurement is accurate the wide variability wouldn’t be there. We can’t simply choose when to accept certain numbers as valid then ignore other.
“Almost as laughable as dismissing contradictory data without actually researching the case first.”
How is this different than what you’re doing for Ellsbury and Jeter? Either the data is flawed or the players are wildly inconsistent year-to-year. Either way, you’re building a case on faulty assumptions.
I’m all aboard for the offensive analysis. But the defensive metrics just aren’t there yet, if they will ever be. I’m dubious based on complexity theory. There are just too many degrees of freedom to really nail it. But the new cameras will help, especially in the outfield. Still, for Ellsbury to swing almost 40 runs in one year highlights the huge problem.

“Compare it to one person passing judgment with a broken thermometer? No.”
That’s wasn’t my intent. My analogy of the thermometer was to the UZR stat. It measures something. What is the question in search of an answer. The problem matching it to FRAA is another nice example.
“The tools that people are working on are not simple.”
That doesn’t make them any less faulty. You can have a complex calculation easily go awry if the initial data is still noisy. This data is.
“of which derailment is the design from what I can gather.”
No, not my intent. My point is we can’t rely on defensive stats and assume they’re gospel. They’re pretty unreliable now and I think they’ll be in the future. The new cameras, like pitch f/x, will help. But I remain skeptical.

UZR is suffering, Paul, because people are using it in statistics like WAR, which they then use to make definitive statement’s about a player’s value. UZR should never, ever be stated in a definitive manner, because it’s not definitively correct. There is too much wrong with it. Which isn’t to say it’s not better than other defensive statistics, it likely is. But, just because it’s the best we have doesn’t mean it’s in any way correct. We should all wait for the ‘Field F/X’ (or whatever they’re going to call it) data to come through before we start taking defensive statistics altogether seriously.
Don’t try and paint us ‘non-UZR-believers’ in the same stroke as those who don’t want to believe in, say, OBP as being valuable, in saying that the only reason we don’t like UZR is because we don’t like what it says. I’ll take a page right out of your book and say that anyone who does that is not interested in having an intelligent discussion.

Here’s the Plus/Minus top 10 for 2009. This also is runs saved above average.
1. Albert Pujols STL +12
2. Daniel Murphy NYM +11
3. Adrian Gonzalez SDP +11
4. Kevin Youkilis BOS +10
5. Travis Ishikawa SFG +10
6. Casey Kotchman ATL/BOS +8
7. Kendry Morales LAA +6
8. Ryan Garko CLE/SFG +5
9. James Loney LAD +3
10. Derrek Lee CHC +3
They all measure defense differently, and so all are going to have different totals and different players who excel, depending on their individual strengths and weaknesses. Also, like SF noted, players can and do slump on defense like they do on offense.
UZR looks at a play and determines whether an average fielder would have made the play and credits run values to that. +/- does the same thing, but breaks the field down into vectors. I don’t know what B-R’s runs-based stat uses. I’d actually forgotten they had a statistic like that (which is especially dumb because I copy edited the mouse-over descriptions for nearly all the stats on the site).
Again, this is a little into the weeds from the main discussion. The B-R stat and UZR both have Teixeira being very good in 2008, then sliding back quite a bit in 2009. Taking the three-year sample, UZR says he’s slightly above average, B-R says he’s solidly above average, +/- would probably agree that he’s solidly above average. Why is UZR so down on Teixeira? I don’t know, but that it is doesn’t mean we throw out the whole stat. It means we keep looking for ways to improve it while we check it against other systems to see if it’s consistently off. Frankly, a system that says Pujols, Youkilis and Gonzalez are among the top defensive first basemen — or that Beltre and Figgins are among the best third basemen, or that Everett and Wilson are among the best shortstops, or that Utley, Zobrist and Pedroia are among the best second basemen — isn’t really setting off any alarm bells for me.

> In statistical parlance it’s called the standard error of the mean
In COLLEGE, I failed stats well enough to know that for deviations, I could take it or leave it, as it were.
No, for me, I liked drama, especially stories in which the Charlatan held court.

“isn’t really setting off any alarm bells for me.”
I dont mean to be pedantic but what are your alarms based on Paul? Your eyes? Stats? UZR?

The problem isn’t with best or worst cases. The problem is with the huge group of people in-between – over 80% if you assume a normal distribution.
And again, a three-year average with wildly divergent numbers just obscures the problem. That’s simple statistics.
The problem with WAR is indeed in the defense. Answer this for us: What’s the difference in WAR for Teixeira assuming his best defensive season and his worst?
That all said, there’s an assumption here that we just need better statistics. But what are the variables that could cause a defender to go from really good to really bad in one year? If those are things out of his control, and I think they are, then I don’t see how the stats are likely to get any better.

Don’t try and paint us ‘non-UZR-believers’ in the same stroke as those who don’t want to believe in, say, OBP as being valuable, in saying that the only reason we don’t like UZR is because we don’t like what it says.
Good thing I didn’t do that:
I think UZR is now suffering at the hands of some who don’t like what it has to say.
Let’s not forget something here: UZR varies widely. Some here are saying that’s because it is not reliably measuring what it intends to measure. In reality, variation is mostly a problem of sample size. How many balls does a defender get to over the course of a season? Certainly not nearly as many as the number of plate appearances a hitter receives. Which is why three-year samples are recommended — because they equal about 600 PAs worth of data. If a player is awesome in 50 games, terrible in 50 games, then decent in 50 games, his overall offensive line is X. Do we throw out OPS because what it measured in the first 50 games differed wildly from the next 50? (Looking at you, Jason Bay and David Ortiz. Simply put, variation — even extreme variation — is not a reason for dismissal.
Now, I’m not saying UZR is the same as OPS; UZR has a human-judgment component that OPS does not. Yes, the Hit-f/x data will make things even more accurate, but given we don’t have it yet, we can only use what we have. Should we be clear about the limitations? To the extent that it’s practical, yes. But I’m not going to write a disclaimer every time I use WAR or UZR because this audience seems pretty knowledgeable about the strengths and weaknesses of those statistics.
All that to say: I’m not going to dismiss UZR’s data if it tells me Mark Teixeira sucks on defense just because I saw every game and think he’s awesome. Right now it’s the best we’ve got.
Here’s the creator of UZR discussing this very question back in 2003:
What would the point of an objective “system” be if it only reinforced what it is you think you already know? I think it was Tango who said that a good system (offense, defense, whatever) should coincide with what you think you know 80% of the time and you should be surprised 20% of the time. I don’t know if I am doing justice to his statement and obviously no one knows what the numbers are (80/20, 85/15, etc.), but I agree with the general concept.
“On the flip side, just because a system follows that patter[n], doesn’t make it a good one of course. That is just a quick and dirty “check” on the sytem right off the bat. Implicit in that 20% (or whatever percentage), is that some smaller percentage will be REALLY surprising (like Snow, T. Hunter, maybe N. Perez).
“Whether that means that the system is “wrong” with regards to those players, I have no idea. I doubt it. I think it either means that these players’ true defensive abilitites are somewhere in between (the objective rating and the subjective consensus), or that these players are the ones, for whatever reasons, LOOK good but really aren’t. I suspect it is a little of both, but I lean towards the latter theory (since it’s my objective rating!). Seriously, I lean towards the latter, since after all, that is the whole point (or one of them) of these objective ratings – to identify those players whose defenewive abilitites we CAN’T, again, for whatever reason, nail down by observation.

It’s 4 in the morning here. I’m blown away by this post. Thanks, Paul, for being a mad man! This is awesome. I’ll read the comments tomorrow. Night all!

what are your alarms based on Paul?
As I said earlier, if the data don’t jive with what your eyes tell you, check them both out and see which makes more sense. The converse is true, as well. If the data do jive, then bully, your eyes have not been deceiving you. But I think you already knew that.

And again, a three-year average with wildly divergent numbers just obscures the problem. That’s simple statistics.
And again, mischaracterizing larger sample sizes as obscuring a “problem” is ludicrous. The problem in the first place is small sample sizes. So, yes, larger sample sizes do obscure wild swings in smaller sample sizes. Time to throw out every statistic and start over with ones that … what? Smooth out performance variation in real time by accurately predicting what a player will do over the long term? The name has changed, but we’re obviously still being Robbed blind.

“In reality, variaton is mostly a problem of sample size. How many balls does a defender get to over the course of a season? ”
I agree with this thought completely. The problem is what you call a sample. Across seasons strains the definition.
“Do we throw out OPS because what it measured in the first 50 games differed wildly from the next 50?”
The problem is that a season is an encapsulated unit – same field, same pitchers, same teammates, etc. Subsequent years are not. So much changes from year-to-year with defense, to say nothing of three years, then you can’t just assume an average is the best approximation. It’s clearly not. You’re averaging apples and oranges and calling it watermelon.
” But I’m not going to write a disclaimer every time I use WAR or UZR because this audience seems pretty knowledgeable about the strengths and weaknesses of those statistics.”
No one here that I’ve read is asking you to. The common thread seems to be about making definitive judgments or even ratings based on these faulty numbers. Go nuts on offense if you’d like. There most are confident in the results – they’re as good as we’re likely to ever have. But WAR and UZR are so variable past the top few players, then when you get to rankings, #5 could just as easily be #15 the following year. That’s the very definition of unreliable.
“As I said earlier, if the data don’t jive with what your eyes tell you, check them both out and see which makes more sense.”
Seriously, how does that approach help with Ellsbury or Jeter? I was shocked Ellsbury was so bad last year and amazed that Jeter was so good. So I should simply trust the numbers cause I’m surprised? That seems off.
Honest question: What’s the R-squared for UZR from year to year?
I’m going to guess .3 ignoring how a small number of players could be carrying that moderate correlation.
Now what the year-to-year correlation of the number of errors? That seems much more consistent to me. And yet I know that’s a very faulty stat.

Instead of the bickering, people have already done the work:
How reliable is UZR?
I can’t believe on a blind guess I was right!
A .36 correlation for UZR is not great. That means it leaves 64% of the variance to be described by other factors. Not surprisingly, outfield UZR is even worse.
And if you believe this analysis, wOBA isn’t that great either and given usually over 600 data points. Basically a players offense is pretty consistent year-to-year but it’s not super consistent either.

That’s weird. My link didn’t post. Here’s another try.
How reliable is UZR?
I can’t believe I nailed it! A .36 for infielders isn’t horrible but it also leave 64% of the variance to be described by other factors. Outfield defense is getting to be almost unreliable since that correlation is probably carried by a small number of players.
As for wOBA – .53 isn’t that great considering how much data there is to work with.

Waltham – the severe amount of regression in offensive stats is just the inherent variability in the game. As for defensive statistics – the different in regression is either due to defensive performance being that much more unpredictable, or the shortcomings of UZR, or both.
Your point about averaging defensive stats over several seasons is a good one. I think it’s common knowledge that there are so many more factors that can affect defensive performance – positioning, types of pitchers you’re playing behind, range of fellow defensemen (and these can all have an affect on each other) – that change from season to season, that defensive stats from one season are already half-way meaningless when compared to another season. It’s what stats like UZR can’t hope to capture, but what things like ‘Field F/X’ will hopefully give us a greater understanding of.

positioning, types of pitchers you’re playing behind, range of fellow defensemen (and these can all have an affect on each other)
UZR actually does account for positioning and maybe I’m misunderstanding you, but the type of pitcher might affect the number and type of balls that come at you, but ultimately you’re judged against your ability to make each individual play.
As for UZR’s reliability from year to year:
more than 50 chances in both years = r^2 of .15
more than 100 chances in both years = r^2 of .19
more than 150 chances in both years = r^2 of .24
more than 200 chances in both years = r^2 of .28
Quite simply, the more of a fielding sample we have for a particular player, the great the correlation from year to year.
For comparisons sake, if we look at wOBA from 2008 to 2009 you get this:
more than 300 PA in both years = r^2 of .24
more than 500 PA in both years = r^2 of .30
So the lesson is, when there’s not a lot of UZR data on a player, there will be a lot of noise, but as the sample size increases, the data (at least from 2008 to 2009) actually becomes almost as highly correlated year to year as the stats that are considered to be the most reliable.

In the comments, Colin Wyers, who did the THT study to which Rob Paul links, says they are on the same page, and that his study actually found greater year-to-year correlation than Appelman’s did.
The comments are also instructive: Tom Tango agrees that UZR is no less variable than wOBA given a large enough sample size (he says 100 games is large enough to provide equal weight between performance and mean regression). Another commenter said he did a similar analysis and found UZR to be a better predictor of itself in the subsequent season than OPS. Lee Panas notes that UZR is actually less variable than ERA, and concludes: I think if there is concern that UZR is not reliable, then there should be similar or greater concern about common pitching stats.
Bottom line; UZR varies, sometimes widely. So does every other statistic. There is nothing in that variation to cause us to doubt UZR’s usefulness as a statistical tool. Its No. 1 flaw is that it relies on subjective interpretation in obtaining its data. But that still makes it more reliable than me — or anyone else — using my/their subjective interpretation of what I/they see, trying to remember it all at the end of the year and saying, “I/We think he’s a good fielder.”

I also recommend this link from Tom Tango’s Inside the Book blog, in which he answers a question that specifically mentions Teixeira as a reason to disregard UZR.
yes, we have bias issues to contend with. There is an uncertainty level.
And yes, you also need more data with fielding than you do with hitting. Roughly speaking, 200 PA as a batter (say 50 games) tells you as much as 400 balls in play (BIP) as a fielder (say 100 games). So, if you can make a judgement on a hitter’s batting stats after one year, you can have that same level of uncertainty using UZR after 2 years. …
The important point is this: the 4 fielders with highest UZR were also in the top 10 among the Fans’ picks.
Not only that, but Giambi, Jacobs, Fielder, and Sexson were 4 of the 8 worst fielding 1B last year according to the Fans. Talk about two completely independent systems matching up.
Maybe UZR isn’t as high on Teixeira as it should be or as the Fans have him, but it’s one of those misses that we simply have to deal with. It doesn’t invalidate the entire UZR methodology. It is hard to find those guys that are highly or lowly ranked in UZR who don’t deserve to be there. …
So, every now and then, UZR misses one. We all see Teixeira. He’s played on several teams already (Angels, Braves, Rangers, Yankees), and the Fans of each think he’s a well above-average fielder. Unless he’s happening to hoodwink everyone, I’m fairly comfortable calling him an above-average fielder, and that UZR is missing something on him. As with every thing, a metric might miss every now and then. Such is life when dealing with samples and uncertainties, and less than ideal data recording systems. But overall, UZR is a net plus. It adds value.
The alternative is what? To rely on scouts. But, who are these scouts, and who’s recording their thoughts in a systematic manner? All we get is some reporter cherry picking some scout’s observation to fit whatever the reporters wants to say. If a reporter really wanted to, he would find the one scout that thinks Teixeira is an average fielder.

A correlation of .30, at best by your numbers, leaves 70% of defense to be explained. Good luck with that.
Tango citing the best and worst hurts his case. They’re not the problem. It’s the other 80% that need explaining if you wish to do accurate rankings. You do, don’t you?
I asked it earlier: What’s the variation on WAR based on Teixeira’s best and worse defense?

That’s a pretty tepid response, Rob, to the fact that UZR is as volatile a metric as FIP and wOBA.
Dave Cameron responds in the Tango thread I linked to with the following list of hitters and their 2006-09 wRAA:
Joe Mauer: +33, +9, +26, +55
Alex Rodriguez: +34, +70, +42, +34
Derek Jeter: +42, +22, +8, +37
Michael Young: +13, +9, +1, +27
Aubrey Huff: +6, +3, +32, -16
That’s a lot of variation. And how much do you want to bet that a different run-based metric will have different results? Because that’s what happens when you look at numbers in different ways, placing different emphases on different events.
Regardless of the micro discussion going on here, I’ve found this research fascinating, and the two links I’ve posted have pretty well convinced me that there’s really no reason to have a great deal of confidence in UZR/WAR, particularly on two-year samples or larger.

“UZR is as volatile a metric as FIP and wOBA.”
I’m not sure I buy that. Pitching maybe, but I don’t understand wOBA enough to think it through. Care to explain what goes into wOBA? Does it use BABIP?
“there’s really no reason to have a great deal of confidence in UZR/WAR, particularly on two-year samples or larger.”
Sweet, then we agree.
Who’s Rob?

I just don’t see any reason to state defensive statistics in any kind of definitive manner. Why should we ONLY use UZR in our WAR calculations? The inventor of UZR himself has said that it can be off by a very large margin from time to time. Should we average different methodologies when coming up with definitive statements on player value? Should we separate offensive performance and defensive performance, knowing that the offensive performance is much more definitive, while the defensive performance is not?
There is a backlash against UZR for two reasons: 1) It has been proven to be unreliable for some cases, and its inventor admits as such. 2) Despite UZR’s admitted flaws, its proponents continue to use it to try and definitively deduce a player’s performance during a given season.
No one is saying UZR is mostly wrong. It’s just not a great statistic if it’s only ‘mostly’ right in telling us what happened during a season. It’s a gauge that can be faulty from time to time, only we don’t really know when it’s faulty or exactly how faulty it is. No one should use such a gauge in order to speak definitively on a player’s performance during a given season.

That’s an excellent point, Andrew. It’s varying within the season too. I would gladly support a UZR or FRAA or +/- leaderboard. But really all you’re doing is posting a number like any other. At the end of the year there’s just not going to be much said of the following year. Or more accurately, about 30% will be said. Can we improve that a bit with the cameras? Hopefully.
Of course I wish Jeter were a fantastic shortstop. But I can see with my own eyes that he’s solid but lacks range. He’s always been that way too.

“…But that still makes it more reliable than me — or anyone else — using my/their subjective interpretation of what I/they see, trying to remember it all at the end of the year and saying, “I/We think he’s a good fielder.”…”
good point paul…i’ve said it before, you’re making a believer out of me that stats beyond the ones relied on traditionally have value in assessing performance…as long as we understand and respect the inherent flaws in such data and systems used to churn that data [and you seem to]…i’m still a little old school in that i prefer to temper my reliance on pure stats with anecdotal evidence, and stuff that i see with my own two eyes, bad memory and all…i have to admit a little favoritism creeps in to distort things…jump throws, the face dive, the weird jeremy giambi play, and being a yankee, that all makes jeter better than average in my mind…and i’d take tex over any other 1b despite his #15 ranking…ask the yankee infielders how many times tex saved them a throwing error…and good post…my impression was that you were merely trying to counter conlin’s over the top proclamation by offering up another way of looking at it…nothing wrong with that, but i knew he was wrong without looking at the stats ;) …

I don’t agree with the statement that defensive stats are notoriously flaky but better than anecdotal evidence and I think that statement sums up well the difference in perspectives between, for instance, Paul and Krueg – or at least much of the difference.
At a certain point of “flakiness” a stat becomes much more misleading, less worthwhile, and certainly less interesting to debate (to me) than anecdotal evidence because at least the latter is explicitly subjective.
If you tell me Nomar was better than Jeter because of a whole range of things you saw Nomar do on the field that you thought Jeter didn’t do as well when you watched him, we can have a fun debate about the performance of players we’ve followed for years – recalling games we watched and reliving the emotion we felt while being fans at a game or sitting someplace and watching on TV. This is much of the fun of fan-debates when those discusions can be kept civil or even playfully argumentative. Beyond the fun, are these methods of gauging a player’s value or performance all that accurate? No, but I think most fans would acknowledge that they pale in comparison to RELIABLE stats when such stats exist for the particular area of performance that is being debated. And I certainly don’t think that eyewitness accounting is completely useless in gauging a player’s performance.
On the other hand, a stat that is so flaky as to posit, for instance, that Kevin Youkilis was more than three times better at defensing first base than any other player in the league, (or frankly that any Yankee is that much better at any aspect of the sport than all other players in the league at his position) is both obviously misleading and much more annoying to argue because it is masked as not only objective but somewhat accurate because it is – after all – a number. Come to think of it, is there any other stat which is viewed as even moderately reliable that would rate any player’s performance at more than triple the value of the next best performance in that category? That’s not meant as a rhetorical quesiton to prove a point – I am actually curious if there is one because I can’t think of one. If there isn’t then it does further cast doubt for me on the value of single-season UZR, on which i didn’t really need any more doubt to be cast frankly.
It seems to me at the very least to be arguable which measure (the exremely flaky stat or the unreliable eyewitness subjectivity) is more accurate as a measure of a player’s performance. And it’s very clear to me which I’d rather debate if those are the only two measures available to me for certain kinds of performance on the field.

i agree with ih. i’d also like to suggest that some of the advanced metrics we use here, while i think they’re great in general, can be a bit misleading in specific. the various numerical gauges that purport to give some kind of total objective value to a players performance are in fact in many ways subjective. they pick a little of this (ops) and a bit more of that (obp), weight these with a score of other factors, and at every turn they’re subject to error, and also leaving something out. in aggregate, they might be the best measures we have, and fairly accurate. but they’re not perfect either. we’ve just come off of a long re-examination of the jeter-vs-nomar debate. was nomar better in some of those early years? i would guess. but even still, there are factors (statistical factors) that are unquantifiable. what is the cumulative value of p/pa over the course of a year? i don’t think this, for instance, is included in any of the weighted stats. should it be? and how much? does it matter? i don’t know.

I greatly appreciate the last two comments here since although Im a huge fan of baseball statistics and respect their use in talking about baseball, I feel that sometimes they are indeed relied on a bit to heavily. I do hope these well articulated arguments by IH and YF will be remembered when people disagree over statistical arguments. Thanks guys…

These last couple of comments bother me slightly not because they are unreasonable (they are reasonable) or disagreeable (I actually agree with them), but rather because they seem, perhaps unintentionally and impersonally, to mischaracterize or ignore the hard work that Paul has done and the disclaimers that he has offered. Paul himself states:
“What I said, and certainly meant, was that if data don’t agree with what you think you’ve observed — and what is data but the numerical (or otherwise) recording of observable events? — then test both the data and the observations.”
Seems really reasonable to me, and in no ways at odds with anyone’s desire to use anecdotal evidence to explain their subjective position.

Sigh.
If you’re unwilling to believe in the veracity of the statistic — or any statistic? — I can’t do much about that. When the greatest statistical minds in the genre run studies that show that UZR correlates better year-to-year than ERA, then I’d be curious as to what statistics you’re willing to accept.
Batting average is, on the face of it, a ridiculous stat: The denominator is all plate appearances — except a lot of those that end in a positive result (BB or HBP) as well as a handful that result in a negative result with some positive side effects (SF). Say what?
OPS/OPS+ is a well used stat that has a lot of value, but it doesn’t weight on-base percentage properly, yet we all use it to determine the value of players.
ERA/ERA+ fails entirely to account for bad luck/defense, something over which pitchers have absolutely no control.
Wins are practically worthless, yet many people still look at them as the ultimate arbiter of pitching value.
So if you’re unwilling to accept a statistic that has flaws — as basically all of them do — but still gives a mostly accurate view of defensive prowess, again, I can’t do anything about that. But then I would expect you would treat all statistics with such skepticism, and then I would ask you: How do you evaluate players without being able to see all of them play every game?
a stat that is so flaky as to posit, for instance, that Kevin Youkilis was more than three times better at defensing first base than any other player in the league,
Well, in 1919, Babe Ruth hit more home runs than most other teams. I guess we need to throw out home runs as a stat? Ok, that’s not fair to your point, but you’re not being fair to mine, either. I explicitly said that Youkilis’ number is being affected by small sample size, something that afflicts all statistics. It certainly affects OPS (which I assume you’re OK with). Get a 50-60-game sample, and see what that does to the leaderboard of any statistic.
Meanwhile, it feels like you didn’t bother taking the time to read any of the defenses of UZR that I posted if your principal concern with the stat is something I already acknowledged and addressed in the original post.
they pick a little of this (ops) and a bit more of that (obp), weight these with a score of other factors, and at every turn they’re subject to error, and also leaving something out. in aggregate, they might be the best measures we have, and fairly accurate.
So they’re the best we have, and they’re fairly accurate, and that’s a reason for … what, exactly? I guess I don’t understand the point, notwithstanding there’s no overall-value stat that actually includes either OPS or OBP, as far as I know. (But I know you were just throwing stuff out as an example, not looking at one particular statistic).
I get very little opportunity to watch any baseball during the regular season. I listen to a bit more via XM. That’s it. If all I have to determine which players are the best are “the best measures we have, and fairly accurate,” then I’m going to use them. If others have anecdotal evidence that something is askew, by all means, it can be presented — but no one has yet explained how anecdotal evidence, which has led as a quick example to HOF arguments for Jim Rice (“the fear”) and Jack Morris (“pitching to the score”), is superior to these metrics.

” When the greatest statistical minds in the genre run studies that show that UZR correlates better year-to-year than ERA, then I’d be curious as to what statistics you’re willing to accept.”
Strawman alert. Who expects ERA is consistent from year-to-year? Of course, straight ERA isn’t a great statistic. What’s the comparison to FIP? And when you say “correlates better” what’s the comparison? .30 to .28? This is a good example where you have to be careful making definitive statements.
“Wins are practically worthless”
Except it’s the only stat that determines anything at the end of the year. It’s all about wins.
“a mostly accurate view of defensive prowess”
You have to define what you mean here. I assume you mean they’re fairly accurate in summarizing what happens in-season. But across seasons, a .30 correlation isn’t that reliable (or to use a poorer term, “accurate”). That leaves 70% of the variance to be explained. If that’s the best the greatest statistical minds can come up with, we have a long way to go. I can understand how a pitcher or hitter may have bad years – given the nature of balls getting hit. But why should fielding vary so widely?
The problem isn’t with UZR, or wOBA, or FIP. The problem is with using them incorrectly – like in fantasy-land rankings – and expecting people not to blanch. Of the three, only batting would I expect any consistency from year-to-year and exactly because of the reason you state – large sample sizes across diverse contexts.
I don’t read anyone here saying only anecdotal evidence should be considered. Since this discussion began with UZR and fielding, I suggest we stick to that. And there, I don’t expect the stat to tell me something I didn’t know already and when it goes haywire I tend to trust my eyes. Ellsbury isn’t as bad as he was made out to be. And Jeter certainly isn’t as good as he is “shown” to be.

The only problem with fielding metrics is they lack charisma. Otherwise, I’d guess they do a better job than the eyes in evaluating a player’s contributions on the field. It’s a little odd to me that people form opinions, even strong ones, about players’ fielding abilities based on the games they watch. I think most of us watch the games on tv. Television doesn’t allow us to see fielding, especially the most important part of fielding–the range of the player. We are hardly ever privy to where the fielder began and finished during a fielding sequence. He is hidden from view. Even instant replays fail to show us much without the context of when the ball was hit. There are, of course, outliers: players like Ozzie Smith, who are so brilliant that your eyes cannot deceive. There are players who are so bad, you don’t need the fielding metrics to know. But the majority of the time, I think it’s very difficult to come up with an accurate view of a player as a fielder based on watching the games on tv.

I think it’s very difficult to come up with an accurate view of a player as a fielder based on watching the games on tv.
What about on an iPhone?

It doesn’t take a genius nor an experienced eye to see that Mark Teixeira is one of the best, if not the best, defensive first baseman in the AL. The stat that Paul provided that I questioned had him as the 15th best…sooooooooooo anyone else agree with that statistic being true or reliable?

sooooooooooo anyone else agree with that statistic being true or reliable?
It may be as reliable or more reliable than the first assertion of your above comment. How do you know that Teixeira is the best defensive fielder in the AL, krueg? Serious question – is it because he looks good on TV? Is it because he looks “smooth”? Is it because he is a great receiver and you remember the errant throws he saves, but don’t compare that to the gross number of errant throws that other players are faced with and save? Is it because other people have said he’s the best? What is your measure? This is an honest question, because it gets to the crux of the issue which is what do people see and think vs. what do developing metrics tell us, why are they sometimes quite disparate?
Nick’s point is a really good one, for that matter.

The two phrases “…if you’re unwilling to accept a statistic that has flaws…” and “If you’re unwilling to believe in the veracity of the statistic — or any statistic?” and the 6 paragraphs of your response that those two phrases frame make up a large red herring in my humble opinion.
I am not asking that you throw out all those stats or even that you throw out UZR. I am asking that you not discount anecdotal evidence (or comments based on common sense) with what Krueg read early on in the thread as smugness – a perception I shared then and which I perceive again frankly with the patronizing “sigh” that begins your response to me (what is that anyway — am I supposed to apologize for forcing you to deign to respond to me?)
All I am asking for – which I understood at least a couple of other regulars here to support – is for you to be willing to engage with (or at least not to dimsiss out of hand) discussions of a player’s performance that reference anecdotal evidence and especially when they are offered in response to the flakiest of statistics.
SF and you have both noted in your comments here that you did indeed welcome such commentary, though in the post you wrote “Defensive stats are notoriously flaky, especially on one-year samples, but it’s better than anecdotal evidence” which is the phrase with which I took issue in my comment. In that comment I did not argue that anecdotal evidence is superior to statistical measures in gauging a player’s value (and certainly not in ranking players against each other which would, as you and Nick say, require watching a ton of games with a great deal of insight, perceptivity, and knowledge of the game). But I did argue that – both as a source of fun debate and as a gauge for talent that is certainly more unreliable than many stats but arguably not for the flakiest of them – anecdotal evidence should be valued as well.
Einstein said “not everything that can be counted counts and not everything that counts can be counted” – and that’s ultimately my point both on this thread and for the argumentation we have on this site generally. It is not meant to discount statistical arguments – not at all. It is simply meant to put a stake in the ground for the value of also gauging players’ worth and value by observing the game. (Aside: it always helps to situate yourself with a Nobel Prize winning physicist when arguing with someone who is better with numbers than you are)
As for the impression that SF picked up that I somehow discounted all the hard work you did with this post, nothing could be further from my intent. Seriously. I find your posts consistently fascinating, thought-provoking, and challenging. They make me wish you were a Yankee fan (to which end I pray for your soul). I don’t like the dismissal of non-stat based analysis and opinion that I sometimes feel accompanies them, but that is a rather minor point relative to the value I get out of reading them and if it is the case that I am the only one who reads some of them that way then it is my issue to deal with, not yours.
By the way, in 1919 when Ruth hit his 29 HRs, there were 5 other players who hit more than one-third of that total, so quite apart from that example being unfair to my argument as you graciously note, it is actually not a true counterpoint to it. So yeah, we can keep that stat…

With all due respect to Paul and SF, I really do believe that any time a statistical analysis that is published here gets questioned for any number of a reasons, we get shouted down or browbeat with numbers and made to feel as if our own way of looking at stats or the game is invalid. On this thread, I included stats from another defensive system that were largely ignored for reasons that arent clear to me. I realize that a certain commentator can often derail things here but the viewpoints of some of the regulars shouldnt be simply discounted if they differ from the authors’ conclusions or views.
I also take offense to the fact that discussing or questioning an analysis always gets viewed as an attack on Paul or lack of appreciation for the great work he has done. I have often shown my appreciation of the great interesting stuff he consistently posts here but if I say “I dont like UZR and dont believe that strong conclusions should be drawn from it” it doesnt mean I dont appreciate or for that matter disagree with the primary conclusions of the main post.
Finally, every person who comments here has some level of bias towards the team that he supports. That bias plays out in the way we look at statistics, watch games, and read news about our teams and the analysis that results from these sources. This bias is part of what makes this site interesting. Questioning the analysis using the same variety of tools should be fair game and part of the discussion here. Im sure some of you will disagree with some (or all of my points here) but this is something that I have felt for a while here. I hope that I am appreciated as a regular and I know its your sandbox SF and YF but these are my thoughts about how the discussion on these topics has gone in the last number of months.

“It doesn’t take a genius nor an experienced eye to see that Mark Teixeira is one of the best, if not the best, defensive first baseman in the AL. The stat that Paul provided that I questioned had him as the 15th best…sooooooooooo anyone else agree with that statistic being true or reliable?”
It’s funny. The aesthetics of his fielding don’t appeal at all to me and don’t suggest an elite fielder. That doesn’t mean he isn’t. But since Donny Baseball and Keith Hernandez are my fielding heroes at that position, I associate thin and smaller body types as the ideal at that position. It’s not a particularly rational way of judging fielding.
Anyway, Paul, in this thread, posted a Tom Tango response to the Mark Teixeira issue above. It’s an admission on the part of Tango that UZR sometimes misses some. Overall, however, he believes that UZR adds a good deal of value to our understanding of the game. I’m not sure if that satisfies the detractors of the stat, but there it is. Having no background in these things and basically feeling more and more alienated from advanced metrics this past year, it doesn’t bother me either way. I’m going to be more about embracing the drama, the frustrating waiting, the anger, the joy of the coming baseball season. I’m going to be more about the narrative and sometimes that narrative will include statistics, but it probably won’t include fielding stats.

At this point, it feels like we’re talking past each other, and I think we’re all a lot closer than we think, so I’ll leave it there. Everyone’s more than had their say, and I doubt any good can come of continuing this further.

What I’m excited by is the notion that the 2010 Red Sox will be a test – a $175 million test – on the reliability of fielding metrics on team success. If they score 800 runs but only give up 650, then they’ll have an outstanding season…

“sooooooooooo anyone else agree with that statistic being true or reliable?
It may be as reliable or more reliable than the first assertion of your above comment. How do you know that Teixeira is the best defensive fielder in the AL, krueg? Serious question – is it because he looks good on TV? Is it because he looks “smooth”? Is it because he is a great receiver and you remember the errant throws he saves, but don’t compare that to the gross number of errant throws that other players are faced with and save? Is it because other people have said he’s the best? What is your measure? This is an honest question, because it gets to the crux of the issue which is what do people see and think vs. what do developing metrics tell us, why are they sometimes quite disparate?”
Well, seeing as I watch ever single Yankee game I saw his entire body of work in 2009. How many “bad plays” did the guy make? i.e. the ball went through him, he booted a grounder, missed a relay, etc.
4 errors
.997 fielding percentage
So…based on my pee-sized brain and simplistic fandom, I would say that Teixeira is pretty fucking good. So which 14 1st baseman are better than him again???
And for the record, I really don’t care if he is considered the “best” 1st baseman, but he certainly isn’t 15th. Half-assed stats be damned. But of course, Paul said we are done talking about it and he is always right so…

per Nick’s comments above, did you watch in person or on TV? Did you see where he started every play? Did you see the lines he took to the ball? Your eyes are valuable, without any doubt, there is no slight meant in talking about the limitations of them (or my eyes, or anyone else’s). But as has been beaten to death above the eyes sometimes lie, and they are best used as a supplement/complement to data (or rather, the data is better used to supplement/complement the eyes, since we all care more about watching than analyzing I think), even if the data doesn’t agree with the eyes. I don’t think anyone here advocates an all-or-nothing stance on anecdotal evidence or statistical data, that has never been asserted by anyone and that conclusion seems to be what we have reached through the plethora of comments.
I have no interest in picking on Teixeira – this is not about him (or Ellsbury, or anyone else on our teams) but rather it is about how data complements, defies, supplements, reinforces what we think we have seen.

” it is about how data complements, defies, supplements, reinforces what we think we have seen. ”
How do you separate out each of those outcomes?
Specific examples are helpful because they frame the debate in concrete terms. For the statistical minds, how many chances are we talking about for Teixeira. My one criticism of him is that he doesn’t play the line particularly well. If a few doubles got by him is that all we’re talking about for his “ranking”?
Ellsbury, and Jeter, get more chances but the samples are so small and noisy that they’re slightly helpful – like a .30 correlation suggests – but not particularly insightful.
The reason I brought up these examples is because they’re so illustrative of the larger debate. Moreover, if the Sox have the advanced fielding metrics, and with cameras to support them, they’re assumed to have, what have they done to correct Ellsbury’s poor fielding. That’s an interesting story to me. They obviously moved him because of what the metric said, even though the public stats had him as excellent in 2008. How do they decide when to trust the numbers? The same problem is true of the Yankees and Jeter. If the stats came back differently the last two years, are they trying to move him to LF?
The real problem is those who uphold the defensive stats seem to assume there’s truth there if we just look harder. But the truth they end up with is so far from any reliable standard I’m left to conclude the game is too “random” to truly summarize in that way. Funny enough, the statistics bear our my conclusion more than they do the stats crowd. We’ll see how the field cameras change things, if they do.
The points about other stats were indeed red herrings. The complaint here is about the unreliability of UZR and other fielding metrics. Pitching stats are unreliable too, but then saying pitchers are unreliable is not really news. We expect fielding to be consistent year-to-year. Maybe that’s the problem? But if they’re not, what are the variables that need to be defined? How can something like “range” vary because of the player, assuming they’re healthy, rather than the contexts?

I will say this: I like UZR. I think it’s useful and gives us a pretty good idea of what a fielder is doing. There are certain areas where it’s fuzzy (Catcher and First Base Defense), but I prefer it to the other systems we have.
As most fielding statisticians will tell you, you can’t really look at one year of metrics and make a definitive statement about a player – there’s too much variation.
Re: Jeter. this is the second straight year he’s been above average according to fielding metrics, after several years of being bad, indicating his improvement might be real and not a flux. In this particular case, I noticed his uptick in production coincided with the beginning of Girardi’s tenure and his repositioning in the infield. I think his improvement is attributable largely to that, and maybe a little to agility/lateral movement conditioning that he’s spoken about doing in the offseason.

If Jeter’s UZR can go up so dramatically because of positioning, then I’m afraid the stat is close to worthless for the purpose it’s often used in comparing players.
Still, that leaves an interesting use case for teams.

“per Nick’s comments above, did you watch in person or on TV? Did you see where he started every play? Did you see the lines he took to the ball? Your eyes are valuable, without any doubt, there is no slight meant in talking about the limitations of them (or my eyes, or anyone else’s). But as has been beaten to death above the eyes sometimes lie, and they are best used as a supplement/complement to data (or rather, the data is better used to supplement/complement the eyes, since we all care more about watching than analyzing I think), even if the data doesn’t agree with the eyes. I don’t think anyone here advocates an all-or-nothing stance on anecdotal evidence or statistical data, that has never been asserted by anyone and that conclusion seems to be what we have reached through the plethora of comments.
I have no interest in picking on Teixeira – this is not about him (or Ellsbury, or anyone else on our teams) but rather it is about how data complements, defies, supplements, reinforces what we think we have seen.”
I think it’s all psychobabble. I don’t need some crazy made up stat to tell me that Tex is a great 1st baseman. Especially in light of all the scrubs we’ve thrown out there since Donnie Baseball hung up his cleats.
No, I didn’t see where he lined up every at bat, nor did I time his reactions to ground balls, etc, etc, etc, etc. What I did see, is that the guy made plays on balls I regularly saw get through our infield in the past, not to mention the crazy amount of throws he picked out of the dirt. I’ve never heard a SINGLE expert/talking head/columnist say that the guy was anything other than one of the best defensive first baseman in MLB. Period. So somehow, because of some stupid statistic that some number nerd came up with in his basement says otherwise, I’m supposed to ignore what I see with my eyes, what other fans of the game as well as the “experts” agree upon because Paul pulled some stat that said the complete opposite of conventional wisdom?

<>
This is really, really wrong. Correct positioning is a huge part of fielding. Even great athletes can have their abilities negated by poor positioning.

Above comment, meant to quote this from Paul in Waltham’s post: “If Jeter’s UZR can go up so dramatically because of positioning, then I’m afraid the stat is close to worthless for the purpose it’s often used in comparing players
Still, that leaves an interesting use case for teams.”
As to the team argument, that’s not really how the stats work, because the way it’s measured is to divide the field up into segments that each fielder is responsible for, regardless of positioning. So if the 3B is making plays in the SS zone, it will show up as a plus for the 3B and a minus for the SS.

But positioning is not part of a fielder’s inherent skill. It’s dependent on so many things – other defenders, the team’s defensive alignment to certain batters – and if UZR can’t completely account for that, it further diminishes its accuracy.

Thanks Andrew. I’d only add: If Jeter went from an obvious negative to an obvious positive, based on positioning, then what does that tell us about him as a player? That he doesn’t position himself properly?
What I meant about team use is that a team could see a low UZR and re-position the player. In effect, they can do an experiment and use the stat to judge its effect. As fans, we’re left to arguing over correlations, and comparatively weak ones. Say the Yankees see Teixeira’s low UZR and tell him to play closer to the line. If his UZR goes up, is that really attributable to Teixeira?
The same is true of Ellsbury. There’s no question the Sox think they see something in the numbers. But by moving him to a new position, they’ve changed the context so drastically, especially with the Monster, that they’ve basically started over on evaluating his defense.

“Why wouldn’t positioning be considered a skill?”
How do you separate that variable from something like range? When a player moves to a new stadium and a new team, what effects do you parcel out where?
This is why I think defensive metrics will always be lacking. But I am curious enough to see how the new cameras work out. As soon as the season-to-season correlations top .50 I’ll start to take notice. But we still have a long way to go.

Correct me if Im wrong but UZR also doesnt take into account many of the throws a player makes (or avoids making if not confident in his arm) in the course of a season. Clearly this should go into the analysis of the quality of a player’s defense and yet another reason why what one sees should be at least taken into account as well. Tex’s arm is one of the best facets in his defensive game (at least according to my lying eyes)
The main point here is that when talking about a players value to a team its not always fair to use the defensive stats that are available to us in a purely quantitative way especially in combination with the much more quantifiable offensive stats. I think its best to leave those types of numbers out of the analysis of value or at least list them separately (as paul did here).

“How do you separate that variable from something like range?”
You don’t have to. All that matters is whether you the plays in the area designated to your position. Whether that’s down to yo your range, arm strength, familiarity with scouting reports/positioning, agility, or whatever your “toolset” is doesn’t matter in the end.

“All that matters is whether you the plays in the area designated to your position.”
That seems overly simplistic for the reason you cite. A fantastic SS will take plays from 3B. But to give the 3B a demerit based on his teammate is exactly the problem. The skill becomes very, very context based (so even more prone to yearly fluctuations). We heard for years how deficient Jeter was. Really, it was his positioning? That doesn’t seem to be the mark of a good statistic if it is so easily changed while the player is still the player he’s always been. For another example, look at A-Rod. He’s bounced from 10 runs up to 10 runs down in his time at 3B. So what’s the bottomline? The average says he’s about average, but with those extremes what can we really conclude?
The one surprising bit to me is that outfield defense, which is mostly range, is even less reliable than infield defense because of the fewer chances. But given the vagaries of how batted balls travel, that’s really not so surprising. I hope the cameras will improve things, but that correlation seems so low I don’t think we’ll ever get there. Hopefully I’m wrong.

“whatever your “toolset” is doesn’t matter in the end.”
Doesn’t it matter though when making determinations about what players should be playing where? We’re seeing both the Sox and Yankees making decisions about their LF/CF. Don’t they at least want a better understanding before making a decision?

If you’re talking about where teams should decide to play guys, then yes – of course it does. But that goes for any sport’s scouting/player development. You always have to ask yourself, “Does this guy have the tools to play position X effectively?” Sometimes you have to decide whether that 6’4″ 270 pounder is a better outside linebacker or defensive end.
But when you’re talking about measuring how effective a player has been at a certain position, than not really, as long a a guys is making all the plays he needs to at his position.

But you said it: UZR doesn’t care about positioning. It only cares about predetermined vectors that are assigned to a STATIC location on the field, no matter where the fielder actually is placed by the team’s defensive coordinator.
If this is true, it’s a horrible, horrible way to measure true defensive performance, and even further undermine’s UZR’s ultimate validity as a viable defensive statistic.
Like PFW said, hopefully the cameras will shed some light on this, and we can develop a stat that does take positioning into account, so we can get a much truer representation of a fielder’s ability. This could also greatly help defensive coordinators fine-tune their plan to maximize the team defense. So we could have a positioning-insensitive stat, which would be more useful on a team level, and a positioning-sensitive stat, which would be more useful on an individual player level.
Hell, teams probably already have this at some rudimentary level.

Just to clarify – when someone says ‘makes all the plays he’s supposed to’, this has to take into account where exactly the team has placed him. If you tell your shortstop to play in, and a ball gets by him that he would have fielded had he been further back, then who’s at fault – the player or the defensive coordinator? Alternatively, if the coordinator tells Jeter to play more up the middle – and a ball gets hit to the first-base side of the field, and Jeter fields it – does he really deserve those extra points? And does Cano, who would be playing further towards first base than normal, really deserve those demerits? In other words, did that play truly tell us about the ability of the players in any real sense, or more about their positioning?
This is why we need something that takes into account where the fielder actually starts out when the ball is in play. Then all the calculations about ball trajectory/velocity and what an average player would accomplish on each can be used, only this time relative to where the player was placed.
And I’m pretty sure this is exactly what those ‘cameras’ are set out to accomplish.

Just by searching “UZR positioning” on Google, I found this (bolds mine):
http://www.baseballdigestdaily.com/bullpen/index.php?option=com_content&task=view&id=79&Itemid=26
“JH: Describe the role of fielder positioning in your model and what affect it has on results.
MGL: Good question. Like most of the models and metrics out there (PMR, ZR, Dewan’s Plus/Minus System, UZR, etc.), unique (where a particular fielder likes to play or is directed to play by his coaches, or even is forced to play due to the park) fielder positioning is inherent in the results. We don’t track (no data that I am aware of does) fielder positioning before a play evolves. So really the results of UZR and all of the similar methodologies, as far as I am aware, really measure range and positioning. If one fielder is better at positioning than another, then he will likely have a better UZR even if he has the same or worse range.
For example, they say that positioning is what made Ripken so good. I don’t know, although I have seen him play quite a bit (I don’t trust my eyes very much at all when it comes to evaluating baseball player talent – at least not fielding talent).
The metric cannot separate the two (range and positioning). There is no way for us to know where a fielder is playing based on the data. Actually, I take that back. It is theoretically possible to estimate a fielder’s average position from the scatter plot of the balls fielded and not fielded, but it would be a fairly vague inference and it wouldn’t change a fielder’s UZR anyway.
Most people, myself included, feel that a fielder’s positioning, good or bad, should be part of his skill set. Of course, what if a coach misdirects where a fielder should play, or he only plays in a sub-optimal location because some other adjacent fielder is exceptionally good or bad? That fielder would unfairly get shortchanged in his UZR results. The converse is also true to some extent. If a fielder were correctly playing in an unusual location because his pitching staff has an unusual distribution of BIP’s, his fielding skill would actually be overrated. Some people have suggested that is why I have Swisher rated so good in RF.”
So you see, it’s really pretty suspect to rely on any defensive metric to know a player’s true inherent ‘range’ ability, since they cannot be separated from positioning. There can be an argument on whether or not a player is responsible for his own positioning before a play (I really don’t think so – don’t all teams coordinate their defenses to a team plan? I don’t know of any player that decides his own positioning plan, or any team for that matter that simply lets its players do whatever. Well, maybe the Mets.), but what cannot be argued is that any defensive statistic today simply doesn’t have the ability to judge a player’s actual range.

“Hell, teams probably already have this at some rudimentary level.”
The Sox absolutely have this in Fenway. I would love to know how much UZR was off with Ellsbury in each of 2008 and 2009. That they moved him for a player they couldn’t know as much about (since he didn’t play 81 games there) speaks pretty loudly. But they could just be looking where the lamp post is and valuing their own stats too highly. I wonder if they’d swap Cameron and Ellsbury mid-season if their metrics aren’t high on Cameron either. Or do they wait until the end of the year? This for me is when it gets really interesting. We all tend to know now what hitters and pitchers can tend to do. But fielding is still black magic.

Correct me if Im wrong but UZR also doesnt take into account many of the throws a player makes (or avoids making if not confident in his arm) in the course of a season.
To just discuss this factually, UZR does take into account a player’s arm, at least UZR as calculated on Fangraphs does.
Further, here is a list from the link I posted above of what UZR does factor in its calculations:
– the location of the batted ball,
– the trajectory of the batted ball,
– the speed of the batted ball,
– whether the batter was LH or RH,
– whether the pitcher has a tendency to give up FB or not,
– what park the player is in,
– the base/out situation
It’s odd that we’re discussing Jeter, given that, until this conversation, he was a case in which everything I’d heard and seen anecdotally — range is much better than the years when he was winning those much-derided Gold Gloves — was confirmed by UZR and the other defensive stats, which all show a stark improvement from well below average to average or slightly above. I’ve heard much less grousing about Jeter’s recent GG because everyone — both anecdotally AND statistically — seems to agree that he actually has transformed himself into a competent defender. Meanwhile, YFSF has linked to articles in which the Yankee front office spoke to Jeter about his defense, and in which Jeter has said he worked on it hard for basically the first time in his life (big-time paraphrase there).

What’s odder – a 35 year old shortstop finally learning to field at above average levels or a statistic showing a positive contribution for the first time since the stat has been collected?
“which all show a stark improvement from well below average to average or slightly above”
It depends on the year. According to UZR, some years he was average (2002-04, 2008), some years he was well-below (2005-2007). The year that’s different is 2009 which was above average. In statistical terms, that’s called an outlier.
“I’ve heard much less grousing about Jeter’s recent GG because everyone — both anecdotally AND statistically — seems to agree that he actually has transformed himself into a competent defender.”
You only heard less because the people pushing the anti-Jeter case were relying exclusively on the stats to do so. As soon as those flipped, what were they going to say or do? They certainly weren’t going to throw out the stats.
“Jeter has said he worked on it hard for basically the first time in his life”
Exactly why that explanation is phony. Of any player in the game today I’m supposed to believe Derek Jeter wasn’t working hard his whole career? Mark had it right above. They moved him a step closer to 2B once Torre left. He’s got average range and doesn’t make mistakes. That’s who he’s always been.
Now that we covered Jeter, what’s your explanation for the 40 run swing in Ellsbury’s defense in one year? I guess he just got lazy?

UZR’s white elephant is the fact that the cameras used to record each play are not located in the same place in each ballpark. That introduces a significant and unacceptable error and makes UZR borderline useless. It would be like recording the speed of an object with radar guns at different angles (another thing ballpark data sources do). Obviously, the collected data will be skewed.

Leave a Reply