N8 Asks:
London's initial ranking of 600 seems a bit low.  I'm curious what your policy on beginning rankings is.  I know that some other leagues (Gotham, Oly, Etc.) have started at 700 or even 800 for their initial bouts.  It seems that it would affect all of the ratings changes in this weekend's UK bouts if London began at 700.  What do you think?


David Answers:
You are absolutely correct.  It does make a big difference for whoever gets to play an "unrated" team first.  The starting values were optimized to minimize this impact on a year-by-year basis (which is why you see teams from earlier starting at 800, and then teams starting at 700, and now teams are starting at 600).  The fact that the optimal starting value is consistently near the bottom of the ratings range reflects a general maturing of the league and the relative lack of experience for most incoming teams.  

We did contemplate London’s impact when we were first coming out with FTS 2.0 and I'll admit we were all sitting around waiting to see how bad it would be this weekend.  [I had actually predicted that all teams would be down by almost 30 points, so the impact was less than I had feared.]  The thing to keep in mind, however, is how many Londons are there out there? — teams that have been playing for a while, are pretty good, but aren't WFTDA?  It was really important to us that we didn't start a habit of guessing a team's skills.  So instead, we accept the distortion that will occur in the system while London gets itself sorted out in order to preserve the greater integrity of the automated system.  The cool thing about this algorithm is that, while the individual teams that played London will be down maybe 15 points from where they should be, once they play outside teams, they will get normalized to the external system almost immediately...so it really is a pretty quick ripple, it just feels bad for a couple of weeks.


Comments

Wouldn't it make more sense for the initial value to be determined based on the first game? In other words, London's initial rating would be based on Montreal's incoming rating and the resulting score. Doing it this way would mean zero impact to the team who plays a new team first, and a theoretically more accurate initial score because it's not set arbitrarily.

That's a really interesting idea. I don't think I could generalize that all the way back to the beginning, but it might be workable starting with 2010 or 2011.

I'll have to think about this some more. My first reaction is that doing this would eliminate any repercussions for the mature team. In other words, they would not suffer from a poor performance, nor be rewarded for a good performance. Something about that seems bad to me. However, the underlying assumption for this method would be that the variation of the "mature" team would be less than the error on a fixed starting rank. It seems reasonable, but I'm not sure if it's true. I'll go see if I can't look at some distributions in the next couple of days and let you know what I find.

Thanks for the great idea!

Oh, but the epistemology is the best part. You actually don't know if the veteran team played well or poorly because their opponent hasn't played anyone else before. It could have been the best performance that team has ever brought to the game of flat track, but you wouldn't be able to tell because they gave that performance to an incomparable team. For that first game, the veteran team is both the best and worst opponent the newbie team has ever played. And at the end of the game they can only be judged based on their single performance with that veteran team, because that's the only relevant information available.

Yeah, I get that...but I tend to think about the distribution of a team's performance more in an absolute sense...I'll probably expand on that more at a later time as part of another topic. But back to the question at hand, I was able to look at some interesting distributions last night. It turns out (probably not really a surprise to anyone) that the new teams really are all over the map and not clustered close to their initial rating. So inverting the expectation calculation would definitely be a better way to go. Unfortunately, when I attempted a quick test, the overall quality of the new algorithm was worse than the one we're using now (measured by the difference between expected and actual results for all bouts). What that means to me is that I've optimized all of these parameters in a way that they are supporting each other in the aggregate, even if there may be momentary anomalies. So for now, I think it's best to leave the system as it is. But I really like the idea and I'll keep playing around with it to see if I can figure out how best to make the whole thing work together. Hopefully I can have this ready to roll out for the 2012 season. That will also give me time to figure out what to do when two new teams play each other first.

One thing I remember disliking with the old model was that it seemed to assume newer/unranked teams were as bad as the worst active team on the list. Then came Oly Rollers, who threw a far bigger wrench into that system than London did to this one.

Each and every team they beat in 2009 to what I recall being a precipitous fall in the rankings. And it took the system ages to start taking the team seriously enough.

Teams that suddenly become powerhouses (RMRG in 2009-2010) and teams that suddenly lose the wind from their sails (like all the teams RMRG's new stars left to join their 5280 Fight Club) are going to do a number on any ranking system.

As do unpredictable teams. I'll decline to name names, but there are certain teams that really seem to outdo themselves one weekend, and fare terribly two weeks later. There's lots of reasons, like lineup changes, budgetary issues, injuries, etc.

Teams that make unexpected changes or are unpredictable not only have their own rankings jerking all over the place, they add misleading results to their opponents' track records. It all tends to average out in the end.

Even this new and improved system seems to think that Gotham plays better than Rocky Mountain Rollergirls and Oly Rollers. Which history suggests is probably not the case.

I agree. The previous system had some definite weaknesses. As you say, any team that's out of place will cause trouble for any ranking system, but this new one is much more robust. Take your example of Oly in 2009. Their first 3 opponents suffered the worst damage and yet Slaughter County and Denver were back to their normal rating after the very next bout and Rose returned to it's previous rating after two bouts. So this system keeps the fluctuations to a minimum.

Regarding Gotham, Rocky and Oly, I would only comment that your sense of history is limited to only two games (the 2009 championship between Gotham and Oly and the 2010 semifinals between Gotham and Rocky). Gotham has been playing at a consistently high level since 2008 while both Oly and Rocky have demonstrated exceptional growth over the last year and a half. Wouldn't it be great to see these teams play each other 3 or 4 times during the season so that we could really see who's consistently better?