Reinventing the Multivariate Hypergeometric Wheel

On Monday morning, instead of packing—something that I’ve been saying a lot lately—I was doing research for strategies at Magic: the Gathering sealed deck tournaments, since I’m going to one on Saturday, and I generally suck at Magic, at least when it comes to the whole winning part. I ended up fascinated by the certainty with which experts break down how many lands you should have in a 40 card sealed deck, and got to thinking about the probabilities here.

In particular, I was interested in figuring out how many lands I should have for my third “splash” color, which is the color for which you only put a couple cards into your deck in order to round it out, because you don’t quite have enough good material in your two main colors. The danger of only having a couple or so cards of a color is that, while you won’t draw them frequently, you need to have the right lands in your deck so that you can play those cards when you do happen to draw them, but those lands can be frustrating when you really need lands for your main colors. So you need to find a balance. But where is the balance?

I broke it down for myself to these questions:

  1. What is the probability that I will draw a splash color card in the first ten cards of my deck?
  2. Given that I have drawn a splash color card in the first ten cards, what is the probability that I will also have drawn a land for that color?

Of course there are two very important quantities that are left unstated in these questions: how many splash color cards do you have, and how many splash color lands, in your deck? (It should be assumed that the deck size itself is only 40 cards, which is the minimum and optimal number to have in a sealed deck tournament.)

Trying not to look up the answer online, though I was fairly certain it was probably there somewhere if I sifted through enough garbage—and there is a lot of garbage probability by overzealous gamers who think they know a little math—I set about creating my own probability model for the first question. I came up with a good function that would answer my question for any number of splash color cards in any sized deck given any number of cards drawn.

I then set upon the second question, but got stuck, and asked the internet what it thought. Someone on the internet said that they used the hypergeometric distribution to calculate the probabilities. I slapped my forehead, because I had just reviewed the hypergeometric distribution a few days ago, but was pretty pleased with myself that my function seemed to be the hypergeometric function. In other words, I had independently re-invented the hypergeometric function. Or so I thought.

As I continued struggling to expand my function to work with two kinds of “successes” (in the language of probability; this means really though that I’m looking for both splash color spells and splash color lands in my first ten cards), I realized that my function was just not going to accept a second kind of success very well while maintaining my confidence that the construction of desired-outcomes-over-possible-outcomes was correct.

So I created a second function that equivalently modeled the first problem, and even proved to myself that my two functions set equal to each other formed an algebraic identity. I then looked back at the Wikipedia article on hypergeometric distributions and realized that my first function was not the hypergeometric distribution; my second function however was. It seems that I re-invented the hypergeometric distribution function twice.

My second formulation (the hypergeometric function) ended up being much easier to extend to two types of successes, and I had fun plugging in values for number of splash color spells, splash color lands, and number of draws, and seeing the probability that a splash color card would come up with along with at least one splash color land. And I thought to myself that this problem wasn’t so difficult to tackle with a hypergeometric distribution function established first, and there are lots of situations when you’re looking for more than one type of success; surely someone else has done this before.

I scrolled down a few sections in the Wikipedia article, and, sure enough, there’s the multivariate hypergeometric distribution function, stated exactly the way I had created it to answer my second question.

One might say that I wasted two hours, because these functions were (literally) right in front of me on my computer screen the whole time if I just took the time to read the Wikipedia page more thoroughly. But I take some satisfaction and pride in knowing that I didn’t need to refer to someone else’s work. I set a challenge for myself, and I answered it on my own.

And this is why I’m bad at competitive Magic: I don’t like stealing other people’s strategy ideas, and in a constantly evolving and knowledge-base intensive game like Magic, that’s not an ethic that is compatible with success. Maybe some day I’ll get over that.

Leave a comment