Talk:Birthday problem/Archive 2

This is an archive of past discussions about Birthday problem. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Solution for "near matches"

Hi,

Can anyone point me to a solution for the "near matches" version (i.e. birhdays within one week of each other)?

Thanks, nyenyec ☎ 17:28, 10 November 2007 (UTC)

Ok. Its actually rather trivial, if you just say Born the same week, and make the number 52. If you want within one week, The problem becomes born the same forghtnight. ( 26 ). —Preceding unsigned comment added by 99.33.92.113 (talk) 10:55, 17 November 2009 (UTC)

Understanding the problem, not

The section #Understanding the problem is rather confusing to me, and I'm not a total math idiot, although I am a math idiot. In particular, I can't make much sense of the following paragraphs:

The actual birthday problem asks whether any of the 23 people have a matching birthday with any of the others — not one in particular. (See "#Same birthday as you" below for an analysis of this much less surprising alternative problem.)

The section #Same birthday as you appears to contradict this, and even if not, it doesn't help understand the problem.

Since in every group of 23 people there are 23*22/2=253 pairs, which is more than half of the number of days in the year, the chance that one of these pairs has a matching birthday is not small. For 28 people, the number of pairs exceeds the number of days, and the probability of matching is considerably greater.

"not small"? How great exactly?

20*19/2=190 pairs is also more than half of the number of days in the year. Why then is exactly 23 used? Should be explained.

28? Where does that number come into play?

"considerably greater"? How great exactly? How is this relevant to understanding the problem?

I dorftrottel I talk I 03:22, December 3, 2007

Both the sentences you quote are takes at making the computational result, following from a rather complicated analysis, less counter-intuitive. While, in a group of 23, the chances that someone (anyone in the group) has the same birthday as a previously identified person are low, the chances that someone (anyone in the group) has the same birthday as someone else (anyone else in the group) are higher. The second sentence makes this somewhat quantitative, though still less quantitative (and less complicated) than the exact calculation. If you in a list of 23 persons compare the birthday of the first person on the list to the others, you have 22 chances of success, but if you compare each and everyone to the others, you have 253 chances. (Why 253? Well, you can compare each of the 23 to each of the other 22, yielding 23*22 comparisons, but then you'll have made all comparisons twice (comparing A to B and comparing B to A), so we divide by two, 23*22/2=253.) Each of these many chances is small, because there are 365 days in a year. In a group of 28, there are 28*27/2=378, which exceeds 365. It is thus not so surprising that you have a fair chance of at least one success.

I think this is a good point to make in the article; possibly it could and should be made more clear. If you understand it now, you may be the right one to improve on it!--Niels Ø (noe) 08:27, 3 December 2007 (UTC)

Thanks a lot for the explanations! I'm not confident enough (yet) to edit the article, but I'll do some more reading and get back to it later. I dorftrottel I talk I 07:45, December 4, 2007

This explanation can also cause misunderstandings.

In a list of 23 persons, if you compare the birthday of the first person on the list to the others, you have 22 chances of success, but if you compare each to the others, you have 253 chances. This is because in a group of 23 people there are 23*22/2=253 pairs, which is more than half of the number of days in the year. So the chance that one of these pairs has a matching birthday is not small.

The numbers are correct, but arguing that the number of pairs is more than half of the number of days in the year would also fit for just 22 people! Furthermore some readers might be inclined to conclude that if the number of pairs exceeds the number of days we would have a sure hit. Maybe it should be pointed out in the main article that this isn't so. --Stevemiller (talk) 17:55, 11 March 2008 (UTC)

If you have 253 pairs of dates (people), it doesn't intuitively make sense that the odds are better than 50% that one of those pairs matches considering there are 132,860 possible pairs of dates which do not match, and only 365 that do match. 71.108.195.111 (talk) 07:12, 20 July 2008 (UTC)

Reordering and clarification

It seems to me the Halmos argument section should come before the approximations, since it follows from the discussion in the first paragraph of "Understanding the problem".

It's also not clear how the Poisson approximation is derived. What does the Binomial distribution have to do with the problem at hand? —Preceding unsigned comment added by Justin.mauger (talk • contribs) 19:53, 4 January 2008 (UTC)

Making use of the Poisson approximation with the information given is impossible. This section needs to be improved, and it needs to be explain how we get to Pois(C(n,2)/365) being a meaningful thing. This could be extraordinarily useful for computing with large values, but as given there is not enough information to make use of it. — Preceding unsigned comment added by 192.80.55.241 (talk) 21:30, 19 June 2012 (UTC)

Extremely confusing, text butchered?

Something must've been removed, since you go straight on to the approximations without actually stating the problem mathematically. Am I missing something? —Preceding unsigned comment added by 85.228.97.83 (talk) 21:54, 7 January 2008 (UTC)

It did indeed get butchered, by this anonymous edit and failure to properly revert afterwards. I believe I've just repaired it. —Keenan Pepper 03:33, 24 January 2008 (UTC)

At one point in the section you restored, the text says "It is easier to first calculate the probability p(n) that all n birthdays are different." Why are we bothering with an assumption that all birthdays are different? As unlikely as it is, they could all be the same day. They each could be any day. This section should be deleted if it isn't seen to be useful to the reader. At the very least, it should be made more plain that this "easier" method is just a step toward understanding the actual problem. Binksternet (talk) 04:18, 24 January 2008 (UTC)

Um, I don't think you understand what it says. It's not assuming that all birthdays are different. If you assumed all birthdays are different, then no two birthdays could be the same, so the probability of any two birthdays being the same is zero, and there's nothing to calculate. It's calculating the probability that all the birthdays are different, and calling this probability

{\bar {p}}(n)

. Then the probability that any two birthdays are the same is

p(n)=1-{\bar {p}}(n)

, because "all birthdays are different" is the logical negation of "at least two birthdays are the same". This is not an "easier" method and does not make any simplifying assumptions. It calculates the exact answer. —Keenan Pepper 05:20, 24 January 2008 (UTC)

It seems this issue has been solved, and the text is no longer very confusing, so i'm removing the tag. If someone still finds it hard to read, please add the tag back. --Storkk (talk) 11:58, 1 February 2008 (UTC)

Pool of possible birthdays stays 365

Some of the article text and prose references assumed that if you document one person's birthday then for the next person you document you could expect their birthday to come from a pool of birthdays reduced by one. However, the pool of possible birthdays is always the same (in the simplified problem format presented here.) Every single new person you see will always have 365 (actually 365.24219) possible birthdays. Binksternet (talk) 23:26, 21 January 2008 (UTC)

The pool of birthdays that you need to hit in order not to have a match is reduced by 1. --Storkk (talk) 13:34, 3 February 2008 (UTC)

Why? Binksternet (talk) 17:28, 3 February 2008 (UTC)

Because you're trying to not have a match, meaning that if you were to hit the same birthday as someone else, you would have a match, so there's one less possibility each time. For example if the first person was born on Jan 1, in order to not have a match you'd need to be born on a day other than Jan 1, which, assuming we're not talking about leap years, is 364 possible days.131.247.152.4 (talk) 14:57, 5 May 2008 (UTC)

Seasonal peaks and troughs

This whole thing works only in theory, which the article does not address. There are seasonal peaks and troughs that vary geographically, which alter the real-life probabilities discussed in this article. For example, there is a peak of birthdays in April/May in Canada, and a trough in December/January. So the probability of any given birthday falling on a particular date is not 1/365. Am I wrong here? 76.71.33.251 (talk) 02:33, 20 February 2008 (UTC)

Nevermind... I see that the article indeed addresses this. Thanks! 76.71.33.251 (talk) 02:39, 20 February 2008 (UTC)

Partition Problem problem

The text for this section says "...If there are only two or three weights, the answer is very clearly no..." But surely weights of 1,3 & 4 grams will balance - 1,3 vs 4 ? --83.105.33.91 (talk) 12:16, 17 March 2008 (UTC)

You don't understand. It says each weight is an integer number of grams randomly chosen between one gram and one million grams. The question is whether it's possible for a majority of the cases. I'll try to clarify the article. —Keenan Pepper 02:00, 18 March 2008 (UTC)

Birthday Pair Problem

I am comparing this article to page 322 in Applied Cryptography by Bruce Schneier and either the book and my math are wrong or the following line is in correct.

"...23 people there are 23*22/2=253 pairs"

It should read "...23 people there are 23*22=506 pairs." Perhaps the division by 2 is present due to pair AB being equivalent to pair BA. If that is the case the article should be more specific because mathematically there are 506 and not 253 pairs. —Preceding unsigned comment added by Rabunbike (talk • contribs) 18:27, 26 March 2008 (UTC)

You have 506 ordered pairs and 253 pairs without a need to order. You are not doing the test twice on each pair, but once on each pair of people. That is why 253 pairs (no order) are correct for this problem. -- 87.162.75.104 (talk) 10:26, 19 April 2008 (UTC)

I agree to the fact that the division by 2 should be explained briefly in the article to help understand the calculation. Florentnicoulaud (talk) 15:11, 17 July 2008 (UTC)

Presidents of the United States

We have thus far had 42 Presidents of the United States and have yet to have a matching birthday pair. The odds that two of our Presidents will share a birthday is rapidly(relatively speaking) approaching 1.0, is it not? —Preceding unsigned comment added by NCJRB (talk • contribs) 01:34, 5 May 2008 (UTC)

That's simply not true. James Polk and Warren G. Harding share the birthday November 2. It is probably fairly unlikely that there would be only one such match (other than the obvious fact that Grover Cleveland shares a birthday with himself), but still there's one. There are also considerably more matching pairs in their death days.Eebster the Great (talk) 03:04, 7 May 2008 (UTC)

Same birthday as you

In the graph next to the same birthday as you section, shouldn't q(n) = 1 when n = 365 (assuming n is the number of people other than you) or n = 366 (assuming n is the number of people including you) ? —Preceding unsigned comment added by 131.247.152.4 (talk) 15:00, 5 May 2008 (UTC)

No, it seemed that way to me at first, too, but that would only be the case if nobody else shared a birthday. You can imagine a group of you and 200 million other people, but all 200 million of the other people were born January 1 and you were born June 30. However, it's rather surprising how slowly that probability q(n) grows.Eebster the Great (talk) 03:04, 7 May 2008 (UTC)

Thought I had a grasp on this but I don't.

Alright, I read the section that said 23 different people means 253 different pairs, which made sense to me. However, I was doing the calculations upwards, and you end up with more than 365 different pairs in the with only 28 or so people range. So why isn't the probability a hundred percent at that point? —Preceding unsigned comment added by 69.62.140.50 (talk • contribs) 03:17, 8 May 2008 (UTC)

You're comparing apples and oranges — number of pairs with number of days. It's possible to have a group of 365 people, one having each possible birthday (not counting 29 February); in that case you'd have 66,430 pairs of people, but each pair having two distinct birthdays. The "253 different pairs" thing is a bit of an intuitive hint that even with only 23 people, there are more possibilities than it might seem at first, but it's not a valid way to determine what the actual probability is of a match. (The problem is that the pairs aren't independent; if A and B share a birthday, and B and C share a birthday, then A and C must share a birthday, and conversely, if A and B don't share a birthday, and B and C don't share a birthday, then there's a slightly elevated chance that A and C do share a birthday.) I think this part of the article needs to be tweaked a bit, as you're not the first person to come out of it with that same question. —Ruakh_TALK 04:08, 8 May 2008 (UTC)

same birthday as you

Regarding this edit [1]. I think the old explanation was better. "Note that this number is significantly higher than 365/2 = 182.5: the reason is that it is likely that there are some birthday matches among the other people in the room." That makes sense because the reason you need so many people is that their birthdays overlap, and take up fewer days than people. Cretog8 (talk) 19:43, 28 June 2008 (UTC)

Notability?

Perhaps I'm being an idiot but in what way is the topic of this article notable? It seems like just a randomly chosen application of elementary probability theory. 196.41.124.8 (talk) 18:42, 10 July 2008 (UTC)

It might not be important, but it is notable. It's very popular as an illustration of probability concepts. Do a web search, and you'll probably be surprised by how much it comes up. Cretog8 (talk) 19:00, 10 July 2008 (UTC)

Clearer definition of "birthday"?

I scanned through the comments, but didn't see my question addressed (sorry if I missed it), but here it is, in two parts:

1. Is it likely that people unfamiliar with this particular "problem"/concept will read this article? And if the answer is "yes," then

2. Do you think that such a person will wonder whether "birthday" means just the day and the month, or the whole day/month/year set?

The reason I ask this question is because it was the very first question I asked when I began to read the article, and I found that, since that information wasn't specified right at the beginning of the article, I wasn't able to get really absorbed in the potential fantasticness of the rest of the narrative related to the phenomenon.

I'm just saying. So can someone just pop the answer in the beginning of the article somewhere? Real quick? XO -- Sugarbat (talk) 02:21, 13 July 2008 (UTC)

If you click the link for birthday, you get, rather unambiguously, the appropriate definition. But it might be worth clarifying. The simplest i could think of would be to add one word, "annual", before "birthday" in the first sentence. --pfunk42 (talk) 02:20, 14 July 2008 (UTC)

Actually, I have exactly the same problem with the opening definition of "birthday" in its own article -- and you reiterate that problem in your last sentence. :) And as a matter of fact, only part of that definition (of "birthday") is correct - since "anniversary," by definition, implies that there has been a period of time since a particular event -- a period marked by the recurrence of a point in time (subsequent to the event) that matches a similarly named point in time at which the event occurred (i.e., one-week anniversary [Tuesday/Tuesday], two-minute anniversary [120 seconds/120 seconds], 100th-century anniversary [C/C or M/M], etc.). However, "birthday" can mean either the event *or* the "anniversary" of that event (someone's birth). Ergo a birthday can be literally the day, month, and year you were born (the actual date of your birth), or just the day and the month (the "anniversary" of your birth). That's why this article, here, is confusing, and it's also why the "birthday" definition in that article is confusing. I don't think anyone's complained, there, just because, there, I don't think differentiating matters as much as it does in this article/with regard to this concept. Right? Sugarbat (talk) 04:51, 19 July 2008 (UTC)

"Two-week anniversary" is a misnomer. An anniversary, by definition, recurs annually. Notice the letters "ann-" at the beginnings of both words: "anniversary", "annual". That's not a coincidence. Michael Hardy (talk) 11:41, 19 July 2008 (UTC)

That's interesting, but doesn't quite apply to the topic. Also, although you're (partly) right about the literal etymology, you're a little off w/r/t the conventional use/definition of the word "anniversary" in English. That's one reason it's called "English" and not "Latin." ;) Sugarbat (talk) 01:47, 31 July 2008 (UTC)

Another Birthday Paradox

In the operetta The Pirates of Penzance the birthday on February 29th of Frederic the pirate apprentice leads to this exchange about a resulting birthday paradox:

FRED. How quaint the ways of Paradox!

At common sense she gaily mocks!

Though counting in the usual way,

Years twenty-one I've been alive,

Yet, reckoning by my natal day,

I am a little boy of five!

RUTH and KING. He is a little boy of five! Ha! ha! ha!

ALL. A paradox, a paradox,

A most ingenious paradox!

This fictional birthday paradox is different from the "birthday paradox" named in the article and is certainly notable. Cuddlyable3 (talk) 21:59, 11 August 2008 (UTC)

Partition Problem Error

The section on the partition problem seems to be clearly incorrect, or at least to use a confusing metaphor. It states that the problem involves moving weights on a scale between the pans in order to balance the scale, and claims that "if there are two or three weights, the answer is clearly no" but this is not true; if a 5 and 10 weight are on the left pan, and a 5 weight is on the right pan, you can move the 5 from the left to the right and balance the scale with three weights. Is there some detail missing that would exclude this case? —Preceding unsigned comment added by Hyphz (talk • contribs) 11:19, 8 September 2008 (UTC)

Paragraph removed

I removed this paragraph:

It is easier to figure the probability that the birthdays will be different, such as: with one person they have 365 opportunities to have a different birthday. The second person only has 364 possibilities to have a different birthday than the first person. The third person has 363 days, and so on. Thus when the group reaches 366 a clash is inevitable — all the days will have been used up (except for leap years of course).

First off, how can the author know what's easier for a specific reader? Secondly, the number 366 amply takes care of leap years. More important, though, is the way that this is presented: it's not explained up front that the goal of this 'easier' mental exercise is to have a 'hit' between any two persons out of a collection of people in a group rather than the question of whether somebody in the room shares, say, your birthday. The exercise isn't set up adequately. Binksternet (talk) 12:57, 6 December 2008 (UTC)

Extra parentetheses?

Why do we need the extra parentheses around the scientific numeral? Is not the order of operations sufficient to just put (100 − 3×10−129)%, for example?— trlkly 20:16, 22 December 2008 (UTC)

Suggestion for addition to the "Generalizations" section

Hi -

I have a suggestion for anyone who is particularly probability-inclined - if anyone would like to add a section on how to calculate the probability of having not just a pair of birthdays, but a triplet of birthdays, a quadruplet of birthdays, and so on, on the same day, I would be very interested in reading about it. I could not figure out how to do this on my own as I realized that the probabilities for each unique set of birthdays are unfortunately not all independent.

The function would look something like this:

P(n,k) = probability that out of n people, at least one set of k people share a birthday

This whole article is simply about P(n,2), and I would be interested to see a generalization of that if possible. —Preceding unsigned comment added by Face Kicker (talk • contribs) 08:02, 25 December 2008 (UTC)

Error in the calculations

I can't believe no one saw this by now. There's a slight mistake in one of the equations. —Preceding unsigned comment added by 89.123.197.192 (talk) 00:15, 13 February 2009 (UTC)

Do tell us about it. Cuddlyable3 (talk) 22:30, 12 June 2009 (UTC)

I believe that he scribbled it in the margin. ( Proof left to the reader ). —Preceding unsigned comment added by 99.33.92.113 (talk) 10:58, 17 November 2009 (UTC)

The expression given in the section "Generalization to multiple types" differs slightly from the cited paper. In Wendl (2003), the expression is not subtracted from 1, but in this Wikipedia entry it is. Both equations are defined as the probability that no members of the first set share a birthday with any member of the second set, although they use different language ("probability of no (i.e. zero) shared birthdays" in this article, "probability of no collisions" in Wendl (2003)). It seems like a quick change to fix it, but I don't want to modify the expression without putting it up for discussion first. — Preceding unsigned comment added by 87.198.26.201 (talk) 13:32, 25 April 2012 (UTC)

Calculating the Probability

This section should explain the reasoning behind the calculation more fully and more clearly. It looks as if either it has been reproduced without it being understood or that it has come from someone who is so familiar with it that he forgets that most of us really do need an explanation........There is a better treatment here http://www.curiousmath.com/index.php?name=News&file=article&sid=78 ....... .mikeL assisted by sinebot, bless it —Preceding unsigned comment added by 92.238.234.172 (talk) 08:36, 1 June 2009 (UTC)

Software for Birthday Paradox Cases

First off, the following statement in the Wiki article is flawed:

"It is easier to first calculate the probability p(n) that all n birthdays are different. If n > 365, by the pigeonhole principle this probability is 0."

The Birthday Paradox is a tiny element of the probability of duplication or probability of collisions. If the set has N elements (e.g. N = 365 birthdays), then for number of elements > 365 the probability is 1 (or 100%) that there is duplication.

Let's take the case of dice rolling. Number of elements N = 6 (6 numbered faces from 1 to 6).

Throw 2 dice at a time. The probability of duplication (e.g. 1-1 or 6-6) is 16.66% (6 / 36).

Total number of sets: 6 ^ 2 = 36

Number of sets WITHOUT duplicates: 30

Number of sets with duplicates: 6

Throw 6 dice at a time. The probability of duplication (e.g. 1-1-x-x-x-x or x-6-x-x-6-x) is 98.46% (45936 / 46656).

Total number of sets: 6 ^ 6 = 46656

Number of sets WITHOUT duplicates: 720

Number of sets with duplicates: 45936

Throw 7 dice at a time. The probability of duplication (e.g. 1-x-x-x-x-1-x or 6-x-6-x-x-x-x) is 100% (279936 / 279936).

Total number of sets: 6 ^ 6 = 46656

Number of sets WITHOUT duplicates: 0

Number of sets with duplicates: 279936.

There is software for this type of calculations. The results are instantaneous for numerical values up to 18 digits wide. Two programs are available now:

Collisions.exe

BirthdayParadox.exe

They are presented on the page of the most thorough analysis of the Birthday Paradox and probability of collisions: The Birthday Paradox: Combinatorics, Probability of Duplication, Coincidences, Collisions, Roulette, Social Security Numbers, Genetic Code, DNA Sequences

Second point: The reversed birthday paradox. The same software performs the most accurate calculations for such situations as well. The reverse collisions probability: Calculate the number of persons (elements) when the probability is known. In the Wiki article, such calculations are only approximations.

Parpaluck (talk) 17:33, 4 June 2009 (UTC)

There is software for this type of calculations, especially the most inclusive cases of probability of collisions. The results are instantaneous for numerical values up to 18 digits wide. Two programs are available now:

Collisions.exe

BirthdayParadox.exe

They are presented on the page of the most thorough analysis of the Birthday Paradox and probability of collisions:

The Birthday Paradox: Combinatorics, Probability of Duplication, Coincidences, Collisions, Roulette, Social Security Numbers, Genetic Code, DNA Sequences.

There is also the complementary probability: The Reversed Birthday Paradox. The same software performs the most accurate calculations for such situations as well. The reverse collisions probability: Calculate the number of persons (elements) when the probability is known. In the Wiki article, such calculations are only approximations.

Parpaluck (talk) 17:58, 4 June 2009 (UTC)

If I join whatever your society is (I assume it has something to do with bringing back web design from the mid-nineties) for life, which is required to obtain your trivial software, do I get the source code too? Because if you do that with HTML I'm not sure I trust anything you've written in a compiled language. Also despite posting it twice you forgot to mention whether it cost anything? Actually that domain rings a bell - aren't you a famous nutter? Oh sorry, my mistake - you're actually a spammer. Can't wait to get hold of those binaries! ----

Probability of duplication

The following statement in the Wiki article is flawed:

"It is easier to first calculate the probability p(n) that all n birthdays are different. If n > 365, by the pigeonhole principle this probability is 0."

The Birthday Paradox is a tiny element of the probability of duplication or probability of collisions. If the set has N elements (e.g. N = 365 birthdays), then for number of elements > 365 the probability is 1 (or 100%) that there is duplication.

Let's take the case of dice rolling. Number of elements N = 6 (6 numbered faces from 1 to 6).

Throw 2 dice at a time. The probability of duplication (e.g. 1-1 or 6-6) is 16.66% (6 / 36).

Total number of sets: 6 ^ 2 = 36

Number of sets WITHOUT duplicates: 30

Number of sets with duplicates: 6

Throw 6 dice at a time. The probability of duplication (e.g. 1-1-?-?-?-? or ?-6-?-?-?-6) is 98.46% (45936 / 46656).

Total number of sets: 6 ^ 6 = 46656

Number of sets WITHOUT duplicates: 720

Number of sets with duplicates: 45936

Throw 7 dice at a time. The probability of duplication (e.g. ?-?-?-?-1-?-1 or 6-?-?-?-?-6-?) is 100% (279936 / 279936).

Total number of sets: 6 ^ 6 = 46656

Number of sets WITHOUT duplicates: 0

Number of sets with duplicates: 279936.

Parpaluck (talk) 18:05, 4 June 2009 (UTC)

Obviously, but how does that make the statement "the probability p(n) that all n birthdays are different ... if n > 365 ... is 0" flawed? Note, that's the probability of there not being duplication. —JAO • T • C 19:58, 12 June 2009 (UTC)

If n>365 then the probability that all n birthdays are different is 0 (zero means impossible) BECAUSE the probability of duplication is 1 (or 100% means certain). Parpaluck has not shown any flaw, only that the same thing can be stated in two ways. Cuddlyable3 (talk) 22:08, 12 June 2009 (UTC)

Horrendous Mathematics on Top of Plagiarism

The Birthday Paradox article at Wikipedia must be rewritten immediately. It is one of those materials that make Wikipedia the laughing stock in the common sense world. Forget about intelligentsia or academia or the virtual world of intelligent and cultured humans! It is things like this article or materials as those dedicated to factorials or combinatorics that make all normally intelligent and knowledgeable persons run away from Wikipedia. Some might only laugh. Most of them, however, have the worst of feelings regarding this laudable attempt to make all knowledge available to the commoners. Each and every one of us is a commoner in most fields of knowledge — nobody should have hard feelings in that regard.

The opening of the Birthday paradox article is blatant plagiarism:

"In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366…"

The sentences are taken, without quotation, from Warren Weaver's famous book Lady Luck (page 132).

Then, the formulae that follow represent an exercise in idiotic mathematics. Alas, if the author(s) would have plagiarized correctly (!) the formula in Warren Weaver's book!!

I urge the Wikipedia editors make immediate changes to this article. They might want also to contact the author of this benchmark article:

The Birthday Paradox: Combinatorics, Probability of Duplication, Coincidences, Collisions, Roulette, Social Security Numbers, Genetic Code, DNA Sequences.

It's for the sake of intelligence, as in human reasoning!

Parpaluck (talk) 19:47, 4 June 2009 (UTC)

While pages 132–135 of Weaver's book deal with this problem, neither of these sentences are anywhere to be found, not even partially, on these pages, at least in not the version available here. —JAO • T • C 20:13, 12 June 2009 (UTC)

I've just looked at page 132 of Warren Weaver's book Lady Luck and that sentence is not there. It's not surprising that the same ideas are expressed, since the problem has been well known for a long time. Weaver's book was published in 1963. I don't actually know whether the problem was well-known before Weaver's book came out but I'd be a bit surprised if it was not. As a mathematician I'm failing to find the "idiotic mathematics" that "Parpaluck" says is here. As for the articles on factorials and combinatorics, could Parpaluck please be specific and post to Wikipedia talk:WikiProject Mathematics, saying what things are objectionable? Michael Hardy (talk) 21:52, 12 June 2009 (UTC)

The sentences were reworded! Looked like more to disguise the "source of inspiration". The keyword is 23 — as is in 23 persons for a 50-50 chance that at least two share the same birthday.

But the worst part is, really, the MATHEMATICS. You still don't get it!

1) Let's take … Warren Weaver's favorite example: n = 23. Your formula is clumsy:

[365 x 364 x (365 – 23 + 1)] / (365 ^ 23) then leads to a formula of combinations! The best, in words:

The Probability_Of_Collisions (Coincidences, Birthday Paradox) = Number_Of_Duplicate_Sets (M, N) / Exponents (M, N) where Number_of_Duplicate_Sets (M, N) = Exponents (M, N) – Arrangements (M, N)

Simply, the formula is:

Probability_Of_Collisions (Coincidences, Birthday Paradox) = 1 –{[365 x 364 x (365 – 23 + 1)] / (365 ^ 23)}

2) "It is easier to first calculate the probability p(n) that all n birthdays are different. If n > 365, by the pigeonhole principle this probability is 0."

The Birthday Paradox is a tiny element of the probability of duplication or probability of collisions. If the set has N elements (e.g. N = 365 birthdays), then for number of elements > 365 the probability is 1 (or 100%) that there is duplication. The birthday paradox is less a problem of probability. In fact, it is better analyzed by the mathematics of sets. The die has 6 numbered faces. It is very easily provable that more than 6 dice can be rolled. Read the case above regarding a number of 7 dice. You can roll 100 dice, if you want to. Is that impossible? Look at the results. Can you find a series of throws that consists only of UNIQUE face values? NOT! Just look at 7-die series. Everything shows at least one duplicate. Matter of fact, most throws show a larger number of duplicates (more than just two dice). Then, how can be the probability of duplication 0? Have you ever thrown 7 dice and all seven shown only unique numbered faces???

Parpaluck (talk) 19:36, 17 June 2009 (UTC)

= OK. Mea culpa … perhaps! "This probability…." refers to the probability for sets with unique elements only. Probably other readers too think of "This probability…." to refer to the probability of the birthday paradox. Again, that's why I say that the birthday paradox is better understood via the mathematics of sets. More deeply, there are no formulae, but algorithms. Thus, the cases of absurdity for N > M can be avoided. The formula of arrangements (to calculate the number of sets with unique elements only) leads to an absurdity. You can't arrange 6 dice taken 7 at a time!

Parpaluck (talk) 19:46, 17 June 2009 (UTC)

Mistake in partition problem

The section on the partition problem mentions N weights with integer gram masses randomly chosen from [1, 1000000]. The last paragraph states that "the distribution of the sum of weights is approximately Gaussian, with a peak at 1,000,000 N and width $\scriptstyle 1,000,000{\sqrt {N}}$ ...". However, the peak (and mean) should actually be at $\scriptstyle {{{1000001N} \over 2}=500000.5N}$ , and the width (standard deviation) should be $\scriptstyle {{{3{\sqrt {37037037037N}}} \over 2}\approx 288675.1{\sqrt {N}}}$ . Here's my work:
The distribution of one weight's mass (in grams) is discrete uniform over [1, n], n = 1000000, so has
${\begin{aligned}\mu _{\text{one}}&={{n+1} \over 2}\\&={{\left(1000000\right)+1} \over 2}\\&={1000001 \over 2}\\\end{aligned}}$
and
${\begin{aligned}\sigma _{\text{one}}^{2}&={{n^{2}-1} \over 12}\\&={{\left(1000000\right)^{2}-1} \over 12}\\&={{1000000000000-1} \over 12}\\&={999999999999 \over 12}\\&={333333333333 \over 4}\\\end{aligned}}$
By the central limit theorem, the average mass (in grams) of N such weights would approximately follow a normal distribution with
${\begin{aligned}\mu _{\text{avg}}&=\mu _{\text{one}}\\\end{aligned}}$
and
${\begin{aligned}\sigma _{\text{avg}}^{2}&={\sigma _{\text{one}}^{2} \over N}\\\end{aligned}}$
so the sum of the masses (in grams) of N weights would approximately follow a normal distribution with
${\begin{aligned}\mu _{\text{sum}}&=N\mu _{\text{avg}}\\&=N\left(\mu _{\text{one}}\right)\\&=N\left({1000001 \over 2}\right)\\&={1000001N \over 2}\\&=500000.5N\end{aligned}}$
and
${\begin{aligned}\sigma _{\text{sum}}^{2}&=N^{2}\sigma _{\text{avg}}^{2}\\&=N^{2}\left({\sigma _{\text{one}}^{2} \over N}\right)\\&=N\sigma _{\text{one}}^{2}\\&=N\left({333333333333 \over 4}\right)\\&={333333333333N \over 4}\\\sigma _{\text{sum}}&={\sqrt {333333333333N \over 4}}\\&={3{\sqrt {37037037037N}} \over 2}\\&\approx 288675.1{\sqrt {N}}\\\end{aligned}}$
If I'm mistaken, please let me know. If not, could someone please figure out what consequences this has? Is the answer to the problem still N = 23? While I could take a crack at figuring that out myself, my stats is a bit rusty, so I'd at least like to have some confirmation that my calculations here are correct before going forward. Klparrot (talk) 00:11, 8 August 2009 (UTC)

reads first person

This article at times read directly to the reader. words such as "we" and "you" should be removed. I would do this except someone is curently editing the article. Thanks in advance ♠ B.s.n. ♥R.N.contribs 02:43, 4 October 2009 (UTC)

Why the Shirky Reference?

Why the reference to Clay Shirky? Is it just cruft from some earlier version? —Preceding unsigned comment added by 128.255.45.53 (talk) 21:29, 24 October 2009 (UTC)

Birthday solution

I checked birth dates in List of Prime Ministers of Australia. 50% probability failed. There are thousands of articles about 'Living People'. It is easy to debunk this birthday problem practically.

I recalled my school life. No birth date match with hundreds of classmates. In office, no match with birth date of almost 1000 employees. I had access to database of around 100,000 people and I found only 1 match(other than my birth date). This sounds improbable, almost impossible, but somehow it happened.

With 1000 people, there must be at least 2 matches between any two people. Think about it for a few days. --212.92.194.201 (talk) 12:20, 24 November 2009 (UTC)

The article does not mention minimum population to make this theory work. In village(or group) with population 365(each having unique birth date) surely this birthday problem will not arise. So how much population this village should have? Thanks! Rāmā (talk) 00:23, 19 November 2009 (UTC)

OK, OK, OK... I was checking match of birth dates of two prime ministers in List of Prime Ministers of Australia. 50% probability failed. Now after waking up, I checked whether my birth date match with birth date of any PM. And I must be honest, it does. It didn't happened in my entire life. And it happened in first randomely chosen list. Observation cause wave function collapse. Rāmā (talk) 10:59, 19 November 2009 (UTC)

Timeline-

I was trying to sleep while thinking about 'birthday problem'. Somehow I came to the conclusion that 'birthday problem' means that if I toss coin in the air wishing it to be 'head' then there is almost 94% possibily that coin will fall on ground with showing 'head'. So 'birthday problem' is hoax!
I came back on wikipedia, checked category 'living people' but birthdates on few random article is missing.
How about checking 'list of prime ministers of India'? But that list repeat prime ministers twice in office. I need minimum 23 Prime Ministers.
I applied commonsense that as most of the wikipedians are from developed countries, hence list about PMs of developed countries will be more perfect. I chose Australia i.e List of Prime Ministers of Australia.
I took casual look at list. No match. Applying theory of probability, 99.99% chances are that in very first attempt no one will not find match.
I declared on this talk page that birthday problem is wrong and went back to sleep.
After waking up, I thought how about cross-checking my own birth date with list List of Prime Ministers of Australia? I found match!
I recalled something link to 'list of US presidents' in 'birth day' article. I searched article and found that it is NOT list of US presidents. It is same list i.e List of Prime Ministers of Australia which I chose randomely and there is birth day match of two prime ministers.

I don't know what to talk. Thanks! Rāmā (talk) 13:37, 19 November 2009 (UTC)

The problem is stated in the text. It isn't about comparing your birthday against a set, rather it's about finding matching entries within a set. The former has low probability, the latter quite high. If you preselect one of the comparators, you changed the problem being considered. As stated in the text: "The birthday problem asks whether any of the 23 people has a birthday matching any of the others — not one in particular."

See also the section Birthday problem#Same birthday as you. Mind matrix 15:40, 24 November 2009 (UTC)

Hmm

You know, I've been wondering. Has this ever been tried out? Because the logic seems to try to bend the numbers some. Lemme try picture this a different way.

If you roll a dice that has 365 sides (hypothetically), if you roll it 23 times, you have a 50% that you roll the same number at least twice. This would be the same scenario, no? Tell me if I'm wrong, but this seems way off. 69.19.255.228 (talk) 07:12, 24 December 2009 (UTC)

Yes, it's been tried out, and yes, that's the same scenario, and yes, it seems way off. If it were obvious, it wouldn't be so interesting. :-) —Ruakh_TALK 04:27, 25 December 2009 (UTC)

My professor tried it in a class of about 30 students. He started with January 1 and it didn't take long to find a shared birthday. It was pretty cool. 66.157.22.51 (talk) 21:35, 1 February 2011 (UTC)

Confusing notation using the same P(...)

In the paragraphe "Calculating the probability", P(A) denotes "the probability of at least two people in the room having the same birthday". Here, the reader would suppose A is a variable and in the example, A = 23.

Then a little further, we have "P(2), that person 2 has a different birthday than person 1". Here, the reader could generalize the notation P(2) to P(A) which creates confusion.

Better use some other symbols like Q(...), or use more parameters within parentheses. 213.41.124.254 (talk) 12:12, 29 March 2011 (UTC)

2. The probabilities that a student in this class (such as you) will expend a high, medium or low amount of effort in studying are 0.50, 0.30 and 0.10, respectively. Given that the student expends a high, medium or low amount of effort, the respective conditional probabilities of getting an A in this courses are 0.90, 0.40 and 0.05. Find the probability that the student (you) will get an A in the course. —Preceding unsigned comment added by 212.14.233.164 (talk) 11:21, 14 May 2011 (UTC)

2. The probabilities that a student in this class (such as you)

2 —Preceding unsigned comment added by 212.14.233.164 (talk) 11:22, 14 May 2011 (UTC)

Rounding error?

From Arcticle: Evaluating equation (2) gives P(A') = 0.49270276

Where my calculations shows it is 0.492702765676 and after rounding 0.49270277 Can somebody confirm and fix? — Preceding unsigned comment added by Rahulov (talk • contribs) 21:47, 2 November 2011 (UTC)

Thanks, I checked and confirm your calculation. However, on fixing, I thought it was better to simply reduce the number of significant digits displayed, so I changed 0.49270276 to 0.492703 in two places. Johnuniq (talk) 01:35, 3 November 2011 (UTC)

Practical Exercise

Here is a practical exercise of the birthday problem:

This page generates random birthdays and returns the resulting percentages. (Confirms this section.)

http://tomsfreelance.com/birthday_problem.php Teynon1 (talk) 15:59, 7 November 2011 (UTC)

Random:

Don´t ask this question in the nursery of the hospital on the day of birth, that isn´t random. — Preceding unsigned comment added by 201.208.167.177 (talk) 16:45, 13 March 2013 (UTC)

Actual birthday distribution

The article doesn't have to state the actual known distribution of birthdays, but it should note that no matter what that distribution is, 23 remains an upper bound for the solution of the basic problem. In other words, if some birthdays are more common and others less common, then the probability of an overlap only increases rather than decreases. (At the same time, the answer to "What is the probability of someone having my birthday?" varies, depending on your birthday's likelihood.) ± Lenoxus (" *** ") 02:02, 15 June 2013 (UTC)

Wormp. I see that it was in a footnote. Never mind. ± Lenoxus (" *** ") 02:09, 15 June 2013 (UTC)

Understanding the Problem

It seems that the first sentence in this section is wrong. The "Same Birthday as You" section claims that that problem asks whether anyone in a group of N people as the same birthday as one, particular other person in the room. The rest of the article seems to be the question of whether, in a group of N people, there is at least one pair of people who share the same birthday. The opening sentence seems to say that the main question of the article is the question addressed by the "Same Birthday as You" section, which is different. I think it would be helpful to re-word that sentence to say something like "The Birthday Problem is to find the probability that, in a group of N people, there is at least one pair of people who have the same birthday." After that sentence, something could be said about the difference between that problem and the one addressed by the "Same Birthday as You" section. — Preceding unsigned comment added by 128.29.43.1 (talk) 15:04, 27 September 2013 (UTC)

I agree. I've incorporated your changes. It could probably be a little clearer but at least it's not saying the exact opposite of the truth now.--Grimboy (talk) 18:08, 10 November 2013 (UTC)

Sample calculations

User 173.28.217.182 asks: (Question from a reader: What are the "bounds" referred to above?). Can someone clarify? Nick Levine (talk) 14:33, 10 November 2013 (UTC)

An "upper bound" means "we don't know the exact solution to this problem, but it's definitely less than or equal to this number". 23 is an upper bound to the problem: How many people must you have in a room for a 50% chance of some pair sharing a birthday, even if not all birthdays are equally likely? (This is likely to be of interest, because in the real world, they are not all equally likely. 2.25.120.161 (talk) 20:41, 25 November 2013 (UTC)

Abstract proof: Notation seems to be confused...

In the abstract proof section, A is the statement "Everybody in the set ${\mathcal {S}}$ has a unique birthday". But then P(A') is defined to be the fraction of injective functions out of all possible functions.

If A is indeed the statement that everybody has a unique birthday then shouldn't $P(A')=1-{\dfrac {365!}{365^{N}(365-N)!}}$ ? In the article this is denoted P(A). Exzession (talk) 10:47, 29 November 2013 (UTC)

Non-mathematical explanations?

Maybe couples tend to have sex more often during certain parts of the year(New Year, for instance), and therefore, birthdays tend to be more common around the fall. JDiala (talk) 05:22, 11 January 2014 (UTC)

Perhaps, but for the purposes of this article, distribution of birthdates has been ignored. Per the text, "These conclusions include the assumption that each day of the year (except February 29) is equally probable for a birthday." Mind matrix 15:05, 11 January 2014 (UTC)

Historical inaccuracy

The article says that:

"The history of the problem is obscure, but W. W. Rouse Ball indicated (without citation) that it was first discussed by an "H. Davenport", possibly Harold Davenport.[2]"

However, looking at quote [2] which is available through the Gutemberg project, there is no reference to the problem or to "H. Davenport". In addition due to the life span of Davenport and the time when the book was written it is unlikely that he proposed the problem.

Looking further, the problem was apparently stated by Richard Von Mises (http://en.wikipedia.org/wiki/Richard_von_Mises) in:

Cf. R. von Mises, Ueber Aufteilungs- und Besetzungswahrscheinlichkeiten, Revue de la Faculté des Sciences de l'Université d'Istanbul, N. S. vol. 4 (1938-1939), pp. 145- 163.

This is info was taken from: Feller, W. (1968). An Introduction to Probability Theory and Its Applications (Vol. 1). Wiley. Third ed., pp. 33.

The German wikipedia entry on the birthday problem indicates that: The paradox is often attributed to Richard von Mises . According to Donald E. Knuth , this origin is not certain : The birthday paradox has been discussed informally among mathematicians as early as the 1930s , but a more accurate copyright can not be determined.

160.228.81.206 (talk) 14:39, 7 March 2014 (UTC) Andrés Altieri

Gutenberg is serving the 1905 reprint of the 4th edition because it is the latest version that is in the public domain. This book was first published in 1892 and went through many revisions. The quote about "H. Davenport" is in the 1960 edition (which is not public domain) on page 45 – please have a look. As I said, it is not referenced, nor is the full name given. However, Ball is certainly reliable and therefore quotable. I think the history is obscure, and perhaps it is worth a new sub-section in the body (something like "Origin of the Problem") which talks about the various claims of priority. Our article on Richard von Mises does indeed assert he posed the problem, but the source is a non-WP:RS website. I certainly would welcome any additional information that could be found and reliably sourced, and I think readers would, as well. Agricola44 (talk) 16:11, 7 March 2014 (UTC).

I've checked Feller and he does indeed reference the von Mises paper for his discussion of the birthday problem, although he does not say von Mises originated the problem. What is the Knuth reference you referred to? Agricola44 (talk) 17:10, 7 March 2014 (UTC).

I am Harold Davenport's son, and family legend has it that he and Coxeter discovered it by noting that two people dining at Trinity had the same birthday. This would have been after Rouse Ball's death, though, and I do not know what the history of the 1960 printing of Rouse Ball's book is. JamesHDavenport (talk) 23:45, 8 March 2014 (UTC)

Yes, Ball died in 1925, when the book was in its 9th or 10th edition, and Coxeter continued to develop and add to the content of later editions/reprints. The assertion very likely came from him. This may be one of those occasions where there are competing claims of priority, all the more reason to have a new section on "Origin of the Problem". Do you know of any source that confirms your family legend? Thanks, Agricola44 (talk) 15:55, 10 March 2014 (UTC).

Is it weird?

Is it a coincidence, that the sum of numbers 1..22 (i.e. one less than 23 people required for 50% chance to have any birthday match) equals 253, which is the number of people required to have 50% chance of a match for any particular person?

It would follow that the first of the 23 people has 22 chances for a match, the second one has 21 chances and so forth, hence 22+21+..+1+0. — Preceding unsigned comment added by 89.69.165.161 (talk) 21:37, 29 August 2014 (UTC)

Dead External Link?

It looks like the external link "A humorous article explaining the paradox" (http://www.damninteresting.com/?p=402) is dead. I don't know enough wiki-ettiquette to mark it as such, the dead link template page (https://en.wikipedia.org/wiki/Template:Dead_Link) mentions using a cached version? — Preceding unsigned comment added by 144.32.136.27 (talk) 08:42, 17 June 2015 (UTC)

IT'S NOT A PARADOX

It's not a paradox, and the solution depends on how the problem is phrased: two people having the same birthday, at least two people having the same birthday... — Preceding unsigned comment added by 73.213.142.170 (talk) 23:12, 8 September 2015 (UTC)

It is a paradox, which is a term used to indicate a truth that's provable without logical contradiction but still feels surprisingly counterintuitive to most people. This is one. As for the problem's phrasing, what's wrong with the current intro that reads, "...some pair of them will have the same birthday"? I mean, sure, you're right, "exactly six will have the same August 4th birthday" would lead to a different solution, but so what? Fetald (talk) 01:40, 19 September 2015 (UTC)

fuzzy logic

It seems to me that the birthday paradox can be read as an exaple of fuzzy logic, any thoughts on that? It this true? Why or why not?141.20.170.21 (talk) 14:09, 24 November 2014 (UTC)

I don't think so, because fuzzy logic deals with degrees of truth between 0 (false) and 1 (true). Instead, this is a probability problem that deals with degrees of likelihood between 0 and 1 that the truth of a specific condition (two people have the same birthday) is either there or it's not, i.e. not fuzzy at all, just true or false. Fuzzy logic would only be involved if two people sort of had the same birthday, but not really, but kind of, except not, yet they do. Fetald (talk) 01:57, 19 September 2015 (UTC)

23

Why is 23 in all the seemingly unrelated problems: the birthday paradox, 23 players in a squad, 23 for 1,000,000g partition problem, and the birthday on the 23rd of May? 87.102.44.18 (talk) 11:08, 27 March 2016 (UTC)

POV

The quote "While this makes for an amusing talking point, and is a very good introduction to the ways in which intuition can lead one widely astray in probability theory" is POV and I have removed it. Coin Collecting John (talk) 00:52, 15 May 2016 (UTC)

Bloom 1973 reference edits..

I just made a bunch of edits relating to the D Bloom, trying to get the inline text in the footnote (about the non-uniform distribution of birthdays) to reference to another footnote. I conclude the such "embedded references" don't work - nor could I get it to display right using a Harvard-style reference. So it now just says "(see external links)" I did find DOI and JSTOR numbers for the Bloom article though. Jimw338 (talk) 06:39, 13 October 2016 (UTC)

Is 100% linkable to Almost surely

Despite its name almost, the article states that it is a case of probability one unless that too is wrong. Is there another article which describes this case that I'm missing? Ugog Nizdast (talk) 15:25, 21 November 2016 (UTC)

"Probability 1" and "guaranteed always to occur" are not the same thing, as explained in that article. For example, in choosing a real number in the interval [0, 1] uniformly at random, the probability of getting an irrational number is 1, but there are choosable numbers that are not irrational. The sentence you edited is not like this: given 367 selections from a set of size 366, it is not possible to make a set of choices with no repetitions. This is much more straightforward and less subtle than "with probability 1." --JBL (talk) 15:44, 21 November 2016 (UTC)

I think if I've fairly understood this then, as you said, "it is not possible" is different from "100% probability"...why does the lead then mention "...the probability reaches 100%..."? Shouldn't it just say something like guaranteed to always occur or other cases are simply not possible at the point onwards? Thanks for humouring me btw, Ugog Nizdast (talk) 11:48, 28 November 2016 (UTC)

Right, "the negation cannot happen" is stronger than "the probability of occurring is 100%" in general (and the latter is what is called "almost surely"). There is one major reason for the present wording, which is that the entire discussion is phrased in terms of probability, so it is natural to continue using that language. Also, in this context (that of a finite probability space), the two notions (100% probability and must happen) actually do coincide (so in finite probability we don't need the more sophisticated notion of almost sureness). --JBL (talk) 13:46, 28 November 2016 (UTC)

Partition Problem

The Partition Problem section makes a claim about 23 that is not supported directly by the one reference given in this section. I added a citation needed tag to the 23 claim, but the entire explanatory paragraph (that begins "The reason is that..") is very weak and reads like poorly written original research. Anybody interested in finding a decent reference for the main claim of this section? That would be awesome. Doctormatt (talk) 05:42, 10 December 2016 (UTC)

Error in the justification of the calculation

In the justification of the main calculation, the text describes the events as independent. But it seems to me that they are not actually independent, since the probability that person n has a birthday different from the earlier people depends on whether the earlier people have all different birthdays or not. That is, in the case there were already some common birthdays, the probability would rise slightly for person n to avoid an overlap. So the justification of the calculation in terms of independent events seems to me to be incorrect. It would be correct, instead, to be talking about conditional probability, that is, the probability that person n has a different birthday, given that the earlier people all have different birthdays. The final numbers would be the same as currently, with this correct way of justifying them. JoelDavid (talk) 01:44, 6 December 2016 (UTC)

Indeed, as written it is nonsense. --JBL (talk) 00:08, 12 December 2016 (UTC)

The Birthday Problem and the Schnabel Census

After having read the original paper by Zoe Schnabel (reference 10 in the article) on her now-called Schabel census, I think the claim of the article, that

The theory behind the birthday problem was used by Zoe Schnabel[10] under the name of capture-recapture statistics to estimate the size of fish population in lakes.

is essentially not correct. There is an ambiguity in the phrase ``theory behind the birthday problem, but, as the Wikipedia article on mark-and-recapture shows, the Schnabel census is better seen as an extension of the Lincoln-Petersen estimator than as something which drops out of the Birthday Problem in some unspecified way.

At the very least, if the claim is going to be made, the assertion should be developed as a subsection and the algebraic connection shown. Otherwise, I recommend either deleting the comment, or saying that the methods are related to the major articles on mark-and-recapture or the Lincoln index.

empirical_bayesian@ieee.org

This user is a member of WikiProject Statistics.

19:44, 23 June 2017 (UTC)

Origin of the Problem

(PLEASE forgive me if I am committing a breach of etiquette by adding this note here, feel free to move it or edit it.) I saw this problem in one of the "Saint" books of Leslie Charteris. When I find it again after loooking through all of my Saint novels, I shall quote it here. The date would have been around 1920 to 1940, in the bulk of the Saint novels. When I have the correct date, I would like to add the information to the visible page.

ATeacher (talk) 03:58, 29 January 2018 (UTC)

No breach. The origin of this problem, like so many others, appears to have a complicated history and, I suspect, it was discovered/rediscovered many times independently. I think this subtopic is an important part of the overall article and I think we should welcome any new, reliably-sourced information regarding it. Thanks! Agricola44 (talk) 16:13, 29 January 2018 (UTC)

Alternate Math Scenario

What if the question was "how many people need to be in a room for there to be a 50% chance that any one of them to have the say B-day as YOU"?

Would the answer be 182.5 (which is half of 365, the total possible birthdays) or some number that is less than that?--Mapsfly (talk) 19:12, 28 October 2017 (UTC)

No, it's 253, as is already covered in the Same birthday as you section of the article. Joule36e5 (talk) 08:27, 2 May 2018 (UTC)

Very pretty math about numerical statistics, but does it address the question?

I really enjoy articles about the Birthday Paradox. The collection of statistical solutions are entertaining, just to see how many different ways that this question can be ripped apart and put back together as a statistical piece of art. BUT! It's fun math, not accurate math. When I saw this question represented on a game show and somebody lost 10 grand because of it, I thought I'd go online and find the websites debunking the bad math and realized there aren't any. Here's the issue. All these problems are relating the problem to be similar to one with dice, and they relate the problem to the number of comparisons made between "X" number of people and people have 1 birthday out of 365 days in a year.

So then they represent a person as "1/365" and and they say each addition of a person is exponential because each person must compare with each person before him (or her). Since the math will never, ever work with (1/365)^(number of people) the most frequent iteration of the solution generally gets reversed if a person can be represented by their birthday as "1/365" then the reverse can be represented as any day NOT their birthday is "364/365". To keep this a little brief I'll just guess that if you are reading this you are familiar with the current popular equation for finding the chance that there is NOT a match:

364/365 = person's unbirthday n = number of people 1-(364/365)^n ≥ 50%

And of course we are all familiar with n=23 as our final solution and we all get very proud of ourselves.

But what did we do and say with that math? Well, broken down the math says that each time a person is added, that person has to compare with all the people before them, and to represent that comparison we multiplied exponentially. So that makes for 253 comparisons of birthdays from one person to another, and by multiplying exponentially we are saying that each comparison has two random birthdays, a total of 506 birthdays represented! Amazing right?! 506 birthday for a mere 23 people....... Wait, you say, 23 people only have 23 birthdays, right? Yes, of course, and that is why the math is fun, but wrong.

To prove to myself that I was thinking this through properly, I took up the challenge a lot of the proponents of the "Unbirthday Paradox" method put forth. I went to my son's school (It's a k through 8 school), explained myself to the director. She thought it was an interesting, math related experiment, so she allowed me to survey all the classes from 8 through 4 (we both thought it would be more of a distraction for the younger grades). The total number of classes I surveyed was 12 (actually 13, but one of the classes only had 17 kids) The class sizes ranged from 24 kids up to 33. None of the classes had any matching birthdays, including the teachers.

Here's where the problem goes awry: A PERSON IS NOT THEIR BIRTHDAY. A person is a person and a birthday is a thing a person has one of. A Birthday is 1 day and one day only out of 365 days for one person. A person, alone in a room has nobody to compare with so the chance that they have the same birthday as somebody else in the room is 0. Therefore, for every person after that the odds are reduced by 1. The true equation looks like this:

N = Number of people

1/365 = Birthday

1/365(N-1) ≥ 50% [SOLVE FOR THIS]

(N-1)/365 ≥ .5 [SIMPLIFY TERMS]

N-1 ≥ 182.5 [MULTIPLY BOTH SIDES BY 365)

N ≥ 183.5 [ADD 1 TO EACH SIDE]

N ≥ 184 [BECAUSE THE QUESTION IS ASKING FOR PEOPLE AND YOU CAN'T HAVE PARTIAL PEOPLE, WE ROUND UP TO THE NEAREST WHOLE PERSON]

With 184 people in the room, the odds are truly 50/50 that two people in the room share the same birthday.

Zebnoesser (talk) 20:17, 20 September 2019 (UTC)Zeb Noesser

No, you are comprehensively wrong. Probably there has never in the history of the world been a group of 184 people assembled with all different birthdays (excepting groups assembled with that express purpose). To choose a random example, in the current US senate (100 members), the following birthdays are repeated: Oct 20, Oct 24 (3x), Nov 17, Nov 21, May 3, Mar 4, Mar 12, Mar 31, Jun 22,Jan 7, Jan 10, Dec 10, Dec 1, Aug 24. --JBL (talk) 20:36, 20 September 2019 (UTC)

Cool! Zebnoesser (talk) 02:25, 21 September 2019 (UTC)

Rouse Ball - Davenport NOT LIKELY

Rouse Ball died in 1925. Davenport was 18 that year. Not likely Davenport wrote something that would have been noted by Rouse Ball at that age. Also, the cite says it was in 1960. How so since Rouse Ball had been dead for 35 years? Seven Pandas (talk) 22:34, 17 April 2020 (UTC)

Indeed. Project Gutenberg has the 4th edition of the cited work, published 1905, here; it doesn't mention anyone named Davenport, nor the birthday problem (at least not by that name, nor phrased in terms of birthdays or days of the year). --JBL (talk) 22:46, 17 April 2020 (UTC)

Ok, I see: the book has been repeatedly updated after Rouse Ball's death, by Coxeter. Here is a snippet from the 13th edition, published 1987 (with both authors listed) that does include the attribution on p. 45. So, the reference should be improved and the text updated, but no time-travel was involved. --JBL (talk) 23:07, 17 April 2020 (UTC)

@Joel B. Lewis: I'm open to suggestions. Text and ref need improving. Seven Pandas (talk) 23:16, 17 April 2020 (UTC)

@Seven Pandas: I just tried something, let me know wha tyou think. --JBL (talk) 01:01, 18 April 2020 (UTC)

@Joel B. Lewis: Very good. Thank you much for the help. Seven Pandas (talk) 01:47, 18 April 2020 (UTC)

Paradox

About this edit: the most common definition of "paradox" is for something self-contradictory. The point being made in that sentence is that this is not the relevant sense of paradox. Adding a dictionary definition does not help make this point at all, indeed it obscures it. In addition, the reference is not a reference for the actual statements being made (it does not validate that this usage of "paradox" matches that particular definition) and so is misleading. --JBL (talk) 16:13, 2 July 2020 (UTC)

"the most common definition of "paradox" is for something self-contradictory." Is there a citation for this?

I recognize that self-contradictory things named `paradoxes` do exist, the most famous being the Liar's paradox and several variants thereof (see https://en.wikipedia.org/wiki/Category:Self-referential_paradoxes ). That some people may be more familiar with those doesn't mean that it is "the most common definition", especially in the context of statistics, where paradoxes are usually about human misperceptions of probabilities, like the Birthday paradox or the Monty Hall paradox.

I didn't read them all, but JBL, are any of the paradoxes in the dozen and a half statistical paradoxes listed on https://en.wikipedia.org/wiki/Category:Statistical_paradoxes , or even any of the two dozen or so listed on https://en.wikipedia.org/wiki/Category:Paradoxes_in_economics actually self-contradictory? It seems all of them use the 'apparent but not actual contradiction' definition.

It seems there is demand for a sentence that clarifies a common misperception about statistical paradoxes that isn't what I wrote. Because the text currently provides no citation giving evidence that paradoxes are typically self-contradictory, and we do have a dictionary on hand to cite to, one resolution may be to make the citation and immediately clarify it; something like "What is meant by "paradox" is the first definition in [citation to OED], not the second, which some readers may expect." B k (talk) 22:42, 2 July 2020 (UTC)

I've updated the lead to describe it as a veridical paradox, which is the type of paradox it is (highly unintuitive but demonstrably true, as opposed to logically self-contradictory).--Trystan (talk) 19:22, 6 March 2021 (UTC)

This seems like an improvement, thanks. --JBL (talk) 20:36, 6 March 2021 (UTC)

pigeonhole principle?

This "pigeonhole principle" is mentioned, but I don't think it applies here. At least strictly... Maybe if we take it as a metaphor, it is intuitive, but this is a formal issue. — Preceding unsigned comment added by 2001:8A0:74C1:3000:4D9:85BB:9787:6A6C (talk) 10:41, 10 March 2021 (UTC)

The pigeonhole principle states that if n items are put into m containers, with n>m, then at least one container must contain more than one item. Where there are n=367 people and m=366 distinct birthdays, at least one birthday must be shared.--Trystan (talk) 15:07, 10 March 2021 (UTC)

@Trystan: What you write is correct, but I think it only weakly applies to the paradox. The reason is because the birthday paradox implicitly requires a calculation of how many people would you have to ask before you are likely to have two people with the same birthday, not before you are certain to have at least two people with the same birthday. If each of the 366 days were equally probable as a birthday, we'd calculate that the odds would go over 50% with 23 people. However, if some birthdays are far more probable than others, the number of people required to cross the 50% threshold drops. CessnaMan1989 (talk) 02:36, 5 November 2021 (UTC)

Clarifying the Introduction

Below is a copied text of a brief conversation I had with JayBeeEll (talk · contribs) at the introduction and my edit that was reverted:

     "The Birthday Paradox is that the probability of at least two people in a group sharing a birthday often surpasses 50% when the group size is less than 23 people. If each day 
     of the year were equally likely to be a birthday, the probability would surpass 50% when the group size were equal to 23 people. That's the apparent paradox, and that is why 
     I'd like to change the introduction. CessnaMan1989 (talk) 14:48, 5 November 2021 (UTC)"[This was what I, CessnaMan1989, wrote on his user page in response to my edit being 
     reverted.]

"Three comments: (1) this should be on the article talk-page, not on my personal talk-page. (2) The straightforward facts in the first 2.5 sentences of your message bear no obvious relationship to the important phrase ("that is why I'd like to change the introduction"), nor to the edits in question. To explain why a change would be good, one should identify a problem in what exists or do a comparison between an existing and potentially-existing version. (3) Have you read the second paragraph of the lead section? (But please, per (1), don't answer this here -- bring discussion to the article talk-page.) --JBL (talk) 15:36, 5 November 2021 (UTC)"[This paragraph is JayBeeEll's response.]

In response to this question, I think a problem exists with the first paragraph in the introduction as it stands now because it doesn't explain why it is called a "paradox." While it is true that the second paragraph explains this in better detail, I think it should be immediately clarified in the first paragraph, and ideally even the first sentence of the article. @JayBeeEll: I am sorry for writing on your talk page. CessnaMan1989 (talk) 18:45, 6 November 2021 (UTC)

Lead should state the paradox

The current lead reads:

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.

This is saying what the birthday paradox "concerns". It's like saying that the Pythagorean theorem "concerns" the lengths of the sides of right triangles. It should instead say what the birthday paradox "is", something like this:

In probability theory, the birthday paradox is that it is surprisingly likely that in a group of people, two will have the same birthday. In particular, there is a 50% probability that in a set of 23 randomly chosen people, two will have the same birthday.

I am not at all attached to the particular wording. But the lead sentence should state the paradox, not just say what it's about. --Macrakis (talk) 15:07, 5 November 2021 (UTC)

The first sentence of the article Pythagorean theorem is "[T]he Pythagorean theorem ... is a fundamental relation in Euclidean geometry among the three sides of a right triangle." The issue with your proposal is that there is no good way to complete the sentence "the birthday paradox is a [noun]" -- it is a puzzle, but also the answer to that puzzle, but also the fact that the answer to the puzzle is surprising to some people, but also a generalization of the puzzle. --JBL (talk) 15:42, 5 November 2021 (UTC)
You have a point about Pythagorean theorem... I knew I should have checked it first!

The completion of "the birthday paradox is..." is what I proposed above. One difficulty with it is that it doesn't work for the birthday problem.

Conversely, the n in the current lead sentence is gratuitous for the birthday paradox, but meaningful for the birthday problem, which presumably has as a solution a function of n giving the probability of birthday twins. For the problem, we could say:
In probability theory, the birthday problem asks for the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.

I think "asks for" or "calculates" or "determines" or whatever is a lot better than the content-free "concerns". --Macrakis (talk) 17:42, 5 November 2021 (UTC)

Perhaps we shouldn't try to define them together in one sentence since they're really two different things. How about:
In probability theory, the birthday problem asks for the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. The birthday paradox is that, counterintuitively, there is a 50% probability of a shared birthday in a group of only 23 people.

Again, I'm not completely happy with the wording, but I think this is better than the vacuous "concerns". --Macrakis (talk) 18:28, 5 November 2021 (UTC)
It might be good to await more input, but at first glance I think splitting the problem and the paradox is not a bad idea. --JBL (talk) 21:07, 5 November 2021 (UTC)
I like this approach as well. It frames the broader problem, as well as explaining why it is often referred to as a paradox. I've gone ahead and made the change, using a slight variation of the wording above. Please feel free to punch it up.--Trystan (talk) 19:52, 6 November 2021 (UTC)

This is not a paradox

It's a surprising result. Brentford F.C. beat Manchester United F.C. the other day 4 - 0. That's a surprising result but it isn't a paradox. On first glance especially to those for whom maths is a foreign country, the statistical result - it is more than likely that in a room containing 23 randomly selected people, two of them will share a birthday - is surprising: but a few minutes of thought and the application of some very elementary statistical principles will confirm the truth of it. I have no doubt that there are those who call this a paradox but that does not make it one. There are those who call Man U a terrible football (aka soccer) team but it does not make it so. Cross Reference (talk) 10:28, 4 October 2022 (UTC)

I agree. (See Paradox article for further information). Meridiana solare (talk) 12:01, 4 October 2022 (UTC)

This is discussed clearly in the article; the name is the name, regardless of whether it's a "good" name. JBL (talk) 17:03, 4 October 2022 (UTC)

Actually the article name is "Birthday problem", not "paradox". Meridiana solare (talk) 17:20, 4 October 2022 (UTC)

The name of the article is titled "birthday problem", but the article discusses two closely related things, one of which is named "birthday paradox". JBL (talk) 20:14, 4 October 2022 (UTC)

Lead

Several things in the lead are very interesting for the body of the article, but I don't see what they add in the lead:

That 70 people suffice for 99.9%.
That the number gets smaller because birthdays aren't uniformly distributed.
Mentioning the pigeonhole principle to justify that with n=367, there must be at least one duplicate.

All of these things are true and interesting, but they don't contribute to the central point. --Macrakis (talk) 22:23, 7 November 2021 (UTC)

I agree. The lead would flow much better if the second paragraph were moved to the body of the article.--Trystan (talk) 22:45, 7 November 2021 (UTC)

I think the pigeonhole principle is central to the related article "Birthday attack" wich, itself, in it's section "Understanding the problem" refers to this very article and thus I think is worth mentioning. I'd even say it's what makes the probability surprisingly high. *Just another layman talking here* =) 80.215.65.208 (talk) 10:35, 20 July 2022 (UTC)

Does it bother no one that the comment "there are (23 × 22) / 2 = 253 pairs to consider, much more than half the number of days in a year" is irrelevant to the probability in question? — Preceding unsigned comment added by 93.70.101.40 (talk) 13:41, 12 October 2022 (UTC)

Incorrect citation

The text says, “The problem of a non-uniform number of births occurring during each day of the year was first addressed by Murray Klamkin in 1967.” which points to a paper from 1967 by Klamkin and Newman. This paper does not actually address the problem of non-uniform numbers of births. So I think this sentence should be cut, or the proper citation should be tracked down. 100.36.247.162 (talk) 22:52, 5 December 2022 (UTC)

Indeed. Klamkin does not have any other relevant papers on MathSciNet, either. For the moment, I have tagged the statement as failing verification. JBL (talk) 18:50, 6 December 2022 (UTC)

Well, this goes way, way back. The attribution to Klamkin was added by AxelBoldt in April 2005, without citation. The footnote was added by Mikeblas in May 2020. @Mikeblas and AxelBoldt: Can you take a look at the discussion above? Thanks. --JBL (talk) 19:49, 10 December 2022 (UTC)