Talk:Bayesian inference/Archive 1

This is an archive of past discussions about Bayesian inference. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

About legal applications

I believe that the example is a wrong application of the Bayes theorem. The prior probability of "defendent is guilty" can only be 0 or 1: a person is, or is not guilty, and should have the benefit of the doubt in the other cases. As a consequence, applying the Bayes theorem does not bring any value. So, I agree with the Court of Appeal.

This does not mean that there would be no other legal application. Could you find an example where another prior probability is used ? Pcarbonn 19:03, 23 May 2004 (UTC)

It sounds like you may be a frequentist, instead of a Bayesian. A frequentist believes that probabilities should only be assigned to truly random events, while a Bayesian uses probabilities to quantify his or her own limited knowledge about the events. Thus, a frequentist would recoil in horror at the thought of assigning a probability to "defendent is guilty" (that being a deterministic concept), while a Bayesian would happily assign a probability between 0 and 1. A frequentist would thus agree with the Court of Appeal.

Since this article is about Bayesian inference, we should adopt the Bayesian view of probabilities for this article. The appropriateness of the legal application can be debated (I didn't like it so much, myself), but it should not be deleted because it is Bayesian. -- hike395 05:15, 26 May 2004 (UTC)

To insist that the prior probability of guilt or innocence be either 1 or 0 (as Pcarbonn suggests) would be to dispense with the need for a trial. One has a trial in order to make the best judgment possible about the probability of guilt or innocence and, given that probability estimate, make the decision that the law requires. Furthermore, the presumption of innocence in criminal cases does not entail that the prior probability of guilt be set at zero; it only means that the prior probability of guilt be no more than, perhaps, the prior probability that any other person in the country or, possibly, the prior probability that anyone else in the world is guilty. Such a prior probability would be small, but it would not be 0 (zero). (anonymous law teacher)

Courts generally do not use Bayesian logic in an "explicit" way. But this may just mean that they don't use symbolic and mathematical notation when they reason about evidence. It is possible, for example, that the following rule about the relevance requirement for the admissibility of evidence is the equivalent of, or assumes, Bayesian logic:

Federal Rule of Evidence 401: "'Relevant evidence' means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be without the evidence."

(anonymous law teacher)

I understand the difference between bayesian and frequentist, and I accept the Bayesian view in general. However, I do not accept it in this legal application (am I POV ?): I would not want to be wrongly put in jail because, statistically, I'm "80% guilty". A way out may be for the article to better explain this point, and be more NPOV compared to the decision of the court of appeal than the initial article. An article should never take only one position in the controversy (e.g. bayesian only), but state both positions honestly. Pcarbonn 05:45, 26 May 2004 (UTC)

Well, decision theory states that you should be put in jail if you are >= 80% guilty if putting an innocent person in jail costs (the jury? society?) 4 times more than letting a guilty person go free. You probably don't like it as an innocent person, because you would bear the costs and none of the benefit :-).

I personally didn't like the legal example, and thought that the last sentence about the Court of Appeal was POV (I was too lazy to fix it, though.) So, as far as I am concerned, it can stay deleted. But, I think the article would benefit from some worked-through example. The example at Naive Bayesian classification is much too complicated for an introduction to the method. I wouldn't mind bringing back the example and make the decision be about something else, less controversial. Do you have any suggestions?

As for the Bayesian vs frequentist, I think this controversy is well-explained at Bayesian and frequentist probability, and does not have to be fully repeated here. I would suggest that a simple link there can suffice. -- hike395 14:06, 26 May 2004 (UTC)

Aren't there some good worked-through examples in the Bayes' theorem article already, in particular the medical one ? I can't think of any legal example. OK for the link to the controversy article. Pcarbonn 17:18, 26 May 2004 (UTC)

Deleting the legal section was over the top. How do you think criminal courts work? The prosecution makes an allegation. It then produces evidence to show that what it has said is almost certainly true (i.e. it can be believed beyond a reasonable doubt). The defence may either seek to show that there are gaps in the prosecution evidence or that some previous not considered possibility is credible (in either case increasing the doubt about the prosecution case). The jury then assesses its belief about the prosecution case and how it has changed given the evidence presented by both sides. That is where Bayesian inference can be used in terms of degrees of belief about what has already happened but is not known. --Henrygb 13:37, 28 May 2004 (UTC)

If I'm not mistaken, the courts are hesitant to admit actual computations of probability in testimony. I've read of a couple of cases in which numerical computations were made by the prosecution but the conviction was later overturned on appeal, on the grounds that the computations were inadmissable. Now I am certainly leaving out many important details and mixing things up, so I'll try to find a reference so that we can have a proper discussion about it. Regards, Wile E. Heresiarch 04:16, 2 Jun 2004 (UTC)

About legal applications, I see the present discussion as somewhat problematic. Essentially what we have right now is a calculation as it could be carried out, but in fact not usually, and maybe never. The few court cases that I've heard about which involved probability are cautionary tales -- the calculation was thrown out for one reason or another. One that I've seen referenced a couple of times is People vs Collins. This paper [1] gives a summary of that case and some others. See also Peter Tillers's web site [2] -- he is a law professor with a longstanding interest in Bayesian inference and he usually has something interesting to say. Incidentally I think Tillers states the editorial remark that was recently struck out -- that in the Regina versus Denis Adams case, the court allowed any irrational inference but disallowed a rational one -- in another context; I'll try to find it. -- Bayesian inference in law is something of a mess, I'm afraid; I don't know where to go from here, maybe someone has some ideas. Happy editing, Wile E. Heresiarch 21:49, 8 Jun 2004 (UTC)

There are quite rational methods to supply priors in legal cases. For example, if one can reasonably assume that a crime was committed by someone living within the city where the crime was committed (and statistics on convictions ought to give good information about that), then P(suspect committed the crime|this background information) is approximately equal to 1/N where N is the population of the city. In other words, if the police were simply to arrest someone at random, that's the probability that they chose the guilty person, i.e., the appropriate prior.

Of course, the police don't arrest people at random, but regardless of their reasons for arresting a particular individual, these reasons have to be examined by the jury and cannot enter into the prior probability. Thus, all the evidence that resulted in the arrest has to be presented at trial. In particular, the jury should not take the prior probability of guilt equal to or greater than 1/2 (the test for probable cause) simply because the police have to have probable cause to arrest. For them to do that and then to consider the evidence that led to arrest would be to (illegitimately) use the same data twice, a no-no. Rather, the jury (if they are Bayesian and such calculations are allowed in the privacy of the jury room) should choose a prior that reflects the probability of guilt in the absence of any evidence that would likely be presented at the trial.

This is quite apart from Wile E. Heresiarch's comment about the admissibility of probability calculations in courts of law. I don't know what the answer to that is. I'm only dealing with a hypothetical court that acts in a Bayesian manner.

But, it most certainly is appropriate to provide priors on guilt, and indeed if one did not do this one would be in the unenviable position of "Verdict first, trial afterwards." Bill Jefferys 20:10, 13 June 2006 (UTC)

Here we're doing a test of hypothesis. In doing such a test, we are evaluating the *status quo*, our null hypothesis, seeking to establish whether there is sufficient evidence to reject it in favor of our alternative.

Since, in our system of justice, the accused is presumed innocent until proven guilty, the null hypothesis should be formulated to reflect this. So, instead of evaluating P(Guilty|Evidence), we should be evaluating P(Innocent|Evidence) instead. There are those, I suppose, that will say since U = I + G, P(I|E) = 1 - P(G|E) and, as such, there is no difference between the two. I submit that in the minds of jurors, there is a subtle (or not so subtle) difference between evaluating evidence from the point of view that the defendant is innocent as opposed to guilty. Dave T 23:29, 5 June 2008 (UTC)

Focus on Algorithm ?

As it stands, this is a odd article. Perhaps it is misnamed and should be moved? Bayesian probability already describes the notion of subjective probability (degrees of belief). Bayes' theorem already talks about the theorem and a simple worked out example. This article seems to be about Applications of Bayes' theorem.

I propose that this article should be about algorithms for performing Bayesian inference, in a Bayesian network. For example, the message passing of Judea Pearl (max product and sum product belief propagation), loopy belief propagation, and perhaps a link to Expectation maximization. --- hike395 15:04, 28 May 2004 (UTC)

Sounds like a good idea to me ! (But I do not know enough to expand it that way.) Pcarbonn 18:21, 28 May 2004 (UTC)

I would disagree. Bayesian inference should be about how evidence affects degrees of belief. Have a page Bayesian algorithm if you wish, or add to Bayesian network. But most Bayesian inference is not about the specialist subject of networks. --Henrygb 19:05, 28 May 2004 (UTC)

Henry: what can you say about Bayesian inference that is not already described in the Bayes' theorem article (specifically the Examples section) and that is not about Bayes net algorithms? And is not a list of applications? This is not a rhetorical question --- your answer should become the content of this article. If there is nothing more to add, then this article should be a redirect to Examples of Bayes' theorem, and we can spawn two new articles --- Applications of Bayes' theorem, and Inference in Bayes nets, perhaps. -- hike395 02:34, 29 May 2004 (UTC)

This page is designed to be the main introduction to Bayesian statistics and Bayesian logic which both redirect here. Nobody who understands probability theory would reject Bayes theorem, but many refuse to apply Bayesian methods. So Bayes theorem is not the place for an introduction to the methods. I have added a section explaining what I think this is about in terms of inference and the scientific method. The legal section (especially R v. Adams comments) is really about inference rather than Bayes theorem. --Henrygb 00:51, 31 May 2004 (UTC)

I agree. Bayes' theorem is a simple mathematical proposition that is not controversial. Bayesianism, on the other hand, is the philosophical tenet that the same rules of probability should apply to degrees of belief that apply to relative frequencies and proportions. Not at all the same thing. Michael Hardy 01:08, 31 May 2004 (UTC)

Well, I wouldn't mind putting examples of Bayesian inference here --- I just object to the redundancy with the material in the Bayes' theorem article. If you look over there, much of the article is about Bayesian methods. If the general consensus is that Bayesianism should be described here, we should probably move a lot of the material from the other article over here. -- hike395 01:50, 31 May 2004 (UTC)

Hello. I agree that there is some redundancy among the various articles about Bayesian stuff. I think the basic examples in Bayes' theorem should stay there, and more specialized stuff should be developed in the other articles. It's true that Bayesian inference is pretty sketchy at present, but I think it should be filled out with new material, not by moving material over from Bayes' theorem. Happy editing, Wile E. Heresiarch 04:12, 2 Jun 2004 (UTC)

It seems that we have an impasse. Henrygb and Michael Hardy want to distinguish Bayes' theorem from Bayesian philosophy. I don't want to have 2 duplicate articles. Wile Heresiarch don't want me to move material over. I don't know how to make edits that fulfill all of these goals. --- hike395 06:04, 3 Jun 2004 (UTC)

I guess I'm not seeing what there is to be moved over from the Bayes' theorem article. Presumably you don't mean the historical remarks, statements of the theorem, or the references. Among the examples, the cookies seems like a trivial example that's appropriate for an introductory article. The medical test example is more compelling but really no more complex than the cookies (and therefore still appropriate). I guess that leaves the binomial parameter example. Maybe we can bring that one over. In the same vein, Bayesian inference could talk about other conventional statistical models reinterpreted in a Bayesian fashion. I guess I should get off my lazy behind and get to it. Happy editing, Wile E. Heresiarch 06:47, 4 Jun 2004 (UTC)

Move material from Bayes' theorem

If we take it as given that this article should be the introductory article about Bayesian reasoning, and that Bayes' theorem should be devoid of "Bayesianism" (because that belongs here), then the following sections reflect "Bayesianism" and belong here:

Historical remarks --- treats a parameter as a random variable,
Examples --- All of the examples treat unknowns as random variables.

I propose moving these over as examples of inference here, which would then flesh the article out and leave Bayes' theorem agnostic to Bayesian inference.

--- hike395 16:00, 4 Jun 2004 (UTC)

It seems pointless to obscure the fact that the major uses of Bayes' theorem are Bayesian. Henrygb and Michael Hardy have made some general statements, but they didn't mention moving anything, so it looks to me like you're putting words in their mouths in defense of your personal program. In any event Historical remarks is a report of Bayes' original essay; one can't accurately report the essay without reporting its motivations. I've restored the historical remarks to Bayes' theorem; cutting them out is exactly analogous to striking out the discussion of Martin Luther's theology on the grounds that the Catholics will find it objectionable. -- As for the examples, I'm not opposed to moving them so long as they're replaced by something else. One trivial example and one more interesting example seems a good formula. Regards, Wile E. Heresiarch 16:43, 4 Jun 2004 (UTC)

Wile: "you're putting words in their mouths in defense of your personal program" is completely incorrect. I am actually a Bayesian (although I dislike that term) and have written programs that use Bayesian inference. I was responding to what I thought Henry and Michael wanted. Please do not attribute sinister motives to me. --- hike395 16:59, 4 Jun 2004 (UTC)

Well, I guess I couldn't tell what you're getting at. Anyway I am happy to let it go. Let's make some new edits and talk about that instead. Wile E. Heresiarch 18:26, 4 Jun 2004 (UTC)

I agree that keeping the Historical Remarks section over there is probably better. I would still like to avoid redundancy between articles: so, to try and make everyone happy (including myself), over in Bayes' theorem, I added a one-sentence mild warning that examples of Bayes' theorem typically involve assuming Bayesian probability ("Bayesianism"), and then a link that points to this article's examples section. I believe that this can make everyone happy: Bayes' theorem can remain relatively pure (except for historical commentary, which is fair enough); readers of Bayes' theorem can very easily look at examples; and we don't need to have multiple pages about Bayesian inference. Comments? --- hike395 07:08, 6 Jun 2004 (UTC)

Looks OK to me. Wile E. Heresiarch 16:16, 9 Jun 2004 (UTC)

Further to this discussion, I've made a proposal on how to improve the relationship between the Bayesian inference article and the article on Bayes' theorem. Please see the Bayes' theorem talk page for details. Cheers, Ben Cairns 07:57, 23 Jan 2005 (UTC).

Evidence and the Scientific Method

I strongly object to this piece

Supporters of Bayesian method argue that even with very different assignments of prior probabilities sufficient observations are likely to bring their posterior probabilities closer together. This assumes that they do not completely reject each other's initial hypotheses; and that they assign similar conditional probabilities. Thus Bayesian methods are useful only in situations in which there is already a high level of subjective agreement.

I am puzzled by the line

This assumes that they do not completely reject each other's initial hypotheses; and that they assign similar conditional probabilities.

To completely reject someone else's hypothesis one would have to assign it a probability of zero. This is a big no-no in Bayesian circles. It is called Cromwell's Rule. Never assign prior probabilities of zero or 1, as you then lose the ability to learn from any subsequent information. As Dennis Lindley puts it "You may be firmly convinced that the Moon is not made of green cheese, but if you assign a probability of zero, whole armies astronauts coming back with their arms full of cheese will not able to convince you."

If two people with different initial priors see the same data, use the same likelihood and both use Bayes' theorem to coherently incorporate the data, then as the data accumulates their posteriors will converge. (If they are not using the same likelihood then, of course, no agreement is to be expected).

The next line does not follow at all...

Thus Bayesian methods are useful only in situations in which there is already a high level of subjective agreement.

Sir David Cox (who is not a Bayesian) put it nicely: "Why should I be interested in your prior?". To answer this, there are 3 cases to consider

a) My prior is very like your prior.

In this situation it is reassuring to find that others have independently come to the same conclusions as oneself.

b) My prior is unlike your prior, but the volume of data available has swamped the prior information, so that our posteriors are similar.

This tells me that my results are robust to major changes of prior, and this tends to make it easier to convince others.

c) My prior is different to your prior and my posterior is different to your posterior. In this situation a Bayesian would say that reasonable men and women can come to different conclusions, given the state of the evidence. The onus is then on them to get more evidence to resolve the matter. In this situation a frequentist analysis which gives a single, supposedly 'objective' answer is a travesty of the scientific method. (Indeed, Bayesians sometimes reverse-engineer frequentist analyses, assuming them to be equivalent to some Bayesian analysis with some sort of uninformative prior. The quest is then to find out exactly what that uninformative prior is and to consider whether it is a reasonable prior.)Blaise 17:42, 27 Feb 2005 (UTC)

Clear statements are welcome

Please give your reasoning or build an argument when adding a notice to the article. A bald notice is not helpful to the editors. Ancheta Wis 19:06, 26 August 2005 (UTC)

Diagrams for formulas

I put a diagram I frequently use on here, all about Bayes Theorem...Image talk:Bays Rules.jpg Other novices like me might also find it useful. --Marxh 09:04, 13 September 2005 (UTC)

Compliments

User Ralph sent the following e-mail to the help desk.

very nice writeup on bayesian inference, I'm impressed........Ralph

Well done to everyone who has worked on it.

Capitalistroadster 23:43, 29 November 2005 (UTC)

Anonymous Observer's Comments

Anonymous Observer made the following comment in the main page:

ANONYMOUS OBSERVER: The above exposition is lovely -- as far as it goes. But the above discussion ignores possible important uncertainties. For example, did a police officer "plant" the DNA "found" at the crime scene? This possibility may be more than de minimis. Furthermore, the posterior probability that a defendant was the source of the DNA found at a crime scence is not necessarily equivalent to the posterior probability of defedant's guilt -- because, for example, the defendant may have deposited his or her DNA before or after the crime was committed; and because, alternatively, someone in addition to defendant may have left his or DNA at the crime scence; and because, alternatively, defendant alone may have deposited his or her DNA at the time of the crime but yet not have committed the crime charged (because, e.g., no one committed the crime -- there was no crime -- and defendant was an innocent passerby who touched the person who suffered harm). Moral of the story: If one uses numbers to express uncertainties, (s)he must try to identify all important uncertainties and are then represent them with an appropriate probability statement, expression, or symbol.

While I agree with the sentiments, this is the wrong way and place to include them in the article. Work needs to be done to seamlessly incorporate this information into the main article. The article is not a talk page, and these comments look like "Comment, response", not like an encyclopedia entry.

So let's discuss how to include this information! Bill Jefferys 23:39, 6 December 2005 (UTC)

I neglected to mention that this had been in the In the courtroom section. Bill Jefferys 23:41, 6 December 2005 (UTC)

See Edwin Thompson Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, (2003). ISBN 0521592712. This book has a beautiful quantitative treatment of this problem of competing hypoteses. The hypotesis that the defendend is guilty is not supported if the hypotesis that the evidence has been tampered with is likely to be true. Bo Jacoby 10:31, 7 December 2005 (UTC)

I have the book and I agree. I'm not quarrelling with the thrust of the comment. The point is, what is the right way to put this information into the article? Anonymous Observer's method wasn't the right way because the main article is supposed to be an encyclopedia entry, not a talk page. So again, how do we include this information in the article in a seamless way that makes it read like a professionally written encyclopedia entry?

Perhaps A. O. or someone else can have a crack at doing this the right way. Bill Jefferys 13:58, 7 December 2005 (UTC)

Another anonymous observer's comments:

Why is this stuff presented like if is was a matter for controversy or Bayesian statisticians were members of a sect? (Bayesians believe, etc..) This stuff is just math. Sure there are cases where you start with a prior probability which is controversial, and that casts doubts on one's conclusions, but that does not make the deduction process itself controversial.OK now, excellent rewrite!

Correction required to "in the courtroom"

Looking at the case Regina v Dennis John Adams mentioned in the article, it would appear from the report at http://www.bailii.org/ew/cases/EWCA/Crim/1996/222.html, cited as DENNIS JOHN ADAMS, R v. [1996] EWCA Crim 222 (26th April, 1996), that Adams' conviction was quashed and a re-trial was ordered, precisely because the use of Bayesian logic was deemed inappropriate for use in a criminal trial.

I agree; it is ironic that this was the outcome, since the use of Bayesian logic clearly undermines the prosecution's case. Were the justices suggesting that eliminating Bayesian inference would be an advantage to the defendant? I don't know. Bill Jefferys 03:22, 12 December 2005 (UTC)

David Hume

I am pretty sure that Hume has made pretty good criticism of (or criticism that can be used for) bayesian inference...his works on causality most of all. — Preceding unsigned comment added by 172.172.67.230 (talk) 09:19, 15 May 2006 (UTC)

External links - applications of Bayesian inference?

Hi, I'm not a mathematician at all, but I've written an article on how to easily implement "Bayesian Rating and Ranking" on web pages here: http://www.thebroth.com/blog/118/bayesian-rating. From my own experience, I wished that there'd been such an article before - so I simply wrote one myself. It's a practical guide how to use it in a real world application. And we all know, most rating systems currently in use on the internet could do with a good dose of Bayesian! So - any suggestions what to do? Is the article good enough to be listed on Bayesian_inference or another Bayesian related article, possibly in the external links section? Wyxel 10:47, 13 June 2006 (UTC)

Hmm, nobody objected or commented, so I added the link. I hope it is considered a useful resource.--Wyxel 05:47, 23 June 2006 (UTC)

Is there a Bayesian fallacy?

If there is, then I think it should be included in the article. I'm thinking about where people confuse p(a|b) with p(b|a), assuming that they are roughly equal in value. . —The preceding unsigned comment was added by 81.104.12.8 (talk • contribs) .

For example because there are more young men in prison than in proportion to the population, p(young_man|criminal), people can fallaciously infer that young men are likely to be criminals, p(criminals|young_men), leading to Demonization and prejudice. —The preceding unsigned comment was added by 62.253.48.86 (talk • contribs) .

Whoops! I meant Conditional probability rather than Bayesian, but the above comments still apply. —The preceding unsigned comment was added by 62.253.48.86 (talk • contribs) .

See our article on the Prosecutor's fallacy. I agree our Conditional probability article should link to this, but I don't see it as being a particularly Bayesian issue. -- Avenue 23:41, 22 July 2006 (UTC)

Confusing p(a|b) with p(b|a) is essentially affirming the consequent. --AceMyth 22:33, 4 October 2006 (UTC)

Likelihood

The 'probability' that some hypothesis is true is a confusing concept, because either it is true or otherwise it is false. Ronald Fisher's concept of Likelihood may ease the confusion somewhat, even if that word is not well chosen either. 'Credibility' might be better, but that is nonstandard. Some hypothesis may be incredible in the light of some experimental facts, while some experimental outcome may be improbable assuming some hypothesis. Calling the two concepts by the same name makes the subject difficult to understand. Bo Jacoby 01:14, 6 March 2007 (UTC).

I would suggest we not challenge the reader by including both 'probability' and 'likelihood' in the opening sentence. Expressing Bayesian concepts correctly is not a simple matter. In case of doubt, relying on textbook wording would be a reasonable precaution. EdJohnston 02:36, 6 March 2007 (UTC)

Thank you. You are probably right, assuming the incredible hypothesis that the textbooks are not confusing too. This discussion should perhaps not be made in this article on bayesian inference, but rather in the article on bayesian probability which says: "When beliefs have degrees, theorems of probability calculus measure the rationality of beliefs". The latin origin term for "degree of belief" is "credibility", (from CREDO, 'I believe'). So one can talk about the conditional degree of probability, (E|H), of an experimental outcome E assuming the hypothesis H, and about the posterior degree of credibility, (H|E), of the hypothesis H after observing the experimental outcome E, and about the unconditional or marginal degree of probability, (E|1), of the experimental outcome E, and about the prior degree of credibility, (H|1), of the hypothesis H, where "1" means "true". Bayes theorem then reads: (H|E)·(E|1) = (E|H)·(H|1). Bo Jacoby 12:51, 6 March 2007 (UTC).

Some of us reserve "probability" for theoretical models, and "likelihood" for knowledge-based estimates. For example, a coin toss model might be P(H)=p (a priori unknown constant parameter p); we observe 3 tosses of a coin assumed to follow that model (heads with probability p) and see 2 heads; what is the likelihood that p > 0.5? — DAGwyn (talk) 20:22, 22 February 2008 (UTC)

Are you the Doug Gwyn of Unix philosophy#Quotes? EdJohnston (talk) 22:23, 22 February 2008 (UTC)

Yes. — DAGwyn (talk) 19:25, 26 February 2008 (UTC)

Prosecutor's Fallacy

I'm thinking that the author of this is suffering from *Prosecutor's Fallacy* with respect to the *In the court room* ... It's ironic since Bayes is what folks use to clear those issues up —The preceding unsigned comment was added by 4.245.6.208 (talk) 02:23, 24 April 2007 (UTC).

Problem with medical diagnostic example

It seems to me that there is a fallacy in the medical diagnostic test example that claims 98% of those who tested positive would not actually have the disease. It assumes that the population that is tested is the same as the general population. This is rarely the case in practice. Often one recieves the test because he or she had symptoms that led a doctor to order the test. This fact alters the prior.

It does interestingly suggest that it is unwise to go around having yourself administered every test you can think of because this increases the false positive rate by lowering the prior. Test manufacturers are supposed to report false positive rates to the FDA as P(test|not disease) not as P(not disease|test) as the latter depends on an uncontrolled sample.

--Selket 07:34, 25 January 2007 (UTC)

Question about the medical diagnostic example

I wonder whether the calculation of false-negatives is correct. I thought false-negatives were those people who have the disease but test negative. This frequency is given as 1% in the problem statement. What is calculated toward the end of the section is the probability that among the population as a whole, if there is a negative result, what is the probability that the person has the disease. That's not the way I think of it. Telliott (talk) 18:01, 14 April 2008 (UTC)

The usual medical usage of "false negative" is to identify a case where a test reports a negative result (tested condition not present) when actually the subject has the condition.

The article correctly states "the probability that a negative result is a false negative is …" Note that it is conditioned upon knowing [only] that there was a negative test result. There is no sense in talking about just the unconditional "probability of a false negative," since for that case we don't even know the probability that a test was conducted, and without a test there is no chance of a false negative, a false positive, a true negative, or a true positive. — DAGwyn (talk) 01:16, 15 April 2008 (UTC)

An obvious mistake in the article?

I fixed this, and I'd appreciate if anyone could take a look and second me. This error seems to be lying there for quite some time... http://en.wikipedia.org/w/index.php?title=Bayesian_inference&diff=prev&oldid=216737323 --Farzaneh (talk) 00:09, 3 June 2008 (UTC)

Measures of Information

"As evidence accumulates, the degree of belief in a hypothesis ought to change. With enough evidence, it should become very high or very low."

Accumulated evidence is not necessarily in agreement and so does not necessarily push towards high or low belief in the hypothesis. New evidence might just reinforce the prior . Prehaps a link to "measures of information" is appropriate ... entropy etc... Alas not something i know much about

—Preceding unsigned comment added by Blendermenders (talk • contribs) 06:05, 9 April 2009 (UTC)

Hypotheses and Decisions

"proponents of Bayesian inference say that it can be used to discriminate between conflicting hypotheses:"

I would say that Proponents of Bayesian inference measure uncertainty in the hypothesis as a distribution. They do not make statements about the "best" or "most likely". That seems more of a frequentist notion . It is only for practical reasons that approximation sometimes rest on point estimates. Decisions in a Bayesian approach require consideration of a utility function.

—Preceding unsigned comment added by Blendermenders (talk • contribs) 06:17, 9 April 2009 (UTC)

Bayesian Inference = Belief Updating

"Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability"

Bayesians Inference ALWAYS update beliefs (and ALWAYS has priors). This can start at a point of ignorance ... but still requires use a valid function expressing ignorance i.e. a belief. —Preceding unsigned comment added by Blendermenders (talk • contribs) 06:56, 9 April 2009 (UTC)

"From which bowl is the cookie?" Example

I've cut the following text from the article:

The sample is not a good one for explaining Bayesian inference and at least may lead to misunderstandings for beginners. The key point is about

P(H_{1})=P(H_{2})

; it is true based on "there is no reason to believe Fred treats one bowl differently from another", but in most situations

P(H_{1})

and

P(H_{2})

depend on the total numbers of the cookies inside, more precisely:

P(H_{1})/P(H_{2})=N1/N2

;

Criticism of an example because it "may lead to misunderstandings for beginners" belongs in the talk page, not in the main article. This is my justification for moving the text.

Personally I think the example is fine for an introductory example since it explicitly states the assumption that Fred will not treat the bowls differently - but that's just my 2c worth. I wouldn't object to a comment at the end of the example along the lines "This is a simple example so we're making the assumption of $P(H_{1})=P(H_{2})$ . However in most situations $P(H_{1})$ and $P(H_{2})$ depend on the total numbers of the cookies inside each jar....." Cje (talk) 11:07, 12 March 2010 (UTC)

It is not very convincing in the same situation first NOT to know the contents of the bowls, and in the same formula to employ full knowledge of the contents. Either we know the contents, then H1 ≠ H2, and we do not need the whole procedure, OR we do NOT know the contents, then we would have to make some guess. 195.4.77.191 (talk) 09:01, 22 November 2010 (UTC)

The article says: "After observing the cookie, we must revise the probability to P(H1 | E), which is 0.6." Given the context I read this as: we must revise the probability that Fred chooses bowl 1 to 0.6. This seems to indicate that there is some assumption that Fred learns something from the event E and revises his believes ... but this is not clear from the setup (eg it is never said whether Fred prefers plain or non-plain cookies and whether this would have any impact on this example). Can someone explain this? — Preceding unsigned comment added by 143.210.72.208 (talk) 16:36, 3 June 2011 (UTC)

Better article

The scholarpedia article: http://www.scholarpedia.org/article/Bayesian_statistics

is far better than this one. Somebody should contact the authors and see if they are willing to donate the text. I suggest we completely erase the current version if so. Andrew Gelman has been pushing this article on this blog as the best introduction to Bayesian Statistics. (added 17:06, 2 September 2009 by 140.247.244.135)

I agree! Do we need to ask permission isn't that allowed under the GFDL? —Preceding unsigned comment added by 130.102.0.171 (talk) 01:36, 19 November 2009 (UTC)

This suggestion misses the point that this article is titled "Bayesian inference", which is distinct from "Bayesian Statistics", although "Bayesian statistics" does redirect here at present. A separate article as an overview of Bayesian statistics would be good, and the contents of suggested web page would match this. Then "Bayesian inference" can be about Bayesian inference. Melcombe (talk) 10:38, 19 November 2009 (UTC)

Also, the ownership of the article is not clear ... the home page http://www.scholarpedia.org/article/Main_Page indicates that authors can retain copyright and I don't see an easyway to find out the status of the one being discussed. Melcombe (talk) 10:51, 19 November 2009 (UTC)

I agree the Scholarpedia article is better. The emphasis here on point hypotheses seems restrictive - learning about continuous-valued parameters is often seen as more intuitive (though both have a place). McPastry (talk) 00:52, 2 March 2010 (UTC)

More discussion of the pros and cons of Bayes (coherence, coming up with a prior, model-sensitivity) would also be good. McPastry (talk) 00:54, 2 March 2010 (UTC)

Different Scholarpedia Articles are licensed under different copyrights. See http://www.scholarpedia.org/wiki/index.php?title=Special:Copyright&id=5230. 'Bayesian Statistics' has a CC BY-NC-ND, by which the ND (No Derivative Works) means we can't take it and change it. Purple Post-its (talk) 17:18, 13 July 2011 (UTC)

Merge discussion

The present contents of Bayes' theorem is more about Bayesian inference than about Bayes' theorem itself. Discussion of Bayesian inference should ideally be in the article Bayesian inference. Bayes' theorem is about a rule for manipulating probabilities and does not primarily deal with manipulating "beliefs" and "evidence". Melcombe (talk) 10:04, 4 March 2011 (UTC)

Agree that it seems like a good idea to merge any Bayesian inference material in article Bayes' theorem to Bayesian inference. Then there is not much left but Bayes' theorem is not the same thing as its application for inference, so perhaps whatever is not about inference could be left in Bayes' theorem. Even if Bayes' theorem then becomes a stub, it probably should still exist as an article on its own. Mathstat (talk) 22:45, 4 March 2011 (UTC)

Absolutely. Bayes' theorem is a theorem of probability theory. Bayes' rule is even more useful than Bayes' theorem! (Bayes' rule = Bayes' theorem in odds form). Bayesian inference is a paradigm for doing statistics. There should be cross-linked pages on both. Richard Gill (talk) 15:48, 5 March 2011 (UTC)

Interjection 2011-05-08. Even Melcombe who inserted the merge banner does not clearly propose a traditional merge that would replace Bayes Theorem/Law/Rule with a redirect to this article. Only because that is true, the first two respondents (above) express agreement.

— continued below —

Oppose: the article focusus on various applications of the theorem, inference being one of the applications. The example of the problem described by Bayes that is cited in the article is in fact what is nowadays called an inferential problem! KKoolstra (talk) 08:34, 29 March 2011 (UTC)

Inference is one of many applications. And certainly it was the motivating application both for Bayes (who probably learnt the theorem from de Moivre) and for Laplace (who was the first to set up Bayesian inference in a general setting). But the article on Bayes' theorem shouldn't be *dominated* by just one of its applications, especially when there is a whole article on that whole huge topic. So I think the Bayesian inference application in the Bayes theorem article should be kept to a modest size and should send the reader to the Bayesian inference article for full information. Richard Gill (talk) 09:07, 29 March 2011 (UTC)

I don't think a real merging of both articles into one is a good idea, since as mentioned above they are different things and we want to provide readers a fast access to barebone bayes theorem, it's alterantive formulations and 1 or 2 examples, independent of Bayesian interference in general. Reorganizing the articles a bit and moving some content to reduce redundancy is of course ok.--Kmhkmh (talk) 11:59, 8 April 2011 (UTC)

I don't think the articles should be merged. Bayes' theorem is just an application of conditional probability and it's equally valid in Bayesian or frequentist contexts. The Bayesian inference article could defer to the Bayes' theorem article for the basic examples (or vice versa) if it seems redundant having them in both places. Maghnus (talk) 04:31, 20 April 2011 (UTC)

Oppose merger, as all others have explained.

By the way, the lead paragraph of this article says Bayesian inference is Bayesian "from its use of the Bayes' theorem in the calculation process." I disagree. The article Bayes theorem says that Bayes's proposition nine is Bayesian in its application of probability to a parameter. The article Bayesian probability seems (at a glance) to focus on the yet wider application of probability to any statement; there are objectivist and subjectivist interpretations thereof. --P64 (talk) 23:35, 25 April 2011 (UTC)

The idea of merging these two articles is incredibly stupid. You should be ashamed of yourself. 165.155.196.88 (talk) 15:41, 5 May 2011 (UTC)

This is not a very good idea. They are separate things. Although each article could be refined independently. —Preceding unsigned comment added by 134.83.216.20 (talk) 14:36, 8 May 2011 (UTC)

— continued from above — I see no support for a traditional merge and consensus that Bayes Theorem/Law/Rule spends too much space on Bayesian inference.--P64 (talk) 18:35, 8 May 2011 (UTC)

Close I have removed the merge template, as the proposed move of material has now been done. Melcombe (talk) 12:53, 26 July 2011 (UTC)

the edit by Gnathan87

Well done Gnathan87. But

The word boolean is computer jargon and could either link to boolean data type, or be omitted.
The conditional probability notation P(H|E) used here is shorthand for P(H=h|E=e) which is shorthand for P(Hypothesis=h|Evidence=e).
The posterior probability was just called the degree of confidence. Why not stick to the terminology that the unobserved evidence has a degree of probability in light of the hypothesis, while the hypothesis has a degree of confidence in light of the evidence?
Why not write (E|H) and (H|E) rather than P(E|H) and P(H|E) ?
You link to the article on Bayesian probability, so there is no need to copy it here.

Bo Jacoby (talk) 06:37, 18 July 2011 (UTC).

Thanks Bo. Yes, to avoid treading on toes I did not edit this section quite as extensively as I otherwise would have, and I agree with some of your points.

I am not sure that the word "boolean" is in fact specific to computer science. I actually feel this is worth mentioning. as it is a characteristic of Bayesian inference that H may be only true or false. For example, it may be that a variables V may take several values, but the hypothesis each time tests "V=1" true or false? "V=2" true or false? "V=3" true or false? etc. Pointing out this distinction (even if indirectly) avoids potential confusion.

boolean: The article boolean does not explain the intended meaning of the word even if the article boolean data type does. That is why I think that "boolean", meaning true or false, is specific to computer programming. The reader who does not know that meaning of "boolean" is left helpless. That the hypothesis is true or false can be written H=true or H=false, without using the word "boolean". But why is it essential that there are only two possibilities? The degree of confidence of V=1, V=2 and V=3 makes perfectly sense. (See Bayesian inference#Distribution of a parameter of the hypergeometric distribution). Bo Jacoby (talk) 09:45, 20 July 2011 (UTC).

I am not particularly concerned over the use of a single word. However, I still feel it is a very well known term, particularly for the likely audience of this article. Maybe I am wrong about that. The reason that H is a boolean is because H ≡ "V=1" etc. The notation in Bayesian inference#Distribution of a parameter of the hypergeometric distribution is slightly misleading in that the probability actually being calculated is P(H|k), where H ≡ "K=1" etc. As I understand it, that is the Bayesian interpretation of probability - confidence in the truth or falsehood of a hypothesis. Gnathan87 (talk) 02:44, 21 July 2011 (UTC)

I agree that the shorthand P(H) etc should be used. However, while it is fairly obvious that H and E stand for Hypothesis and Evidence, I still think these should be defined.
I'm not sure that abbreviating to just (E|H) is a good idea. It's not a big deal to spell it out, and avoids potential confusion.

In the full formula

P(H=h|E=e)=P(H=h)\cdot {\frac {P(E=e|H=h)}{P(E=e)}}

the values are often omitted

P(H|E)=P(H)\cdot {\frac {P(E|H)}{P(E)}}

The P 's are actually superfluous

(H=h|E=e)=(H=h|)\cdot {\frac {(E=e|H=h)}{(E=e|)}}

Omitting both gives the very compact expression

(H|E)=(H|)\cdot {\frac {(E|H)}{(E|)}}

Bo Jacoby (talk) 09:45, 20 July 2011 (UTC).

I agree that the notation P(H) should be used instead of P(H=h) etc, However, I am not sure about omitting the P. Is that a common notation? Also, is it used in other articles? I have a feeling it might be confusing to some readers, and in any case, I do not think leaving it in is too intrusive. Gnathan87 (talk) 03:00, 21 July 2011 (UTC)

With respect to the above merge discussion, I think it is in fact probably better to move the definitions from Bayesian probability to this article. This is something Melcombe already suggests actually on Talk:Bayesian_probability, and I agree. I was thinking that probably most of Bayesian probability#Calculations with Bayesian Probabilities could be removed from that article and merged into this one. Considering the above merge discussion I actually removed the same material from Bayes' theorem earlier today. I was not sure whether that was such a good idea, but on balance I think it is if this separation is to be encouraged. Gnathan87 (talk) 05:07, 19 July 2011 (UTC)

confidence: Bayesian_probability#Calculations_with_Bayesian_probabilities explains that "some of the probabilities are interpreted as representing beliefs". Bayesian inference is difficult to grasp when everything is called probability. The probabilities representing beliefs should be called something else, such as degree of confidence. That is what we were trying to do in bayesian inference. We should not call P(H|E) the posterior probability, but rather the degree of confidence in the hypothesis, H, after taking the evidence, E, into account. (If a gambler throwing dice obtains 'six' 24 times in a row, would you consider the probability that he is not cheating, or the confidence that he is not cheating?) Bo Jacoby (talk) 08:51, 20 July 2011 (UTC).

I think it is important that both terms are mentioned. I would use both in this context. I certainly don't think it's a good idea to use "prior probability" without mentioning "posterior probability". Gnathan87 (talk) 03:10, 21 July 2011 (UTC)

When you split the above discussion, please sign each coherent part of your own contribution, making it transparent who wrote what.
If X is a variable, and x is a value, then X=x is an event, and the probability of that event should be written P(X=x) rather than P(X), which is shorter to write but harder to read. So the Probability that the Hypothesis is true should be written P(H=true) the first time. Shorthand notations may be explained and used later. But it is not nice to use P as an abbreviation for degree of confidence.
I agree that it is not a good idea to use "prior probability" without mentioning "posterior probability". But is is a good idea to use "prior degree of confidence" together with "posterior degree of confidence".
Omitting the P and abbreviating to just (E|H) is to my knowledge not a common notation. (Albeit nice and clear).

Bo Jacoby (talk) 06:40, 21 July 2011 (UTC).

The last several edits gutted the article

Why was the definition of the terms in the formula removed?? This renders the article unintelligible to anyone who does not already know what the formula is.drh (talk) 02:57, 20 July 2011 (UTC)

We are working on it, sir! Bo Jacoby (talk) 09:06, 20 July 2011 (UTC).

¢== Melcombe's request. ==

Melcombe has requested proof or references for the claims in the subsection Bayesian_inference#Distribution_of_a_parameter_of_the_hypergeometric_distribution.

The proof that

p(K|k)={\frac {{\binom {K}{k}}{\binom {N-K}{n-k}}}{\binom {N+1}{n+1}}}

is

p(K|k)=p(k|K){\frac {p(K)}{p(k)}}={\frac {{\binom {K}{k}}{\binom {N-K}{n-k}}}{\binom {N}{n}}}{\frac {\frac {1}{N+1}}{\frac {1}{n+1}}}={\frac {{\binom {K}{k}}{\binom {N-K}{n-k}}}{\binom {N+1}{n+1}}}.

The proof that

\mu =-\left({\frac {({\frac {-N-2}{-n-2}})({\frac {-k-1}{-n-2}})}{\frac {1}{-n-2}}}\right)-1

follows from

\mu =\sum _{K=0}^{N}K\cdot p(K|k)

using the lemma 7b from binomial coefficient#Series involving binomial coefficients

\sum _{K=0}^{N}{\tbinom {K}{k}}{\tbinom {N-K}{n-k}}={\tbinom {N+1}{n+1}}.

The proof that

\sigma ={\sqrt {\left({\frac {({\frac {-N-2}{-n-2}})({\frac {-k-1}{-n-2}})}{\frac {1}{-n-2}}}\right)\left({\frac {(1-{\frac {-N-2}{-n-2}})(1-{\frac {-k-1}{-n-2}})}{1-{\frac {1}{-n-2}}}}\right)}}.

follows from

\mu ^{2}+\sigma ^{2}=\sum _{K=0}^{N}K^{2}\cdot p(K|k)

using the same lemma twice.

These results were (by my late friend Jan Teuber) found in an article by Karl Pearson from around 1927. I haven't got the exact reference. I develloped them independently a few years ago because I needed them and didn't find them in the books. They seem to have been forgotten by the statisticians.

The formulas are implemented by the J - programs

  deduc =. *`%`:3"2@(,: (%:@* -.))@((,: , 1:) % +/@])
  T     =. -@(+ #)
  induc =. (T&}: , }.)@(T~ deduc T)

The deduc program computes the sample mean and standard deviation from the population distribution like this

  6 deduc 20 40
    2     4
1.104 1.104

The induc program computes the population mean and standard deviation from the sample distribution like this

  2 4 induc 60
 22.25 37.75
 9.337 9.337

These programs are slightly more general than the formulas in that they also work for multivariate distributions.

   0 0 0 induc 3
1 1 1
1 1 1

Bo Jacoby (talk) 14:17, 5 August 2011 (UTC).

Andrew Gelman's edits

Apparently this article sucks. Shame I don't know enough to help! jorgenev 18:08, 14 September 2011 (UTC)

Apparently he tried to change it... but the edits were reverted. Douglas Whitaker (talk) 21:37, 25 November 2011 (UTC)

The page is pretty horrible. The descriptions of inference versus prediction, the stuff about events being 'freshly observed' and the unremitting emphasis on point hypotheses instead of continuous-valued parameters all provide a poor summary of what Bayesian inference is. Gelman recommends the scholarpedia page instead. --50.135.4.90 (talk) 23:25, 25 November 2011 (UTC)

The scholarpedia page however assumes quite a bit of prior knowledge, though. Why were Gelman's edits reverted? They seemed to improve the article while still maintaining the same target audience. 205.175.113.129 (talk) 12:45, 26 November 2011 (UTC)

I would emphasise that there is nothing inherently discrete about the description before the revert. In the continuous case, you just have uncountable models or outcomes as appropriate. In my view it is much easier to visualise things in this way, and it is better pedagogically. As for emphasis, although one of the examples was unwritten, there were meant to be two substantial examples demonstrating the continuous case. The continuous case was also explained in the text. The thing about inference vs. prediction was first mentioned by Gelman - a distinction he makes in his book. I personally am not sure it is necessary, but kept it in anyway.Gnathan87 (talk) 20:15, 28 November 2011 (UTC)

Gelman's edits were a major improvement over the current state. If you want anyone writing about Bayesian inference, he would be the guy. I would suggest to revert back to his edits. whyking_thc 09:17, 27 November 2011 (UTC)

As Gelman said himself, his edits were relatively minor; the article needed overall work. I did not intend to "revert" his edits so much as rewrite the content. In my view the old approach was not understandable for readers unfamiliar with the subject. In particular, it is not clear how P(E) fits in to the probability space. What do P(E) and P(E|H) mean if the probability space is not defined?

I would like to suggest a merge of the old and new content. There were also many general improvements that the revert has undone. Gnathan87 (talk) 20:03, 28 November 2011 (UTC)

Hypergeometric?

The presence of this section just puzzles me. Being of limited utility; technically involved (although elementary); and of little general pedagogic value, it seems totally inappropriate here. It adds little or nothing to the beta-binomial section afterward, which is both easier and more historically relevant, although I think even that should be revised. At any rate, the hypergeometric section should be moved to the end or (preferably) removed. — Preceding unsigned comment added by 209.2.221.99 (talk) 20:36, 8 November 2011 (UTC)

The formula for mean and standard deviation of K, given k,n,N, is very useful. From that you can easily compute mean and standard deviation of P=K/N, The beta-binomial section afterwards is about the same problem but only for the special case where N is infinite.

Consider for example 20 balls which may be red or not red. You take a sample of 5 balls and they are all red. Estimate the total number of red balls. You need the formula to find that the mean number of red balls = 17.857 and that the standard deviation = 2.247.

  0 5 induc 20
2.14286 17.8571
2.24745 2.24745

Bo Jacoby (talk) 22:10, 8 November 2011 (UTC).

So what does this general case add to the article? Why don't we just do the special case, and replace this section with some motivation for the beta-binomial? Note: the article is called "bayesian inference," not "sampling theory." — Preceding unsigned comment added by 160.39.140.189 (talk) 04:15, 11 November 2011 (UTC)

The general case solves the above example which cannot be solved by the beta-binomial special case. And it is bayesian inference. Bo Jacoby (talk) 12:54, 11 November 2011 (UTC).

I agree that these sections do not tell you any more about Bayesian inference itself, only about results derived using it. Maybe if these sections belong in the article at all it is more as examples. I was actually previously considering turning one of the sections into the (currently empty) parameter example, although maybe something much simpler would be better, for example finding the mean of a normal distribution... Gnathan87 (talk) 20:18, 18 November 2011 (UTC)

The mean, μ, of a normal distribution is tricky to estimate because the prior distribution is controversial. All you know is that −∞< μ<∞, and that does not define a probability distribution. The hypergeometric case is easier, because in a finite population (N balls) there is but a finite number of possibilities for how many of them are white. The principle of insufficient reason defines the prior distribution

P(K)={\frac {1}{N+1}}

for 0≤K≤N, integer K. The beta-binomial case is tricky too because 0≤P≤1, real P, does not uniquely specify a prior distribution for P because there is an infinite number of possible values for P. Unlike 0≤K/N≤1, integer K, where there is a finite number of possible values K/N. So the limiting case (N→∞) of the hypergeometric is a better approach to the beta-binomial case. Bo Jacoby (talk) 11:30, 21 November 2011 (UTC).

Mathematical foundation

Quote:

Basically, if the prior distribution has low variance (i.e. there is little uncertainty regarding the parameter), then the posterior distribution will be highly affected (compared with the conditional distribution of the data) and will be quite similar to the prior distribution. On the other hand, if the prior distribution has high variance, the choice of prior distribution will have little effect on the posterior distribution, which will instead be largely determined by the data.

This is true for normal distributions, but not for beta distributions, where the standard deviation is maximized,(σ=1/2), for the limiting case B(0,0). Nor is it true for discrete distributions. Bo Jacoby (talk) 13:22, 28 November 2011 (UTC).

Wording

Most occurrences of "uncertainty" should read "probability". Just one example:

Instead of "the uncertainty of one model tends to 1 while that of the rest tend to 0."

I suggest "the probability of one hypothesis tends to 1 while the probabilities of the other hypotheses tend to 0."

– Rainald62 (talk) 15:44, 25 November 2011 (UTC)

I agree that "uncertainty" is misleading. I would prefer "credibility". Like this: "the credibility of one hypothesis tends to 1 while the credibilities of the other hypotheses tend to 0.". The word "probability" is often already used to signify a parameter, and it is confusing to reuse it. Consider this case. A fair die has the probability one sixth of showing a six. You may suspect that the die is unfair when (say) four throws gave four sixs. Then you talk about "the probability that the probability of throwing a six is greater that one sixth". That is hard to understand. It is a little easier to say "the credibility that the probability of throwing a six is greater that one sixth". Bo Jacoby (talk) 08:05, 26 November 2011 (UTC).

I also do not like the word "uncertainty". I think the word "uncertainty" implies the opposite of "certainty", and "certainty" could be taken in the sense "degree of certainty" or "probability". (i.e. uncertainty = 1 - probability). I prefer the word "confidence" when referring to probability specifically under a Bayesian interpretation. To link to the above discussion, the reason I retained "uncertainty" was because it had been added by an expert. I am not sure about "credibility", but only because I have not seen it used in this sense very often. Gnathan87 (talk) 18:05, 29 November 2011 (UTC)

"Confidence" is a terrible term to use, as it leads directly to confusion with the non-Bayesian idea of "confidence intervals". "Degree of belief" or "degree of plausibility" or just "support" are standard terms. --50.135.4.90 (talk) 05:29, 2 December 2011 (UTC)

Good point. I am happy with "degree of belief", which I prefer to "degree of plausibility" because it is shorter. Just to add, shortening "degree of belief" to "belief" should probably be used sparingly if at all; "belief" sounds to me more like a yes/no thing. Gnathan87 (talk) 10:36, 2 December 2011 (UTC)

Glad you like "degree of belief". "State of belief" is not standard however, and its use in the introduction suggests that something importantly different is being defined, which is not the case. The introduction currently uses technical ideas ("degree of belief" being one) before they are defined; this is confusing, particularly to non-experts.--McPastry (talk) 19:13, 2 December 2011 (UTC)

Updated the lead to reflect these points. Gnathan87 (talk) 15:10, 11 December 2011 (UTC)

Goal of Bayesian evaluation

I think we need a "why do we care" functional statement, e.g. Bayesian inference "can be used to evaluate how reliable is an estimate of probability" or "is used to evaluate how likely a statistical correlation (or anything?) is to be true." Or something--if it can be stated non-mathematically. Help me out, here. I had a statement in mind but I lost it while I was composing the introductory sentence. —Monado (talk) 04:52, 8 December 2011 (UTC)

I've added some new material to the first paragraph. Does that convey the importance better? Gnathan87 (talk) 15:33, 11 December 2011 (UTC)

Revert of lead

[I would just like to note: since I have been a particularly active in editing this article recently, I hope the revert does not appear oppressive (not at all my intention)] A few comments on my reasoning:

The material on philosophy is fundamental to this topic, as it justifies Bayesian inference. Even if philosophy is not discussed at length, IMHO it should at least be mentioned. Otherwise for example, you leave hanging: why should the update provided by Bayes' theorem be used? Why should some other method not be equally valid, or better? Mention of the Ramsey-de Finetti theorem was in fact added partly in response to the comments of Monado (above).
On second thoughts, maybe reference to "Bayesian philosophy" is better than picking one justification. Hopefully it is now better?
The removed material explained the ideas to somebody new to the topic\a lay reader without going into mathematical details.
The structure of the lead was I think previously pretty good, and the edits removed the sense of developing the ideas. First paragraph, conveys the basic idea and background. Second paragraph, more in depth explanation of how Bayesian inference more generally applies to distributions of exclusive/exhaustive possibilities. Then it went on to discuss, as per the above discussion re Andrew Gelman, how it is practically applied.

Gnathan87 (talk) 14:21, 14 December 2011 (UTC)

Since the lead section has been reverted again, I would add some further comments, as in my view, the new version is a significant step backwards. Particularly, I will compare http://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=465836189 to http://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=465830221. (I agree that the distributions example was not ideal, although I would still like to find a really good but terse example to put in there).

To begin with something more superficial, the lead should be written as prose, not a list of bullet points. (Admittedly, the new version is effectively an expanded version of the old first paragraph. But it was pretty much prose, and the intention had ultimately been to try and make it flow better.) More importantly, the aim of this paragraph was to concisely present the very most important points - with the proviso that further details followed. The new version is missing any sense that Bayesian inference, in full generality, acts on a distribution. It is not explained that beliefs cannot exist in isolation (unless they have probability 1). This is bad for a number of reasons. Firstly, from WP:LEAD, "The lead serves as an introduction to the article and a summary of its most important aspects." Secondly, there is now nothing for those who do not wish to study the mathematics to grasp a flavour of of the details. (Also bearing in mind that many e.g. engineers/scientists are more used to starting from intuitive ideas rather than formulae). Thirdly, it may convey a misleading impression of what Bayesian inference achieves.

To address a few more specific differences:

The edit labelled "a detailed repetition of model selection method removed from the abstract". This is not technically "model selection", in which one would typically use ratios rather than the direct probabilities. Furthermore, this is a useful introduction to the ideas pushed for by Gelman; I would agree that it is an important generic method warranting its own mention in the lead.
The edits concerning "removal of unclear adjectives". One presumably wants to emphasise to the lay reader that Bayesian inference is very general, hence the use of phrases such as "many diverse applications" and specifying that the application are only "including" these fields. For a reader who is not minded to always be logically pedantic, the phrase now reads as if these are the only four fields in which Bayesian inference applies.
The edit concerning the use of Bayesian inference by the brain. "It has been suggested that the brain uses Bayesian inference to update beliefs" does not in my view go far enough. This is admittedly not an area I am very familiar with, so please correct me if I am wrong. But my understanding is that there is certainly evidence for this, past the mere suggestion that this is how the brain works.
The introduction of Bayesian probability as "A degree of belief is represented as a Bayesian probability". What should the uninformed reader understand by a "Bayesian probability"? Is it a normal probability? Is it different to a normal probability? Is this some complicated prerequisite? Much better to briefly explain, e.g. "the Bayesian interpretation of probability, in which degrees of belief are represented by probabilities."

Gnathan87 (talk) 16:46, 14 December 2011 (UTC)

Thank you for the above comment.

My edit was not a revert. It was a step in the (IMHO) right direction.
My objection to the lead is that is was confusing and messy. What is the meaning of "many diverse applications" as opposed to "many applications" ? The word "diverse" is confusing and basicly nonsensical in the context. What is the meaning of "many applications" as opposed to "applications" ? It is not easy to count applications. The present formulation "Bayesian inference has applications in science, engineering, medicine and law" is straightforward and does not imply neither that the number of applications within any of the four fields is low, nor that the applications are restricted to the four fields only. I also find it difficult to imagine an application which is not somehow included in the field of 'science'. The application of bayesian inference to 'law' is an application of 'science'.
Re "It has been suggested that the brain uses Bayesian inference to update beliefs". It is better if it does not go far enough than if it goes too far. This article on Bayesian inference does not elaborate on the subject, so why not just link to an article that does.
If any aspect needs to be adressed in the lead, it should have a subsection of its own in the article.
My objection against a link to Ramsey-de Finetti theorem is that no such article is found in wikipedia. Write it first, link to it later.

Bo Jacoby (talk) 17:59, 14 December 2011 (UTC).

Bo - my concern is really over how the phrases read.

2. The difference is in conveying a qualitative generality. Just "applications" is likely to be understood (rightly or wrongly) as in some sense restricted. Presumably, Bayesian inference should be understood as having a proliferation of applications spanning virtually any subject. I think it is quite correct to go on to list some fields in which Bayesian inference has well known specific uses. (What is implied is a more restricted view of "science" as "what a scientist does", "law" as "what a lawyer does" etc.) However, a reader will almost certainly take that it is limited (at least mainly) to these areas unless this is qualified with "including".

3.I agree, but I think the wording is wrong. "it has been suggested" suggests a paucity of evidence, whereas "there is evidence" suggests more validity (and hence justifies this information going in the lead). I don't think mentioning that evidence exists necessarily means you then have to go into details - diverting to Bayesian brain is fine. How about the version that's there now?

4. The deleted material does have its own subsection - Method. The deleted material was basically interpreting the mathematics.

Gnathan87 (talk) 21:28, 14 December 2011 (UTC)

2. You cannot stop people misunderstanding, but you can stop writing bullshit. "many diverse applications" is bullshit. Abandoning logic is abandoning communication. The purpose of an encyclopedia is not 'conveying a sense' but stating facts.

3. "There is evidence that the brain uses Bayesian inference to update beliefs" is a very strong formulation calling for proof and documentation and references. Personally I do not believe it to be true. The operation of the brain is an area of current research.

4. The nonmathematical summary of 'Method' is contained in the sentence: "Bayes' theorem is used to calculate how belief in a proposition changes due to evidence."

Bo Jacoby (talk) 22:40, 14 December 2011 (UTC).

2. With respect, good communication is about being specific. Good understanding is about being logical. "There are applications" leaves open non-factual options such as for example that the applications are limited. Furthermore, this is the understanding to which readers will probably gravitate. This is poor communication. I think there would be overwhelming, if not complete consensus that the applications of Bayesian inference are both "many" and "diverse". Why not qualify with "many diverse applications"?

3. Yes. That is why I originally wrote "there is evidence to suggest" :) Maybe then we need a more neutral phrasing. It is certainly clear that the brain does not always use Bayesian inference. Perhaps "Research has suggested"?

4. That is a summary, but the summary surely warrants more detail? For example, where in that summary does it explain that degrees of belief must always sum to 1 over the exhaustive/exclusive possibilities? What is there to explain that you cannot just take two arbitrary degrees of belief for heads and tails and then just update them independently of each other?

Gnathan87 (talk) 22:58, 14 December 2011 (UTC)

2. I do not know the limitations of the application of bayesian inference. Do you? It may turn out to be more (or less) limited that what we thought before, or more (or less) limited than what we think the reader thinks. This kind of information is not encyclopedic and should simply be omitted. When I read "many diverse applications" my feeling is: Please cut the crap and give me the facts. Provide rock solid knowledge, not hints or feelings.

3. Is there hard encyclopedic results from this brain research?

4. The fact that any probability distribution sums to 1 is general and not specific for bayesian inference. The details belong to the subsection and not to the summary.

Bo Jacoby (talk) 09:36, 15 December 2011 (UTC).

2. I agree that the descriptive version is not "rock solid fact", due to the inherent subjectivity in the meaning of adjectives. However, I somewhat object to the idea that anything that is not a "rock solid fact" is inappropriate. Many people will only read the lead, in order to come away with a feel for the topic. You must ask: if the reader had all of the details, would they apply this characterisation? If the answer is virtually certain to be yes, it is then helpful and appropriate. (Besides, what is a "fact" if not just something that everybody agrees with?) On the other hand, I do agree that it is good to strive for objectivity. Maybe there is some other way of putting it that would achieve consensus.

3. As I say, I'm not familiar with this area and I hope somebody more knowledgeable can step in. However, in the meantime I've done some research and the answer seems to be yes. According to http://www.bayesiancognition.org/readings/, "There are no comprehensive treatments of the relevance of Bayesian methods to cognitive science." However, by following references it is not hard to find examples. e.g. http://cogsci.uwaterloo.ca/courses/COGSCI600.2009/KorWol_TICS_06.pdf, "It thus seems that people are able to continuously update their estimates based on information coming in from the sensors in a way predicted by Bayesian statistics". From http://www.mrc-cbu.cam.ac.uk/people/dennis.norris/personal/BayesianReader.pdf, "The Bayesian Reader successfully simulates some of the most significant data on human reading.". From http://web.mit.edu/cocosci/Papers/f881-XuTenenbaum.pdf, "We report two experiments with 3- and 4-year-old children, providing evidence that the basic principles of Bayesian inference are employed when children acquire new words at different hierarchical level"

4. Yes, but the fact that a belief distribution must sum to 1 is a constraint imposed by Bayesian inference. Valid probability space => valid degrees of belief, but valid of degrees of belief =\> valid probability space. Because degrees of belief need not be coherent, it is not necessarily intuitive to view an individual belief as part of a set, or to think of degrees of belief as being dependent on one another. Particularly for those who do not study the mathematics there is nothing to emphasise the important idea that this is the constraint imposed. I still think this should be explicitly explained in the lead.

Gnathan87 (talk) 19:57, 15 December 2011 (UTC)

2. "if the reader had all of the details, would they apply this characterisation?" No they would not. The characterization "many diverse applications" is void of meaning, and the reader having all the details will recognize that fact.

3. Don't write in wikipedia until you really know what you are talking about.

4. The lead should be specific to the article. Other articles tell the properties of probability in general. I don't understand what you mean by "degrees of belief need not be coherent".

Bo Jacoby (talk) 22:52, 15 December 2011 (UTC).

2. I disagree, but this is really not worth arguing any more... 3. Thank you, but I was not being reckless. I am quite familiar with the topic, I have just not worked in that particular sub-discipline. As it happens, I have studied under one of the authors I linked to above and knew full well that this research existed. 4. Coherence is the property defined by Ramsey and De Finetti from which the axioms of probability are implied. Subjective degrees of belief need not be coherent, but it is only if they do that they are seen to be "rational". Gnathan87 (talk) 23:07, 15 December 2011 (UTC)

3. If you provide references to prove your claim as an encyclopedic fact, then I have no objection. Otherwise it does not belong here.

4. The reader has no chance to follow your distinction between rational and irrational beliefs.

Bo Jacoby (talk) 00:08, 16 December 2011 (UTC).

I'm not suggesting that coherence should be explained. What I am saying is this: Just because you say that "degrees of belief are represented by probabilities", the reader does not necessarily immediately conceptualize degrees of belief as part of a set that must sum to 1 and should be updated together. Why? Because that is not how degrees of belief are intuitively represented. Understanding the philosophy of Bayesian inference is about constructing the analogy in your mind between degrees of belief and probability spaces. This is a fundamental paradigm shift that I think should be explicitly walked through in the lead, even though it is technically implied. Particularly for those who are not so mathematically experienced, or do not study the mathematics.

I would still assert that the old version (which I am by no means suggesting was perfect, but had been under continuous development and was the result of much thought and balancing) was beginning to present a clear and understandable development of these ideas. I would also re-emphasise the reasoning behind the structure of the old version: The first paragraph, as has been retained in the current version, was deliberately "dumbed down", containing just the essential ideas for the casual reader. The second and third paragraphs then went on to build on these ideas, acting as 1. a more detailed summary for those who do not study the mathematics or 2. a preparatory introduction to the mathematics. Finally, there was a pointer to Bayesian model selection, a use for Bayesian inference that warrants mention in this article, but is covered elsewhere. Gnathan87 (talk) 17:15, 16 December 2011 (UTC)

Hmm, I happened to be using this book today: http://www.amazon.co.uk/Bayesian-Networks-Introduction-Probability-Statistics/dp/0470743042/ref=sr_1_1?ie=UTF8&qid=1324409881&sr=8-1#reader_0470743042. Of course it is not an encyclopaedia, but I would point out what (quite coincidentally) is written in the first paragraph on page 1: "The topic provides a natural tool for dealing with a large class of problems containing uncertainty and complexity. These features occur throughout applied mathematics and engineering and therefore the material has diverse applications in the engineering sciences." Gnathan87 (talk) 19:46, 20 December 2011 (UTC)

Nice! Without loss of meaning the author could simply have written: "The topic provides a tool for dealing with uncertainty and complexity, which occur throughout applied mathematics and engineering, and therefore the material has applications in the engineering sciences." Some authors get paid per page. We don't. Bo Jacoby (talk) 22:34, 20 December 2011 (UTC).

2. "throughout" could be the hint preventing the wrong interpretation of the current wording. Note the title of Jaynes' book, "... - the logic of science".

3. Although I'm pretty sure that my brain works the Bayesian way, I would prefer the cautious wording.

Besides, the very beginning, "In statistics ..." does not make sense to me. At least the current content of Statistics may be taken as proof that Bayesian inference is not "in" statistics. -- Rainald62 (talk) 00:17, 18 February 2012 (UTC)

Formula in Philosophical section

P(P|E) and others in formula and in text. I think these should be like P(A|B), like in the bayes' theorem article. --Pasixxxx (talk) 19:04, 6 January 2012 (UTC)

Disagree. This notation here makes clearer the epistemological interpretation of Bayes' theorem (dealing with a "Proposition" and "Evidence"). However, Bayes' theorem is fundamentally a mathematical relation on a probability space with no particular interpretation, which is why in my view it should be presented in its own article with more neutral symbols such as

A

and

B

. Admittedly, focus in Bayes' theorem has recently shifted towards a Bayesian interpretation; I have opposed that on the talk page because I do not think it is sufficiently NPOV. Gnathan87 (talk) 23:21, 7 January 2012 (UTC)

Letter P should not mean both probability and proposition in the same formula. So P(P|E) cannot be accepted. Bo Jacoby (talk) 20:20, 8 January 2012 (UTC).

Hmm. Technically, the syntactic context resolves any ambiguity (i.e. the probability function is defined to take propositions as arguments, and evaluates to a real, not a proposition). I think that the benefits of labelling "proposition" as "P" outweigh the (quite small) potential for confusion here. Gnathan87 (talk) 06:44, 12 January 2012 (UTC)

Thinking about it, one solution would be to use the function C() for "Credence" instead of P() for "Probability". C() is used in philosophy to emphasise that the interpretation of probability being used is subjective probability, aka credence. I'm not overly keen on this, though, because it adds extra complexity and is inconsistent with the notation in the rest of the article. Also, I'm not sure readers would be familiar with the term credence. Gnathan87 (talk) 12:21, 12 January 2012 (UTC)

This problem can be overcome by use of the Pr notation. After all, <math>\Pr</math> produces

\Pr

automatically. It is one of the notations in List of mathematical symbols (under P), and is widely used. There is no need to invent new notation (and anyway that woild not be allowable on Wikipedia). Melcombe (talk) 16:05, 12 January 2012 (UTC)

Although it may be worth bearing in mind that again, in the philosophical literature, Pr() is used to distinguish "Propensity" (i.e.objective probability) from P() for "Probability" (see e.g. http://www.joelvelasco.net/teaching/3865/humphreys%2085%20-%20why%20propensities%20cannot%20be%20probabilities.pdf) Gnathan87 (talk) 19:17, 12 January 2012 (UTC)

"Inference over a distribution"

(@Melcombe) A query on your revert: the intended "distribution" was the probability distribution over some set of exclusive/exhaustive possibilities. Is there some reason that is technically not a "distribution"? I think "Inference over a distribution" is a better name for that section, because it then clearly distinguishes it as an extension of inference on a single hypothesis from the previous section. Gnathan87 (talk) 23:11, 24 February 2012 (UTC)

What was in the section section at the time contained no mention of "distribution", nor gave any indication of what was distributed over what. You are relying far too much on telepathy on the part of readers, with unexplained equations using unexplained and newly invented notations. What WP:Reliable sources are you using? Melcombe (talk) 07:56, 26 February 2012 (UTC)

OK, I see. On second thoughts I do agree. I've tried a new name; might still be able to do better though.

I do admit to not having added sufficient sources recently. My view has been that given the frequent changes to the article, it has been in the interests of the article to concentrate on the basic structure and text, particularly since the content can be found or inferred from any text covering Bayesian inference. Once it begins to stabilize (as it is, I think, now doing), then it will be much clearer which sources are appropriate and where. As for "unexplained equations using unexplained and newly invented notations", I must say that I am unsure what you refer to. There does not appear to me to be anything unexplained inappropriately to the level of the subject, or newly invented. Gnathan87 (talk) 15:07, 26 February 2012 (UTC)

Deletion of unnecessary maths portions

Despite the warnings that stuff that does not meet Wikipedia standards will be deleted, someone feels an explanation is needed. Thus this appeared on my talk page following the re-insertion of clearly inappropriate material. " Melcombe. While still not using the discussion page you remove useful subsections. That is plain vandalism on your part. Bo Jacoby (talk) 23:44, 20 February 2012 (UTC)."

I responded as follows and repeat it here for info. Melcombe (talk) 22:47, 22 February 2012 (UTC)

This is of course nonsense. The material is plainly WP:OR as no WP:Reliable sources have been included. It also fails the test of being important to the description of Bayesian inference in general terms, and serves only to get in the way of expansion in useful directions. Moreover, it fails any reasonable excuse for being included as a "proof" of a result such as outlined as possibilities in WP:MOSMATH. Applying the rules of Wikipedia standards is clearly not vandalism. If this stuff were "useful" someone could provide a source for it and it might then be sensible to construct an article specific to Bayesian inference for that specific distribution, in which this stuff could be included. Melcombe (talk) 22:47, 22 February 2012 (UTC)

The matter was discussed above and on the archived talk page here. Melcombe removed two sections. The one on bayesian estimation of the parameter of a binomial distribution has Bayes himself as a source. The other one, on bayesian estimation of the parameter of a hypergeometric distribution, having Karl Pearson as the source, generalizes the first one and is conceptually simpler, because the estimated parameter only takes a finite number of possible values, and so the prior distribution is defined by the principle of insufficient reason (which was not applicable for the continuous parameter of the binomial distribution). My question to Melcombe regarding his need for further explanation or proof was repeated on his talk page and remains unanswered. Melcombe's contribution is destructive, and he is neither seeking consensus nor compromise. Bo Jacoby (talk) 04:27, 23 February 2012 (UTC).

The material deleted has neither person as an explicit source, and no explicit source at all. The second sentence of WP:Verifiability is "The threshold for inclusion in Wikipedia is verifiability, not truth — whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think it is true." ... and this amounts to providing WP:Reliable sources in the article. Then there is the question of whether the amount of mathematical detail that was included should appear in an encyclopedia article. It is clear that it should not, as it fails for reasons described at WP:NOTTEXTBOOK and in WP:MOSMATH. No amount of discussion can reasonably override these established policies. What was left is a summary of the result at a suitable level of detail. Bayes' contribution is contained in a much shortened section, as there is now a separate article on this source publication. Melcombe (talk) 18:09, 24 February 2012 (UTC)

I suppose that Melcombe is right in assuming that he can learn nothing from other wikipedia editors. The deleted sections were referenced from here showing that they are useful in answering elementary questions. The missing sources were mentioned on the talk page, which Melcombe didn't care to read before removing the material. An improvement, rather than a deletion, could have been made. Bo Jacoby (talk) 18:43, 27 February 2012 (UTC).

Bo,

Trotting out your first sentence harms your case. The usual wicked pleasure of provoking another editor won't be available here, because Melcombe will just ignore the bait anyhow. You'ld have better luck with me!

WP is not a textbook, so the helpfulness of the material in answering questions is an irrelevant although good fact.

That said, updating a distribution for the Bernoulli proportion is the simplest and most conventional example, so I should hope that you and Melcombe could agree on a version. Why not look at Donald Berry's book (or the earlier and rare book by David Blackwell), or the famous article for psychologists by Savage, Edwards, and Lindman?

Maybe you both could take a break from beating on each other, and beat on me for a while? ;D *LOL* I made a lot of edits in the last day, and some of them must strike you as terrible!

Cheers, Kiefer.Wolfowitz 19:03, 27 February 2012 (UTC)

Thanks to Kiefer.Wolfowitz for the reaction. I have got no case to harm. I included the formulas for mean value and standard deviation for the number K of hits in a population of N items, knowing the number k of hits in a sample of n items. I did not myself consider these formulas to be original research on my part, because they are found in the works of Karl Pearson, but, not knowing these formulas himself, Melcombe considers them to be original research on my part, and so he removed them from wikipedia. To me this is a win-win situation: Either my contribution is retained in wikipedia, or I get undeserved credit for inventing the formulas. Cheers! Bo Jacoby (talk) 16:07, 28 February 2012 (UTC).