User talk:Michael Hardy/A remark on epistemic probabilities
A certain "unfair" die shows a "1" with relative frequency 2/7, and each of the other five numbers each with frequency 1/7, i.e. the "1" appears twice as often as any of the other five numbers.
An ambiguous question asks: "What is the probability that a '1' appeared the first time the die was cast?"
All probabilities are conditional. (If one follows Kolmogorov's conventions, whereby a probability space consists of an underlying set Ω and a countably additive probability measure P on a sigma-algebra of subsets of Ω, then one could say that the probability of a measurable subset A of Ω is the conditional probability P(A | Ω).)
- If one asks for the conditional probability that a '1' appeared on the first toss, given the whole record of outcomes, then the answer is either 0 or 1.
- If one asks for the conditional probability that a '1' appeared on the first toss, given only the information about relative frequencies above, then the answer is 2/7.
- But what if one asks for the conditional probability that a '1' appeared on the first toss given even less information: one knows there are six possible outcomes but one is wholly ignorant of the relative frequencies above?
One possible answer simply rejects this last question: the question makes sense only if one has assigned some probability distribution to the set of possible frequency vectors, saying, for example, that the probability is 1/9 that the relative frequency with which a 1 appears is between 0.1 and 0.2, etc. From this position, one might reasonably decline to assign any probability to the proposition that there was life on Mars a billion years ago.
The mathematics of probability theory as developed by Kolmogorov and others enables us to deduce consequences from assignments of probability to such things as these, but it does not tell us what "prior probabilities" to assign in the first place. If we were to say that the probability that the die is "fair" is 1/2, and the probability that it has the biases assigned above is 1/2, then Kolmogorov can answer our third question:
Likewise if one were to say that the vector (p1, p2, p3, p4, p5, p6) of probabilities of the six outcomes of the throw of the die is uniformly distributed on the simplex defined by p1 + p2 + p3 + p4 + p5 + p6 = 1, then the probability that a "1" was observed on the first toss would be 1/6, but the conditional probability that a "1" was observed on the first toss, given that a "1" appeared on each of the next eight tosses, would be much higher than 1/6, since the sequence of eight consecutive "1"s would be evidence of bias.
The reasonable lay person unschooled in the mathematics of probability theory, observing a long string of consecutive "1"s, or even a less extreme case of unusually high frequency of "1"s, will begin to suspect bias. If one were to attempt to model such a layperson's behavior as some probability distribution assigned to the aforementioned vector (p1, p2, p3, p4, p5, p6), it is not immediately clear how, or even whether, one could do so. However, one aspect of the answer that appeals to common sense, is that, given symmetry of information that treats the six outcomes equally, the probability distribution one assigns to the vector (p1, p2, p3, p4, p5, p6) should be symmetric under permutations of the six indices. Among the consequences of this observation are:
- (A) The probability of a "1" on the first toss is 1/6; and
- (B) The probability of a "1" on the next toss, given the record of outcomes a long sequence of tosses, will approach the observed relative frequency with which a "1" has been observed. [Later note: It has been pointed out that this doesn't follow from the assumptions above without further assumptions. It is true if one uses any of the prior probability distributions that would be sensible under most circumstances. (And using the word "observed" twice in that sentence is probably redundant. Michael Hardy (talk) 00:52, 4 August 2011 (UTC))]
But how fast will it approach that observed relative frequency?
From Kolmogorov's point of view, that depends on what probability distribution has been assigned to the vector (p1, p2, p3, p4, p5, p6). But we don't approach the question of how to model the reasonable layperson's inferences with such a distribution already assigned. We might nonetheless reasonably approach it with the constraint of symmetry in the six indices. That is enough to draw the conclusion stated in the bullet point above that says the probability of a "1" on the first toss is 1/6. It is not enough to answer the question: how fast?
Kolmogorov's viewpoint allows us to say:
- For any probability distribution on the vector (p1, p2, p3, p4, p5, p6) that is symmetric in the indices, the probability of a "1" on the first toss is 1/6.
But Kolmogorov's point of view does not allow us to say
- The conditional probability of a "1" on the first toss, given a state of prior ignorance of the relative frequencies, that state of knowledge being symmetric under permutations of the six indices, is 1/6.
The latter statement is an eminently reasonable conclusion about how sure it is logical to be if one has only that information. But it is not a mathematical theorem. It is a counterexample to the belief that probability theory consists only of mathematics, as opposed to being a science that, by its nature, must rely heavily upon mathematics. Not all questions in probability theory are mathematical problems.
The stupidest comment I ever saw on the so-called principle of indifference was by Richard von Mises, saying that the assignment of equal probabilities in such circumstances is no more reasonable than all members of the basketball team have equal heights because one doesn't know their heights. Whoever will read what I wrote above should see how completely that misses the point.
From both Kolmogorov's POV and mine it makes sense to speak of the probability of ["1" on the first trial], and it does not make sense to speak of the die's probability of landing on "1" divorced from any particular trial and thought of as a relative frequency or propensity, since that is neither a well-defined event in a suitably defined probability space for this experiment (consisting of throwing the die infinitely many times) nor a well-defined proposition about whose truth one is uncertain. But from von Mises' POV and that of frequentist statisticians (in particular from Kolmogorov's other point of view), the "true probability of getting a '1'" is the limiting relative frequency, posited above to be 2/7. Such persons can easily misunderstand the principle of indifference as saying that in a position of ignorance one should estimate the six relative frequencies as 1/6. Then they refute that straw man. They take the principle of indifference to prescribe a way of estimating, rather than as stating the values of certain conditional probabilities. Michael Hardy (talk) 21:33, 9 May 2009 (UTC)
Opinion: people are not a simple object
[edit]I do not think I can say something clever on this matter, since I guess that all reasonable points of view are formulated long ago (repeatedly), and it does not help much, to say "but I prefer that one". However, Michael invites me to participate, and I do.
Some people try to understand inanimate nature (say "physicists" for short), other try to understand people's behavior ("psychologists" for short). (Well, encyclopaedists probably try both, but let me simplify facts.)
Michael is clearly a "psychologist" (I mean, within this discussion). I am a "physicist". And I agree to disagree; I do not think I could or should make him change his mind. (Similarly, I do not try to convert a poet into a mathematician, nor to convert myself into a poet.)
Still, sometimes I try to understand why people buy lottery tickets. I have a number of explanations (stupid or clever, I do not know). They are quite specific, taking into account rather special features of people's life.
As far as I understand, Michael (and many others) want to describe people's behavior out of some simple and elegant universal principles. I am skeptical. I believe that only (some properties of) inanimate nature admits such principles. People are strange and complicated ("of high complexity", that is, admitting no simple description).
I recall an engineer told me: an engineer must decide, even if he/she has not enough data. Well, I am not an engineer; a mathematician having not enough data must say: I do not know. If someone asks me "what is the area of a disk whose radius is somewhere between 2 and 3" I answer: I do not know; I only know that it is somewhere between 4π and 9π. If someone asks me "what is the probability that (...something...) given that the coin asymmetry is at most 0.25" I answer: I do not know; I only know that it is somewhere between (something) and (something). For me, there is no difference here between geometry and probability; the output results from the input; garbage in – garbage out. No miracles.
Still, if Michael will find miracles, I'll be happy to see it.
Boris Tsirelson (talk) 06:49, 10 May 2009 (UTC)
- I'm not a "psychologist" in the sense of trying to model or understand how people behave in such matters. Rather I have in mind how sure of an uncertain proposition it is logical to be. Not how sure people actually feel, nor how sure they act as if they are—just how sure it is logical to be.
- (But of course the question of whether logic can or should enter such questions will not be generally agreed on.) Michael Hardy (talk) 14:53, 10 May 2009 (UTC)
- Then I do not understand why do you think that questions with incomplete information are meaningful. A man seeing five "heads" out of otherwise unknown coin gets some opinion, some belief, some subjective probability, especially if he is forced to decide (rather than just theorize). In contrast, for a "physicist" this question is just like the area of a disk of unknown radius.
- In other words: I can understand (more or less) what is "subjective" probability, but only in relation to psychology. I do understand (I hope so) what is "physical" probability (objective, but silent toward the questions you like). What I do not understand (for now?) is something intermediate (probably you call it logical probability?) which is objective like the "physical" probability and nevertheless well-defined under incomplete information. I agree that it would be very nice to find such a notion; this is what I called "a miracle". However, what makes you think that it is possible? Boris Tsirelson (talk) 15:17, 10 May 2009 (UTC)
One reason for hope that it is possible is that we do see this result, that the probabilities mentioned above are equal to 1/6. That's not a mathematical theorem, but it is compelling. My actual suspicion is that uncertainty can be modeled by something intermediate between probability as conceived by Kolmogorov and his ilk (non-negative measures assigning measure 1 to the whole space) and propositional logic, and that those two extreme cases (conventionally understood mathematical probability theory, and propositional logic) would be included. Michael Hardy (talk) 15:41, 10 May 2009 (UTC)
- Then, let me try it. A disk is known to lie inside a square 2x2. What is the chance that its area exceeds π/4? Boris Tsirelson (talk) 16:52, 10 May 2009 (UTC)
I don't know. It reminds me of an exercise proposed in a paper by the physicist Edwin Jaynes: a very limp piece of string of length L is thrown very unskillfully onto the floor. Find the probability distribution of the distance between the ends. Michael Hardy (talk) 17:19, 10 May 2009 (UTC)
- Oh, really? I expected you'll use the principle of indifference. You are a proponent of it, aren't you? Please explain how you treat it, when you use it, and why you did not use it here. Boris Tsirelson (talk) 17:55, 10 May 2009 (UTC)
- "But Kolmogorov's point of view does not allow us to say: The conditional probability of a "1" on the first toss, given a state of prior ignorance of the relative frequencies, that state of knowledge being symmetric under permutations of the six indices, is 1/6." Sorry, I want to understand it, but I fail to. Could you please explain it in different words? I did not understand, which condition is given. Boris Tsirelson (talk) 18:01, 10 May 2009 (UTC)
The condition was the state of knowledge of one who knows there is a cubical die with the six outcomes, and although aware that there could be biases, is not aware of any biases. This condition is not an event in any probability space that we've posited; hence Kolmogorov's approach cannot entail the conclusion. And clearly it does not justify any conclusion that a "1" appears 1/6 of the time. But it does justify the conclusion that the conditional probability of "1" on the first toss, given that state of knowledge, is 1/6. Michael Hardy (talk) 21:59, 10 May 2009 (UTC)
- "Kolmogorov's viewpoint allows us to say: For any probability distribution on the vector (p1, p2, p3, p4, p5, p6) that is symmetric in the indices, the probability of a "1" on the first toss is 1/6." Is it really different from what you say? Is there any situation where the difference becomes important when deciding, what to do? Boris Tsirelson (talk) 04:38, 11 May 2009 (UTC)
- It seems, I start to understand what you say. You say that symmetry of the available knowledge is sufficient for knowing some (not all, well, but some) probabilities. Especially, if all we know is symmetric under permutations of "1", "2", "3", "4", "5", "6", then "1" is of probability 1/6. (In the context of subjective probability it is not bad, but you do not want it, do you?)
- I do not agree. As far as I know this was in the fashion some centuries ago, but then criticized, since it is too good: the less you know the more you conclude! Now it is in the fashion to believe that existence of probabilities is a very special feature of some experiments (and situations), such as coin tossing. Frequencies must be stable enough.
- In other words, it seems, your approach to the very idea of probability does not stipulate falsifiability. If probability theory is a part of mathematics then it need not (and cannot) be falsifiable. But if it is a part of sciences then it needs! Boris Tsirelson (talk) 05:31, 11 May 2009 (UTC)
Your statement that "the less you know, the more you conclude" is based only on a misunderstanding that should be dispelled by what I wrote above. You don't conclude that the limiting relative frequencies are 1/6. I think those who object are clumsily assuming that that is what is concluded. Michael Hardy (talk) 15:53, 11 May 2009 (UTC)
- Not at all. Starting with the phrase "I start to understand" above, I really do. I understand that you understand that the frequency in the long run need not be 1/6. But still: if you know something asymmetric then it is not easy to conclude what is the probability (yes, I know, we speak about the probability of the first toss only). And if you do not then you are happy to conclude that the probability (yes, of the first toss only) is exactly 1/6. Boris Tsirelson (talk) 16:04, 11 May 2009 (UTC)
What is symmetric is your knowledge of the situation, and the probabilities are taken to be epistemic, i.e. they are features of your knowledge of the situation. I think things like this may make a practical difference in how statistical inference is done. Michael Hardy (talk) 16:25, 11 May 2009 (UTC)
- I understand (more or less) what is an objective probability and what is a subjective probability. But what is epistemic probability? If it really is something meaningful then the corresponding theory should be falsifiable. Is it? (Otherwise I suspect that we say a word but we have no notion behind it, which happened a number of times in the history of sciences.) Boris Tsirelson (talk) 16:34, 11 May 2009 (UTC)
- But I am afraid, all that is already written in Statistical regularity, Frequency probability, Probability interpretations etc. Boris Tsirelson (talk) 05:41, 11 May 2009 (UTC)
- Off-topic, but maybe interesting: there exists a mathematical theory that describes such a situation: symmetry holds, but probabilities do not exist (and frequencies are extremely erratic). To this end, in the space of infinite sequences, use "quasi-sure" instead of "almost sure". It means, use meager sets instead of null sets. Boris Tsirelson (talk) 06:59, 11 May 2009 (UTC)
- All that reminds me of a derivation of the expectation of the geometric distribution via memorylessness: the expectation a satisfies a = 0 + (a+1) (1-p) whence a = (1-p) / p. Nice, but a doubt remains: is a a number? What if it is infinity? Similarly, epistemic probability must be 1/6 by symmetry. Nice, but a doubt remains: is it a well-defined number? Maybe it is rather an interval of numbers? A distribution over numbers? Something else? Nothing at all? How to know, how to verify? Boris Tsirelson (talk) 05:39, 12 May 2009 (UTC)
Subjective probabilities could possibly be considered an instance of epistemic probabilities. Logical probabilities (i.e. how sure of something is one logically justified in being, given what one knows) differ from subjectivie probabilities, and are definitely included among epistemic probabilities. Michael Hardy (talk) 21:13, 12 May 2009 (UTC)
- Does it answer my questions? Boris Tsirelson (talk) 06:00, 13 May 2009 (UTC)
A new interesting book
[edit]This could be interesting to both of us: [http://www.worldscibooks.com/mathematics/7312.html THE SEARCH FOR CERTAINTY (On the Clash of Science and Philosophy of Probability), by Krzysztof Burdzy]
The author is very clever (I think so, knowing him personally) and surely does not write rubbish.
For now I do not know what exactly does he write. However, after reading his Introduction (available for free) I feel that he agrees with you in a very important point: there is a probability theory as a part of sciences (not only of mathematics).
Also I see that he is very critical towards many views (probably, many of my views and probably also some of yours).