Mathematics desk
< September 27	<< Aug \| September \| Oct >>	September 29 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

September 28

Estimating the variance-covariance matrix of a Multinomial function

I wish to estimate the variance-covariance matrix of a multinomial likelihood function containing two parameters.

I think the steps are:

1. Log-transform the likelihood function.

2. Take the second partial derivative of the log-likelihood with respect to each parameter.

3. Insert the parameter point estimates (obtained via maximum likelihood estimation) into the second partial derivatives.

4. Construct a matrix of the resulting 4 values. I believe this would be the Hessian Matrix.

5. Take the negative of that matrix, giving the Information Matrix

6. Invert the Information Matrix.

Could someone please verify that the above steps are correct, or point out which steps are missing or incorrect?

Once I know that I have the steps correct, I want to post an example. Thank you for any thoughts on this matter Mark W. Miller 00:34, 28 September 2007 (UTC)[reply]

I believe the steps above are correct after having worked entirely through one example. I did intend to post that example here in the hope others could check my partial derivatives. However, those equations are very long and very messy. In the last day I learned how to use the program MAPLE, with which I was able to check my partial derivatives for myself... in addition to evaluating them numerically on a computer with a different program. I'm not 100% certain I am using the terms "Hessian Matrix" and "Information Matrix" exactly as intended. Nevertheless, my computations do appear to be correct.

Mark W. Miller 04:25, 29 September 2007 (UTC)[reply]

So simple yet so hard

Radioactive particles decay. The unobtainium particles have a halflife of 1 second. That is, for each second there is a 50% chance the unobtainium particle will disintegrate.

At t==0 seconds , there are 3 unobtainium particles.

.

At t==1 seconds , there are X unobtainium particles.

.

At t==2 seconds , there are 1 unobtainium particles.

How many unobtainium particles are there at t==1 second?

It's obvious to me that the number of particles at t==1 is a discrete PDF whereby

Pr(X==0) = 0

Pr(X==1) = x1

Pr(X==2) = x2

Pr(X==3) = x3

Pr(X>3) = 0

1 = x1 + x2 + x3

But how to find x1, x2 and x3? The simple answer of x3 = comb(3,0) * p^0 * (1-p)^3 is wrong.( p= 0.5 )

220.239.107.201 13:33, 28 September 2007 (UTC)[reply]

You know the half life, so you could try calculating P(t1=X, t2=1), for X=1,2,3 (hint: what is the distribution of the number of decays in the first second? in the second second, given that there are now X at t=1?). Then note that the events described by those three probabilities are your entire conditional sample space, and do the appropriate thing (hint: the three probs calculated will not sum to 1). Baccyak4H (Yak!) 14:46, 28 September 2007 (UTC)[reply]

You know that one of the particles survived to 3 seconds, and the other two did not, so the only question is whether each of the other particles decayed during the 1st second or the 2nd second. There are four possibilities, that both decayed in the 1st second, that only the 1st particle decayed, that only the 2nd particle decayed, or that ~~both~~ neither decayed. Do you know how to figure each of those probabilities ? StuRat 19:00, 28 September 2007 (UTC)[reply]

I think one of those possibilities should, of course, be that "none decayed in the 1st second"Richard B 23:41, 28 September 2007 (UTC)[reply]

Yes, of course, good catch. Correction made. StuRat 01:43, 29 September 2007 (UTC)[reply]

A way of tackling this is given by Bayes' theorem. Calling Y the number of particles after two time units (without the observation fixing Y at 1), you can determine P(X = i), 1 ≤ i ≤ 3, P(Y = 1), and P(Y = 1|X = i), 1 ≤ i ≤ 3. Bayes' theorem then allows you to compute P(X = i|Y = 1), 1 ≤ i ≤ 3. --Lambiam 22:27, 28 September 2007 (UTC)[reply]

Most functions are not lines

The article about differential calculus states that "Most functions, however, are not lines." I would like to know:

1. Is this true?
2. If it is true, what does it mean?
3. What's the purpose of the article stating it? A.Z. 21:16, 28 September 2007 (UTC)[reply]

Function (mathematics) should help. Friday (talk) 21:21, 28 September 2007 (UTC)[reply]

I should explain myself better. I understand "most" to mean more than 50%, and "most functions" to mean "more than 50% of all functions". It seems to me that there's an infinite number of functions that are lines, and an infinite number of functions that are not. I have a hard time figuring what the sentence means. A.Z. 21:42, 28 September 2007 (UTC)[reply]

The article meant 'most functions are not straight lines' in the context of the example given - I've changed it to add that additional clarity.

Does that solve the problem or...?87.102.83.163 22:10, 28 September 2007 (UTC)[reply]

If you re-read that part - you will see that it is refering to the slope of straight lines. —Preceding unsigned comment added by 87.102.83.163 (talk) 22:13, 28 September 2007 (UTC)[reply]

Yes, I took "lines" to mean "straight lines". A.Z. 04:25, 29 September 2007 (UTC)[reply]

To compare the sizes of infinite sets, mathematics uses cardinal numbers. The cardinality (size) of the set of functions whose graphs are lines is

{\mathfrak {c}}

, the cardinality of the continuum. The cardinality of the set of all functions (at least in classical mathematics) is 2^{${\mathfrak {c}}$}, which, by Cantor's theorem, is genuinely more than

{\mathfrak {c}}

. The cardinal number 2^{${\mathfrak {c}}$} is also known as

\beth _{2}

, or Beth two. See also Cardinality of the continuum#Sets with cardinality greater than c. —Preceding unsigned comment added by Lambiam (talk • contribs) 22:05, 28 September 2007 (UTC)[reply]

Thank you a lot for those links! I had never read about classical and non-classical mathematics. I read your observation about classical mathematics to mean that there's controversy about the topic. Your post obviously doesn't answer my question, but it's very helpful. I would like to be able understand one day why, in classical mathematics, they say that the set of all functions is bigger than the set of functions that are lines. This looks like something hard to prove. A.Z. 04:23, 29 September 2007 (UTC)[reply]

Think about picking a random function. This means for every real number, assigning it to a random value. If you were to do this 100 times, how often would the values lie perfectly on a line? What if you did it a million times? It would seem very, very unlikely (practically impossible). This is the intuition behind what the author means when they say that most functions are not lines. J Elliot 22:33, 28 September 2007 (UTC)[reply]

Thank you for your reply (really), but we are talking about all functions, not just a hundred of them or one million of them. A.Z. 04:23, 29 September 2007 (UTC)[reply]

Common sense works here. A real-valued function takes a real number x and produces a real number y. How likely is it that those xy pairs lie exactly on a line, any line? Is it not intuitively obvious that the likelihood is small? Even if we restrict the functions to be continuous (so that small changes in x produce small changes in y), lines are unlikely. In fact, we could restrict attention to polynomial functions of degree at most three,

y=c_{0}+c_{1}x+c_{2}x^{2}+c_{3}x^{3}.\,\!

For this to be a line, both c₂ and c₃ must be exactly zero. So in real life we're extraordinarily lucky if a function, say in physics or chemistry or ecology, looks like a line.

Why is this worth mentioning? Because we need to deal with such functions, to answer questions that only calculus can handle. --KSmrq^T 22:41, 28 September 2007 (UTC)[reply]

c₂ and c₃ will be exactly zero an infinite amount of times, and they won't be exactly zero also an infinite amount of times. Thank you a lot for your post, but that an infinite set can be bigger than another infinite set seems rather counter-intuitive to me. A.Z. 04:23, 29 September 2007 (UTC)[reply]

As an example consider the set of polymonials in x; of these polynomials with order 3 or less is a subset, and of those polynomials with c2 and c3 (see above) is a subset. So despite this set having infinite members it is smaller than the sets in which it is nested.83.100.183.116 06:02, 29 September 2007 (UTC)[reply]

It is certainly not new that there are different sizes of infinite sets. Lambiam has already mentioned the relevant concepts; I suggest you take a look at cardinality, and perhaps set for some more basic background. -- Meni Rosenfeld (talk) 08:27, 29 September 2007 (UTC)[reply]

Defining this in any strict sense becomes very difficult. First what is the set of all functions? Are we talking about just polynomial functions, or following Elliot, for every real number, assigning it to a random value which gives a much larger set including the cantor set. Then you would need to talk about the distribution of functions in a statistical sense. Here pure maths falls down, yes we can talks about the codimension or measure of our set. However it becomes a physical problem, the physics of a situation determines the set of functions and distribution of such. If we are working with two orbiting bodies, then the class of all functions we consider becomes just the conic sections. Its possible to construct a physical situation where most functions in that situation will be straight lines. You can make this at bit more formal by considering a Generic property of the set you are considering. --Salix alba (talk) 08:31, 29 September 2007 (UTC)[reply]

If your intuition isn't helping, the solution is to train a better intuition. Consider a stone-age culture whose numbers are "one, two, three, many". That's really how you're approaching infinity: just throwing up your hands and saying "really big". Fair enough, but we can also define and study infinity more carefully, as mathematicians.

Long ago people kept tallies of sheep or bushels of grain or whatever using physical tokens, one token for each item. Our formalism for "counting numbers" reflects this primitive idea, that we can match five pebbles with five sheep or five bushels of grain, so we have an abstract number "five". If one of our sheep goes missing, when we do the match-up we'll have a pebble left over. This leads to a formalism for comparing numbers. In fact, we can define and compare "cardinal numbers", including infinities, in just this way.

Suppose we are given two sets, A and B. A function ƒ: A→B assigns to each element a in set A an element b in set B. For our purposes, we insist that ƒ be an injection, so that if a₁ ≠ a₂, then ƒ(a₁) ≠ ƒ(a₂). That is, we're matching each different element in A with a different element in B. If such a function exists then we say, by definition, that the size ("cardinality") of the set A is less than or equal to the size of the set B, and write |A| ≤ |B|. If we also have an injection from B into A, then A and B are the same size.

From such a humble definition we can draw some fascinating conclusions. We can recognize (or define) an infinite set by the property that it has a proper subset (omitting one or more elements) of the same size. For example, the counting numbers {1,2,3,…} are an infinite set because ƒ(n) = 2n is an injection. Any set of this size or less is called "countable". It turns out that fractions (the rational numbers) are also countable, but that real numbers are not. The set of real numbers is a "larger infinity" than the set of counting numbers.

Our untrained intuition, accustomed only to finite sets, says that a proper subset must always be smaller. Our expanded intuition says, not always. But we can prove that the set of all subsets of a set A, sometimes denoted by 2^A, is always strictly bigger than A itself. Note that we include the empty set as one of the subsets, and that the theorem applies to infinite sets as well as finite sets. Now, as Lambiam has said, we can use this to show that the set of all real-valued functions is strictly larger than the set of functions which are lines.

We can argue that this should be plausible for the following reason. To specify a line function we need precisely two numbers, say the slope m and the y intercept b, defining y = mx+b. To specify a single arbitrary function we need an infinite number of (x,y) pairs, as much data as all the lines put together. Still, plausibility depends on intuition, so we'd prefer proof. Let L = R² be the set of all line functions, 2^L the set of all subsets of L, and let F = R^R be the set of all functions. The theorem assures us that |L| < |2^L|; so if we can prove that there is an injection from 2^L into F (implying |2^L| ≤ |F|), we will have proved that |L| < |F|, which is our goal. I hesitate to propose an exercise I have not attempted myself, but I'm going to leave this one to the reader.

The thing is, while we can discuss some interesting mathematics concerning sizes of infinities, it's a diversion from the intent of the remark in the article. It would suffice to observe that many functions of practical interest are not line functions. --KSmrq^T 08:56, 29 September 2007 (UTC)[reply]

I changed the article according to your last sentence. I am not yet able to understand what you wrote because you used too many words and symbols that I don't understand, and probably because it is complicated. I'm currently unable to have an opinion about whether the following is true:

Suppose we are given two sets, A and B. A function ƒ: A→B assigns to each element a in set A an element b in set B. For our purposes, we insist that ƒ be an injection, so that if a₁ ≠ a₂, then ƒ(a₁) ≠ ƒ(a₂). That is, we're matching each different element in A with a different element in B. If such a function exists then we say, by definition, that the size ("cardinality") of the set A is less than or equal to the size of the set B, and write |A| ≤ |B|. If we also have an injection from B into A, then A and B are the same size.

I should study cardinality and sets, I know, but Wikipedia's article is terrible at teaching me that. Did you prove (or tried to prove) above that "the set of all subsets of a set A, sometimes denoted by 2A, is always strictly bigger than A itself"? I don't see why, "by definition", we say that "the set A is less than or equal to the size of the set B". A.Z. 06:52, 30 September 2007 (UTC)[reply]

You can't "see" why something is so by definition. Once it has been agreed to define "positive" as meaning "greater than 0", then henceforth "positive" means "greater than 0", not for any particular deep reason that you can see, but simply by definition. The definition of one set's size being less than or equal than that of another set, or more precisely the mirror relation of being greater than or equal in size, can be found in our article on the cardinal numbers; search for "greater than or equal". If the size of B is greater than or equal to the size of A, then (by definition) the size of A is less than or equal to the size of B. That the set 2^A (not 2A) is larger than the set A is precisely the conclusion of Cantor's theorem. The proof is not particularly complicated. --Lambiam 18:39, 30 September 2007 (UTC)[reply]

Generally when mathematicians say "most" they're referring to measure, not cardinality. It's perfectly correct in most contexts to say that most continuous functions from R to R are not linear, even though the linear functions and the continuous functions have the same cardinality (namely the cardinality of the continuum). Measure theory formalizes the intuitive idea that a "randomly selected" continuous function is unlikely to be linear. -- BenRG 01:01, 30 September 2007 (UTC)[reply]

What does this section mean? I can barely understand it. Does it mean that there are mathematicians that don't believe in cardinality? A.Z. 06:52, 30 September 2007 (UTC)[reply]

In classical mathematics, one can make unrestricted use of the law of excluded middle, or the equivalent rule of double negative elimination. In particular, to show that some proposition P is true, classical mathematicians can use a proof by contradiction: assume that P is false, derive a contradiction from that assumption, and conclude that therefore P is true. Not so in constructive (or in intuitionistic) mathematics. The proofs for Cantor's results crucially depend on non-constructive proof methods, like showing that A < B and A = B lead to a contradiction, and concluding that therefore A > B. In general, there are several different equivalent ways in which you can define a mathematical concept. Whether "P implies Q" or "(not P) or Q" is used in the definition makes no difference, classically. But constructively, these two are not the same, and two different definitions that classically define the same concept may define different, non-equivalent concepts when interpreted constructively. It is possible to give a constructive definition of cardinality, but in fact it is possible to give different definitions that are not mutually equivalent, and whatever the definition selected, most classical results (such as Cantor's theorem) are no longer valid. Thus the whole edifice of cardinal numbers collapses. This is what Hilbert referred to when, in expressing his resolve to fight intuitionism, he proclaimed: Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können ("No one shall be able to expel us from the paradise that Cantor has created for us"). --Lambiam 19:42, 30 September 2007 (UTC)[reply]