Jump to content

Wikipedia:Reference desk/Archives/Mathematics/2015 February 26

From Wikipedia, the free encyclopedia
Mathematics desk
< February 25 << Jan | February | Mar >> February 27 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


February 26

[edit]

Sample size and confidence levels

[edit]

I know that logically I should have the same confidence in a statistical test that uses n=50 and gives a 95% confidence level as one that has n=1000, also with a 95% confidence level. But my gut feeling is to trust the one with the bigger sample size more. Is there any basis for this feeling? Bubba73 You talkin' to me? 09:31, 26 February 2015 (UTC)[reply]

No there's no good reason for that though someone might spot a problem with the prior hypothesis with the larger sample. For the test with n=50 the and 95% confidence if you looked at the figures you'd probably think the difference should be blindingly obvious whereas for n=1000 your intuition would say it was still iffy. So you'd probably have the exact opposite gut feeling if you actually looked at the raw data. Dmcq (talk) 11:33, 26 February 2015 (UTC)[reply]
One little thing might matter - an error in one data point out of 50 is more likely to change the conclusion than the error in one data point out of 1000. Bubba73 You talkin' to me? 23:03, 26 February 2015 (UTC)[reply]
Not necessarily at all. The 95% confidence interval will be much wider with only 50 samples rather than 1000. Dmcq (talk) 16:32, 28 February 2015 (UTC)[reply]
You do know that statistics from a sample size of n=50 is different from statistics from a sample size of n=1000 EVEN IF THE CONFIDENCE LEVEL IS EXACTLY THE SAME!!! The statistics from n=1000 has a smaller uncertainty or error interval than the one from n=50.
Just because two statistics have the same confidence level DOES NOT MEAN they have the same error interval. Naturally you want the result from the statistics with the smallest error interval. You would be a fool to choose n=50 over n=1000 unless you do not care about the error interval or if the cost of gathering a sampling point is very very expensive. 175.45.116.60 (talk) 00:55, 27 February 2015 (UTC)[reply]
I guess that is what I was getting at - the error interval. Is there an article that talks about the error interval? Bubba73 You talkin' to me? 00:14, 28 February 2015 (UTC)[reply]
If you want to read articles about the error, read below
Checking_whether_a_coin_is_fair#Estimator_of_true_probability
Standard_error
175.45.116.65 (talk) 05:22, 2 March 2015 (UTC)[reply]
I would say that one of the underlying issues is the following. Whenever you are applying a statistical test, you are generally going to be making some assumption about the underlying distribution of the data. For example, you might assume the expected values should reflect a constant plus random noise drawn from a normal distribution. When you calculate a 95% threshold you are essentially saying, given the model I expect, how much confidence do I have that my observations conform to that model. However, in the real world, statistical models often prove to be inexact. You might assume random variations that follow a normal distribution, but the truth is a Laplace distribution or something else. If statistics shows that your data doesn't fit the model, is that because you have discovered a physically important signal, or because your understanding of the background noise wasn't very good? With small numbers of data points, one often has to implicitly assume that the underlying model is reasonable (e.g. normally distributed errors), but when you have lots of data you can often test those assumptions and justify more rigorous conclusions. Dragons flight (talk) 00:42, 28 February 2015 (UTC)[reply]

Calculation method help

[edit]

Need help with correct method for calculating this:

I have membership id (which might have multiple members in it) and member id which represents an individual member of an account. I am trying to calculate average deposit / deposit date for memberships as well as for individual members.

example table below:

Membership ID Member ID Deposit Date Deposit Amount
121 1 23-04-2013 500
121 2 07-04-2013 500
131 46 23-04-2013 100
121 1 01-06-2013 900
131 46 01-06-2013 340
541 91 23-04-2013 500
679 51 23-04-2013 500
679 1 23-04-2013 500

— Preceding unsigned comment added by 203.63.22.226 (talk) 14:11, 26 February 2015

I've answered at the same question on the miscellaneous desk. Please don't post the same question on more than one desk. Dbfirs 23:23, 26 February 2015 (UTC)[reply]