Jump to content

Wikipedia:Reference desk/Archives/Mathematics/2014 March 10

From Wikipedia, the free encyclopedia
Mathematics desk
< March 9 << Feb | March | Apr >> March 11 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 10

[edit]

Calculating average ranking

[edit]

I have a number of lists that rank Chinese characters by frequency of occurrence in various types of text. I only have the ranking order, not the actual frequencies. The lists are all different lengths. The most common characters appear in all lists, but many others appear only in a subset of lists. Characters that do not appear in a list are assumed to be less common (by some unknown amount) than the lowest-ranked character that does appear. I want to calculate an average ranking for each character, but I don't know what to do when a character does not appear in one or more lists. If I just ignore those lists, and take an average among the lists in which the character does appear, then the calculated average could be higher than if the character actually did appear, with a low ranking, in an ignored file. This is illogical. What is the sensible way to handle this, to make maximum use of the information that I do have, while not creating illogical results? 86.160.86.139 (talk) 14:13, 10 March 2014 (UTC)[reply]

There is no general purpose way to do this- basically you are asking for a voting procedure which will aggregate a set of rankings into a single average ranking. There are serious mathematical obstacles to this- see voting system and specifically voting paradox for some idea of what problems can arise. The specific issue of some characters not appearing on some lists is yet another major problem which as far as I can see doesn't have a clear resolution. You'll have to think about what sort of "average" your ranking is meant to represent, and choose a method based on that. For example if you only want an average of the most common or well-known characters, it might make sense to disallow any characters that fail to appear on one of your lists. This is a bit arbitrary, but whatever you choose to do will be similarly arbitrary. Staecker (talk) 15:02, 10 March 2014 (UTC)[reply]
There is no generally consistent way to perform rank aggregation (see Arrow's impossibility theorem), but folks do use heuristic methods that work well enough for their purposes. This paper discusses some methods, based on minimizing distance from the aggregate to the individual lists, and has an associated R package for computations. --Mark viking (talk) 15:39, 10 March 2014 (UTC)[reply]
Above answers are good and correct, but don't get too discouraged. Just because there is no mathematically perfect system doesn't mean there isn't one that will work well for your purposes. Since there are a very large number of characters, I suspect Borda count would be a decent place to start. SemanticMantis (talk) 15:45, 10 March 2014 (UTC)[reply]
Thanks for the great answers everyone. 86.160.86.139 (talk) 18:35, 10 March 2014 (UTC)[reply]

Why is arcsinh(1) so close to π/2 ?

[edit]

Is there some deeper reason, or meaning, or explanation, for the fact that I ask this in the light of and . — 86.125.196.90 (talk) 18:25, 10 March 2014 (UTC)[reply]

It must have something to do with the fact that . It follows that this is a question of showing that . I suppose there is a complex analysis method to evaluate the integral. I will revisit this later.--Jasper Deng (talk) 19:46, 10 March 2014 (UTC)[reply]
Speaking of integrals, plotting and simultaneously on the positive semiaxis reveals some very interesting results. There's a symmetry there with regards to the vertical line x = 1, and one can easily see how the area of in between 0 and 1 that lies above that of the Gaussian function nicely mirrors that of from 1 to Also, I was perhaps thinking of expanding into Taylor series, as well as writing the binomial series of , and then integrating it in between 0 and 1. — 86.125.196.90 (talk) 20:18, 10 March 2014 (UTC)[reply]
My personal opinion is that expanding it into a series is not going to be particularly helpful. You're right that I want to reduce it to a Gaussian integral or something.
It actually seems to be simpler than going into the complications of the natural logarithm, since the derivative of arcsine is . Then we're considering . The inverse function of the integrand is of course . But I'm not sure about this integral's relation with the Gaussian integral. More later.--Jasper Deng (talk) 23:02, 10 March 2014 (UTC)[reply]
Actually, it's a completely moot point - the two quantities are not equal, i.e. the hyperbolic arcsine of 1 is not equal to the square root of pi divided by two. The former is about .881, the latter about .886.--Jasper Deng (talk) 00:15, 11 March 2014 (UTC)[reply]
I never said they were... I was just surprised by their being so close to one another... (Maybe that approximation sign is to blame, it looks too close to an equality sign). — 86.125.196.90 (talk) 01:23, 11 March 2014 (UTC)[reply]
It's probably a coincidence, then. Here's where Taylor series get into the picture, I'd think. But even that is probably a pure coincidence. Basically, we're not going to be able to answer the question using the integral methods I really like to use.--Jasper Deng (talk) 02:00, 11 March 2014 (UTC)[reply]
We have an article on Mathematical coincidences, but this one doesn't seen to be mentioned (and it's not nearly as close an approximation as some that are). AndrewWTaylor (talk) 11:12, 11 March 2014 (UTC)[reply]
I think it's a coincidence, unless there is some clear motivation for thinking it is not. I'm sure that if you work hard enough, you'll find some a posteriori justification for it though. One thing you could try to do is to realize arcsinh(1) as an approximation of the Gaussian integral (e.g., by Riemann sums), but I don't see an obvious way to do this. Another thing you could try is to make an arctangent change of variables in the integral of . If you do this (with appropriate scaling), the graph of the integrand will nearly overlap with the graph of . Sławomir Biały (talk) 11:38, 11 March 2014 (UTC)[reply]
Perhaps eq 43 and writing the factorial of inaginary argument as a Gaussian integral... Count Iblis (talk) 14:01, 11 March 2014 (UTC)[reply]


OK, I think I've "solved" it. Sort of. I've noticed some time last year that Now, my (dumb) question is equivalent to proving that But, very roughly speaking, So Since is strictly increasing, and since is only slightly greater than 1, we have to find an argument slightly smaller than 1. Indeed, fits the bill. — 79.113.202.204 (talk) 21:18, 11 March 2014 (UTC)[reply]