Wikipedia talk:WikiProject Statistics/Archive 6
This is an archive of past discussions about Wikipedia:WikiProject Statistics. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 4 | Archive 5 | Archive 6 | Archive 7 |
Infobox problem
Could somebody take a look at Generalized gamma distribution for me? I've tried to add an infobox using {{Infobox probability distribution}}, but for some reason it's giving "Unknown type" for the pdf and the variance fields. It's probably some really simple thing that I've missed, but I'm just not seeing it; so if anybody could spot what I've done wrong, I'd be very grateful. Cheers, Jheald (talk) 20:34, 27 October 2012 (UTC)
estimating quantiles does not cite SPSS
The section discussing estimating quantiles does not mention SPSS. SPSS is older than the other packages mentioned. It is available in many languages and used in many disciplines.71.191.254.119 (talk) 12:53, 5 December 2012 (UTC)Art Kendall Art@DrKendall.org [1]
Rule of three (medicine)
FYI, Rule of three (medicine) has been requested to be renamed, see talk:Rule of three (medicine) -- 70.24.248.246 (talk) 06:14, 31 December 2012 (UTC)
Glossary of probability and statistics needs quite a bit of attention. There seems to be no inclusion criteria for the terms themselves and some of the definitions are weak. Illia Connell (talk) 04:05, 2 January 2013 (UTC)
A tag has been placed on Wikipedia:Pages needing attention/Statistics, requesting that it be speedily deleted from Wikipedia. This has been done under the criteria for speedy deletion, because it is a redirect to a nonexistent page.
If you can fix this redirect to point to an existing Wikipedia page, please do so and remove the speedy deletion tag. However, please do not remove the speedy deletion tag unless you also fix the redirect. Feel free to leave a note on my talk page if you have any questions about this. Illia Connell (talk) 07:02, 20 January 2013 (UTC)
- It is not a redirect to a nonexistent page. It is/was a redirect to a nonexistent section on a page that does exist. 81.98.35.149 (talk) 22:21, 21 January 2013 (UTC)
I have listed this unused redirect for deletion: Wikipedia:Redirects for discussion/Log/2013 January 20 Illia Connell (talk) 16:02, 20 January 2013 (UTC)
- The page is now used, having replaced coresponding address on Wikipedia:Pages needing attention. 81.98.35.149 (talk) 22:58, 21 January 2013 (UTC)
UK census 2011
FYI the big data release for the United Kingdom Census 2011 happened this week - we're trying to work out how to incorporate it into Wikipedia over at Wikipedia talk:WikiProject UK geography.FlagSteward (talk) 15:06, 31 January 2013 (UTC)
Get More Info
This actually answered my problem, thank you! — Preceding unsigned comment added by 94.153.11.162 (talk) 23:39, 2 February 2013 (UTC)
A discussion about tone and undue weight
At Template:Did_you_know_nominations/Heuristics_in_judgment_and_decision_making there is a discussion about whether an article in this WikiProject's scope should have been tagged for unencyclopedic tone. Additional perspectives on this would be welcome. MartinPoulter (talk) 16:06, 9 February 2013 (UTC)
What, if anything, should be done with Wikipedia:WikiProject Statistics/Manual of Style. It is tagged as "proposed", but there have been no substantive edits in over two years. Most of it addresses mathematical notation; presumably most of that is in Wikipedia:Manual of Style/Mathematics. There is no reference to the Stats MoS on WikiProject Statistics main page. Regards, Illia Connell (talk) 00:40, 6 March 2013 (UTC)
- I've marked the page as historical. Illia Connell (talk) 18:55, 18 March 2013 (UTC)
I have proposed merging Category:Statistics qualifications into Category:Statistics education. Please see this discussion: Wikipedia:Categories for discussion/Log/2013 March 28. Regards, Illia Connell (talk) 04:43, 28 March 2013 (UTC)
I propose removing Category:Machine learning from Category:Computational statistics. I think that most of the relevant sub cats of Category:Machine learning are already in Category:Statistics (or in another subcat thereof) and that most subcats of Category:Machine learning are not statistical in nature. Regards, Illia Connell (talk) 04:58, 28 March 2013 (UTC)
- That looks a good idea. 81.98.35.149 (talk) 14:16, 28 March 2013 (UTC)
I have updated Wikipedia:WikiProject Statistics/List of statistics categories, adding cats from Category:Category-Class Statistics articles. What was the original and current purpose of Wikipedia:WikiProject Statistics/List of statistics categories? Regards, Illia Connell (talk) 05:15, 5 April 2013 (UTC)
- I didn't create that page and haven't really used it, although I have updated it occasionally. The page is linked from the project main page. It might be useful for some purposes given that, if you bring up the list, and click on "related changes" on the left wikipedia toolbar, you get an indication of the recent changes to text on the front-pages of these categories .... there is sometimes useful text about the scope of the category. Doing the same starting from the project categories just gives changes to the talk pages of the categories. Note also that the list is a somewhat reduced verson of "all categories" since it omits many that are (or were originally) thought to be outside the scope of immediate interest to the project -- statistical methodology -- omitting much of probability, and statistics books, journals, organisations, software. Melcombe (talk) 09:11, 5 April 2013 (UTC)
- Thanks for pointing out the "related changes" link - I had not known about that trick. I like the idea of limiting the list to categories related to methodology. Regards, Illia Connell (talk) 23:17, 5 April 2013 (UTC)
Any objections if I get rid of this: Wikipedia:WikiProject Statistics/Cleanup listing? It has been replaced with a new bot generated listing, and, other than the talk page of a long-gone editor, no current pages link to it. Regards, Illia Connell (talk) 20:44, 6 April 2013 (UTC)
I have listed circular analysis for AfD: Wikipedia:Articles for deletion/Circular analysis following a contested PROD. Regards, Illia Connell (talk) 19:51, 31 March 2013 (UTC)
- Note that this was relisted on 9 April, so there's still time to contribute. Melcombe (talk) 17:01, 12 April 2013 (UTC)
WPStatistics welcome template
I've nominated {{WPStatistics welcome}}
for deletion because it is apparently unused since its creation. The discussion is at Wikipedia:Templates_for_discussion/Log/2013_April_18#Template:WPStatistics_welcome. Regards, Illia Connell (talk) 19:13, 18 April 2013 (UTC)
Journal templates
{{Statistics journals}} redirects to {{Open access statistics journals}}. As the lattter includes both open access and subscription journals, I propose renaming these so that {{Open access statistics journals}} redirects to {{statistics journals}} (i.e., swap the names of the two templates). There was some discussion of these templates on the talk page: Template talk:Open access statistics journals. Unless anyone objects, I'll request this double move. Regards, Illia Connell (talk) 01:30, 5 April 2013 (UTC)
- I think you should go ahead. It will then give others the scope to use these different names for usefully different purposes. Melcombe (talk) 09:16, 5 April 2013 (UTC)
- I've made the request at Template talk:Open access statistics journals#Requested move. Regards, Illia Connell (talk) 04:33, 7 April 2013 (UTC)
Done: see {{Statistics journals}}. Illia Connell (talk) 05:02, 23 April 2013 (UTC)
I've just created Minimum chi-square estimation. It needs further work, including references and links from other articles to it. Michael Hardy (talk) 01:21, 22 May 2013 (UTC)
"Hot articles" not updating?
The Hot articles section of the Statistics Project page does not seem to be updated recently. Some of the articles listed there have not been changed in the last three weeks. Mathstat (talk) 12:58, 3 June 2013 (UTC)
Help with WP:RSN?
One editor has voiced an opinion, but this may be an area that needs more expert opinion. Help welcome. Wikipedia:Reliable_sources/Noticeboard#Two_questions_about_if_economic_chart_RS. CarolMooreDC - talkie talkie🗽 22:41, 12 June 2013 (UTC)
History of Median test
The German version of Median test, de:Median-Test mentions 3 different names Mood’s Median-Test, Westenberg-Mood-Median-Test and Brown-Mood-Median-Test. Hence I have added a Expand template on the page. It will be highly relevant to add something about the history of the topic. The Legend of Zorro 02:08, 23 June 2013 (UTC)
One of your project's articles has been featured
Hello, |
Retiring
Thanks for your good works. Kiefer.Wolfowitz 15:11, 31 July 2013 (UTC)
New article proposal: Saddlepoint approximations
I just submitted for review a draft proposal for http://en.wikipedia.org/wiki/Wikipedia_talk:Articles_for_creation/Saddlepoint_approximation please have a view!
Kjetil B Halvorsen 14:15, 24 August 2013 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)
Choropleth map
There's a meaningless sentence fragment in Choropleth map#Color progression, described at Talk:Choropleth map#unintelligible fragment. Will someone who knows about this topic please fix it? --Thnidu (talk) 16:42, 10 September 2013 (UTC)
Error in the formula?
I test the Cochran Armtage test of trend, i got a negative variance for several data. The formula is may be false.
Sarah Cattan 04/10/2013
One- and two-tailed tests
This topic refers to the page here: [[1]]
I would suggest that the example of using coin tossing is potentially very confusing!
The naming of the "tails" side of the coin is purely coincidental and could easily lead to all sorts of confusion in a discussion about the tails of probability distributions. This is likely to be particularly so for those for whom English is not their native language.
Therefore I would suggest that the page needs revision to replace this example.
This is a significant edit and I do not feel competent to make it myself as a newly registered user of Wikipedia.
I hope this helps?
Andrew — Preceding unsigned comment added by Andrew4win (talk • contribs) 11:13, 8 November 2013 (UTC)
- This is a good point. I think you should create a new section on the discussion page of the article, Talk:One- and two-tailed tests, and copy the text above there. Sign your post by writing ~~~~ (four tildes) at the end. If you know of a better example, please feel free to replace it. A more experienced Wikipedia editor can improve on the formatting if necessary. Isheden (talk) 11:51, 8 November 2013 (UTC)
Type I error and alpha risk
Much confusion may be avoided if one starts a discussion on statistical testing by stating whether one is interested in proving either the NULL HYPOTHESIS [call it case1] or the ALTERNATIVE HYPOTHESIS [call it case 2].
This is to say, if I make a widget, and I want to know is it a good widget (the NULL HYPOTHESIS is that it is indeed a good widget, and the ALTERNATIVE HYPOTHESIS is that it is a bad widget), then I make a test which yields a test statistic that can indicate it is good.
But there is always a risk of error.
If the NULL HYPOTHESIS is to be proved then the Type I error or the alpha risk is that the NULL HYPOTHESIS is rejected when it is in fact true. In this case it is a False Negative--not a False Positive.
If the ALTERNATIVE HYPOTHESIS is to be proved then the Type I error or the alpha risk is that the ALTERNATIVE HYPOTHESIS is not rejected when it is in fact false. This is a False Positive.
[case 1] If NO Type I error is made, then testing a good widget = do not reject the NULL HYPOTHESIS. Hey! We have a good widget (and it is true)!
[case 2] If NO Type I error is made, then testing a good widget = reject the ALTERNATIVE HYPOTHESIS. Hey! We do not have a bad widget (and it is true)!
[case 1] If a Type I error is made, then testing a good widget = has a statistical result of rejection of the NULL HYPOTHESIS and is perceived as a bad widget--incorrectly! And you throw a good product in the trash.
[case 2] If a Type I error is made, then testing a good widget = has a statistical result of failing to reject the ALTERNATIVE HYPOTHESIS and is perceived as a bad widget--incorrectly. And you throw a good product in the trash.
In either case, a Type I error will cause you to throw good widgets in the trash. Which is not good if you want to stay in business.24.249.171.39 (talk) 02:36, 13 November 2013 (UTC)Brett Strawbridge
Please help with a new article
I'm an AfC reviewer, I recently accepted the article Bayesian programming into mainspace. The original author is now asking me a bunch of questions on the article talk page that I cannot competetently answer. My statistics "expertise" starts and ends at calculating a simple average. The main issues with the article are about style and referencing - it has been written very much like a college paper and a large section filled with complex formulae is completely unsourced, the original author says it is "self sufficient" - but that doesn't fly on WP of course. Roger (Dodger67) (talk) 16:57, 25 November 2013 (UTC)
Edit-a-thon for International Year of Statistics
This Sunday, Dec 8, at the main Washington DC public library, there will be an edit-a-thon on articles related to statistics, censuses, and surveys. The Wikimedia DC chapter and the American Statistical Association are sponsoring it in the context of the International Year of Statistics. We welcome you to participate! To show up in person, see the event page for details. To participate virtually, sign up there and we expect to have a live Etherpad with some e-conversation during the event. To suggest what to do, or how to do it, e.g. items for us to work on, please add them to the list on the event page or write me on my talk page. -- econterms (talk) 21:45, 5 December 2013 (UTC)
AfC submission
Mind reviewing this submission? Regards, FoCuSandLeArN (talk) 23:39, 15 January 2014 (UTC)
Thank-you for your efforts
I know this isn't the place for this, but as someone who had no statistics (still possible in 1980's to get a phd in maths with no stats), but desperately needing to explain stuff to her children studying two types of engineering, I REALLY appreciate all the effort that has gone into making the statistics pages level-understandable (starting low and going up). (Having worked on the mathematics wikipedia pages, I know how difficult it is to come to a consensus.) Thank-you.Lfahlberg (talk) 13:28, 22 January 2014 (UTC)
"Free statistical software"
Free statistical software is an odd article, seemingly lifted as a whole from Citizendium and giving avuncular advice about statistics freeware (despite its title, not free software) but also statistics in general. Bits within it strike me as worth of retention though, somewhere. Any ideas? -- Hoary (talk) 00:53, 18 February 2014 (UTC)
- Statistical software software redirects to List of Statistical Packages. My feeling is that there should be a page about statistical software in general. Then the list page can state what licenses they use. Is there anything unique about Free Statistical Software other than its license?
- On a separate note what are guidelines regarding using content for Citzendium? Jonpatterns (talk) 10:08, 18 February 2014 (UTC)
- No, this has nothing to do with license. It's not about the libre, it's about the gratuit. Offhand I can't think of examples that are the former and not the latter (and thus wouldn't be in the article), but the article has plenty of examples of what are the latter but not the former. Is there anything special about software that's gratuit? A lot of people would say no, but I'd guess that most of these either have their software paid for by others or have a reliable and ample salary. Personally, I'm not at all unsympathetic to the idea of publicizing the gratuit. Prices could easily be added to Comparison of statistical packages. Oh no, wait: they can't. ("An article should not include product pricing or availability information unless there is a source and a justified reason for the mention.") -- Hoary (talk) 12:49, 18 February 2014 (UTC)
- I think the article should be renamed Statistical software. The advantage of libre licensing and gratuit pricing can be discussed as part of the article. The exact pricing of a solution isn't needed, it can simply be said to no/low/med/high cost. Jonpatterns (talk) 13:26, 18 February 2014 (UTC)
- The advantage of the gratuit should be obvious. Of course, some people may think of the cliché "There's no such thing as a free lunch" and extend it to software: surely what costs no money must be a trojan or similar? Curiously, the article freeware doesn't do anything to allay such fears; but if any article should allay them, that is it. After all, there's no obvious reason to think that the pluses and imaginable minuses of statistics freeware as compared with for-money statistics software would differ from those for software with other applications. Likewise, the pluses and imaginable minuses of free software (and the related open-source software) can be discussed in those articles. Meanwhile, I'm not at all sure that software can be described as no, low, medium or high cost unless we can present sources commenting on this. Plus these terms have debatable meaning: I'm a stingy bastard, so what's "medium cost" for you can be "high cost" to me; everyone you know qualifies for the academic discount for product X, but it doesn't apply to me; north Americans are quoted prices for Y that I have to grudgingly concede are "medium", but here in Japan Y is only available via one firm, which trebles its price (citing its addition of Japanese documentation, which I anyway don't want); etc etc. -- Hoary (talk) 09:15, 19 February 2014 (UTC)
- We do have lists that categorize as freeware, free software, or commercial software, which is quite specific and generally easy to source. - MrOllie (talk) 15:44, 19 February 2014 (UTC)
- The advantage of the gratuit should be obvious. Of course, some people may think of the cliché "There's no such thing as a free lunch" and extend it to software: surely what costs no money must be a trojan or similar? Curiously, the article freeware doesn't do anything to allay such fears; but if any article should allay them, that is it. After all, there's no obvious reason to think that the pluses and imaginable minuses of statistics freeware as compared with for-money statistics software would differ from those for software with other applications. Likewise, the pluses and imaginable minuses of free software (and the related open-source software) can be discussed in those articles. Meanwhile, I'm not at all sure that software can be described as no, low, medium or high cost unless we can present sources commenting on this. Plus these terms have debatable meaning: I'm a stingy bastard, so what's "medium cost" for you can be "high cost" to me; everyone you know qualifies for the academic discount for product X, but it doesn't apply to me; north Americans are quoted prices for Y that I have to grudgingly concede are "medium", but here in Japan Y is only available via one firm, which trebles its price (citing its addition of Japanese documentation, which I anyway don't want); etc etc. -- Hoary (talk) 09:15, 19 February 2014 (UTC)
- I think the article should be renamed Statistical software. The advantage of libre licensing and gratuit pricing can be discussed as part of the article. The exact pricing of a solution isn't needed, it can simply be said to no/low/med/high cost. Jonpatterns (talk) 13:26, 18 February 2014 (UTC)
- No, this has nothing to do with license. It's not about the libre, it's about the gratuit. Offhand I can't think of examples that are the former and not the latter (and thus wouldn't be in the article), but the article has plenty of examples of what are the latter but not the former. Is there anything special about software that's gratuit? A lot of people would say no, but I'd guess that most of these either have their software paid for by others or have a reliable and ample salary. Personally, I'm not at all unsympathetic to the idea of publicizing the gratuit. Prices could easily be added to Comparison of statistical packages. Oh no, wait: they can't. ("An article should not include product pricing or availability information unless there is a source and a justified reason for the mention.") -- Hoary (talk) 12:49, 18 February 2014 (UTC)
Popular pages tool update
As of January, the popular pages tool has moved from the Toolserver to Wikimedia Tool Labs. The code has changed significantly from the Toolserver version, but users should notice few differences. Please take a moment to look over your project's list for any anomalies, such as pages that you expect to see that are missing or pages that seem to have more views than expected. Note that unlike other tools, this tool aggregates all views from redirects, which means it will typically have higher numbers. (For January 2014 specifically, 35 hours of data is missing from the WMF data, which was approximated from other dates. For most articles, this should yield a more accurate number. However, a few articles, like ones featured on the Main Page, may be off).
Web tools, to replace the ones at tools:~alexz/pop, will become available over the next few weeks at toollabs:popularpages. All of the historical data (back to July 2009 for some projects) has been copied over. The tool to view historical data is currently partially available (assessment data and a few projects may not be available at the moment). The tool to add new projects to the bot's list is also available now (editing the configuration of current projects coming soon). Unlike the previous tool, all changes will be effective immediately. OAuth is used to authenticate users, allowing only regular users to make changes to prevent abuse. A visible history of configuration additions and changes is coming soon. Once tools become fully available, their toolserver versions will redirect to Labs.
If you have any questions, want to report any bugs, or there are any features you would like to see that aren't currently available on the Toolserver tools, see the updated FAQ or contact me on my talk page. Mr.Z-bot (talk) (for Mr.Z-man) 05:28, 23 February 2014 (UTC)
AfC submission
Wikipedia talk:Articles for creation/Variance of Effect-size. FoCuSandLeArN (talk) 12:55, 24 February 2014 (UTC)
Another AFC submission
I just created Articles for creation/Abundance estimation, I intend to add to it over the next few weeks/months but I welcome any help or additions. I intend it to be an overview of abundance estimation methods and their applications. My main focus is currently on mark-recapture methods so I would particularly welcome input/additions about other methods. Jamesmcmahon0 (talk) 21:33, 5 March 2014 (UTC)
Dear statistics experts: I am guessing that this old Afc submission is about some kind of statistical analysis. It will be deleted shortly unless someone decides that it is a notable topic and should be kept and improved. Just saying... —Anne Delong (talk) 00:45, 4 April 2014 (UTC)
Most Influential Languages in the World Economy
Please comment at Talk:Linguistic demography#"Influential languages" chart on a chart which I believe should be removed from the article. Since the editor who created the chart disagrees with me, consensus is needed. Cnilep (talk) 00:59, 22 April 2014 (UTC)
Request: simple better graph for Lorenz curve
Please see my comment at Talk:Lorenz_curve#Better_graph_suggestion. Can anyone help create it? If so, please post there. --Piotr Konieczny aka Prokonsul Piotrus| reply here 09:09, 29 April 2014 (UTC)
Gini coefficient discussion
Project members are invited to look at Talk:Gini coefficient#Gini in Template:infobox country and to provide input. – S. Rich (talk) 04:27, 6 May 2014 (UTC)
A lot of continuous distribution pages now have a "Different Equation" section that is rather opaque, it's just title linking to differential equations and a horribly typeset set of differential equations - no text explaining it. I don't have time to fix it up myself, so I thought I'd pass on my observation. Lucaswilkins (talk) 20:32, 12 May 2014 (UTC)
AfC submission - 01/06
Draft:Bayesian hierarchical modeling. FoCuSandLeArN (talk) 20:27, 1 June 2014 (UTC)
AfC submission - 04/06
Draft:Multiple factor analysis. FoCuSandLeArN (talk) 22:46, 4 June 2014 (UTC)
A draft at AFC needs some specialist attention
Please see Draft:Geometric-Poisson Distribution, it needs some help from a subject specialist to get it into acceptable shape. Roger (Dodger67) (talk) 14:45, 28 June 2014 (UTC)
Is this project dead?
None of the current topics on this page has received even a single reply, does that mean there is no action here? Roger (Dodger67) (talk) 07:50, 29 June 2014 (UTC)
- Quiet, but not entirely dead. A number of topics were requests for comments at talk pages or drafts. I usually go directly to the indicated pages rather than comment here. The article you requested comments for has already been accepted, which quenches any comments at this point. I will say that the acceptance of the article was a mistake; geometric Poisson distributions (a type of compound Poisson distribution) have been around since the 70's and the present article seems a coatrack for a particular researcher's papers. --Mark viking (talk) 16:43, 29 June 2014 (UTC)
Ben Geen, Colin Norris, Lucia de Berk
https://en.wikipedia.org/wiki/Benjamin_Geen https://en.wikipedia.org/wiki/Colin_Norris https://en.wikipedia.org/wiki/Lucia_de_Berk
Are there people out there interested in the topic of unexplained clusters of cases at hospitals leading to miscarriages of justice? I recently got hired by the defence in the case of Ben Geen to take a look at some statistics in his case.
http://arxiv.org/abs/1407.2731
My report will be submitted to the CCRC in an attempt to get them to consider considering the case and who knows maybe even recommending a re-trial. So a very long, long way to do.
Since I'm working for the defence I should not be editing wikipedia pages on the topic. But maybe other people like to. There is a lot going on, see
About the connections with Poisson variation (the law of small numbers) and data-analytic fishing expeditions (trawling) and cognitive biases in statistics, see
http://bengeen.wordpress.com/2014/07/26/an-open-letter/
Here is a big connection with statistics. Prof. Jane Hutton wrote an expertise report for the defence for (failed) appeal in 2008. She was not allowed to present her arguments in court to the jury because according to the judge what she had to say was "barely more than common sense, anyway". Shades of Sally Clarke, right? Similarly, an anaesthetist who had a lot of scathing things to say about the medical evidence, was not allowed to present arguments in court either, for the same reason. Of course, the anaesthetist was only a US associate professor and his evidence contradicted that of a UK full professor, very eminent man, who had previously been very useful in the Harold Shipman case. Now Shipman was a serial killer, no doubt about that. But Ben Geen?
Richard Gill (talk) 11:45, 28 July 2014 (UTC)
I've created this stub as a new parent for Category:Index numbers (which perhaps could be renamed). The category old parent was too limited (Index (economics)). There are indexes such as the Gender Gap Index and others I list at Measures of gender equality that are clearly not limited to the science of economics. I'll leave it to you to expand this article, or redirect it (I am not sure how relevant it is to the Indexed family...). If anyone would like a clarification for "what is this article about", it's about a type of object that could be linked from such sentence: A Human Development Index is an index that measures human development (and clearly the index (economics) was too narrow. --Piotr Konieczny aka Prokonsul Piotrus| reply here 06:14, 14 August 2014 (UTC)
It seems we are missing a key theoretical concept. Let's take a sample sentence: "The indicator is defined as a share of private sector employment of population aged 16". Where should the indicator link point to? I don't think our current disambig page has anything helpful... PS. Found a ref, will stub it - but help from more experienced editors in stats is appreciated. --Piotr Konieczny aka Prokonsul Piotrus| reply here 06:27, 14 August 2014 (UTC)
- In statistics, an indicator variable (also called a Dummy variable (statistics)) is a auxiliary binary variable created to indicate membership in a specified set. Having created the variable, one can perform statistical analyses on it. Indicator function is a closely related concept. What you are talking about seems more like an Economic indicator, which is just a statistic of interest. Presumably other social sciences have borrowed the idea from economics. For instance, there is an article on Community indicators and a journal Social Indicators Research We don't have an article on Social indicators, but probably should. --Mark viking (talk) 18:03, 14 August 2014 (UTC)
Template: regression bar: Why OLS under models?
In the template regression bar (http://en.wikipedia.org/wiki/Template:Regression_bar), I feel like ordinary least squares (OLS) shouldn't go under "Models" as it is an estimation technique and not a model. Indeed it is listed under "Estimators" too. — Preceding unsigned comment added by 46.5.16.52 (talk) 11:41, 22 August 2014 (UTC)
Dear statistics experts: Should this old AfC submission be kept and improved instead of being deleted as a stale draft? —Anne Delong (talk) 00:41, 9 September 2014 (UTC)
- Google scholar gives 3799 citations of the source paper, so seems notable. In my personal opinion, the main problem with the submission as it stands is not that it lacks sufficient context (the most recent reason for declining the submission), which could be solved with a few wikilinks in the first sentence, but that it never explains what GLS stands for. I know just about enough econometrics to know it's generalized least squares, but the submission doesn't mention that term either, which seems a major omission. Since this is a (fairly straightforward?) modification of the Augmented Dickey–Fuller test (ADF test), one option would be merging with that article. The origin and main field of application of this is econometrics rather than mainstream statistics, so I've cross-posted to WT:WikiProject Economics in the hope of finding someone with more knowledge of time-series methods who might be able to improve or merge this. Qwfp (talk) 17:09, 9 September 2014 (UTC)
- Thanks, Qwfp, for your efforts here. I have postponed deletion of the draft so that it won't disappear before something can be done with it. —Anne Delong (talk) 17:35, 9 September 2014 (UTC)
Comment on the WikiProject X proposal
Hello there! As you may already know, most WikiProjects here on Wikipedia struggle to stay active after they've been founded. I believe there is a lot of potential for WikiProjects to facilitate collaboration across subject areas, so I have submitted a grant proposal with the Wikimedia Foundation for the "WikiProject X" project. WikiProject X will study what makes WikiProjects succeed in retaining editors and then design a prototype WikiProject system that will recruit contributors to WikiProjects and help them run effectively. Please review the proposal here and leave feedback. If you have any questions, you can ask on the proposal page or leave a message on my talk page. Thank you for your time! (Also, sorry about the posting mistake earlier. If someone already moved my message to the talk page, feel free to remove this posting.) Harej (talk) 22:48, 1 October 2014 (UTC)
Audience considerations
I've just read the articles in Weiner and Gaussian processes. I am not a mathematician, but I am a social scientist with an interest in research methodologies. I was hoping to find a clear description of cases where an assumption of normal distribution is sound. I am working on a paper where qualitative interviews indicated heterogeneity in a key behavior, so we have looked at splitting our groups using k-means cluster analysis, based on continuous behavior observation data. We found that previously-used groupings of observations, where agents had been assumed to have homogeneous behavior, had heterogeneous behavior and that individuals clustered together in multiple equilibria. We have had a lot of push-back from the statisticians and stats-trained researchers, in the group, because they claimed at first to not understand the method and then said the findings were probably exaggerated.
I came to wikipedia with these concerns: what conceptual framework supports an assumption of homogeneity or heterogeneity? What tests are available to establish one or the other? What types of cause and effect relationships underlie equilibrium processes that exist in reality? Basically, I wanted to turn the argument around and ask them to question their assumptions in the same light they were questioning my work.
I searched the web for "empirical support for homogeneity and normal distributions" and saw the word "process" with wikipedia in the search results, and thought I was on the right track for finding information about the causal/conceptual framework, like an operational model, a process flow diagram or at least a textual description of what characteristics typify these sorts of processes, or something like that. But, I was completely unprepared to understand what I was reading. It was not helpful or useful to me at all.
I don't know in general about all of the articles in the math/stats project at Wikipedia, but these articles were not accessible to me. I think they would be inaccessible by any non-mathematician. The sort of 'text book talk' in proofs and formulas can be helpful. I've really appreciatd the project's sensitivity and specificity articles. But, in these articles there was nothing but 'text book talk'. I had no frame of reference to understand these articles.
Maybe it is my applied research background that cripples me in the more basic research and math theory arena, but it seems like the audience for wikipedia should be somewhat like that of an encyclopedia, not a text book. And definitely not an advanced undergraduate/graduate school level textbook.
So, all I can say in response to my colleagues, for now, is "your assumption contradicts the beliefs of the real people we are claiming to study" and "i've shown that there isn't a tendency toward an equilibrium between our three core behavioral indices, but toward multiple points of equilibrium". I am guessing they will reply "we know better than the people we are studying, they don't realize their equilibrium-seeking tendencies" and "all you've shown is something so confusing that we don't understand it and that you don't know how to do things the old fashioned, tried and true way".
I thought the wikipedia articles would help explain how empirical single-equilibrium processes occur, something about the standard approach for supporting an assumption of equilibrium and if and how homogeneity relates to the discussion and... And all I found were pieces written to an audience so specific that I didn't learn a single thing, although the figures did say something to me, but I can't explain what because the article didn't say.
I don't want this to be a place to settle a dogmatic/ideological score, but I do think the audience should be considered in a more meaningful way. I wanted to find information that could help me make sense of complicated math stuff, but it was over my head. I'm sorry to see that.
Draft:Gaussian process latent variable models
Draft:Gaussian process latent variable models
Opinions? Michael Hardy (talk) 00:02, 6 December 2014 (UTC)
ERROR IN CONFUSION MATRIX
Hello, I just noticed an error in the confusion matrix: the denominators of FPR and FNR are switched. NB: just in the two cases at the bottom of the confusion matrix. In the list to its right things are ok. Regards, Ivo. Jul 11 2014.
Class of Akaike information criterion
The article Akaike information criterion is currently graded Start class. I suspect that the class should be at least C.
SolidPhase (talk) 01:02, 2 January 2015 (UTC)
- I agree, at least a C class. I've changed the ratings accordingly. Thanks, --Mark viking (talk) 01:21, 2 January 2015 (UTC)
- Much glad you agree! SolidPhase (talk) 07:47, 2 January 2015 (UTC)
WikiProject X is live!
Hello everyone!
You may have received a message from me earlier asking you to comment on my WikiProject X proposal. The good news is that WikiProject X is now live! In our first phase, we are focusing on research. At this time, we are looking for people to share their experiences with WikiProjects: good, bad, or neutral. We are also looking for WikiProjects that may be interested in trying out new tools and layouts that will make participating easier and projects easier to maintain. If you or your WikiProject are interested, check us out! Note that this is an opt-in program; no WikiProject will be required to change anything against its wishes. Please let me know if you have any questions. Thank you!
Note: To receive additional notifications about WikiProject X on this talk page, please add this page to Wikipedia:WikiProject X/Newsletter. Otherwise, this will be the last notification sent about WikiProject X.
Harej (talk) 16:57, 14 January 2015 (UTC)
AfC request
Could someone please give the page at User:Inezzzzz/sandbox a look over? I've asked at two other wikiprojects as well, but I can't even begin to understand what is happening at this article. --TKK! bark with me! 23:47, 15 January 2015 (UTC)
- I can't claim I to fully understand even standard Kriging, but I'm not sure we really need separate articles on Indicator kriging and Multiple-indicator kriging. Qwfp (talk) 20:40, 16 January 2015 (UTC)
General comment this and other main stats pages.
Talk:Standard deviation#General comment this and other main stats pages.. Fgnievinski (talk) 22:59, 30 January 2015 (UTC)
Proposed move
See Talk:MLSE In ictu oculi (talk) 11:58, 13 February 2015 (UTC)
I wonder of other wikipedian statisticians would like to chip in with their opinions regarding significance level? Thanks. Tayste (edits) 22:03, 9 February 2015 (UTC)
What aspects of statistical significance? Z tests T tests F tests
Ftests for hypothesis testing of least squaure fits and other multi-linear statistical curve fits?
Numerical analysis significance of approximations of statistical quanitities?
Historically significant uses of statistics for critical decision support?
Monetizing risk exposure in statistically valid ways for decision support?
Provide more information on "significance level". — Preceding unsigned comment added by Arctific (talk • contribs) 16:35, 16 March 2015 (UTC)
Draft:Generalized Functional Linear Model
At WP:AFC there is a proposed draft Draft:Generalized Functional Linear Model. Can someone here see if it is valid? Graeme Bartlett (talk) 06:19, 17 March 2015 (UTC)
Add the WPStatistics banner to the talk page of articles in List of statistics articles
This has been done except for talk pages with the word 'redirect' in it and talk pages that had previously been deleted after 2010.AppliedStatistics (talk) 21:09, 6 April 2015 (UTC)
File/Data/Text comparison
Members of this WikiProject may be interested in the discussion at Talk:Data comparison#Article title Yaris678 (talk) 13:05, 9 May 2015 (UTC)
Actuary FAR
I have nominated Actuary for a featured article review here. Please join the discussion on whether this article meets featured article criteria. Articles are typically reviewed for two weeks. If substantial concerns are not addressed during the review period, the article will be moved to the Featured Article Removal Candidates list for a further period, where editors may declare "Keep" or "Delist" the article's featured status. The instructions for the review process are here. SandyGeorgia (Talk) 02:08, 17 May 2015 (UTC)
List of missing articles
I've started a list of missing statistics articles. Fgnievinski (talk) 13:31, 20 May 2015 (UTC)
Errors and residuals in statistics listed at Requested moves
A requested move discussion has been initiated for Errors and residuals in statistics to be moved to Errors and residuals. This page is of interest to this WikiProject and interested members may want to participate in the discussion here. —RMCD bot 22:46, 28 May 2015 (UTC)
Reporting on public opinion polls
There's a debate at Talk:Debate_on_the_monarchy_in_Canada#Polling on whether reporting the questions and results of public opinion polls is a violation of the pollster's copyright and whether media sources on polling results can be considered reliable sources. Please share your knowledge/opinions. AnonAnnu (talk) 06:07, 7 June 2015 (UTC)
Quasi-maximum likelihood
Quasi-maximum likelihood was the most visited (by some measure) economics related article in June 2015 hist, archiving 40,000 hits per day. However, the article could do with improvement. For example what is the Quasi-maximum likelihood used for? Jonpatterns (talk) 09:24, 9 July 2015 (UTC)
BESSEL CORRECTION
I HAVE BEEN INFORMED THAT THE EXACT FORMULA FOR BESSEL CORRECTION BASED ON SAMPLING WITHOUT REPLACEMENT HAS AN ADDITIONAL FACTOR OF N/(N-1). E(S2)= N(n-1)var(x)/((N-1)n) AND VARIANCE OF SAMPLE MEANS=(N-n)var(x)/(N-1)n. I AM JUST CONCERNED THAT YOU DON'T MENTION IT EXPLICITLY IN THIS ARTICLE. INSTEAD, ONE CAN IMPLICITLY DEDUCE IT FROM THE ARTICLE ABOUT VARIANCE, BECAUSE IT IS DUE TO THE COVARIANCE TERMS. ISN'T THIS THE REASON WHY THE CORRECTION IS BIASED FOR STANDARD DEVIATION? IN ANY CASE, HOW DOES ONE DERIVE THESE EXPRESSIONS FROM FIRST PRINCIPLES BECAUSE I HAVE PROVED THEM BY MERE INDUCTION? BY MUZINGU DANIEL KAMPALA, UGANDA, EAST AFRICA. 14.07.2015 --154.73.12.87 (talk) 14:45, 14 July 2015 (UTC)
Request for Comment
Comments would be appreciated on this RfC: Talk:Infinite_monkey_theorem#RfC:_Which_of_these_versions_of_the_lead_is_the_more_accurate_and_informative.3F. Thanks DaveApter (talk) 10:28, 20 July 2015 (UTC)
Fixes to the logistic regression article
I have been working on logistic regression which was tagged as being unclear. I'd like comments on whether it is now clear enough. Thanks. (I've only been editing on Wikipedia for a few days, so I am not sure of how to do things). PeterLFlomPhD (talk) 12:50, 21 July 2015 (UTC)
New articles Suggestion
I noticed that there are only 2 articles in the new articles list. Can this be right? Also, I added an article The MAGIC criteria -- PeterLFlomPhD (talk) 21:14, 1 August 2015 (UTC)
Created new article - British statistician Roger Thatcher
I've created a new article on the British statistician Roger Thatcher.
Additional input for further research collaboration and secondary sources would be appreciated at the article's talk page, at Talk:Roger Thatcher.
Thank you,
— Cirt (talk) 05:12, 30 October 2015 (UTC)
Pseudolikelihood set notation mistakes?
On the pseudolikelihood page, a set of random variables is declared, but it seems to me that the notation is wrong.
It is declared as:
But, I think, it should be declared as:
Furthermore, a tuple describing a dependency between two random variables and is defined using the set notation, which seems a bit odd. Why not use instead? Or something else to distinguish tuples from sets?
(It seems inappropriate to just change it. But I am not sure how many people are on this page. It seems rather quiet here.)
ArtificialUser (talk) 17:38, 6 November 2015 (UTC)
неологизм "моральная статистика"
http://static.ozone.ru/multimedia/book_file/1011869368.pdf вот это произведение -- 2015 года -- стр. 14, 15:: "исследование моральных явлений привело к появлению моральной статистики, возникла ??психометрика?? т.е. ??наука об измерении психики человека??; ??в физике выделилась статистическая механика??; социометрика [прим. моё социометрика не признана научным направлением в 2002 году точно была и сильно просто таки -- "не упоминайте даже названия!!"]; в истории ??историометрика??, или ??клиометрика??. Статистические методы нашил своё применение в генетике и ботанике и т.д." Olga V. Demir (Petchyonkina) (talk) 00:01, 30 January 2016 (UTC)
A new article, P-value fallacy, may benefit from the attention of some of the members of this project. --76.14.85.215 (talk) 05:44, 22 February 2016 (UTC)
Can't print the book
https://en.wikipedia.org/wiki/Talk:Ensemble_Kalman_filter#links_to_cudenver.edu Rendering failed
Generation of the document file has failed.
Status: ! Internal error: bad native font flag in `map_char_to_glyph'
Return to Special:Book — Preceding unsigned comment added by 93.43.75.161 (talk) 22:23, 20 March 2016 (UTC)
Accessibility of introductory paras in statistics articles
In my work life at NCBI (NIH), a key project is building an open access glossary for biomedicine and biomedical research (more about that here and here). We are basing this glossary largely on NIH resources. But when we cannot identify an accurate definition for a term that is reasonably understandable, we draw on Wikipedia. Wikipedia articles should begin with a definition that is accessible to lay people. When Wikipedia doesn't have a short accessible description, we work with WikiProject Medicine to achieve that, via the page's talk page. Around 15% of the glossary comes from Wikipedia/Wiktionary.
We have recently broadened the scope of the NIH glossary to cover research design and terminology, and we are again drawing on the Wikipedia - see for example cross-sectional study. When it comes to statistical terms, we're going to need to rely on Wikipedia a great deal: these terms are frequently missing from available glossaries, or definitions exist, but they are either inaccessible or inaccurate. I realize how difficult it is, to achieve both accessibility and accuracy in describing statistical methodologies and terminology. However, it is critically important. I'll be working with some statisticians who are keen on helping with this but haven't been involved in Wikipedia before, and we will be organizing some edit-a-thons for statisticians as well.
Wikiproject Medicine now has a translation project, which involves making the introductory paras of medicine pages very accessible, and then having these translated. This idea of having accessible introductions is a great idea, and it would be wonderful to see that happen for statistical pages too. Are others in Project Statistics interested in this challenge? I'm very much looking forward to participating in the project, and look forward to many interesting discussions. Hildabast (talk) 16:58, 16 April 2016 (UTC)
- Welcome to WP:Statistics! It sounds like a worthy project and we welcome more statisticians working on Wikipedia stats articles. I've worked on simplifying leads for math and stats articles in the past and it is a challenge to simplify some topics without compromising correctness. I am interested in the project and would like to see if I can help. I notice in the cross-sectional study example above, you all are just taking part of the first sentence of the article. Is PubMed Health mostly looking to extract sound bites like this, or more full leads? --Mark viking (talk) 18:18, 16 April 2016 (UTC)
- Thanks! That's great! Yes, I agree it's going to be tough, but it will be interesting for sure. For the PubMed Health glossary, it has to be really short. It could be longer than that example, but not by a whole lot. Hildabast (talk) 21:44, 16 April 2016 (UTC)
Help needed
Would knowledgeable Statistics Project editors please look at Wikipedia:Articles for deletion/The rich get richer (statistics) and comment. Better yet, please improve the article. Thanks. – S. Rich (talk) 20:18, 17 April 2016 (UTC)
For those interested in phylogenetics, and clade presentations
Could you have a look at this effort, here, to use clade diagrams to summarize pharma business acquisitions. My take at present is that the images created are devoid of standard quantitative meaning—nothing is captured by vertical and horizontal line lengths, as far as I can tell—and so they are a misapplication of this maths/graphic presentation method. Moreover, I argue that they are misleading (presenting a time axis, but not making spacing of events proportionate to the historical time differences), much harder to maintain (consider adding entries to a std Table versus this graphic), more likely to diminish article quality (in their ambiguity of content, again, over a std Table with clear headings), and therefore practically amenable to decay as a result. I would add to this, in this esteemed stats context, that they would make those who trained us, and other purists in methodology and meaning (and Edward Tufte more generally), turn in their graves/beds. After having a look at the User page and at a couple of pages linked on that sandbox page, leave your opinion here, regarding the overall effort? Thanks for your opinion. Cheers. Le Prof Leprof 7272 (talk) 01:37, 23 March 2016 (UTC)
- Sorry about the inappropriate posting above. I have opened a discussion in the appropriate forum, here: Wikipedia_talk:WikiProject_Companies#Diagrams, for anyone who is interested Jytdog (talk) 03:37, 23 March 2016 (UTC)
- Jytdog moved the question to a new forum, the one where it is least likely to be viewed with rigour, see last comment and link. I reply there. I stand by the fact that stats is an appropriate venue to call for experts, and that it was appropriate to call out to you you at this location, to ask your input. All coming from this area, I would appreciate if you state for the record, if you have any real knowledge on this matter (have ever actually worked on a project involving cladogram-type computations/presentations), for sake of transparency, please. Le Prof Leprof 7272 (talk) 15:45, 23 April 2016 (UTC)
Need expert on Markov models
The article about Markov Models needs attention from an expert. If you are an expert on statistics or Markov Models please take a look at that article.
212.95.7.68 (talk) 08:09, 1 May 2016 (UTC) Dominik
Extremum estimators vs. M estimators
I wonder why we have articles on both M estimators and Extremum estimators, given that Amemiya (1985) writes on the latter: "What we call extremum estimators Huber called M estimator, meaning maximum-likelihood-like estimators." In other words, the person we credit for having developed extremum estimator theory states its nothing other than M estimator theory. So why do we have two different articles for the same thing? --bender235 (talk) 22:38, 12 May 2016 (UTC)
- Am ML estimator is one kind of M-estimator. An M-estimator is one kind of extremum estimator. My understanding is that the three classes of estimators are nested, not synonyms for each other. See for instance, this tutorial based on Chapter 6 of Hayashi's Extremum Estimators book. --Mark viking (talk) 23:14, 12 May 2016 (UTC)
- Hm, alright. --bender235 (talk) 23:40, 12 May 2016 (UTC)
Pre-RfA opinion poll results page
We need some gigantic brains to help sort things out over here:
Many thanks, Anna Frodesiak (talk) 07:15, 10 July 2016 (UTC)
Differential equation of distributions
Every distribution article has an unsourced differential equation section, e.g., Gamma_distribution#Differential_equation and Chi-squared_distribution#Differential_equation. Smells like original research material? fgnievinski (talk) 18:05, 18 June 2016 (UTC)
- It's not. See Pearson distribution. Ozob (talk) 19:49, 18 June 2016 (UTC)
- That's a rare exception. Most such sections are entirely unsourced. See for example Normal_distribution#Differential_equation, Gamma_distribution#Differential_equation, etc. fgnievinski (talk) 19:27, 26 July 2016 (UTC)
- I think Ozob's point is that some of the other distributions' differential equations are special cases of the Pearson system; see for instance [2]. These sections do need sourcing, but it doesn't look like OR. -Mark viking (talk) 21:27, 26 July 2016 (UTC)
birnbaum-saunders
? this definition skey
γ = 16 α 2 ( 11 α 2 + 6 ) ( 5 α 2 + 4 ) 3 {\displaystyle \gamma ={\frac {16\alpha ^{2}(11\alpha ^{2}+6)}{(5\alpha ^{2}+4)^{3}}}} \gamma ={\frac {16\alpha ^{2}(11\alpha ^{2}+6)}{(5\alpha ^{2}+4)^{3}}}
think denominator should be squared not cubed? — Preceding unsigned comment added by Kornbrot (talk • contribs) 09:06, 1 August 2016 (UTC)
Problematic article
List of countries and dependencies by population is an article that is relevant to this project. After having it drop off my watchlist some time ago I returned on 9 August 2016 to find that the population table is constantly being edited without any indication of sourcing, or edit summaries, usually by IPs or newly registered editors. The complete lack of sourcing for new edits means that the data in this article is dubious at best. One particular problem that I found is that data templates, which are used to automatically calculate today's population based on official sources, have been removed and replaced with unsourced manual calculations. Several times now I have had to restore these templates after yet another unsourced, unexplained change.[3][4][5][6][7] I have now tagged the article to identify issues, and requested semi-protection, but the input by responsible editors who can update the article with accurate, sourced data is needed. --AussieLegend (✉) 04:40, 22 August 2016 (UTC)
Categorising Statistical disclosure control?
I'm trying to work out what the appropriate categories are for Statistical disclosure control. It's been put into Category:Human subject research and Category:Disclosure, but I figure there's got to be a subcategory of Category:Statistics for it to go in, too. Given its use in statistical agencies, I thought it might belong somewhere in Category:Official statistics, but none of the subcategories really fit and I'm not sure it's a great fit for the main category either. Any thoughts? Confusing Manifestation(Say hi!) 04:41, 22 August 2016 (UTC)
I don't know much about Cohen's kappa, but it's been on my watchlist for a long time. In the past two days, a couple of IP editors each made an edit—one of them a fairly substantial edit—and I was hoping that somebody who understands the math better than I do could check that the edits were legitimate and not subtle vandalism. Thank you. — Malik Shabazz Talk/Stalk 02:49, 24 August 2016 (UTC)
- I took a look at it and both edits seem fine. I think the purpose of the substantial edit was to be a bit more pedagogical and show the explicit formulas in addition to the actual numbers of the example. The second edit corrected an error in the first edit. --Mark viking (talk) 03:09, 24 August 2016 (UTC)
Does 'simple linear regression' imply OLS?
Should the definition of simple linear regression include the use of ordinary least squares (OLS) as the estimation technique, or does the term embrace non-OLS methods (e.g. least absolute deviations)? Interested editors may wish to respond at Talk:Simple linear regression#Title change. Qwfp (talk) 08:42, 18 October 2016 (UTC)
2016 Community Wishlist Survey Proposal to Revive Popular Pages
Greetings WikiProject Statistics/Archive 6 Members!
This is a one-time-only message to inform you about a technical proposal to revive your Popular Pages list in the 2016 Community Wishlist Survey that I think you may be interested in reviewing and perhaps even voting for:
If the above proposal gets in the Top 10 based on the votes, there is a high likelihood of this bot being restored so your project will again see monthly updates of popular pages.
Further, there are over 260 proposals in all to review and vote for, across many aspects of wikis.
Thank you for your consideration. Please note that voting for proposals continues through December 12, 2016.
Best regards, Stevietheman — Delivered: 18:08, 7 December 2016 (UTC)
Two similar categories
Do we need Category:Probability theorists and Category:Researchers in stochastics to be kept separated? Their scope looks pretty similar but I may well be overlooking something here. Marcocapelle (talk) 09:55, 9 December 2016 (UTC)
- Further discussion, see CfD proposal. Marcocapelle (talk) 18:27, 15 December 2016 (UTC)
Should Wikipedia statistics articles be understandable by non-statisticians?
Isambard Kingdom has proposed deletion of material from several statistics project articles with the justification that the material is too detailed.
The justification raises a fundamental issue about the suitable level of presentation in Wikipedia statistics articles. I believe the question would benefit from discussion among members of the Statistics WikiProject. (For examples of Isambard Kingdom's proposed deletions see the talk pages for the Hosmer-Lemeshow test, Kaplan-Meier estimator, or Exponential distribution)
A common criticism of Wikipedia statistics articles is that they are not comprehensible to non-statisticians. Here are examples from some of the most frequently-read statistics pages.
https://en.wikipedia.org/wiki/Talk:Standard_deviation
- "Absolutely obtuse to a lay person"
https://en.wikipedia.org/wiki/Talk:Confidence_interval
- Utterly indecipherable to the lay-reader. If the general public is your audience, this article is a complete failure. I'm a reader with an advanced degree, and a well-rounded education, and I can't penetrate even the lede.
https://en.wikipedia.org/wiki/Talk:Chi-squared_test
- "The explanation is completely in theoretical terms. I'm trying to understand an article better, and this piece is absolutely no help in doing so."
- "The first sentence is ridiculously complicated! Statistics is very poorly explained on wikipedia, and this is one of the worst examples."
https://en.wikipedia.org/wiki/Talk:Regression_analysis
- "Who is this article for? Well it isn't for me. I understood NOTHING!"
https://en.wikipedia.org/wiki/Talk:Binomial_distribution
- "Accessibility: Would it be possible to write an introductory section that gives just a conceptual description of what the binomial distribution is about, before we enter the maths?"
https://en.wikipedia.org/wiki/Talk:Monte_Carlo_method
- "This article is quite technical. It would be nice to have a simpler layman's description too."
https://en.wikipedia.org/wiki/Talk:Proportional_hazards_model
- "Please, somebody, take pity on those of us who need more fundamental understanding, and write an introduction to this subject that would be useful and graspable by anybody with the basic interest to look it up. That's how to make Wikipedia better; make it useful."
https://en.wikipedia.org/wiki/Talk:Logistic_regression
- "This page is utterly incomprehensible for the novice who just wants a basic idea of what logistic regression analysis *does*. The rigorous math is fine but before diving into it it would be nice to give a more comprehensible introduction and maybe a real world example that might illuminate the topic a bit."
- "The point above is extremely relevant. Most people do not have a firm understanding of Applied Mathematics or Statistics in general. Quite a surprise that none of the contributing authors has ventured into making their knowledge understandable for the lay person. The ability to teach or communicate concepts to others is a distinction between an expert and an apprentice."
- "Generally I've found that statistics articles not saying very much (although a few of them do) and consequently incomprehensible"
- "As a novice, most wikipedia articles on statistics are useless. An encylopedia article should present basic information, and direct users to more detailed information at other entries. Someone has written a very fine statistics textbook, in wiki-form, that is useless to either laymen or novices."
- "You MUST be joking. I don't think I am a dolt. However I am not a mathematician nor a statistician; I am a professional translator (also a linguist and also a contributor to Wikipedia but in language-related articles and such). I looked up this article today because I NEED to know, in a very basic LAYMAN's sort of way, what logistic regression is, what it is about, and ideally (for my purposes) an intelligible explanation of how it works which provides a model of the language that ought to be used when explaining this to someone."
- "recognisable (faithful might be a better word) to those familiar working with logistic regression but completely opaque to neophytes. I cannot understand it and I'm really trying."
The key problem with many of the statistics articles in Wikipedia, as indicated by the quotes, is that they are incomprehensible to lay readers. The current level of writing is suitable for a person already familiar with probability and statistics, not for the average readers.
The need is for articles that provide sufficient detail and examples that the reader can readily understand. Editors of the statistics pages have more familiarity with these concepts than most readers. What statistician-editors find tedious, many readers will find informative and necessary to understanding.
If Isambard Kingdom and other readers find the material too detailed and tedious, then one solution would be to move the introductory material to the end of the articles. The material could be in a section with the title "Introduction to x for the novice". In that way, readers who wish for a brief, mathematical, highly technical explanation can get that first, while readers who wish for a more comprehensible lay-oriented explanation can find it at the end.
Would appreciate your thoughts and suggestions. If this is not the suitable forum for this discussion, please inform me of the better forum and accept my apologies. Michaelg2015 (talk) 19:58, 14 January 2017 (UTC)
- Prodding articles for this reason is inappropriate, so they should all be rejected. However too technical is a problem. Two ways to address this are 1. to have the introductory material at the start. The lede should be understandable to most readers. Secondly for large articles/topics there could be an "Introduction to xxxx" article, linked right from the start of the technical article. I don't thnk the introductory stuff should be at the end, as the newbies to the topic won't even find it before they give up reading. Rather than a statistical specialist writing that material, we need someone like a maths teacher that knows statistics to work on it. Graeme Bartlett (talk) 01:17, 15 January 2017 (UTC)
- "Too detailed" is not a reason for deletion, but I don't see evidence of a prod in the recent logistic regression history, either. How best to summarize a highly technical topic for the widest possible audience is a problem for many math articles. WP:TECHNICAL is the basic guideline here. Among its advice is to make early sections as simple as possible and to write "one level down". Looking at the logistic regression article, the first two sections contain (1) what is it used for, (2) a concrete example) and (3) the basic concepts behind LR. I think it does a pretty good job of explaining the basics without straying into textbook territory. The prose could probably be improved. But I don't think there is any magic prose that in a few paragraphs will allow general audience folks without an understanding of basic notions like probability distribution or fitting data to a model to understand what LR is about. --Mark viking (talk) 04:55, 15 January 2017 (UTC)
- This discussion prompted me to revisit the Logistic regression article and make some revisions, although inadequate. Michaelg2015's post is useful in pointing out the big problems, and this is the right forum. I accept that the goal should be to make a broad set of statistical articles accessible to the lay person. This is to be done by writing accessible introductory explanations (what the statistical method is, a good concrete example, and a good discussion of history of applications) and relegating complications to later sections or to advanced level articles (which themselves should meets similar standards, but expecting readers to be informed on basic level topics). A common complication is that there are multiple ways most statistical methods were derived, or could be derived
- E.g., ordinary linear regression can be explained from curve-fitting, visually, then by arbitrary adoption of the minimization of the least squared deviations as a convenient method of calculation, or it can be derived from maximum likelihood estimation assuming normally-distributed errors).
- E.g., logistic regression can be explained from curve-fitting, and by the relative ease of calculation vs. the probit approach (see J.S. Cramer, The Origins of Logistic Regression, with interesting tabulation of number of papers by decade employing probit vs. logit), and by maximum likelihood
- The principles of the above-mentioned wp:Technical are good, but we need more guidance for statistics in particular. A general solution to the problem could be to develop a good working format of a standard statistics article, and refine that with practice. Here's a draft outline:
- 1. One- or two-paragraph introduction stating what the statistical method "is". This must define the method simply, and it may mention that extensions/complications exist but not go into them at all, to avoid cognitive overload.
- 2. A good concrete example. This should be well-chosen. To be encyclopedic, it should be a real example that has historical or practical importance; it should not be a bad-textbook-like made up implausible example.
- 3. Applications history, scope, in various field areas. Comments on growth and decline of use relative to alternatives.
- 4. How the model can be derived, with historical notes, perhaps in chronological order of actual derivation of the model, but perhaps better in order of simplicity of explanation
- Derivation 1 (e.g., curve-fitting, use of graph paper perhaps lognormal graph paper as relevant)
- Derivation 2 (e.g., maximum likelihood)
- 4. Calculation of the method: brief discussion of algorithms, mention of some software
- 5. Interpretation of the estimated model.
- 6. Alternative measures of fit, there are always ad hoc alternatives, unfortunately, some of which may be useful. (What is an ROC curve, anyhow?)
- 7. Extensions
- E.g., when data is pair-matched
- E.g., multiple categories, unordered or not, rather than just binary outcomes for logistic regression
- Note the "Extensions" should usually be short discussions linking to a main article on the advanced topic.
- --doncram 11:54, 16 January 2017 (UTC)
- This discussion prompted me to revisit the Logistic regression article and make some revisions, although inadequate. Michaelg2015's post is useful in pointing out the big problems, and this is the right forum. I accept that the goal should be to make a broad set of statistical articles accessible to the lay person. This is to be done by writing accessible introductory explanations (what the statistical method is, a good concrete example, and a good discussion of history of applications) and relegating complications to later sections or to advanced level articles (which themselves should meets similar standards, but expecting readers to be informed on basic level topics). A common complication is that there are multiple ways most statistical methods were derived, or could be derived
Main Wikipedia gamma distribution page - an error
Hi, I'm new to this, I know how to edit Wikipedia pages, but I am reluctant to do that on such a fundamental point, so I thought I would post this here where I can't do any damage.
The CDF of the Gamma Distribution is correct on this Wiki Statistics project page, or whatever this place is called, but is incorrect on the main Wikipedia page you go to for the Gamma Distribution. The summary box on the RHS of the top of that page gives 1- F(x), not F(x) (I am not going to try to format this...). It is correct in the body of the Gamma Distribution page. Anyone familiar with this distribution will understand what I am talking about and will, I hope, change it. I have been working a lot with survival analysis lately and it is a disaster to get muddled on 1 - F(x) versus F(x).
Vandawk8 (talk) 14:58, 20 January 2017 (UTC) Vandawk8 (talk) 14:58, 20 January 2017 (UTC)
- Thanks for catching that. Looking at the article history, a recent editor changed the summary CDF to be in terms of the upper incomplete gamma function, whereas in the body of the text, the CDF is in terms of the lower incomplete gamma function. I think it is better to stick with the conventions in the body of the text for the summary template, so will make the change. In the future, if you see a problem, be bold in making a correction! There are others watching the page and will often discuss and revert if they think your edit is a mistake. Cheers, --Mark viking (talk) 18:49, 20 January 2017 (UTC)
WikiJournal of Science promotion
The WikiJournal of Science is a start-up academic journal which aims to provide a new mechanism for ensuring the accuracy of Wikipedia's scientific content. It is part of a WikiJournal User Group that includes the flagship WikiJournal of Medicine.[3][4]. Like Wiki.J.Med, it intends to bridge the academia-Wikipedia gap by encouraging contributions by non-Wikipedians, and by putting content through peer review before integrating it into Wikipedia. Since it is just starting out, it is looking for contributors in two main areas: Editors
Authors
If you're interested, please come and discuss the project on the journal's talk page, or the general discussion page for the WikiJournal User group.
|
T.Shafee(Evo&Evo)talk 10:29, 24 January 2017 (UTC)
ANCOVA
Does anyone have the expertise to add a figure to the ANCOVA article, so that the variables in the equation can be illustrated more clearly? See comments on ANCOVA discusson page. Gak (talk) 13:49, 1 March 2017 (UTC)
Proposed deletion of Mumbai statistics
The article Mumbai statistics has been proposed for deletion because of the following concern:
While all constructive contributions to Wikipedia are appreciated, content or articles may be deleted for any of several reasons.
You may prevent the proposed deletion by removing the {{proposed deletion/dated}}
notice, but please explain why in your edit summary or on the article's talk page.
Please consider improving the article to address the issues raised. Removing {{proposed deletion/dated}}
will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and articles for deletion allows discussion to reach consensus for deletion. 70.51.200.162 (talk) 06:30, 11 March 2017 (UTC)
Request assistance to automatically make human population evolution numbers - population statistics - appear in the pages giving historic overviews e.g. when clicking a particular century, decade or year
This would very helpful to bring insights. There are already (semi-)automated tools to generate pages summarizing the notable events of a particular centuries, decades and year. Just try it: the next article you read and you see a year, decade or century mentioned, put it between square brackets and enjoy being lead to a page with an overview of the most noteable events. The quest for developing such a tool was brought up in the portal talk article on "History": Portal_talk:History#How_about_automatically_making_human_population_evolution_numbers_appear_in_the_pages_giving_historic_overviews_e.g._when_clicking_a_particular_century.3F
Enjoy your day, like me :) --SvenAERTS (talk) 14:09, 8 April 2017 (UTC)
Popular pages report
We – Community Tech – are happy to announce that the Popular pages bot is back up-and-running (after a one year hiatus)! You're receiving this message because your WikiProject or task force is signed up to receive the popular pages report. Every month, Community Tech bot will post at Wikipedia:WikiProject Statistics/Archive 6/Popular pages with a list of the most-viewed pages over the previous month that are within the scope of WikiProject Statistics.
We've made some enhancements to the original report. Here's what's new:
- The pageview data includes both desktop and mobile data.
- The report will include a link to the pageviews tool for each article, to dig deeper into any surprises or anomalies.
- The report will include the total pageviews for the entire project (including redirects).
We're grateful to Mr.Z-man for his original Mr.Z-bot, and we wish his bot a happy robot retirement. Just as before, we hope the popular pages reports will aid you in understanding the reach of WikiProject Statistics, and what articles may be deserving of more attention. If you have any questions or concerns please contact us at m:User talk:Community Tech bot.
Warm regards, the Community Tech Team 17:16, 17 May 2017 (UTC)
help needed at Population proportion
Please see about "common errors" and about sample size guidance at Talk:Population proportion. This is an important introductory statistical topic, but the article was created in 2016 and was not part of WikiProject Statistics until just now. There are related articles at Sample size determination and P-chart and probably elsewhere, but a simple treatment of this important case is very important.
It oughta be possible to make this comprehensible for Wikipedia editors trying to reason about what sample size should suffice to determine the proportion of bad articles, out of some big list of possibly bad ones, as comes up at wp:AN or wp:ANI from time to time (as now). --doncram 17:54, 17 May 2017 (UTC)
One of your project's articles has been selected for improvement!
Hello, |
Third opinion on "Bayesian information criterion" vs. "Schwarz criterion"
Could someone please comment here. Thanks. --bender235 (talk) 11:07, 7 August 2017 (UTC)
Wigner semicircular distribution: Wigner parabolic distribution
The article on the Wigner semicircle distribution contains a section on related distributions that I think is in need of attention from an expert. The discussion of the Wigner parabolic distribution contains two different formulas for the PDF, both of which seem wrong to me (neither is properly normalized) but I don't know want to guess what the correct one should be. The rest of the section is confusing as well. MathHisSci (talk) 10:10, 1 October 2017 (UTC)
May an expert on the subject fix the lead section of this article. Hitro talk 21:51, 5 October 2017 (UTC)
Issue regarding statistics of religion/non-religion/no response being discussed
I notify to this project that there is an ongoing discussion about a statistical matter pertaining to the "religion in..." article format; HERE. We need some opinions to reach a consensus. Thank you.--Wddan (talk) 09:31, 9 October 2017 (UTC)
- Also Iryna Harpy, JimRenge, Nillurcheier, and other users who watch "religion in..." articles are invited to take part in the discussion.--Wddan (talk) 09:50, 9 October 2017 (UTC)
Importance of Akaike information criterion
The article Akaike information criterion is currently rated as having Mid importance. In the 20th century, that would have been correct. Now, however, I believe that the article should be rated as having High importance. FlagrantUsername (talk) 15:12, 13 October 2017 (UTC)
Disambiguation links on pages tagged by this wikiproject
Wikipedia has many thousands of wikilinks which point to disambiguation pages. It would be useful to readers if these links directed them to the specific pages of interest, rather than making them search through a list. Members of WikiProject Disambiguation have been working on this and the total number is now below 20,000 for the first time. Some of these links require specialist knowledge of the topics concerned and therefore it would be great if you could help in your area of expertise.
A list of the relevant links on pages which fall within the remit of this wikiproject can be found at http://69.142.160.183/~dispenser/cgi-bin/topic_points.py?banner=WikiProject_Statistics
Please take a few minutes to help make these more useful to our readers.— Rod talk 18:51, 3 December 2017 (UTC)
I am writing regarding the page Info-metrics. The article has been tagged for WP:Notability. I have added several relevant journal articles and textbooks to the article's talk page. The editor who originally tagged the article contends that he does not have the expertise to evaluate the sources. I request help from this project. Arnob (talk) 18:26, 12 December 2017 (UTC)
Off by two orders of magnitude?
Your feedback is requested at Talk:Gender dysphoria#Off by two orders of magnitude?. Thanks, Mathglot (talk) 11:37, 24 December 2017 (UTC)
Distribution-free maximum likelihood for binary responses and other goodies
Could someone have a look at these four more or less recently created articles:
- Distribution-free maximum likelihood for binary responses
- Heteroskedasticity and nonnormality in the binary response model with latent variable
- Binary response model with continuous endogenous explanatory variables
- Probit model for panel data with heterogeneity and endogenous explanatory variables
To an utter layman like me, they all look somewhat related, but I'm not able to even venture understanding them, let alone thinking about a selective merger. Pinging Smmurphy who's been involved with two of them. – Uanfala (talk) 22:59, 28 December 2017 (UTC)
- I am 100% behind deleting them. The user who created them (User:Carolineneil) did so in a research project that resulted in a lot of problems. In that project, they created pages based on the subject headings in popular econometrics textbooks and the titles of recent major econometrics papers, I think. While the subjects themselves are verifiable, not OR, and non-POV, the articles do not, in my opinion, represent a genuine attempt to improve the encyclopedia and they don't really represent an net plust for the encyclopedia. Carolineneil was banned in this discussion[8], and deleting much of their work was endorsed in various discussions, including [9]. I went through the statistics/econometrics articles and proposed a few for merger, hoping that would get more eyes on the problem. I've never submitted an article for AfD, but would be happy to support deleting most/all of them. Smmurphy(Talk) 14:55, 30 December 2017 (UTC)
- Also noting Carolineneil's three other drafts:
- Leaving for this project's members to decide what to do with them. – Uanfala (talk) 18:49, 1 January 2018 (UTC)
looking for contributing speakers for JSM2018 (vancouver) -by 9 jan 2018!
Hello, I apologize for the lateness of this submission but I am going to the Joint Statistical Meetings (JSM) 2018 in Vancouver and wanted to submit a topic-contributed session that involves statistics, communication, and wikipedia. One speaker will present on the importance of graphical literacy and outline the wikipedia article she would like to create; I would be presenting on statistical literacy for scientists and how important this is for wikipedia editing as well as general statistics; and we would like to have someone from this group present to encourage the JSM audience to contribute to Wikipedia - to improve their own stats literacy and communication (continuing professional development), to provide service to their community and the wider community beyond statistics and practice (service), and even to do some scholarship (even though authorship is difficult to claim in Wikipedia, and "citations" are hard to justify if others can edit your work. Sponsoring groups for the session would be the ASA Committee on Professional Ethics (COPE - which I chair) and possibly the section on teaching statistics, so it is important for each session to include discussion of how engagement with this project (editing/promoting statistics on Wikipedia) can fulfill the ethical obligation to help statisticians and others communicate with data more clearly, and/or how editing for Wikipedia can support teaching and learning. If you're interested, please let me (Rochelle Tractenberg) know *ASAP* since the proposal is due 11 Jan 2018!
rotrac 21:44, 7 January 2018 (UTC) — Preceding unsigned comment added by Rotrac (talk • contribs)
There has been some discussion about the lead of the article Likelihood function. The discussion is at Talk:Likelihood_function#Wording_of_lead. Any recommendations on this would be much welcomed and appreciated. IAmAnEditor (talk) 23:44, 16 March 2018 (UTC)
Census (and Agricultural Censuses) Template?
I'm an Agricultural Economics student looking to improve some of the representations of Agricultural Censuses on Wikipedia.
I was wondering if there was a template which should be followed for these, and Censuses more generally in how they're represented. While this topic in specific is unlikely to merit pages for the individual census years, as is the case with larger censuses of population, I still don't've an idea as to whether or not I'm including enough information in the few edits I have done so far.
As one can see in my edits to Canadian Census of Agriculture, United States Census of Agriculture and Farm Structure Survey (which isn't quite a census technically, but is the methodological basis for the European Census of Agriculture), I'm more or less thinking that a breakdown by Introduction, Overview, History, Methodology, Data Collected, and Use / Publication sections is appropiate - with specific references to Data Collected being of interest, moreso than with a census of population, as it may not be immediately something which the public may understand (the scope of an agricultural census is often differently broader than common expectations).
Does anyone have thoughts on whether this approach is correct? Suggestions of what might want to be incorporated? Or if there is aspects of more general census framing on Wikipedia that I am missing and should incorporate?
Thanks for your responses, Wcconey (talk) 13:31, 28 March 2018 (UTC)
Add references as task
I've noticed a pervasive problem in Wikipedia math articles, including in statistics, is that many pages are largely unsourced and possibly original research. I think adding references should be added in Article-related tasks. Wqwt (talk) 05:21, 5 April 2018 (UTC)
Machine learning bar
As the WikiProject with probably the most articles transcluding this template, does anyone here have a view on the possible conversion of Template:Machine learning bar to a footer template? This sidebar has annoyed me for some time but the last comment on the template talk page (also expressing the same opinion) is two years old. Since this affects quite a lot of articles I'm hoping for at least some clue that this change wouldn't go against a contrary consensus before going ahead... Bigbluefish (talk) 10:32, 13 February 2018 (UTC)
- I don't know the difference between sidebar and footer. For example Economics has both. But it seems fine as is. Wqwt (talk) 03:30, 23 April 2018 (UTC)
- If we could measure how many people use that economics sidebar for navigation I doubt we'd find it is being used much at all. How do you get to an article on evolutionary economics and after reading the lede end up navigating next to an article on national accounting? And like the machine learning bar, it is much too prolifically transcluded for a subject-overview navbar with a thematic illustration. How can it be a helpful introduction to an article on, for example, the income–consumption curve, for the most prominent graph, appearing alongside the lede, to be a graph of supply and demand curves? Bigbluefish (talk) 10:37, 10 May 2018 (UTC)
Mean squared prediction error
I asked this on the article’s talk page but got no response. Throughout the article Mean squared prediction error, shouldn’t every summation sign and every instance of and be multiplied by 1/n? And how about in each of the two right-hand side terms of the first equation in the Estimation section? Loraof (talk) 17:34, 25 May 2018 (UTC)
- I’ve gone ahead and changed it. Loraof (talk) 23:35, 28 May 2018 (UTC)