Wikipedia talk:WikiProject Statistics/Archive 7

This is an archive of past discussions about Wikipedia:WikiProject Statistics. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 5

Archive 6

Archive 7

WikiProject collaboration notice from the Portals WikiProject

The reason I am contacting you is because there are one or more portals that fall under this subject, and the Portals WikiProject is currently undertaking a major drive to automate portals that may affect them.

Portals are being redesigned.

The new design features are being applied to existing portals.

At present, we are gearing up for a maintenance pass of portals in which the introduction section will be upgraded to no longer need a subpage. In place of static copied and pasted excerpts will be self-updating excerpts displayed through selective transclusion, using the template {{Transclude lead excerpt}}.

The discussion about this can be found here.

Maintainers of specific portals are encouraged to sign up as project members here, noting the portals they maintain, so that those portals are skipped by the maintenance pass. Currently, we are interested in upgrading neglected and abandoned portals. There will be opportunity for maintained portals to opt-in later, or the portal maintainers can handle upgrading (the portals they maintain) personally at any time.

Background

On April 8th, 2018, an RfC ("Request for comment") proposal was made to eliminate all portals and the portal namespace. On April 17th, the Portals WikiProject was rebooted to handle the revitalization of the portal system. On May 12th, the RfC was closed with the result to keep portals, by a margin of about 2 to 1 in favor of keeping portals.

There's an article in the current edition of the Signpost interviewing project members about the RfC and the Portals WikiProject.

Since the reboot, the Portals WikiProject has been busy building tools and components to upgrade portals.

So far, 84 editors have joined.

If you would like to keep abreast of what is happening with portals, see the newsletter archive.

If you have any questions about what is happening with portals or the Portals WikiProject, please post them on the WikiProject's talk page.

Thank you. — The Transhumanist 07:57, 30 May 2018 (UTC)

Uniform probability distribution graphs

Hello everyone,

I've been looking to several probability distribution articles on Wikipedia, and I'm surprised to see that the rendering of each probability and cumulative density function is not uniform, which can be unpleasant when you are trying to compare two functions side-by-side.

There's a lot of distribution to cover, in my opinion the most common ones would be:

I'm not an expert at data visualisation, and I have no preference over what style to use, as long as it is consistent accross all distributions.

Best regards,

Zatyra (talk) 09:29, 14 May 2018 (UTC)

You might be better off discussing this at Template talk:Infobox probability distribution where there was a sizeable discussion that unified them to some extent in 2005. The dataviz landscape has changed a lot since then so to properly standardise them now is a great idea. Bigbluefish (talk) 14:39, 3 July 2018 (UTC)

Wiki Education hiring an experienced Wikipedian

Wiki Education is hiring an experienced Wikipedian for a part-time (20 hours/week) position. The focus of this position is to help new editors (students and other academics) learn to edit Wikipedia. The main focus of the position is monitoring and tracking contributions by Wiki Education program participants, answering questions, and providing feedback. We're looking for a friendly, helpful editor who like to focus on article content, but also with a deep knowledge of policies and guidelines and the ability to explain them in simple, concise ways to new editors. They will be the third member of a team of expert Wikipedians, joining Ian (Wiki Ed) and Shalor (Wiki Ed). This is a part-time, U.S. based, remote or San Francisco based position.

We are especially interested in people with experience editing statistics-related articles. See our Careers page for more information. Ian (Wiki Ed) (talk) 20:09, 6 August 2018 (UTC)

Coarsened exact matching (CEM)

I recently used this method for weighting data in a regression model and couldn't find anything about this method on Wikipedia when I looked earlier. I was wondering if others thought this should be created as an independent article or if new content on this topic should be placed in a related article (refs for context: [1][2][3]). Based upon a few searches I did, propensity score matching appears to be the most relevant article that pertains to the application of CEM to propensity scores, whereas the most relevant article to the general topic appears to be matching (statistics).

That said, should CEM be covered in a section in matching (statistics) with Coarsened exact matching redirecting to that section, and mentioned in propensity score matching, or should I create an independent article on CEM? I'd probably add roughly the same amount of content on this topic to a Wikipedia article as there currently is in the article on structural breaks. I don't expect to add content on this topic until next week at the earliest, but I figured I'd ask for feedback on where to put it beforehand to save me some time. Seppi333 (Insert 2¢) 18:19, 16 August 2018 (UTC)

statistical stability is bad

Hello, I think the article statistical stability is a hoax. (I’m not saying the topic itself is a hoax). In any case it is a horrible article. Sorry to be so harsh!144.35.20.45 (talk) 05:37, 9 September 2018 (UTC)

Melvin R. Novick

Hello. I've just created Melvin R. Novick. Feel free to expand it with references. Thanks!Zigzig20s (talk) 17:20, 26 October 2018 (UTC)

Statistical induction and prediction

The method and formulas described in https://www.academia.edu/3247833/Statistical_induction_and_prediction seems to be new. Do we want them included in wikipedia? Bo Jacoby (talk) 06:51, 11 November 2018 (UTC).

Coin flipping head, tail or edge

On 28, Jan I changed the sample space for a coin flip to 3 = head, tail, edge. My source is a paper issued in 1993 in the Physical Review a scientific journal. The authors were Daniel B. Murray and Scott W. Teare. So far no one seems to bother my contribution. I guess it make sense, I should change all related articels in the follwoing weeks e.g. Sample Space. I got the article as a pdf file. Any comments for this?

Da Vinci Nanjing (talk) 13:39, 31 January 2019 (UTC)

That paper doesn't show it's possible that a coin lands on its edge as the authors don't say they observed a coin landing on its edge even once, only what "extrapolations based on the model suggest...". As a statistician, I'm naturally skeptical about using a model based to extrapolate beyond the range of observed data as all models are wrong. Sure, I believe it might be possible, but I've never observed it myself, and this paper doesn't demonstrate it is possible, it only reports the probability that their models suggests. --Qwfp (talk) 20:28, 31 January 2019 (UTC)

Cluster Analysis page edited by SPA

The Cluster Analysis page is being overedited/overhauled by a special purpose account Glokc (talk · contribs · deleted contribs · logs · filter log · block user · block log). It deserves a look and some scrutiny . Limit-theorem (talk) 18:03, 2 February 2019 (UTC)

The Cluster Analysis page has been extended by me and all the made statements have references to the state-of-the-art scientific papers published in the peer reviewed journals and conferences. I have not deleted any of the original material, just restructured it a bit, significantly extended and modified some statements (citing the trusted scientific papers). I believe that the article has been improved with the made changes and would be glad to receive some feedback and refinements for the made edits but the account Limit-theorem (talk · contribs · deleted contribs · logs · filter log · block user · block log) just "undo"-ed my extensions multiple times instead of refining them further. --Glokc (talk) 18:18, 2 February 2019 (UTC)

WikiProject Data Science

There's an upstart WikiProject for data science. It doesn't look very active anymore, but I suggested merging them into Statistics or CS as a task force. Qzekrom (talk) 22:33, 9 February 2019 (UTC)

Proposed merge of Scheirer–Ray–Hare test into ANOVA on ranks

I've just proposed merging new article Scheirer–Ray–Hare test into ANOVA on ranks but I could do with a reference relating the two. Please comment at Talk:ANOVA on ranks#Proposed merge of Scheirer–Ray–Hare test. Thanks, Qwfp (talk) 11:33, 17 February 2019 (UTC)

Edited Vuong's closeness test, removed Expert-subject|statistics|date=November 2008

Hi project,

I've just edited Vuong's closeness test and removed the

tag from November 2008!!!

The expert request was for a sentence criticising the use of the test for Zero-Inflated Poisson on the grounds that these models are nested. However, the test can be used for non-nested or nested models, so the criticism is moot.

Happy to hear from anyone on the project if they have problems with this. Newystats (talk) 10:34, 21 March 2019 (UTC)

Inter-rater reliability expert wanted to help review a WikiJournal of Medicine (WikiJMed) article submission

The article in question is an unpublished pre-print undergoing peer review organised by the WikiJournal of Medicine.

All of the article's content has been accepted by journal editors and peer reviewers except for a question remaining about inter-rater reliability. I suspect that this will not be a difficult question for someone well-versed in this area of statistics.

Note that while this is a medical article written by plastic surgeons, you do not need to be a physician to help us! Physicians and other biomedical scientists have already reviewed the medical-specific content. We need input regarding the statistical analyses only. (I'm a psychologist who reviews medical articles for non-medical aspects only, e.g., grammar, syntax, organization, etc.)

If one or more of you would be so kind as to look at the article and offer your input, we would be most grateful!

The article is: WikiJournal Preprints/Comparison between the Lund-Browder Chart and the BurnCase 3D® for Consistency in Estimating Total Body Surface Area Burned.

You might wish to start with the questions I posted about inter-rater reliability along with the authors' response. Of particular interest are Tables 3 and 5 in the Results section of the article, which report coefficients of variation.

Our question, which we hope you might answer, is: Have the authors performed and reported the appropriate statistical analyses needed to support their article's findings and conclusions?

Since this is an open peer-review process for an article to be published in our no-cost, open access journal, we prefer if you write your comments, recommendations, or questions directly on the Talk (discussion) page for the article. However, that is not required. Therefore, if you prefer, feel free to send your comments, recommendations, or quesstions for the article authors via email to action editor Thomas Shafee and via email to me (Mark Worthen).

If you have any questions or feedback about this request, comment here and (please) ping me (Mark Worthen) and Thomas Shafee. Also feel free to email Thomas and email me.

Thank you! - Mark D Worthen PsyD (talk) (I am a man. The traditional male pronouns are fine.) 10:49, 4 April 2019 (UTC)

p-value

I just now edited the first line of the article on p-values. I saw an ambiguity, a mistake, and an inconsistency between example and definition. I have not yet checked if my changes keep the wording in line with the reference [1], whatever that is. (It looks like a reliable source, but maybe it is wrong, and then I have to find a better reliable source! Or, dear reader and fellow wikipedian, you do) Richard Gill (talk) 09:22, 14 April 2019 (UTC)

Please see discussion

Template_talk:Infobox_country#Metro_area_parameter. Interstellarity (talk) 17:18, 12 May 2019 (UTC)

A possible Science/STEM User Group

There's a discussion about a possible User Group for STEM over at Meta:Talk:STEM_Wiki_User_Group. The idea would be to help coordinate, collaborate and network cross-subject, cross-wiki and cross-language to share experience and resources that may be valuable to the relevant wikiprojects. Current discussion includes preferred scope and structure. T.Shafee(Evo&Evo)^talk 02:56, 26 May 2019 (UTC)

Wrong lables of probability density function plots

In some article (e.g. Exponential distribution, Pareto distribution and q-Weibull distribution) the plots for the probability density function are labled "P(X)" or "Pr(X=x)" suggesting that a density is a probability (which is wrong, especially, if the "probability" is said to be 3[4]). What do you think, should we start a larger project changing all plot using one software? Best, --Qaswed (talk) 07:48, 27 June 2019 (UTC)

I don't know if I'd say it's wrong per se. Do we have a standard practice on Wikipedia? My copy of (Durrett 2010) says "it is often useful to think of

f(x)

as being

P(X=x)

although

P(X=x)=0

" and the following sass suggests that previous editions straightforwardly used

P(X=x)

for the density: "By popular demand we have ceased our previous practice of writing

P(X=x)

for the density function. Instead we will use things like the lovely and informative

f_{X}(x)

." Dk657 (talk) 17:02, 29 June 2019 (UTC)

I would say wrong per se for continuous distributions, and propose labelling the y axis probability density Newystats (talk) 11:27, 3 July 2019 (UTC)

I've made a first edit at Exponential distribution, but I think I need to tweak the margins. I've used the following R code -

curve(dexp(x,rate = 1.5),0,5,ylab="probability density",lwd=3,col="lightblue")
curve(dexp(x,rate = 1),0,5,add=TRUE,lwd=3,col="purple")
curve(dexp(x,rate = 0.5),0,5,add=TRUE,lwd=3,col="orange")
legend("topright",lty=1,lwd=3,col=c("orange","purple","lightblue"),
       legend=c(expression(lambda==0.5),expression(lambda==1),expression(lambda==1.5)),bty="n")

thoughts? Newystats (talk) 23:18, 12 July 2019 (UTC)

Looking for advise from editors on Probability Statistics

In March of 2002 I made, what was then thought to be at the time, an unknown mathematical discovery in Lotto Probability Draw Pattern Mathematics. Now some 17 years later, I have been recently trying to get the discovery more validated than it was some 17 years ago by those who were originally involved. I became an editor in 2009, unrelated to the discovery but have been fairly inactive the last couple of years and I need to get back to it I know.

During the process I recently learned on the web, that their saying my independent discovery had already been known about years prior to 2002. But those I spoke with, didn't know where or what this type of mathematical discovery is called other than by the name 'decades analysis.' I've been unable to find anything like it on the web to see where this might have both originally came from and I don't understand why we don't have it discussed on our topic article page for Lottery Mathematics

I am going to leave a link here for just how |Lotto Probability Draw Pattern Mathematics looks in a Probability Report, with the hope someone can help me both find more information about this type of mathematics and or reference links to pages or papers explaining the subject matter in general. I'd appreciate any help anyone can give me at all. Tinkermen (talk)

I don't think this belongs here. Dk657 (talk) 12:27, 3 August 2019 (UTC)

Wikipedia page "Outlier"

I noted in the Outlier page on Wikipedia a section on a test called Thompson. There are several Thompsons but there is only one associated with Thompson's Tau test and that is: Thompson, W.R., 1935, 'On the criterion for the rejection of observations and the distribution of the ratio of deviation to sample standard deviation'. Annals of Mathematical Statistics, Vol. 6, No. 4, pp. 214-219. There is a document on the website www.mygeodesy.id.au in the folder 'Survey Computation' under the section 'Tau-distribution and testing residuals' that you may find interesting (http://www.mygeodesy.id.au/documents/Tau%20distribution.pdf

Regards Rod Deakin randm.deakin@gmail.com — Preceding unsigned comment added by 101.175.122.53 (talk) 06:49, 13 September 2019 (UTC)

request for addition to Template:Infobox_Country

For those interested, there is a request at the Template:Infobox_Country talk page to add parameters for the judiciary (highest court, highest judge). --Goldsztajn (talk) 16:44, 15 September 2019 (UTC)

Nomination of Portal:Statistics for deletion

A discussion is taking place as to whether Portal:Statistics is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.

The page will be discussed at Wikipedia:Miscellany for deletion/Portal:Statistics until a consensus is reached, and anyone is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the page during the discussion, including to improve the page to address concerns raised in the discussion. However, do not remove the deletion notice from the top of the page. Certes (talk) 13:12, 1 October 2019 (UTC)

Mean squared prediction error

This page needs citations and checking. Also there's an old discussion on correctness on the talk page that could use the attention of someone with the right expertise. Article could then be de-tagged. All the best: Rich Farmbrough, 14:34, 2 October 2019 (UTC).

Statistical biographies

Some folks here might be interested: I have a draft article on Wei Biao Wu up at AfC. My judgement on notability may be suspect, per the draft talk page, but he has a PNAS article with 400 citations + a bunch of articles with 100+ citations, and so I think he probably meets WP:NPROF criterion 1.

I'll also mention the Michael Woodroofe article that I originated (following COI guidelines and keeping NPOV to my best ability, see my last name). I'd love it if someone with no COI and more knowledge than I would take a look. Russ Woodroofe (talk) 10:51, 13 October 2019 (UTC)

Dispute About Akaike information criterion

There is a content dispute at https://en.wikipedia.org/wiki/Wikipedia:Dispute_resolution_noticeboard#Akaike_information_criterion. Is there someone here who is willing to try to act as a volunteer mediator? In this case, no dispute resolution experience is necessary, but a little subject matter knowledge about statistical model evaluation would be helpful. Robert McClenon (talk) 15:41, 21 October 2019 (UTC)

Request for information on WP1.0 web tool

Hello and greetings from the maintainers of the WP 1.0 Bot! As you may or may not know, we are currently involved in an overhaul of the bot, in order to make it more modern and maintainable. As part of this process, we will be rewriting the web tool that is part of the project. You might have noticed this tool if you click through the links on the project assessment summary tables.

We'd like to collect information on how the current tool is used by....you! How do you yourself and the other maintainers of your project use the web tool? Which of its features do you need? How frequently do you use these features? And what features is the tool missing that would be useful to you? We have collected all of these questions at this Google form where you can leave your response. Walkerma (talk) 04:25, 27 October 2019 (UTC)

Q-Gaussian (second opinion needed)

Hello, I am an AfC reviewer and recently (some time ago) I denied Draft: Q-Gaussian due to an article on Q-Gaussian already existing. However, today, the editor who created the article posted on my talk page saying that the two deals with different subjects. So can someone who has familiarity with math/stats take a look at this and see if they do talk about different things? Thanks, Taewangkorea (talk) 17:22, 2 December 2019 (UTC)

It is hard to tell. The author should add sections about the pdf and cdf of the distribution he's proposing. This would allow us to compare it to the existing page. Could you please write that to him? — Preceding unsigned comment added by Talgalili (talk • contribs) 19:02, 2 December 2019 (UTC)

Comment on creation of Workforce in country XX

On the economics project page, there is proposal for a new set of articles which would incorporate statistical-related issues, for those interested: see here.--Goldsztajn (talk) 14:44, 5 December 2019 (UTC)

Chi-squared test article

The article on the chi-squared test is a disaster. Of course, it is a very challenging article to write, since it must be read by all kinds of people from all kinds of disciplines (medicine, psychology, ..., applied statistics, mathematical statistics) https://en.wikipedia.org/wiki/Chi-squared_test I have made some edits but I don't think I have made the article more accessible; on the contrary. However, I think it is now closer to being correct. The original authors' notion of chi-square test seems to be any statistical test which rightly or wrongly, uses a chi-square distribution to evaluate the statistical significance of the test. OK, one can live with that. But still one has to make clear that whether or not this is *legitimate* depends on a whole lot of things which most readers will be simply completely unaware of, and unable to grasp. After all, whether or not one should use a particular statistical analysis method in a particular case is quite an *art*. But psychology students are given flowcharts which says that they *must* use certain tests in certain situations. Often, the wrong questions are asked; the really important questions are not asked, since they depend on quite sophisticated concepts which the user has never thought about. Richard Gill (talk) 12:18, 4 January 2020 (UTC)

Nelson–Aalen estimator article

Article on Nelson–Aalen estimator has an error. "A concave shape is an indicator for infant mortality while a convex shape indicates wear out mortality." is incorrect since both the infant mortality and wearout failure curves are concaved. — Preceding unsigned comment added by 2601:8C0:0:AFF0:946A:257A:4C6A:AF4 (talk) 03:02, 27 December 2019 (UTC)

I disagree. One typically sees a rapidly decreasing hazard rate in the early stages of life, due to "infant mortality", and a rapidly increasing hazard rate in "old age" due to wear out. Though at extremely old ages one sometimes appears to see a constant hazard rate due to the fact that those who are still around are essentially immortal, but of course still are susceptible to random accidents Richard Gill (talk) 12:20, 4 January 2020 (UTC)

Probability distributions: we need a standard for topics

The different articles for distributions include different sub-headers. For example:

Binomial_distribution: Specification, Example, Expected value and variance, ..., Estimating parameters, ...
Chi-squared distribution: Definition, Introduction, Characteristics, Relation to other distributions, ...
Normal distribution: Definition, Properties, Cumulative distribution function, ..., Estimation of parameters,...

etc.

I'm planning to carve out a standard of what an article should include, and will track what I do here. If others wish to join, please leave a comment :)

Tal Galili (talk) 14:05, 27 November 2019 (UTC)

Great idea. Limit-theorem (talk) 14:39, 27 November 2019 (UTC)

Yes, good idea. NB the chi-squared *test* page is a disaster. The many different tests sometimes called chi-square test because the user is told to use a chi-square distribution to get their p-values are all mixed up; the distinction between theoretical, sometimes approximate, sometimes asymptotic, sampling distributions needs to be better explained. Richard Gill (talk) 12:24, 4 January 2020 (UTC)

proposed structure

Based on some ideas from: https://en.wikipedia.org/wiki/Wikipedia:Featured_articles#Mathematics And also from some distribution articles, I propose the following arc-type of a structure (some might be missing):

Definitions (this section will include sub-sections on introduction or maybe examples, PDF and CDF, as they are inherent to all distributions)
Properties (move discussions about parameter, moments, "central limit theorem", etc. - here)
Related distributions (this is a special case of properties, so I give it its own section)
Statistical Inference (includes Estimation of parameters, confidence intervals, statistical tests, etc.)
Occurrence and applications
Computational methods (for things like how to create random numbers, numerical approximations, and software implementations)
History (I'm putting this at the end since for most distribution this is where this section was placed, and since the main focus for readers, I assume, will likely be to understand how to use a distribution and less so about its origin)
other sections as usual: see also, notes, references, links (each with their own main section)

I'll modify a few distributions and will add them here. Please review and comment:

Opportunity for causal research on Wikipedia research missing it

The "Recent research" column in the current Signpost discusses Murić et al. (2019) "Collaboration Drives Individual Productivity" in Proceedings of the ACM on Human-Computer Interaction. This paper establishes a clear correlation for both Wikipedia articles and GitHub repositories, that, in the case of the former, editors who edit pages edited by greater numbers of editors tend to edit more pages than those who don't. The authors assume that causation flows in the direction as stated in the title and repeated the third paragraph of their conclusion, but they perform no causal research ruling out the alternative hypothesis that causation flows in the opposite direction, which is more likely in my opinion, that individual productivity drives the extent of collaboration. Or to put it a different way, I think it's more likely that editing more pages leads to editing pages edited by greater numbers of editors. This is a great opportunity for critical authors of undergraduate though postdoctorate levels to explain the importance of analyzing causation in our familiar subject matter on open data sets. If anyone wants to pursue this, please ping me on my talk page so I can give you some more ideas I have on approaching it. EllenCT (talk) 00:12, 28 January 2020 (UTC)

I would be very interested to open a discussion with someone willing to work on this & potentially collaborating.

cc: EllenCT. = paul2520 (talk) 18:44, 28 January 2020 (UTC)

Statistics question at Talk:2019–20 coronavirus pandemic#Epidemic curve graphics

Some attention from statistics-inclined Wikipedians would be helpful for resolving the question at Talk:2019–20 coronavirus pandemic#Epidemic curve graphics. Thanks! {{u|Sdkb}} ^talk 22:49, 15 April 2020 (UTC)

REGRESSION ANALYSIS IN STATISTICS

       REGRESSION ANALYSIS IN SPSS
                   By
           Iniobong Emmanuel
              08138223438

Introduction. Variables in regression analysis. Types of Regression.

Assignments on regression analysis.

Introduction Economical, medical, social, phycological,agricultural processes e.t.c involves predicting or estimating values in order to guide our decision making process. Regression in statistical analysis is concerned with predicting or estimating the value of a dependent variable from an independent variable. Variables in Regression Analysis There are two variables used in regression analysis. They are; 1. Dependent variables. 2. Independent variables. What is a dependent variable A dependent variable is a variable that cannot stand on its own without obtaining or depending on another variable for its estimated value. That is why it is called a dependent variable. Example A businessman is concerned in predicting what will be his income when he invest a certain amount in his business. Note here Income = dependent You have no idea on how much it will be at the end of the month. And it only depends on the amount you invest to get it's value. That is why it is called a dependent variable. What is an independent Variables An independent variable is one that stands on its own to give a value to the dependent variable. In other words it is a source of value for the dependent variable. Note In our example, the amount of money invested is known as the independent variable. i.e investment amount = independent

Types of Regression The types of regression to be treated in this course are; Linear regression Multiple regression Logistic regression Ordinal regression Multinomial logistic

Assignment on regression analysis 1. I am a pilot, am interested in predicting the distance my aircraft will travel with amount of jet fuel my plane. What will be my dependent variable and my independent variable? 2. President Iniobong Emmanuel is intrested in predicting or estimating the gross domestic product (G.D.P) with the amount he invest in agriculture. As an analyst, what will be the dependent and independent variable.

Solution 1. Dependent variable= distance

 Independent variable = fuel

2. Dependent variable = g.d.p

 Independent variable = amount of investment in agriculture..

pl — Preceding unsigned comment added by Iniobong7 (talk • contribs) 18:46, 25 September 2020 (UTC)

REGRESSION ANALYSIS IN STATISTICS

       REGRESSION ANALYSIS IN SPSS
                   By
           Iniobong Emmanuel
              08138223438

Introduction. Variables in regression analysis. Types of Regression.

Assignments on regression analysis.

Introduction Economical, medical, social, phycological,agricultural processes e.t.c involves predicting or estimating values in order to guide our decision making process. Regression in statistical analysis is concerned with predicting or estimating the value of a dependent variable from an independent variable.

Variables in Regression Analysis

There are two variables used in regression analysis. They are; 1. Dependent variables. 2. Independent variables. What is a dependent variable A dependent variable is a variable that cannot stand on its own without obtaining or depending on another variable for its estimated value. That is why it is called a dependent variable. Example A businessman is concerned in predicting what will be his income when he invest a certain amount in his business. Note here Income = dependent You have no idea on how much it will be at the end of the month. And it only depends on the amount you invest to get it's value. That is why it is called a dependent variable.

What is an independent Variables

An independent variable is one that stands on its own to give a value to the dependent variable. In other words it is a source of value for the dependent variable. Note In our example, the amount of money invested is known as the independent variable. i.e investment amount = independent

Types of Regression The types of regression to be treated in this course are; Linear regression Multiple regression Logistic regression Ordinal regression Multinomial logistic

Assignment on regression analysis 1. I am a pilot, am interested in predicting the distance my aircraft will travel with amount of jet fuel my plane. What will be my dependent variable and my independent variable? 2. President Iniobong Emmanuel is intrested in predicting or estimating the gross domestic product (G.D.P) with the amount he invest in agriculture. As an analyst, what will be the dependent and independent variable.

Solution 1. Dependent variable= distance

 Independent variable = fuel

2. Dependent variable = g.d.p

 Independent variable = amount of investment in agriculture..

pl — Preceding unsigned comment added by Iniobong7 (talk • contribs) 18:48, 25 September 2020 (UTC)

Third opinion request

Hello,

Can I request a neutral editor to take a look at a recent addition to St. Petersburg paradox and the accompanying discussion on Talk:St. Petersburg paradox#Median value? There is a dispute on proper phrasing of how exactly a median calculation a journal article proposes is being calculated. I'd normally consider asking on WP:3O but this requires some statistical knowledge, so asking here instead. SnowFire (talk) 18:21, 10 August 2020 (UTC)

Expected mean squares

I have created an article titled Expected mean squares. So have fun. Michael Hardy (talk) 23:48, 19 August 2020 (UTC)

Opinions needed

A bit of an ongoing argument and edit war needs help, at Talk:Principal_component_analysis#Entry_paragraph. Dicklyon (talk) 04:17, 22 August 2020 (UTC)

@Michael Hardy: you have any opinions or help to offer? Know any other active project members? Dicklyon (talk) 23:14, 23 August 2020 (UTC)

Deprecating usage of the template {{radic}} to write root radicals has an RFC for possible consensus.

"Deprecating usage of the template {{radic}} to write root radicals" has an RfC for possible consensus. I you would like to participate in the discussion, you are invited to add your comments on the discussion page. Thank you. Walwal20 ^{talk ▾ contribs} 02:12, 25 September 2020 (UTC)

Double check calculation at Template talk:SensSpecPPVNPV please

See Template talk:SensSpecPPVNPV#Prevalence threshold and my edit to the template (diff). Many thanks - Mark D Worthen PsyD (talk) [he/his/him] 18:48, 10 December 2020 (UTC)

Ergograph

I found this while stub-sorting (in this shape)and am not sure it isn't built on a misunderstanding. Could someone have a look? The name suggests it's a graph showing work, not a seasonality thing, but ... PamD 05:30, 7 October 2020 (UTC)

Looking at the sources used by googling their ISBNs and seeing previews on Googlebooks it seems clear that this article is misguided and has been so from the start. It's been copied around the internet, of course.

Also the file File:Weymouth Climatic Graph.png uses it in its caption, wrongly as far as I can see, and is used in a couple of articles.

Meanwhile the more common use of "ergograph" is as a device for measuring muscles. PamD 07:37, 7 October 2020 (UTC)

I've now done some work on the article and think it makes more sense and agrees with its sources, but a more expert statistical eye (or anyone with access to the untitled "Further reading" in "Institute of British Geographers (1950). Transactions and Papers — Institute of British Geographers. G. Philip (16–19): 2, 184", could still be useful, thanks. PamD 07:50, 7 October 2020 (UTC)

That reference is unclear, PamD, not only because of the lack of a title but also because the range of issue numbers given. I have access and did a text search for ergograph and it threw up this. Let me know if you'd like a copy. Cordless Larry (talk) 21:48, 10 December 2020 (UTC)

Last phrase of second sentence

Hi there :)

I've decided that I also want to help with Wikipedia's Statistics now. I haven't been on the editing team of Wikipedia much before. Perhaps I would have a profile, to put information about myself? I haven't looked. So, for the meantime, my website is https://www.aidanhorn.co.za . I am a microeconometrician, but I like using the internet, to find information, and I believe that the internet should be used to store important information. I am a practicing labour economist. I'm sure I won't need an introduction in future, once I start participating on Wikipedia's Talk pages, and once I edit my Wikipedia profile (if there is one).

In the last phrase of the second sentence, it reads, "and by allowing the magnitude of the variance of each measurement to be a function of its predicted value."

Can't this be made clearer, by saying, "and by allowing the standard errors on the point estimates to be functions of the response variable's predicted value."

The whole sentence is currently as follows, "The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value."

By the way, The Economist recently published two articles about Wikipedia, in the Leaders and International sections. Perhaps your university has an institutional license to access The Economist, through Gale Academic OneFile (or Westlaw). — Preceding unsigned comment added by Aidanhorn (talk • contribs) 11:51, 29 January 2021 (UTC)

This simple search made it clear you're talking about Generalized linear model. Hope that helps someone answer you about your suggestion; I can't. As for "edit my Wikipedia profile", I see you did, at User:Aidanhorn. --50.201.195.170 (talk) 21:02, 8 February 2021 (UTC)

Wow, I still have stuff to learn about how to communicate here! Aidanhorn (talk) 10:24, 10 February 2021 (UTC)

Metalog distributions

I have recently published a new Wikipedia page, https://en.wikipedia.org/wiki/Metalog_distribution. If any of you get a chance to take a look at it and have any comments or suggestions, it would be greatly appreciated! Riskanal (talk) 18:08, 16 February 2021 (UTC)

The history section places great emphasis on flexibility, but that is not the only important property of a distribution. The normal distribution, for example, is a stable distribution; the exponential distribution is memoryless. Furthermore, unlimited flexibility is already obtained by expanding in orthogonal polynomials, Fourier series, or something similar. (This was already done by Neyman in his 1937 smooth test when he expanded the log of a density function in orthogonal polynomials; see [5].) The article should focus more on what is unique to these distributions and why they are useful.

You should write \ln instead of ln. Also, \text{log}, \text{nlog}, \text{logit}, etc.

I feel like "approved by the Comprehensive R Archive Network" is perhaps overstating the difficulty of being included in CRAN.

Lastly, I really hope that your username is short for something, because if all I do is insert a space, the result is . . . not good. Ozob (talk) 03:11, 17 February 2021 (UTC)

Thanks for the feedback! I hope to address many of your thoughts in the next few days. A couple of responses in the meantime:

1) While flexibility is not the ONLY important property, it sort of subsumes all other properties. For example, since a metalog distribution can approximate an exponential arbitrarily closely, it can be made "virtually" memory-less. (However, I would need to research in more detail to look into how quickly it departs from "almost memory-less" as data begins to accumulate, to give a more precise answer.) I may try to expand on the flexibility section to clarify this point.

2) Thanks much for the link to the article by Neyman. Are you aware of any work on how quickly orthogonal polynomials converge to match a given distribution? Although not mathematically proven, empirical experience suggests that the metalog converges quickly to match most existing distributions (e.g., fewer than ten terms or so). — Preceding unsigned comment added by Riskanal (talk • contribs) 21:47, 18 February 2021 (UTC)

I disagree that flexibility subsumes all other properties. That may be true in some applications, but it is not true generally. Special distributions often have unique properties, and that is what makes them interesting. An appropriately normalized sum of IID samples from a distribution with finite second moment converges to a normal distribution, not anything else.

The convergence properties of orthogonal polynomials are well-understood. They will always converge in the

L^{2}

(root mean square) sense, and convergence is generally fast in practice. You can find detail in textbooks on numerical analysis or treatises on orthogonal polynomials. If you're interested in density estimation using orthogonal polynomials, I think the paper on Neyman's smooth test that I linked to earlier has some information on that. Ozob (talk) 03:27, 19 February 2021 (UTC)

Thanks again for your comments. I fixed the notation (I'm new to Wikipedia), and also revised the introduction to make it less focused on shape flexibility. Feel free to pass along any other suggestions. Riskanal (talk) 03:38, 22 February 2021 (UTC)

I like it much better. If I were to make further suggestions, I guess the major one would be that you (and therefore the article too) seem to have the perspective of someone involved in fitting data in a particular way (I guess maybe decision analysis? Though I don't actually know what that is), but there may be other reasons why someone would care about these distributions. For example, if someone is interested in pure probability, devoid of any applications, would they find something interesting here? That is, is there some natural mathematical process that produces metalog distributed random variables? Or random variables that are approximately metalog distributed (in a rigorously quantifiable way)? As another example of a potentially interesting property, what happens if I take the sum of two metalog-distributed random variables? Or product, or quotient, or some kind of transformation, etc.? Is the result again metalog-distributed, or perhaps distributed like some well-known distribution? A third question is whether there are alternative parametrizations. It seems clear that you could rearrange some of the coefficients by factoring out, in the quantile function, powers of

y-1/2

. And since I mentioned orthogonal polynomials above, I guess one could also ask whether it would be helpful to replace the powers of

y-1/2

by orthogonal polynomials of some sort. Fourthly, it seems that the standard way of fitting a metalog distribution is by least squares. What kind of properties does this estimator have? Are there other estimators that are useful in some circumstances? And if one wants to take a Bayesian approach and put priors on the coefficients, how is parameter estimation done?

Anyway, those are just some ideas that came to mind. Don't take them too seriously, it's late and I'm babbling. Ozob (talk) 05:56, 23 February 2021 (UTC)

Weighting

It's kind of ironic that the article on Weighting suffers from undue attention to very specific, very technical uses of the term -- you know, undue weight -- instead of a more-general discussion of the topic for general readers using examples that they're likely to encounter, such as polling data. --Calton | Talk 13:53, 11 March 2021 (UTC)

Name of current census in the United Kingdom

The census currently taking place in the rest of the UK has been delayed a year in Scotland this has led to a debate about what the article's title should be which can be found here if anybody is interested in giving their view. Llewee (talk) 22:27, 20 March 2021 (UTC)

Serious inaccuracy in page on Mixed-design

I believe there is a serious inaccuracy at the start of the page on mixed designs. The article states: "Thus, in a mixed-design ANOVA model, one factor (a fixed effects factor) is a between-subjects variable and the other (a random effects factor) is a within-subjects variable." This conflates two separate concepts: within-subjects vs. between-subjects, and fixed factor vs. random factor. The within-vs-between distinction is about whether each participant experiences all conditions of an independent variable (within) or just one condition (between). The fixed-vs-random distinction is about whether the conditions of an independent variable are set by the experimenter (fixed) or randomly sampled from a pool of possible conditions (random). The two distinctions are not the same. Between-subjects variables could be fixed or random. Within-subjects variables could be fixed or random. Someone else noticed this mistake 3 years ago. There's a slim chance I might be mistaken here, but I think it's far more likely that the article is mistaken. I apologize I am not familiar enough with Wikipedia editing and tagging to even know if I'm pointing out this mistake in the right place, and I am not quite confident enough to fix the error on my own. It would be very helpful if an expert statistician could please review and fix the page (I myself am a psychology professor). Thank you. UniAce (talk) 20:34, 19 April 2021 (UTC)

Median of a gamma distribution

I've been involved in a rather one-sided discussion (that is, with almost nobody but me) at Talk:Gamma distribution#Median of the gamma distribution for about 2 years now. I could use a second and third opinion. My contribution to the problem was to do some original research and get it peer reviewed and published. Maybe someone will say yes or no to us using it in the article now. Dicklyon (talk) 22:58, 14 May 2021 (UTC)

Draft:Mann-Kendall trend test

I'd appreciate an opinion on whether this is a plausible article. -- DGG ( talk ) 23:37, 27 October 2021 (UTC)

Article for David Cox

We are looking to clean up the article on David Cox to make sure the description of his work is accurate and cited properly. Your help would be most welcome. Joofjoof (talk) 04:45, 21 January 2022 (UTC)

User script to detect unreliable sources

I have (with the help of others) made a small user script to detect and highlight various links to unreliable sources and predatory journals. Some of you may already be familiar with it, given it is currently the 39th most imported script on Wikipedia. The idea is that it takes something like

John Smith "Article of things" Deprecated.com. Accessed 2020-02-14. (John Smith "[https://www.deprecated.com/article Article of things]" ''Deprecated.com''. Accessed 2020-02-14.)

and turns it into something like

John Smith "Article of things" Deprecated.com. Accessed 2020-02-14.

It will work on a variety of links, including those from {{cite web}}, {{cite journal}} and {{doi}}.

The script is mostly based on WP:RSPSOURCES, WP:NPPSG and WP:CITEWATCH and a good dose of common sense. I'm always expanding coverage and tweaking the script's logic, so general feedback and suggestions to expand coverage to other unreliable sources are always welcomed.

Do note that this is not a script to be mindlessly used, and several caveats apply. Details and instructions are available at User:Headbomb/unreliable. Questions, comments and requests can be made at User talk:Headbomb/unreliable.

- Headbomb {t · c · p · b}

This is a one time notice and can't be unsubscribed from. Delivered by: MediaWiki message delivery (talk) 16:02, 29 April 2022 (UTC)

Discussion at Wikipedia talk:Manual of Style/Words to watch § RfC: Relative time references - 'today' or not 'today'?

You are invited to join the discussion at Wikipedia talk:Manual of Style/Words to watch § RfC: Relative time references - 'today' or not 'today'?. Kudpung กุดผึ้ง (talk) 07:48, 6 June 2022 (UTC)

Aalen-Johansen estimator

While reading an academic paper recently, I encountered the term Aalen-Johansen estimator. Does Wikipedia cover this topic anywhere? If not, it would be great to start an article about it. —Mx. Granger (talk · contribs) 03:53, 17 July 2022 (UTC)

Ridge regression merge

A few more views on a merge between Ridge regression and Tikhonov regularization would be appreciated, the key question being the preferred direction of the merge. Please add thoughts at Talk:Tikhonov regularization#Proposed merge of Ridge regression into Tikhonov regularization. Klbrain (talk) 23:08, 31 August 2022 (UTC)

Mixed model merge

There's indecision at Talk:Mixed model#Proposed merge of Multilevel model with Mixed model. This relates to Multilevel model and Mixed model; some assistance there would be appreciated. Klbrain (talk) 07:11, 13 September 2022 (UTC)

Sample ratio mismatch draft

Hi team. I started working on an article draft about sample ratio mismatch, but I won't have much time to work on it right now. This is a pretty important topic because SRM errors are common, and even small discrepancies between the true and expected proportions of treatment groups can throw off experiments. I would appreciate help adding technical details to the article and finding reliable sources, as the only reliable source that uses the term "sample ratio mismatch" that I've been able to find is the KDD '19 paper. Qzekrom (she/her • talk) 02:05, 16 September 2022 (UTC)

Stupid spacecraft statistics

Your serious feedback or crumpled smile would be appreciated at Talk:Timeline of the far future#Stupid spacecraft statistics. Thanks, Mathglot (talk) 02:31, 3 October 2022 (UTC)

Wikipedia:Reliable_sources/Noticeboard#Association_of_Religion_Data_Archives_and_World_Religion_Database has an RFC

Wikipedia:Reliable_sources/Noticeboard#Association_of_Religion_Data_Archives_and_World_Religion_Database, which is within the scope of this WikiProject, has an RFC for WP:DEPRECATION. A discussion is taking place. If you would like to participate in the discussion, you are invited to add your comments on the discussion page. Thank you. Æo (talk) 19:48, 17 November 2022 (UTC)

Dispute resolution: Monty Hall problem proof

There's a dispute at the Monty Hall problem article about how best to present a proof solving the problem using Bayesian statistics. We're hoping interested editors can help us decide between two options. Thank you. GabeTucker (talk) 18:28, 19 January 2023 (UTC)

Null Hypothesis

going back to at least 2013 the article on the null hypothesis introduction has a single line about the Strong Null Hypothesis without attribution to a source. 50.207.172.218 (talk) 22:08, 28 February 2023 (UTC)

RFC on whether citing maps and graphs is original research

Please see Wikipedia:Village pump (policy)#RFC on using maps and charts in Wikipedia articles. Rs chen 7754 19:04, 19 March 2023 (UTC)

FYI: Wikipedia:Articles for deletion/Information fuzzy networks

FYI:

Wikipedia:Articles for deletion/Information fuzzy networks

You're invited to give your opinions.

--A. B. ^{(talk • contribs • global count)} 14:10, 8 August 2023 (UTC)

COI edit request at Probability box

A COI editor has made a request for information to be added to the Applications section of this article. The requesting editor appears to be the author of the reference they are hoping to apply to the requested addition to that section. Unfortunately, Space Trajectory Optimization Asteroid Impact Monitoring State Estimation is not independently notable (nor for that matter, are many other items in the Applications section). Thus, I was hoping someone more mathematically inclined could have a look? Thank you very much for any help on this. Regards, Spintendo 19:59, 1 September 2023 (UTC)

Statistics in Serial Killer Nurse cases

The Lucy Letby trial in the UK is about a nurse who kept being present when suspicious events happened on her shifts. Or was she a bothersome person who kept complaining when she saw mistakes being made, and are the events called suspicious because she was present? This is a subject of intense controversy in the UK. There is a Wikipedia article on Lucy Letby, and there has been a lot of activity by editors on the Wikipedia article about myself, because of my own activities in the public arena calling for a retrial and arguing her innocence. On the Lucy Letby talk page I argued that editors should distinguish between "being guilty" and "being found guilty". For that reason, I will also be attacked for writing these remarks here. It will be seen as an attempt to use Wikipedia as a vehicle for campaigning. I have already been labelled by the main stream media in the UK as a nutty conspiracy theorist and some sort of terrorist, attempting to undermine the rule of law. (Deja vu: law professors in the Netherlands wrote the same thing about me during the trial of Lucia de Berk). While the trial was going on, Dutch police came to my house in the Netherlands in the night with a letter from UK police, threatening me with arrest next time I visit the UK. I don't want to enter into any discussion here, because it is clear that I do have multiple conflicts of interest. I'm just hoping that Wikipedia editors from outside the UK with interests in statistics, law, and forensic science, will start following Wikipedia developments concerning statistics and the Lucy Letby trial. Richard Gill (talk) 06:04, 25 September 2023 (UTC)

Requested move at Talk:Relative change and difference#Requested move 24 September 2023

There is a requested move discussion at Talk:Relative change and difference#Requested move 24 September 2023 that may be of interest to members of this WikiProject. ModernDayTrilobite (talk • contribs) 14:36, 2 October 2023 (UTC)

Thread that also has statistics

Hello. There is a discussion at Wikipedia:Administrators' noticeboard#Many blocks shouldn't be indef, which has an element of statistics involved. Specifically it relates to calculation of the margin of error. If you are interested in providing input about how margin of error is calculated or the thread in general, you are welcome to join the discussion. Regards,--Thinker78 (talk) 21:13, 4 October 2023 (UTC)

Consistent notation of the quantile function

I noticed that different notations are used to denote the quantile function of probability distributions. The most common ones are $Q(p)$ (e.g. Tukey lambda distribution), $Q(u)$ (e.g. Dagum distribution), $F^{-1}(\cdot )$ (e.g. Kumaraswamy distribution), and $x(F)$ Laplace distribution. In other cases, the quantile function is redubbed as "random variate generation", and described as a transformation of a standard uniform random variable (rv), i.e. $U\mapsto X$ (e.g. Burr Type XII distribution).

Although I realize that this is similarly the case in the literature, I believe that arbitrarily inconsistent notation like this, can be very confusing to readers, especially newcomers. But unlike the published literature, here, the notation can be made consistent.

I don't have a strong preference myself, but I (usually) prefer $x(F)$ over $F^{-1}(\cdot )$ nowadays, since both $Q(\cdot )$ and $F^{-1}(\cdot )$ have no standard for the name of the parameter (I've seen $p$ , $q$ , $u$ and $y$ used in different places). I can also see that standard uniform transformation notation, e.g. $X=-\log(1-U)$ for the standard exponential distribution, could be a good choice, but only if it is described as being the "quantile function", and not only "random variate generation". jorenham (talk) 14:26, 14 December 2023 (UTC)

If we were going to try to standardise, I would prefer Q(u), particularly because it leads naturally to q(u) for the density quantile function. But note also that adopting a standard on Wikipedia would make it inconsistent with the way the quantile function is usually presented in the literature for some distributions. Newystats (talk) 02:34, 15 December 2023 (UTC)

Coupling the QF notation with that of the QDF makes sense to me.

I guess that notational inconsistency with the literature is inevitable. I even stumbled accross a

G(p)

today (C.L. Mallows, '73).

Perhaps it's a good idea to explicitly list the common QF notations on the quantile function page? jorenham (talk) 02:54, 15 December 2023 (UTC)

Should R not be listed as of top importance?

I see some statistics software (e.g. Minitab) has an importance of mid-importance, but there's nothing for R, which I don't think reflects the importance R does have. Drkirkby (talk) 15:43, 3 February 2024 (UTC)

Notability of John H. Wolfe

The article John H. Wolfe has gone through a PROD, but still has issues as it is based on one secondary textbook claim that his work on model-based clustering matters. It was created directly by a novice editor (Stat3472 33 edits). The article model-based clustering supports him as the inventor, but whether this is big enough for notability is unclear. Comments on the talk page please, perhaps better than AfD. Ldm1954 (talk) 09:56, 2 March 2024 (UTC)

Multilevel regression with poststratification

Can we please have some actual mathematical detail in the Multilevel regression with poststratification article as to the various models used, the techniques used to estimate their parameters, and analysis of the power of those methods in increasing accuracy? At the moment, the article's all talk and no math. — The Anome (talk) 11:53, 18 June 2024 (UTC)