Talk:Factor analysis/Archives/2012
This is an archive of past discussions about Factor analysis. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
??
Shouldn't this entry describe how to do factor analysis? It is a mathematical (statistical) procedure, after all, not just a loose idea, and there is only a hint that there is math involved at all! Why not split up the contents of the current page into a brief overview, and a (much shorter) example, with the bulk of the contents (yet to be written) in the middle? 63.145.36.242 (talk) 22:34, 3 January 2008 (UTC)
There is nothing resembling an account of what factor analysis is! It's as if the article is trying to protect the reader from reading about actual math. If I weren't very rusty on this, I'd jump right in. 131.183.81.100 21:36 26 Jun 2003 (UTC)
- I didn't realize that this encyclopedia was limited to a mathematical perspective. I admit that I am not a mathematician, but as a marketer I have had occasion to use factor analysis tecniques and that is why my article is practical and applications oriented. If you wish to add or correct some of the mathematical theory involved, then go ahead. But don't trash the article because it is not purely mathematical enough for you. user:mydogategodshat
It's not because it's not purely mathematical enough, but rather because of two things: It is written as if factor analysis is used ONLY in marketing (as if "car" were defined as "a technology for travelling from Grafton to Clarksburg", those two locations being the ONLY ones cars could travel to or from) and, more seriously, it does not even hint at what factor analysis is, or even attempt to say anything about that. Factor analysis originated in psychology and is used in biology and other fields; to say that the data are usually collected by market researchers or the like is absurd. Michael Hardy 22:20 26 Jun 2003 (UTC)
- Here is a solution. I will move the article to factor analysis (in marketing) to reflect the applied nature of the article. That way you can write a seperate purely theortical article that wont be tainted with practical considerations.
I never contemplated any purely theoretical article not tainted with practical considerations, nor do I think that would be a good idea. I do think that some attempt should be made to say what factor analysis is. I will probably attend to that within a few days. Michael Hardy 22:29 26 Jun 2003 (UTC)
"Have a computer run the factor analysis procedure" is the black box in this article: what that procedure is should be a central point, because if that procedure is replaced by something quite different, then what is being done is NOT factor analysis, even if everything now in the article remains true. Michael Hardy 22:32 26 Jun 2003 (UTC)
- If you wish to add or correct some of the mathematical theory involved, then go ahead. Any help at compensating for my mathematical deficiencies would be appreciated. - user:mydogategodshat
- Over the next week or two I will also be writing articles on:
- Your mathematical expertise would be appreciated there as well.
- user:mydogategodshat
From Wikipedia:Redirects for deletion
- Factor analysis -> Factor analysis (in marketing)
- Was moved because it was not about factor analysis in general, but just about factor analysis in marketing. But such a move has no use if one lets the redirect stand. Andre Engels 18:25, 4 Jan 2004 (UTC)
- Don't delete, write at least a stub or disambiguation page. Onebyone 18:34, 4 Jan 2004 (UTC)
- Move it back and have the entire article under the heading ==Marketing==. The term doesnt need to be split. --Jiang 03:46, 11 Jan 2004 (UTC)
From User talk:Angela
- I see you moved factor analysis (in marketing) to factor analysis. Let us recall some history. That was this article's original name. I complained that the article makes not the least attempt to say what factor analysis is (and please see the discussion page before disputing this). The page's first author rather belligerently and falsely accused me of wanting to write some sort of ivory-towerish account devoid of any mention of practical applications. What incited that attack I do not know. What happened was a sort of compromise. But my objection stands: the article still simply does not care what factor analysis is; it makes no attempt to even hint at that topic. Michael Hardy 02:05, 13 Jan 2004 (UTC)
- I don't dispute that the article is lacking information on what FA is, but that isn't a reason to remove the information on applications of FA. I added the subheading to make it clear that most of the article is on its use in market research. This doesn't stop someone coming and writing something more about FA that is not related to market research. I'm not sure I understand what the problem is. Angela. 02:15, Jan 13, 2004 (UTC)
Suggestion : fuse advantages/disavantages for both
There is no reason for listing different advantages/disavantages in psychology and marketing. Indeed, the advantages/disavantages of Factor Analysis are the same, regardless of the domain of its application.
At the very least, the "Applications in psychology" advantages/disavantages should be changed. As it is, it focus only on intelligence tests, while Factor Analysis can be used in all areas of psychology, especially in Trait theory.
--Guillaume777 03:49, 6 November 2006 (UTC)
What factor analysis is
I'm looking at this again a long time after the discussion above. I'm amazed that someone would suggest that I'm trying to make this "purely mathematical" or not hint at applications. Obviously a technique about which all of the things said in the article are true could be something altogether different from factor analysis, until my edit today. The article didn't even hint at what factor analysis is, and the person who made those irrational accusations against me didn't appear to know or care what factor analysis is. But now I guilty of what I was told was a heinous crime: inserting into this article something about what factor analysis is. That that would be considered criminal is one of the more remarkable instances of irrationality I've seen. Michael Hardy 23:02, 19 Nov 2004 (UTC)
- Thanks for filling in the mathematical theory that I was not familiar with. In my opinion that is a more satisfactory approach than inserting "The factual accuracy of this article is disputed" at the top of the article because you felt it lacked "actual math". mydogategodshat 18:46, 20 Nov 2004 (UTC)
Matrix notation
The matrix notation doesn't seem to be correct, since μ cannot simply be a 10x1 vector. From my experience with principal components analysis, I'd guess that it was supposed to be a 10x1000 vector, with each column containing the average vector, but I'm not certain. --Dfalcantara 07:24, 6 August 2006 (UTC)
Actually I do not thing it should be a 10x1000 matrix, for two reasons: 1) in this case it is indistinguishable from the error matrix and serves no purpose, 2) in this case there is also no sense in which it is an average. Rather (reylying on mathematical intuition but not actual knowledge of FA - hmm, a page on FA written entirely by people who don't know exactly what it is??), I think it should be the outer product of a 10x1 vector mu, and a 1x1000 vector of 1's. This is both conformable and has the property that each row is the mean for a particular test. -J.Lewis
Common Factor Analyses versus Principal Component Analyses
Perhaps include a subsection in talking about the differences between exploratory factor analysis and principal component analysis. Based on my experience in working in Stat Lab is that students/clients frequently get them confused. Perhaps this can be added to common factor analyses. Below is my undertanding on the differences. What do you think?
Exploratory factor analysis (EFA) and principal component analysis (PCA) may differ in their utility. The goal in using EFA is factor structure interpretation and also in data reduction (reducing a large set of variables to a smaller set of new variables); whereas, the goal for PCA is usually only data reduction.
EFA is used to determine the number and the nature of latent factors which may account for a large part of the correlations among a large number of measured variables. On the other hand, PCA is used to reduce scores on a large set of observed (or measured) variables to a smaller set of linear composites of the original (or observed) variables that retain as much information as possible from the original (or observed) variables. That is, the components (linear combinations of the observed items) serve as reduced set of the observed variables.
Moreover, the core theoretical assumptions are different for both methods. EFA is based on the common factor model (FA), whereas, PCA is not.
1. Common and unique variances
- Common Factor Model (FA): Factors are latent variables that explain the covariances (or correlations) among the observed variables (items). That is, each observed item is a linear equation of the common factors (i.e., single or multiple latent factors) and one unique factor (latent construct affiliated with the observed variable). The latent factors are viewed as the causes of the observed variables.
- Note: Total variance of variable = common variance + unique variance (in which, unique variance = specific + error variance).
- Principal Components (PCA): In contrast, PCA does not distinguish between common or unique variances. The components are estimated to represent the variances of the observed variables in an economical fashion as possible (i.e., in a small a number of dimensions as possible), and no latent (or common) variables underlying the observed variables need to be invoked. Instead, the principal components are optimally weighted sums of the observed variables (i.e., components are linear combinations of the observed items). So, in a sense, the observed variables are the causes of the composite variables.
2. Reproduction of observed variables
- FA: Underlying factor structure tries to reproduce the correlations among the items
- PCA: Composites reproduce the variances of observed variables
3. Assumption concerning communalities & the matrix type.
- FA: Assumes that a variable's variance is composed of common variance and unique variance. For this reason, we analyze the matrix of correlations among measured variables with communality estimates (i.e., proportion of variance accounted for in each variable by the rest of the variables) on the main diagonal. This matrix is called the Rreduced.
- Note: Principal Axis factoring (PAF) = principal component analysis on Rreduced.
- PCA: There is no place for unique variance and all variance is common. Hence, we analyze the matrix of correlations (Rxx) among measured variables with 1.0s (representing all of the variance of the observed variables) on the main diagonal. The variance of each measured variable are entirely accounted for by the linear combination of principal components.
Also see Principal Component Analysis
(please bare with me, I am new to contributing to wikipedia).
RicoStatGuy 16:15, Sept 30, 2006(UTC)
- Hei, I have not read your comments, but I agree with you. I just read an article in a journal where the people seem not to understand the difference between both methods. This book explains the difference:
- Jackson J E 1991 A User’s Guide to Principal Components. New York, John Wiley and Sons
- (i.e. the difference in application and in the criteria to be maximized) 136.159.65.84 (talk) 00:45, 15 March 2008 (UTC)sstein
Advantage?
"Allows for a satisfactory comparison between the results of intelligence test" Not sure this is true without some more assumptions (such as all intellegence tests are conducted on random samples from the same population) Or at the very least I'm unclear what is supposed to be meant here. Cydmab 06:21, 6 October 2006 (UTC)
Can someone add to the example section? Dheerajakula 04:40, 12 January 2007 (UTC)
Introduction Not Suitable for a Layman
Factor analysis is a very simple concept. You have a number of different variables which may be products of a single variable we can't measure - for instance, number of smiles, laughs and cheery whistles per day might all be products of the variable "happiness" - and factor analysis is a way of identifying whether this is the case or not. Anybody can understand that without prior knowledge about statistics, yet the introduction to this article has the potential to confuse a layman because it never outlines the basics in easy to understand language. This isn't unique to this article, it seems to plague all of the statistics articles on wikipedia and in my opinion, they should be simplified greatly so that those with no knowledge of the subject can gain a foothold. This doesn't preclude us from adding in detailed sections later on in the article, but please keep in mind that this is supposed to be intelligible to your average housewife/manual labourer, not college professors. Blankfrackis (talk) 17:03, 5 May 2008 (UTC)
- Agree completely. I found Factor Analysis baffling. Checked wikipedia. Now more confused. It's an introduction for people who already know. —Preceding unsigned comment added by Guinness4life (talk • contribs) 04:54, 3 January 2010 (UTC)
Personally, I visited this page twice ... initially as a layman and later after understanding some of the statistical concepts. I feel the "Example" section better describes the underlying concepts to a layman and should come before the "definition" part which should be renamed to "mathematical definition". In essence moving from a layman version to a mathematical,more detailed version makes sense. Also the above description might help a layman understand things faster.Pranshu.diwan (talk) 22:00, 9 December 2009 (UTC) Pranshu
Some mention of techniques/algorithms
I found the specification of the formulation in the article to be concise and clear -- specifically this part:
Also we will impose the following assumptions on .
- and are independent.
That part clarified a lot for me. Taking this as a definition of FA, things very general. My reading of this is that F could be any multi-variate distribution with E(F)=0 and Cov(F)=1. Presumably people would only look at products of univariate distributions in practice for F, right?
Two things that seemed entirely absent from the article, both which I was looking for, were indications of (a) which techniques/algorithms are used most successfully in practice to identify F, L and from data, and (b) which parametric forms for F and are typically used. If someone knowledgeable in the area could add those, it would be a big improvement. —Preceding unsigned comment added by 98.207.54.162 (talk) 21:43, 8 January 2009 (UTC)
The example: improvement is needed
The example, finally, it says, "The values of the loadings L, the averages μ, and the variances of the "errors" ε must be estimated given the observed data X." This statement is not clear enough. It leaves out the F. In the example, F is assumed to take only some specific values. And very clearly L depends on F. Can someone say something about the way how F is defined?
In short, this model can be presented as . It is different from the usual panel analysis, . Here, F is independent of j, and Z are omitted, and is j dependent. Jackzhp (talk) 21:23, 9 February 2010 (UTC)
Encyclopedic scope
Reminder: this is an encyclopedia.
This example will go over anyone's head who has not done this before, so it is pointless for the purposes of the WP. In fields where I am expert, I have always been able to easily create super-simplified examples. As I am becoming a psychology expert I want to do this, especially since I am learning that many decisions made based on factor analysis as it descends directly from Cattell's work. Because of the social importance (and the nature of an encyclopedia), I think there is a need to create a super-simplified model as an example, possibly using Cattell's intelligence work more directly, that explains all the components, especially those influenced by external variables.
The purpose of this suggestion would be to show where the various inputs are, especially "fudging input," so that they can become well-known. The reason for this is that I feel certain that other types of analysis will come over the horizon soon based on emerging technology that will be able to provide these inputs. In other words, the reader does not need to know how it works, just where his relevant "hooks" are if he/she happens to have a source for valid input. Lexical analysis is such a source, though not new. Still lexical analysis may yet go through changes as the meaning of meaning itself is "rigorized" --which sounds a little like Monte Python, does it not? --John Bessa (talk) 16:23, 7 December 2010 (UTC)
The level of a factor
The term level is commonly used to describe the values of a factor, but this article makes no mention of it, other than in the sentence
(the assumption about the levels of the factors is fixed for a given F)
Tedtoal (talk) 17:26, 12 September 2012 (UTC)
Does a factor only take finite number of levels? or it can take infinite number of levels? Jackzhp (talk) 21:23, 9 February 2010 (UTC)
Intelligence Citations Bibliography for Articles Related to IQ Testing
I see a lot of mention of factor analysis when I read professional literature on IQ testing. I wish I had more competence to directly edit this article in a newbie-friendly manner, but what I can do is suggest some Intelligence Citations as examples of one discipline where discussion of factor analysis also comes up. Maybe that will help relate factor analysis to one of its applications for readers of this article. You are welcome to use these citations for your own research. Any help you can provide to other Wikipedians by suggesting new sources through comments on that page would be much appreciated. -- WeijiBaikeBianji (talk) 22:32, 17 July 2010 (UTC)
Covariance?
I'm confused by some of the notation used in this article. It uses the terminology Cov(blah) without defining it. I checked wiki for another page that might explain this notation, and found the page on covariance. However, the notation used on that page doesn't really make sense for this page. It typically has Cov(bla1, bla2), indicating the covariance between two variables, while this page uses this notation on entire expressions. Can someone clarify this notation somewhere in the article or remove it? --Dragon guy (talk) 17:06, 3 June 2011 (UTC)
Inconsistency: is PCA factor analysis?
There is an inconsistency in the text: "Factor analysis is related to principal component analysis (PCA), but the two are not identical." vs "Principal component analysis (PCA): The most common form of factor analysis," Which are correct? Matthiashh (talk) 15:21, 26 September 2011 (UTC)
PCA is NOT factor analysis. PCA analyzes total observed variance across the variables, while exploratory factor analysis analyzes only shared variance. Somewhat Agree (talk) —Preceding undated comment added 23:53, 9 July 2012 (UTC)
Mathematical model of the same example
Why is used in the formula? It seems that here is incorrect and should be just removed, since we have a number here, not a matrix. — Preceding unsigned comment added by 90.190.224.237 (talk) 09:36, 17 February 2012 (UTC)
possible typo / error - but i'm not sure, so hope original author can resolve
In the section called "Mathematical Model of Example", there is a list of descriptions of the variables in the matrix equation. I think the description of "mu" should say "observable" rather than "unobservable". The specific text is pasted here:
"where
N is 1000 students X is a 10 × 1,000 matrix of observable random variables, μ is a 10 × 1 column vector of unobservable constants (in this case "constants" are quantities not differing from one individual student to the next; and "random variables" are those assigned to individual students; the randomness arises from the random way in which the students are chosen),..."
Mu is described as a vector of UNOBSERVABLE constants. But in the equation listed earlier, mu is the average value of x across all the i's. So it seems like that is an OBSERVABLE constant, since it is a statistic of the observable x's.
But I am not sure I am correct, so am hoping someone else can confirm this or explain why I am mistaken.
Vonetc (talk) 17:04, 10 May 2012 (UTC)Andrew von Nordenflycht
- The text is probably correct, but might be made clearer. "Mu is described as a vector of UNOBSERVABLE constants" is correct ... in actual application it might eventually be replaced by vector of the sample means (as an estimate of μ), but I don't think the description ever gets that far. I don't see anything exactly corrsponding to "in the equation listed earlier, mu is the average value of x across all the i's" but there is "a set of observable random variables, with means ." ... Here there is a notional population of x's with expected values μ, but the observed set of x's doesn't correspond to the whole population. Thus μ is unobservable (unknown). I think I would prefer changing "observable" to "observed" and "unobservable" to "unknown" throughout. Melcombe (talk) 19:37, 10 May 2012 (UTC)
Confusion over the meaning of eigenvectors
"The component scores in PCA represent a linear combination of the observed variables weighted by eigenvectors"
PCA does not form "linear combinations of observed variables" and weight them "by eigenvectors." Rather, PCA creates a new set of coordinates axes from linear combinations of the observed variables and these linear combinations/coordinate axes are the eigenvectors. Furthermore, the document cited in support of this statement shows a profound misunderstanding of the nature of eigenvalues and eigenvectors (viz. "An observed variable “loads” on a factors if it is highly correlated with the factor, has an eigenvector of greater magnitude on that factor." Eigenvectors are not unique. For each eigenvalue there are an infinite number of eigenvectors of differing magnitudes. This fact is known to anyone who has ever learned to calculate a set of eigenvectors. As a consequence there is no such thing as a largest eigenvector.)
The utility of this complicated process is simply this: it allows you to perform what is, essentially, a rotation of your data into a "new" coordinate system in which the covariances (i.e. the variation of one variable with another) vanish, leaving you with a set of uncorrelated variables. This representation of the data has many useful properties but I fear that the utility is lost in the confusion.75.157.135.57 (talk) 23:02, 21 September 2012 (UTC)
Mathematically this article is a mess and it's not difficult (but certainly tedious) to understand why:
only two of the cited references are written by authors with any training in mathematics. Most of the others
are psychologists (and one is even a political scientist). See the list below. I understand that Factor analysis has its
major applications in the social sciences but it is a set of statistical methods that use mathematical
techniques somewhat removed from high school algebra. If the editors who have produced this abortion had
a clue maybe they would access the mathematical and statistical literature written by mathematicians and
statisticians but they haven't. Instead, in an effort to demystify the difference between Principal component
analysis and factor analysis, we are given this:
"In PCA, 1.00s are put in the diagonal meaning that all of the variance in the matrix is to be accounted for (including variance unique to each variable, variance common among variables, and error variance). That would, therefore, by definition, include all of the variance in the variables. In contrast, in EFA, the communalities are put in the diagonal meaning that only the variance shared with other variables is to be accounted for (excluding variance unique to each variable and error variance). That would, therefore, by definition, include only variance that is common among the variables."
Where this is not laughably wrong, it's incoherent. The author of this quotation may be a fine Russian scholar but he appears to be woefully ignorant of the mathematical underpinnings of the research methods he employs. And this is only one example from a long list.
I will not edit this article. Somebody with the patience to deal with the inevitable "idiot" backlash will have to do that. But whoever takes on that task, I implore you, consult the mathematical and statistical literature and stay away from the "methods" articles meant for social scientists, they are useless.
And, as I have said elsewhere:
Everyone who edits a Wikipedia article should ask him/herself why he/she has the expertise and writing skill to make a positive change. Most people don't know anything and fewer still can write coherently so why do you, Editor, feel that you have something to contribute? Do you have any professional or educational or practical qualifications in the subject area, have you ever been paid to teach a class in the subject area, has anybody ever paid you to write anything? If not, be conservative; your first inclination should be to do no harm, to leave the article alone. I refer you to the article by Kruger and Dunning, "Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments," Journal of Personality and Social Psychology, 1999, 77(6), 1121-1134. There's a good reason most people don't write for the Encyclopedia Britannica: they're not good enough. For most people, the most effective way to improve the quality of Wikipedia is to leave it just as you found it — errors and all.
Academic Qualifications of Cited Authors
Suhr, Diane (2009). "Principal component analysis vs. exploratory factor analysis"
Author has a PhD in Educational Psychology
Brown, J. D.. "Principal components analysis and exploratory factor analysis – Definitions, differences and choices." Author has PhD in Slavic Languages
Ledesma, R.D. and Valero-Mora, P. (2007). "Determining the Number of Factors to Retain in EFA: An easy-to-use computer program for carrying out Parallel Analysis". Practical Assessment Research & Evaluation, 12(2), 1-11 Senior author appears to be a psychologist Unable to determine the qualifications of junior author
Russell, D. W. (2002). In search of underlying dimensions: The use (and abuse) of factor analysis in Personality and Social Psychology Bulletin. Personality and Social Psychology Bulletin, 28, 1629-1646. Author looks to be a chemist.
Sternberg, R.J. (1977). Metaphors of mind: Conceptions of the nature of intelligence. New York: Cambridge. pp. 85-111. Author is a psychologist
Richard B. Darlington (2004) "Factor Analysis" Author is psychologist
Fabrigar et al. (1999). "Evaluating the use of exploratory factor analysis in psychological research." Authors are all psychologists
Ritter, N. (2012). A comparison of distribution-free and non-distribution free methods in factor analysis. Paper presented at Southwestern Educational Research Association (SERA) Conference 2012, New Orleans, LA Unable to determine the qualifications of author
Subbarao, C.; Subbarao, N.V.; Chandu, S.N. (1995) "Characterisation of groundwater contamination using factor analysis" Second author looks to be a geologist or geological engineer
Love, D.; Hallbauer, D.K.; Amos, A.; Hranova, R.K. (2004) "Factor analysis as a tool in groundwater quality management: two southern African case studies". Physics and Chemistry of the Earth, 29, 1135-1143. Authors are geologists, chemists, civil engineers
Barton, E.S.; Hallbauer, D.K. (1996) "Trace-element and U---Pb isotope compositions of pyrite types in the Proterozoic Black Reef, Transvaal Sequence, South Africa: Implications on genesis and age". Chemical Geology, 133, 173-199. Authors appear to be geologists
Sepp Hochreiter, Djork-Arné Clevert, and Klaus Obermayer, 2006. A new summarization method for affymetrix probe level data. Bioinformatics, 22(8), 943-949. Senior author appears to have some sort of degree in mathematics
Robert MacCallum (June 1983). "A comparison of factor analysis programs in SPSS, BMDP, and SAS". Psychometrika 48 (48). Author has math degrees and psychology degrees
Thompson, B. (2004). Exploratory and confirmatory factor analysis:Understanding concepts and applications. Washington, DC:American Psychological Association. Author is educational psychologist
Factor Analysis. Retrieved July 23, 2004, from http://www2.chass.ncsu.edu/garson/pa765/factor.htm Author (David Garson) has degrees in Political Science and Government
Raymond Cattell. Retrieved July 22, 2004, from http://www.indiana.edu/~intell/rcattell.shtml Author is psychologist (MSc no PhD) who died in 1998
Exploratory Factor Analysis - A Book Manuscript by Tucker, L. & MacCallum R. (1993). Retrieved June 8, 2006, from: http://www.unc.edu/~rcm/book/factornew.htm L. (R.) Tucker unable to determine qualifications Junior author cited above75.157.135.57 (talk) 00:11, 24 September 2012 (UTC)