Jump to content

Lumpers and splitters

From Wikipedia, the free encyclopedia
(Redirected from Lumping (taxonomy))

Lumpers and splitters are opposing factions in any academic discipline that has to place individual examples into rigorously defined categories. The lumper–splitter problem occurs when there is the desire to create classifications and assign examples to them, for example, schools of literature, biological taxa, and so on. A "lumper" is a person who assigns examples broadly, judging that differences are not as important as signature similarities. A "splitter" makes precise definitions, and creates new categories to classify samples that differ in key ways.

Origin of the terms

[edit]

The earliest known use of these terms was thought to be Charles Darwin, in a letter to Joseph Dalton Hooker in 1857: It is good to have hair-splitters & lumpers.[1] But according to research done by the deputy director at NCSE, Glenn Branch, the credit is due to naturalist Edward Newman who wrote in 1845, "The time has arrived for discarding imaginary species, and the duty of doing this is as imperative as the admission of new ones when such are really discovered. The talents described under the respective names of 'hair-splitting' and 'lumping' are unquestionably yielding their power to the mightier power of Truth."[2]

They were then introduced more widely by George G. Simpson in his 1945 work The Principles of Classification and a Classification of Mammals. As he put it:

... splitters make very small units – their critics say that if they can tell two animals apart, they place them in different genera ... and if they cannot tell them apart, they place them in different species. ... Lumpers make large units – their critics say that if a carnivore is neither a dog nor a bear, they call it a cat.[3]

A later use can be found in the title of a 1969 paper "On lumpers and splitters ..." by the medical geneticist Victor McKusick.[4]

Reference to lumpers and splitters in the humanities appeared in a debate in 1975 between J. H. Hexter and Christopher Hill, in the Times Literary Supplement. It followed from Hexter's detailed review of Hill's book Change and Continuity in Seventeenth Century England, in which Hill developed Max Weber's argument that the rise of capitalism was facilitated by Calvinist Puritanism. Hexter objected to Hill's "mining" of sources to find evidence that supported his theories. Hexter argued that Hill plucked quotations from sources in a way that distorted their meaning. Hexter explained this as a mental habit that he called "lumping". According to him, "lumpers" rejected differences and chose to emphasize similarities. Any evidence that did not fit their arguments was ignored as aberrant. Splitters, by contrast, emphasised differences, and resisted simple schemes. While lumpers consistently tried to create coherent patterns, splitters preferred incoherent complexity.[5][6][7][8]

Usage in various fields

[edit]

Biology

[edit]

The categorization and naming of a particular species should be regarded as a hypothesis about the evolutionary relationships and distinguishability of that group of organisms. As further information comes to hand, the hypothesis may be confirmed or refuted. Sometimes, especially in the past when communication was more difficult, taxonomists working in isolation have given two distinct names to individual organisms later identified as the same species. When two named species are agreed to be of the same species, the older species name is almost always retained dropping the newer species name honoring a convention known as "priority of nomenclature". This form of lumping is technically called synonymization. Dividing a taxon into multiple, often new, taxa is called splitting. Taxonomists are often referred to as "lumpers" or "splitters" by their colleagues, depending on their personal approach to recognizing differences or commonalities between organisms.

For example, the number of genera used in Pteridophyte Phylogeny Group (PPG) I has proved controversial. PPG I uses 18 lycophyte and 319 fern genera.[9] The earlier system put forward by Smith et al. (2006) had suggested a range of 274 to 312 genera for ferns alone.[10] By contrast, the system of Christenhusz & Chase (2014) used 5 lycophyte and about 212 fern genera.[11] The number of fern genera was further reduced to 207 in a subsequent publication.[12] Defending PPG I, Schuettpelz et al. (2018) argue that the larger number of genera is a result of "the gradual accumulation of new collections and new data" and hence "a greater appreciation of fern diversity and [..] an improved ability to distinguish taxa". They also argue that the number of species per genus in the PPG I system is already higher than in other groups of organisms (about 33 species per genus for ferns as opposed to about 22 species per genus for angiosperms) and that reducing the number of genera as Christenhusz and Chase propose yields the excessive number of about 50 species per genus for ferns.[13] In response, Christenhusz & Chase (2018) argue that the excessive splitting of genera destabilises the usage of names and will lead to greater instability in future, and that the highly split genera have few if any characters that can be used to recognize them, making identification difficult, even to generic level. They further argue that comparing numbers of species per genus in different groups is "fundamentally meaningless".[12]

History

[edit]

In history, lumpers are those who tend to create broad definitions that cover large periods of time and many disciplines, whereas splitters want to assign names to tight groups of inter-relationships. Lumping tends to create a more and more unwieldy definition, with members having less and less mutually in common. This can lead to definitions which are little more than conventionalities, or groups which join fundamentally different examples. Splitting often leads to "distinctions without difference", ornate and fussy categories, and failure to see underlying similarities.

For example, in the arts, "Romantic" can refer specifically to a period of German poetry roughly from 1780 to 1810, but would exclude the later work of Goethe, among other writers. In music it can mean every composer from Hummel through Rachmaninoff, plus many that came after.

Software modelling

[edit]

Software engineering often proceeds by building models (sometimes known as model-driven architecture). A lumper is keen to generalize, and produces models with a small number of broadly defined objects. A splitter is reluctant to generalize, and produces models with a large number of narrowly defined objects. Conversion between the two styles is not necessarily symmetrical. For example, if error messages in two narrowly defined classes behave in the same way, the classes can be easily combined. But if some messages in a broad class behave differently, every object in the class must be examined before the class can be split. This illustrates the principle that "splits can be lumped more easily than lumps can be split".[14]

Language classification

[edit]
A 2011 RSUH linguistics conference bringing together various prominent Russian "lumpers" belonging to the Moscow School of Comparative Linguistics, including Vladimir Dybo and Georgiy Starostin (standing in front)

There is no agreement among historical linguists about what amount of evidence is needed for two languages to be safely classified in the same language family. For this reason, many proposed language families have had lumper–splitter controversies, including Altaic, Pama–Nyungan, Nilo-Saharan, and most of the larger families of the Americas. At a completely different level, the splitting of a mutually intelligible dialect continuum into different languages, or lumping them into one, is also an issue that continually comes up, though the consensus in contemporary linguistics is that there is no completely objective way to settle the question.

Splitters regard the comparative method (meaning not comparison in general, but only reconstruction of a common ancestor or protolanguage) as the only valid proof of kinship, and consider genetic relatedness to be the question of interest. American linguists of recent decades tend to be splitters.

Lumpers are more willing to admit techniques like mass lexical comparison or lexicostatistics, and mass typological comparison, and to tolerate the uncertainty of whether relationships found by these methods are the result of linguistic divergence (descent from common ancestor) or language convergence (borrowing). Much long-range comparison work has been from Russian linguists belonging to the Moscow School of Comparative Linguistics, most notably Vladislav Illich-Svitych and Sergei Starostin. In the United States, Greenberg and Ruhlen's work has been met with little acceptance from linguists. Earlier American linguists like Morris Swadesh and Edward Sapir also pursued large-scale classifications like Sapir's 1929 scheme for the Americas, accompanied by controversy similar to that today.[15]

Religious studies

[edit]

Paul F. Bradshaw suggests that the same principles of lumping and splitting apply to the study of early Christian liturgy. Lumpers, who tend to predominate in this field, try to find a single line of successive texts from the apostolic age to the fourth century (and later). Splitters see many parallel and overlapping strands which intermingle and flow apart so that there is not a single coherent path in the development of liturgical texts. Liturgical texts must not be taken solely at face value; often there are hidden agendas in texts.[16]

The idea of a single Hindu religion is essentially a lumper's concept, sometimes also known as Smartism (on the basis of the Smārta synthesis). Hindu splitters, and individual adherents, often[quantify] identify themselves on the other hand as adherents of a religion such as Shaivism, Vaishnavism, or Shaktism, according to which deity they believe to be the supreme creator of the universe.[citation needed]

Various "holistic" approaches to religion can prioritise themes such as individual spirituality,[17] the New-Age-style essential oneness of multiple religious traditions, or religious fundamentalism.[18]

Philosophy

[edit]
Freeman Dyson in 2005

Physicist and philosophy writer Freeman Dyson has suggested that one can broadly, if over-simplistically, divide "observers of the philosophical scene" into splitters and lumpers – roughly corresponding to materialists (who imagine the world as divided into atoms) and Platonists (who regard the world as made up of ideas).[19]

Psychiatry

[edit]

In psychiatry, the

'splitters' and the 'lumpers' have fundamentally different approaches to psychiatric diagnosis and classification. First, 'splitters' emphasise the heterogeneity within the diagnostic categories and argue that this heterogeneity drives the 'splitting' process'. 'Lumpers', on the other hand, point to the similarities between the diagnostic categories, and suggest that these similarities justify the creation of broader entities.[20]

Thus lumpers might see "stress" where splitters could identify (say) worry, grief, or some sort of anxiety disorder.

Neuroscience

[edit]

In neuroscience, "uncertainty aversion" and "uncertainty tolerance" in semantic representations appear to correlate with the terms "splitters" and "lumpers" respectively.[21] As neuroscientist Marc-Lluís Vives observes:

"Our survival is possible because every day we make use of previously acquired categories to navigate the world. Every single mug we encounter is distinct, but fundamentally the same. Thanks to this powerful capacity to classify distinct stimuli under the same category, we can generalize our knowledge from the previously encountered subset of mugs to a future subset of mugs. However, this also posits a dilemma: Is a glass mug still a mug? That is, what are the defining principles that make something a "mug"? Establishing this is fundamental since it also affects its relationship with its close-neighbors. Conceptualizing a mug as very different from a glass creates a more clear-cut mapping between the input—that is, the stimulus perceived—and the output that a person needs to generate—that is, the response, such as drinking coffee. Classical work in cognitive science demonstrates that the more similar two stimuli are, the harder it is to discriminate them and respond with different behavior."[22]

Artificial intelligence and linguistics

[edit]

Natural language processing, using algorithmic approaches such as Word2Vec, provides a way to quantify the overlap or distinguish between semantic categories between words.[23] This can provide a sense of how often the contexts of words overlap or are dissimilar in general usage.

See also

[edit]

References

[edit]
  1. ^ Darwin, Charles (1 August 1857). "Letter no. 2130". Darwin Correspondence Project. Retrieved 10 July 2017.
  2. ^ Branch, Glenn (December 2, 2014). "Whence Lumpers and Splitters?". NCSE. Archived from the original on 19 October 2023. Retrieved 19 October 2023.
  3. ^ Simpson, George G. (1945). "The Principles of Classification and a Classification of Mammals". Bulletin of the AMNH. 85. New York: American Museum of Natural History: 23.
  4. ^ McKusick, V.A. (Winter 1969). "On lumpers and splitters, or the nosology of genetic disease". Perspect. Biol. Med. 12 (2): 298–312. doi:10.1353/pbm.1969.0039. PMID 4304823. S2CID 35339751.
  5. ^ J.H. Hexter 'The Burden of Proof' TLS 3481 (October 24th, 1975) pp. 2–4.
  6. ^ C. Hill, 'The Burden of Proof' TLS 3843 (November 7th, 1975) p. 17.
  7. ^ R. Cobb and M. Heinemann 'The Burden of Proof' TLS 3844 (November 14th, 1975) p. 16.
  8. ^ J.H. Hexter and R. Hammersely 'The Burden of Proof' TLS 3846 (November 28th, 1975) pp. 19–20. See also the further articles in the TLS by R. McCaughey, P. Zagorin and F.M.L. Thompson.
  9. ^ PPG I (2016), "A community-derived classification for extant lycophytes and ferns", Journal of Systematics and Evolution, 54 (6): 563–603, doi:10.1111/jse.12229
  10. ^ Smith, Alan R.; Pryer, Kathleen M.; Schuettpelz, Eric; Korall, Petra; Schneider, Harald & Wolf, Paul G. (2006), "A Classification for Extant Ferns", Taxon, 55 (3): 705–731, doi:10.2307/25065646, JSTOR 25065646
  11. ^ Christenhusz, Maarten J. M. & Chase, Mark W. (2014), "Trends and concepts in fern classification", Annals of Botany, 113 (4): 571–594, doi:10.1093/aob/mct299, PMC 3936591, PMID 24532607
  12. ^ a b Christenhusz, Maarten J. M. & Chase, Mark W. (2018), "PPG recognises too many fern genera", Taxon, 67 (3): 481–487, doi:10.12705/673.2
  13. ^ Schuettpelz, Eric; Rouhan, Germinal; Pryer, Kathleen M.; Rothfels, Carl J.; Prado, Jefferson; Sundue, Michael A.; Windham, Michael D.; Moran, Robbin C. & Smith, Alan R. (2018), "Are there too many fern genera?", Taxon, 67 (3): 473–480, doi:10.12705/673.1
  14. ^ Pugh, Ken (2005). Prefactoring. O'Reilly Media. pp. 14–15. ISBN 9780596008741. Retrieved 2014-10-21.
  15. ^ Ruhlen, Merritt. "Is Algonquian Amerind?" (PDF). Archived from the original (PDF) on 2011-08-10. Retrieved 2009-10-25.
  16. ^ Bradshaw, Paul F., The Search for the Origins of Christian Worship, Oxford University Press, 2002, p. ix. ISBN 0-19-521732-2.
  17. ^ Enrich, Sturm (22 March 2021). Holistic Religion: God-Free, Faith-free, Worship-free; Pro-Individual, Pro-People, Pro-Earth, Pro-Ethics. Sturm Enrich (published 2021). ISBN 9780996113458. Retrieved 11 June 2021.
  18. ^ Mozaffari, Mehdi (1996). "Islamism in Algeria and Iran". In Sidahmed, Abdel Salam; Ehteshami, Anoushiravan (eds.). Islamic Fundamentalism. New York: Routledge (published 2018). p. 229. ISBN 9780429968143. Retrieved 11 June 2021. [...] the Islamic fundamentalists have a holistic concept of Islam. They believe in the absolute indivisibility of the three famous D's.
  19. ^ Freeman Dyson, Dreams of Earth and Sky, New York Review Books, 2015, p. 238.
  20. ^ Starcevic, Vladan (2015). "Classification of anxiety disorders and conceptual and diagnostic issues". In Boyce, Philip; Harris, Anthony; Drobny, Juliette; Lampe, Lisa; Starcevic, Vladan; Bryant, Richard (eds.). The Sydney Handbook of Anxiety Disorders: A Guide to the Symptoms, Causes and Treatments of Anxiety Disorders. New South Wales: The University of Sydney. p. 40. ISBN 9780994214508. Retrieved 3 August 2020. The 'splitters' and the 'lumpers' have fundamentally different approaches to psychiatric diagnosis and classification. First, 'splitters' emphasise the heterogeneity within the diagnostic categories and argue that this heterogeneity drives the 'splitting' process'. 'Lumpers', on the other hand, point to the similarities between the diagnostic categories, and suggest that these similarities justify the creation of broader entities.
  21. ^ Vives, Marc-Lluís; de Bruin, Daantje; van Baar, Jeroen M.; FeldmanHall, Oriel; Bhandari, Apoorva (2023-01-13). "Uncertainty aversion predicts the neural expansion of semantic representations". doi:10.1101/2023.01.13.523818. Retrieved 2023-07-29.
  22. ^ "Uncertainty aversion predicts the neural expansion of semantic representations". Neuroscience Community. 2023-04-19. Retrieved 2023-07-29.
  23. ^ Di Gennaro, Giovanni; Buonanno, Amedeo; Palmieri, Francesco A. N. (November 2021). "Considerations about learning Word2Vec". The Journal of Supercomputing. 77 (11): 12320–12335. doi:10.1007/s11227-021-03743-2. ISSN 0920-8542.
[edit]