Talk:Dice-Sørensen coefficient
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
The contents of the Sørensen similarity index page were merged into Dice-Sørensen coefficient on February 25, 2013. For the contribution history and old versions of the redirected page, please see its history; for the discussion at that location, see its talk page. |
Letters and diagrams
[edit]Wouldn't you count start ($) and end (^) as letters? Then there are 6 digrams in night and nacht, and they share 3 ($n, ht, and t^) - 50% coefficient. Homunq (talk) 00:43, 27 February 2009 (UTC)
Does it induce a proper metric?
[edit]Jaccard does. But is this
a metric? Can someone add this (or the opposite) to the article? bungalo (talk) 20:45, 31 January 2011 (UTC)
no, it's not. I'll add something to show why not
RichardThePict (talk) 15:05, 13 November 2011 (UTC)
Proposed Merge
[edit]This is identical to the Sørensen similarity index. I think the two articles should be merged, but I don't know what would be the best name for the merged article. The formula is sometimes called the Sørensen-Dice coefficent. Maghnus (talk) 19:50, 31 August 2011 (UTC)
Counterexample for triangle inequality is wrong
[edit]I think that the counterexample for the triangle inequality is wrong. dist({a},{b})=1, dist({a},{a,b})=1/3, dist({b},{a,b})=1/3 so fare everything is fine.
But then the check of the triangle inequality is:
dist({a},{b})+dist({b},{a,b}) > dist({a},{a,b})
1 + 1/3 > 1/3 there is no violation! — Preceding unsigned comment added by Ironmanlu (talk • contribs) 13:01, 6 June 2014 (UTC)
- The counterexample is correct. The triangle inequality states "the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side". This means that picking any two sides to add together, they must always be greater. In other words, it must hold for every combination of sides. Although the above point by Ironmanlu tests that dist({a},{b})+dist({b},{a,b}) > dist({a},{a,b}), we must also test dist({a},{b})+dist({a},{a,b}) > dist({b},{a,b}) and dist({a},{a,b})+dist({b},{a,b}) > dist({a},{b}). Respectively, these give 1+1/3 > 1/3 (again, this is fine) and 1/3 + 1/3 = 2/3 which is not greater than the required value of 1. Therefore the triangle inequality does not hold. Therefore it is a valid counter-example. Neuropsychiatry (talk) 13:20, 24 June 2014 (UTC)
Notational confusion
[edit]The article currently says
- Sørensen's original formula was intended to be applied to presence/absence data, and is
- where A and B are the number of species in samples A and B, respectively, and C is the number of species shared by the two samples
The two parts of this use two different conflicting notational systems. The indicated definitions of A, B, C fit the first definition for QS. But then with A and B being numbers, the last expression, containing the union of the two numbers A and B, makes no sense. The definitions intended in the last expression are that A and B are sets, and the vertical bars are the cardinality operator.
I'm going to revise this to use only the set notation here, because I think it fits in best with what follows. Loraof (talk) 17:34, 26 March 2016 (UTC)
Dice published first: Why the naming preference given to Sørensen?
[edit]Please add two explanations or else revise this article: 1) why Sørensen's name is added, since he wasn't the first to publish. 2) why Dice's name is second, since he was first to publish.
It appears this should be called the Dice-Sørensen coefficient or simply Dice's Coefficient. Is prejudice against Americans on display here?
(It is also curious why the former has a Wikipedia page while the latter does not.) — Preceding unsigned comment added by Newagelink (talk • contribs) 06:55, 8 June 2016 (UTC)
- According to Google search popularity, almost no one uses Sørensen here:
- https://trends.google.com/trends/explore?date=all&q=Dice%20coefficient,S%C3%B8rensen%E2%80%93Dice%20coefficient,Dice-S%C3%B8rensen%20coefficient&hl=en
- I'm an expert in this field, and I've never heard anyone say Sørensen-Dice, only ever Dice. An argument could be made for Dice-Sørensen, if the discoveries were in fact independent, but the current term is most certainly incorrect from a scholarship perspective. Qjkx (talk) 13:36, 11 May 2024 (UTC)
- I've moved the article and updated the contents to fix the incorrect author order. Qjkx (talk) 14:35, 11 May 2024 (UTC)