Talk:Haplogroup R1a (Y-DNA)/Lede

Current Version

Haplogroup R1a is the name given to a major human Y-chromosome haplogroup within R1 (R-M173). In other words, it is one of the major male-lines of humanity.

It is found at high frequencies in a wide geographic arc extending from South Asia to Central and Eastern Europe and Southern Siberia.[2]

R1a is believed to have originated somewhere within this same area in Eurasia, most likely in the area from Eastern Europe to South Asia.

Proposed Change by PB666

Haplogroup R1a is the name given to a major human Y-chromosome haplogroup within haplogroup R1. R1a is one of the most frequent Y-DNA haplogroups found in certain parts of Eurasia, such as Western India and Pakistan. R1a is found commonly across a wide geographic area extending from Central Europe to South Asia and Southern Siberia.

R1a is currently defined by a genetic marker known as M420. Previously it was defined by a marker SRY1532.2, which defines a majority of R1a; however a small subset males living close to the Fertile Crescent are have the M420 marker and lack SRY1532.2. This recent discovery has resulted in a reorganization for the 'family tree' for R1a. SRY1532.2 now defines R1a1. In additions, other new markers for R1a and R1a1 have been discovered. Recently, the R1a tree has also widened, with 4 new branches on the exterior of the R1a tree, all subclades of R1a1a. One of these, R1a1a7 has been found at elevated frequencies over broad areas of Central and Eastern Europe. Another branch, R1a1a6, has been found in Arabia and Pakistan.

R1a distribution can be divided into three hierarchical tiers and each higher tier has has a broader distribution. The basal tier splits R1a* from the R1a1, the higher tiers. R1a* is limited to regions proximal to the Middle East. The middle tier splits R1a1* from R1a1a. R1a1* is distributed more broadly than R1a*, from Northern India to the Caucasus to Scandinavia and from Iran to SW Asia to the East-Central Mediterranean. The highest tier, R1a1a has the broadest distribution. In Eurasia it is found from Iceland to Northeast Asia, from Central Asia to South India, and from Ireland to Egypt.

The origin of R1a is elusive; however, it is believe to originated within Eurasia. Because R1a1a is the most abundant form of R1a, most studies have focused on R1a1a'a origin. The more recent studies indicate that while many regions are possible sources of early spread, the most likely sources of spread are from South Asia, Western and Central Asia.

Analysis of PB666 proposal (please leave above draft alone now and add new drafts below)

First sentence, two changes:

major human Y-chromosome haplogroup within R1 changed to major human Y-chromosome haplogroup within haplogroup R1. Doubling of word. Not good a good change. Please explain aim.--Andrew Lancaster (talk) 08:49, 19 November 2009 (UTC)[reply]

the line of critique is not clearPB666 ^yap

Word repetition of this type is not considered good style. On the other hand, please explain the AIM of the edit. What is wrong that is being fixed. If you can explain this maybe we can find an improvement together.--Andrew Lancaster (talk) 23:43, 19 November 2009 (UTC)[reply]

R1 (R-M173) changed to R1 simply. Why? What is the aim of removing reference to the clarifying mutation name? Here is the reason I put it in: all the phylogenetic names are changing, which has given us a challenge in making the article not confusing. Mutation names are the anchors because they do not change.--Andrew Lancaster (talk) 08:49, 19 November 2009 (UTC)[reply]

The problem with inserting the lingo here is this is the top of the lede, at some point we do want to explain the lingo, but remember it has to make sense as R-M173 = R1 = M173 positive IOW R-M### = RX = M### positive. Otherwise we end up with having to explain three types of lingo all over the article.PB666 ^yap 23:24, 19 November 2009 (UTC)[reply]

And BTW I agree with you on the point, but we still have to explain it, you can't assume the reader will figure out automatically.PB666 ^yap 23:24, 19 November 2009 (UTC)[reply]

I have inserted such explanation now, not right here but not so far away below.--Andrew Lancaster (talk) 23:45, 19 November 2009 (UTC)[reply]

Second sentence completely removed

...and as far as I can see nothing similar was added back in. This was In other words, it is one of the major male-lines of humanity. PB666 should explain his aim. Here is the aim of the sentence originally: it explains what type of thing R1a is to anyone who does not familiar with the word "haplogroup" which is an unusual word. To me it seems like this is exactly what PB666 has been demanding: reduction of jargon and improved accessibility?--Andrew Lancaster (talk) 09:02, 19 November 2009 (UTC)[reply]

I think we need a better phraseology other than male-lines. I have never heard this used before in any context and it sounds to me like a Wikipedia:NEO. Specifically we are not talking about male-lines but genetic lineages of males, while parallel these are not the same. PB666 ^yap 23:30, 19 November 2009 (UTC)[reply]

77000 hits on google, and the term is very common. The point about genes not being the people carrying them is extremely obscure. But no problem it can also be fixed extremely easily by replacing "is" with something like "represents". Will do.--Andrew Lancaster (talk) 23:48, 19 November 2009 (UTC)[reply]

Next sentences concerns geographical distribution

These give/gave a very short summary. This is partly obviously because there is a whole section on this below, as well as the info box. But it was also, as discussed many times, because of the edit wars which occurred every time people tried to list the "most important" places. Old version: It is found at high frequencies in a wide geographic arc extending from South Asia to Central and Eastern Europe and Southern Siberia.

Its not an arc, the Frequency distribution goes through the caucasus and central Asia, the Siberian population can be seen as a terminal, off probably central asia but potentially North Indians, anyway it speculates.PB666 ^yap 23:37, 19 November 2009 (UTC)[reply]

Correct enough. Word arc can easily be fixed without major surgery.--Andrew Lancaster (talk) 00:01, 20 November 2009 (UTC)[reply]

New version: R1a is one of the most frequent Y-DNA haplogroups found in certain parts of Eurasia, such as Western India and Pakistan. R1a is found commonly across a wide geographic area extending from Central Europe to South Asia and Southern Siberia. Problems with additional words and changes:

Two sentences in a row beginning with same words.

tweeking issue.

You were saying "we" (the mere underlings) needed to have high standards (or else)? I guess this did not apply to you? Is this another "wordage" problem? By the way: "tweaking".--Andrew Lancaster (talk) 00:01, 20 November 2009 (UTC)[reply]

Broad summary is now coming after a much more detailed comment about a specific region. Surely the order would normally be the opposite?

I will review the order logical order.

Why include specific details about only one region in this lede at all? See WP:UNDUE but it is also a question of avoiding redundancy and repetition. A lede should not recreate the main body.

It should be a synopsis, and it can briefly recapitulate major points of the article with reverence to the credibility of the claims.PB666 ^yap 23:37, 19 November 2009 (UTC)[reply]

I am saying it does not look like a balanced synopsis at all. In between a compressed summary and a terse description, is a no man's land which inevitably causes lack of clarity and seeming bias. So the summary in the lede needs to be very short, and certainly not the same level of detail as the main section, which is where you are wrongly headed.--Andrew Lancaster (talk) 00:01, 20 November 2009 (UTC)[reply]

Why specifically are Western India and Pakistan chosen as if they are the best examples of areas with high R1a levels? Pakistan only has some populations with high levels, and it is not the west of India which has high levels. See List of R1a frequency by population--Andrew Lancaster (talk) 09:02, 19 November 2009 (UTC) and compare to States and territories of India.[reply]

Western Indian and Pakistan, specifically, have the two highest TMRCA. See table on the page.PB666 ^yap 23:37, 19 November 2009 (UTC)[reply]

Please you really look at where those places are on a map. I would say Bihar is not Western, and I would say Kerala is?--Andrew Lancaster (talk) 00:01, 20 November 2009 (UTC)[reply]

In summary the reasons for these proposed changes are not clear. If they are being proposed, then why?

The lede need to be fleshed out, the current lede is exceptionally abbreviated for a class B article and way to abbreviated for a good article. (unsigned comment by PB666)

What is the aim?

To draft important points of the article into a stand-alone summary. (unsigned comment by PB666)

Why are they better?

See 1 and 2. (unsigned comment by PB666)

--Andrew Lancaster (talk) 09:02, 19 November 2009 (UTC)[reply]

PD, you have already explained this position on the article talkpage, and as you have said yourself, people disagreed. Effectively it really comes down to you saying that the ledes are too small, and the only standard I can see you to be using is your personal preference. There is no problem if you would explain and make a case about your preferences, but conversation has gotten stuck (or is being ignored and bi-passed) by you pretending this is Wikipedia policy, when it is not. The lede stands alone no better in your version, you just padded it with material from the detailed sections, in ordet to make it longer, which is your personal aim: length. That is not correct according to any style policy. Do you not agree that the lede and the main body should not repeat each other? Please explain how this can be good style?--Andrew Lancaster (talk) 00:01, 20 November 2009 (UTC)[reply]

New paragraphs

Proposal is:

R1a is currently defined by a genetic marker known as M420. Previously it was defined by a marker SRY1532.2, which defines a majority of R1a; however a small subset males living close to the Fertile Crescent are have the M420 marker and lack SRY1532.2. This recent discovery has resulted in a reorganization for the 'family tree' for R1a. SRY1532.2 now defines R1a1. In additions, other new markers for R1a and R1a1 have been discovered. Recently, the R1a tree has also widened, with 4 new branches on the exterior of the R1a tree, all subclades of R1a1a. One of these, R1a1a7 has been found at elevated frequencies over broad areas of Central and Eastern Europe. Another branch, R1a1a6, has been found in Arabia and Pakistan.

R1a distribution can be divided into three hierarchical tiers and each higher tier has has a broader distribution. The basal tier splits R1a* from the R1a1, the higher tiers. R1a* is limited to regions proximal to the Middle East. The middle tier splits R1a1* from R1a1a. R1a1* is distributed more broadly than R1a*, from Northern India to the Caucasus to Scandinavia and from Iran to SW Asia to the East-Central Mediterranean. The highest tier, R1a1a has the broadest distribution. In Eurasia it is found from Iceland to Northeast Asia, from Central Asia to South India, and from Ireland to Egypt.

For the most part, this is only a slightly compressed recitation of material currently discussed in sections immediately below the lede. It has been in and out of the lede. In other words, it could be in the lede, or in a separated section, but not both. To explain why it ended up in its own section, it is because compressed versions like the one above are hard to follow, given the changes that have happened in the literature on this matter. Once more this whole discussion is pointless if PB666 can not explain what improvements he is proposing, and demonstrates that he has thought through the versions he wants to replace. It is certainly not self evident.--Andrew Lancaster (talk) 09:11, 19 November 2009 (UTC)[reply]

There are also some basic problems:

a small subset males living close to the Fertile Crescent are have the M420 marker and lack SRY1532.2. This is clearly an attempt to introduce an un-sourced theory. Personally I have sympathy with this theory, but it can not be included here because the authors do not use these leading words, and it is not self evident, for example is the UAE or Oman "close to the Fertile Crescent"?--Andrew Lancaster (talk) 09:20, 19 November 2009 (UTC)[reply]
new markers for R1a and R1a1 have been discovered. I know what this means, the markers are mutations, but will anyone else see what it signifies? Not only will they need help to understand that these are SNP mutations, but they'll also wonder whether this is unusual and meaningful. (It is not, and for that reason it is not normally mentioned in this type of article in this type of way. All clades are distinguished by thousands of SNPs.) In this case we do know PB666's intention because he explained it. As mentioned in various talkpages, PB666 has declared that he wants such wording included in order to give an impression of relative ages, which is once more a way of slipping in a personal theory in between the lines.--Andrew Lancaster (talk) 09:20, 19 November 2009 (UTC)[reply]
R1a distribution can be divided into three hierarchical tiers and each higher tier has has a broader distribution. The basal tier splits R1a* from the R1a1, the higher tiers. Several problems:

Doubled word.
Metaphor is not clear. What is a "tier" here for normal readers? Secondly the language is mixed. The opposite of "higher" is "lower". "Basal" is pure jargon and adds nothing here.
Who says there are three tiers? Depends how you count it right? We'd need to explain how we count it.
But before adding explanations of explanations we need to ask ourselves why we are trying to explain all this here, when it is explained again immediately below. Once again the lede is simply being padded with more or less full versions of the material from the detailed sections. And the the extent that things are being compressed, it is making it impossible to understand.--Andrew Lancaster (talk) 09:29, 19 November 2009 (UTC)[reply]

R1a1* is distributed more broadly than R1a*, from Northern India to the Caucasus to Scandinavia and from Iran to SW Asia to the East-Central Mediterranean. The highest tier, R1a1a has the broadest distribution. We simply do not have any source to suggest this. The opposite might be true.--Andrew Lancaster (talk) 09:29, 19 November 2009 (UTC)[reply]
In Eurasia [R1a1a] is found from Iceland to Northeast Asia, from Central Asia to South India, and from Ireland to Egypt. According to the proposals of PB666 the distribution of R1a1a in about every 5th sentence of the article as a whole? I think his versions would contain both the detailed explanation in the section devoted to it, plus about 5 "summaries". Why? What is improved by this?--Andrew Lancaster (talk) 09:29, 19 November 2009 (UTC)[reply]

Last paragraph

Old version:

R1a is believed to have originated somewhere within this same area in Eurasia, most likely in the area from Eastern Europe to South Asia.

New version:

The origin of R1a is elusive; however, it is believe to originated within Eurasia. Because R1a1a is the most abundant form of R1a, most studies have focused on R1a1a'a origin. The more recent studies indicate that while many regions are possible sources of early spread, the most likely sources of spread are from South Asia, Western and Central Asia.

Comments:

Added words at opening of paragraph, "The origin of R1a is elusive" are unencyclopedic in style and add nothing.
Removal of words "most likely in the area from Eastern Europe to South Asia" implies all of Eurasia is equally likely which is clearly not correct.
Because R1a1a is the most abundant form of R1a, most studies have focused on R1a1a'a origin. This guess about psychological intentions of published authors is both unsourced and I think wrong. People have not chosen to study what is most common because it is most common. They have had to study what they CAN study. Two comments about this:-

As with many of these wordings, it is possible to read them in different ways and argue that they are not mistaken, but that is not the aim is it? PB666 says that his aim is to make "crap" writing more "accessible". Once more it needs to be pointed out that the problems he is fixing are not self evident.
If we would correct the wording to make sure no such misunderstanding was possible we would once again note that this inserted wording basically repeats things being said in other sections already. Once more, the lede is being padded with material being handled in the sections.

The more recent studies indicate that while many regions are possible sources of early spread, the most likely sources of spread are from South Asia, Western and Central Asia. Two remarks:

Once more this is redundant and discussed too many times now if it gets put into this lede.
The authors PB666 presumably has in mind actually wrote in a way which could be more clearly summarized as "Asia, most likely South Asia" instead of "South Asia, Western and Central Asia".--Andrew Lancaster (talk) 09:41, 19 November 2009 (UTC)[reply]