Jump to content

Wikipedia talk:Wikidata/2017 State of affairs/Archive 4

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5Archive 6Archive 10

Quality improvements through Wikidata lists

The intention of some people is to have e.g. lists which get maintained at Wikidata, not here. Somehow this would be ensuring better quality by some magic process. Let's take an example of such a possible list, the Judith Wright Prize.

In reality, the ACT Judith Wright Prize was awarded between 2005 and 2011[1]. While we lack the 2005 and 2011 winners, our list of winners is correct otherwise.

According to Wikidata, the list of winners is a lot longer though, it has 22 entries.

  • Peter Boyle, correct
  • Adrian Caesar[2] (oops, he was only shortlisted)
  • Alan Gould[3] (oops, only shortlisted as well)
  • Barry Hill, correct
  • David Brooks (author)[4] (oops again, shortlisted)
  • Diane Fahey, correct
  • Emma Jones (poet)[5] (oops, "commended", not won)
  • Felicity Plunkett: commended, not won
  • J. S. Harry: highly commended, not won
  • Jan Owen: highly commended, not won
  • Jaya Savige: shortlisted, not won
  • Marcella Polain: shortlisted, not won
  • Martin Harrison: shortlisted, not won
  • Petra White: commended (twice), not won
  • Philip Hammial: shortlisted, not won
  • Sarah Holland-Batt: correct
  • Jordie Albiston: highly commended, not won
  • Susan Hampton: correct
  • Elizabeth Campbell: shortlisted, not won
  • Brendan Ryan: shortlisted, not won
  • Ella O'Keefe[6]: ouch, she was never involved with the ACT Judith Wright Prize in any way, she won the Overland Judith Wright Poetry Prize which is a completely different award (not even the successor of the ACT one, as both existed separately between 2007 and 2011 or thereabouts)
  • Melody Paloma[7] same problem as O'Keefe

So, GerardM, perhaps you can now write a scientific, statistically totally sound blog post about how Wikidata has a (22 entries, 5 correct) 77% error rate for these BLP data (with most of the errors going back 2 years, but some added only today by yours truly?)? And that Enwiki, in its one "item" about the Judith Wright Prize, had a 100% correct rate instead? Just imagine that we had replaced our local list with a Listeriabot shared list, in our quest to improve quality by using the bigger, more linked database? Perhaps the saddest thing is that our list already existed before you made all these errors in Wikidata. Fram (talk) 14:08, 16 January 2017 (UTC)

@Fram: Can you do the same exercise by filtering the statements having at least one reference which is not Wikipedia ? Snipre (talk) 13:20, 17 January 2017 (UTC)
Well, the prize has no reference[8], and for the other 20 have only O'Keefe and Paloma a real reference, but like I said these two are wrong anyway, as it is about a different prize. Fram (talk) 15:33, 17 January 2017 (UTC)
Very unfortunately these sort of problems have been going on for years and years and nothing effective seems to be done. It seems to be left to people noticing errors to correct them and not up to those creating them to improve their faulty mass importing. See, rather notably, wikidata:Wikidata:Project chat/Archive/2014/08#Vandalism? and wikidata:User talk:GerardM/Archive 1#!!!!! What !!!! (where crowds of Israelis, not just Israeli politicians, were completely wrongly marked as "religion Islam"), and, ongoing, wikidata:User talk:GerardM. Thincat (talk) 16:22, 17 January 2017 (UTC)
Damnit, I will have to remove 'alter Jews to religion Islam and vice versa' off my 'how to vandalise Wikidata and have it disseminate erroneous information' list. Wont be original now... Only in death does duty end (talk) 16:32, 17 January 2017 (UTC)
I noted that people are complaining about his edits at the Wikidata project chat (section "linguists", where an error rate of 37% for some of his edits gets greeted with a resigned "Usual for GerardM"), and at his own talk page (section "it's frustrating"), where four persons are complaining about his general approach to editing. He "defends" himself with a link to a recent blog post he made[9], where he is lambasting Wikipedia for adding a wrong award to Clare Hollingworth, only to have to add in a second comment below the post that actually, enwiki was right all along anyway. But he doesn't care about making errors, since they will be solved someday anyway. Nice attitude, explains his 77% error rate I highlighted above, or his merge of [10] and [11] soon afterwards (one is a hill in a district, the other is the district...) He probable meant to merge [12] instead (two other articles about the same district), but now he has added incorrect interwikilinks (so even these get corrupted by Wikidata or GerardM, and no one notices it because they don't turn up on our watchlist). Other discussions at his talk page, like the "Anna-Kristin Ljunggren" section, indicate that he doesn't understand (or care) what he is doing, damaging wikipedia (in this case norwegian) in the process. Correction of errors that get mentioned doesn't seem to be something that interests him. But we shouldn't criticize him because he has over 2 million edits... Frightening! On enwiki, we could restrict or block him, but we can't control who edits Wikidata, and so e.g. known BLP violators could still easily edit enwiki through Wikidata lists, infoboxes, ... if these would become accepted. Fram (talk) 08:31, 18 January 2017 (UTC)

Some comments

First, I do make mistakes. Obviously. The big thing is that What I am after is not for Wikipedia to import any data from Wikipedia, I am interested in getting links associated with wiki links and red links. The first objective is to bring Wikidata tooling to Wikipedia so that from Wikipedia we can more easily find inconsistencies and associate the links with statements at Wikidata. For all kinds of reasons there will be inconsistencies. Some of them will be editorial and some are just wrong. Once it is easier to do these things more and more effort will go into reconciling the differences between both Wikipedia and Wikidata. Again, this does not change the experience of Wikipedia at all.

Second, if and when a Wikipedia decides to use Wikidata directly is up for them. I do not care really.

Third, it is assumed that everything has to be perfect. It is not and Wikidata is much better than it was before but for it to grow there has to first be something to grow and nurture. At the start I have imported a lot from Wikipedia categories and some checks and balances were in there. The problem is that the categories are not consistent and this introduces the problems that were introduced. I am no better at finding after the fact what went wrong because there is no record and much of the tooling has changed a lot. I made errors but I think they are within the range of what can be expected of either human editing or bot editing. The crux is that I have been bold and it resulted in a lot of content, content that allows Wikidata to grow and prosper.

Fourth, it is nice to see that I am vilified. Talk about BLP. I do not care but I do care that it is used to deflect from what is proposed. Then again do your worst. It is not an argument that makes you stand strong quite the contrary, it gives me the impression that you do not know what you are talking about, that you only have an axe to grind. Thanks, GerardM (talk) 07:22, 19 January 2017 (UTC)

BLP doesn't mean that editors can't be criticized for their editing errors, sheesh. But thanks for confirming that what you are interested in with your proposal is making Wikidata better (or at least larger), not so much making enwiki better. What the purpose would be of making Wikidata better in this way remains unclear, but that "more and more effort will go into reconciling the differences between both Wikipedia and Wikidata." is a problem, not a solution.
Again, you prove that you do not understand how this works. Fine. Thanks, GerardM (talk) 10:14, 19 January 2017 (UTC)
Then make a better effort at explaining it. To use your logic and statistics, how would replacing enwiki lists with an 11% error rate with wikidata lists with a 77% error rate actually improve enwiki? Or, to go back to your proposal, how would the system even know, if you had a redlinked "Overland Judith Wright Poetry Prize for New and Emerging Poets" in a Wikipedia article (which happens in List of Australian literary awards), that it was supposed to go to this unsourced, incorrectly named item you just created? Before the system can automagically link the items, a lot of work will need to be done, since just saying "they have the exact same name" will often not help either. You have created at Wikidata multiple indistinguishable items "Helena", which no system could ever reliably choose between if you have a redlinked Helena at enwiki. You would need a lot of AI to decide, based on what page the redlink is found on, whether any item at Wikidata is the right match (if it exists at all, which is usually not the case anyway). There are two Edgar Millers at Wikidata, but neither is about Edgar Miller (psychologist). Even if you had AI, take Klaas Boot. A redlink at Dutch Sportsman of the year, wining the award in 1956 for Gymnastics. Wikidata has two persons named Klaas Boot, Klaas Boot sr. and Klaas Boot jr.. Assuming the link system would have found the match between the names, it would then at best link to Sr., who is at Wikidata described as a "Dutch gymnast", and not to Klaas Jr., who is described as "Dutch television presenter" and makes no mention at all of gymnastics or sport or awards. Too bad that the correct link would have been to Jr., not Sr. Basically, what you propose is a heavy piece of programming, which will be hard pressed to give good performance and good results at the same time, but which will be unlikely to result in more or better articles. In most cases, one will need to look for reliable sources (through Google and the like) anyway, since usually these are lacking in Wikidata. And to get to the articles in other languages usually will be much faster through Google as well, or by using the existing interface (for the Klaas Boot example, a redlink on Dutch Sportsman of the year? Go to the Dutch version of the article), and there you are certain to find the right [https://nl.wikipedia.org/wiki/Klaas_Boot_jr. Klaas Boot.
In the end, your proposal is practically unfeasible and would even when implemented not significantly improve enwiki even in the long run, nor would it help our readers in most cases. The cost would be way too high compared to the potential benefits. Fram (talk) 11:20, 19 January 2017 (UTC)
As for your problematic edits listed above (and those discusseed by others at Wikidata), these are not edits from the start of Wikidata, these are edits from late 2014 to just the last few days. You are quick to write a blog where you incorrectly claim that wikipedia statistically has a 20% error rate, based on one sample where it had an 11% error rate; but when it is pointed out that a similar sample based on your recent edits show a 77% error rate, you start about BLP, vilifying, invalid "everything has to be perfect" requirements, and so on. But when everything is explained to you at length here, and you then still succeed in adding the website of the wrong Overland Judith Wright Poetry Prize to the article of the Judith Wright Prize[13] then I don't think any improvemnts may be expected. Oh, FYI, the correct website would be this.
"I made errors but I think they are within the range of what can be expected of either human editing or bot editing. The crux is that I have been bold and it resulted in a lot of content, content that allows Wikidata to grow and prosper." No, you have been bold again and again, resulting in way too much invalid content, often related to BLPs, which makes Wikidata (and its reputation) worse and which seriously reduces the appetite to include Wikidata data here (and which in most cases you left to others to correct even when the problems were pointed out). You are one of the most prolific Wikidata editors, and can continue largely unchecked. Someone with your track record would have been long restricted or blocked at enwiki. If Wikidata doesn't handle these kind of edits and editors any better, then it just can't be trusted enough to be used as a datasource, and your proposal (and comments) here is just a waste of time. Fram (talk) 08:08, 19 January 2017 (UTC)
You do not know my track record. It is much longer and probably complex than you expect. Never mind, you want an argument why cooperation would benefit English Wikipedia. I have one for you. When you react, do not talk about me, that is not relevant, talk about the point that I make. Thanks, GerardM (talk) 09:40, 21 January 2017 (UTC)
Read it, don't see how your lofty conclusions follow from the proposal you make. When you have a redlink, you can a) write an article about it, or b) try to match it to Wikidata based on whatever, then (usually) write a Wikidata item because none exist, or find good sources to verify and expand the Wikidata item, and then write a Wikipedia article based on this. For some reason, it looks as if A is a lot more logical and productive than B. Perhaps for small wikipedias this may be different (although even then most of the information will still need to be researched anyway, so why bother with Wikidata in the first place?), but for enwiki (and most other large wikis), going to Wikidata as the first port of call makes little or no sense. And what the gender gap has to do with all this? Yes, after you have added links to Wikidata to all redlinks on enwiki, you can probably calculate how many of these are about women and how many about men (and how many about neither). Of course, in that time you could simply have written many articles about women, if that is your main interest. Or about people from the Non-English speaking, non-Western world, because as far as I can tell we have a much larger globalization gap than a gender gap. Fram (talk) 14:45, 21 January 2017 (UTC)
I would think the "globalization gap" is precisely where wikidata can help. I've been working on organization-related data and see this all the time. For instance there are hundreds of universities in Indonesia that have an entry on the 'id' wikipedia but no other wikipedia. Wikidata entries based on those idwiki pages now at least give you basic information on name, website, maybe location, type of institution, etc. and a link to the 'id' article from which you could create at least a brief English translation. Similarly for many institutions in Brazil, or even some European countries. Surely it is better for enwiki to have at least a stub of information on a legitimate organization based on what can be gleaned from wikidata than to have no hint it even exists? ArthurPSmith (talk) 15:13, 21 January 2017 (UTC)
No. Without reliable sources for such subjects, we are better of without an article than with poor stubs based on an unreliable source. And to find such universities, it is easier to check the enwiki lists, and to follow the interwiki link to e.g. the Indonesian wikipedia to find more complete lists. FInding it on Wikidata is nnot really user-friendly. E.g. the first redlinked one I find has an article on the Indonesian Wikipedia, [14] but not on Wikidata. I found it through Google. So why would we link all redlinks to Wikidata and not to e.g. Google, which has much more information than Wikidata and more often points to reliable sources as well? Fram (talk) 17:52, 21 January 2017 (UTC)
@Fram: Why don't we link to Google, Britannica, or other external sources when they have better articles on a given topic than we currently do? Because they're external links, and we prefer internal links to Wikipedia articles, even if they're stubs/worse than external sites, with the hope that someone will then come across that article and help improve it so it is better than the external site. The same applies to links to Wikidata - they are internal links within the Wikimedia projects, and they can be improved by pointing visitors towards them and asking them to improve them. In the long run, hopefully we'll have article placeholders that will even present that information inside the English Wikipedia - but in the meantime, it's better to point towards the wikidata entry instead. Thanks. Mike Peel (talk) 23:14, 21 January 2017 (UTC)
I agree with Fram here. It is better to wait until reliable sources are available and/or a reasonably complete and informative article can be written, rather than having a small stub with sources of doubtful veracity and usefulness. Sometimes a redlink can prompt the writing of a proper article, in ways that a stub doesn't. It is difficult to be sure, and I wish proper studies had been done on this (maybe someone has studied this?). Carcharoth (talk) 00:56, 22 January 2017 (UTC)
@Fram: d:Q12486663 is the wikidata entry for id:Institut Teknologi Sumatera - and it has sat there as an entry with no attached data other than the idwiki link since 2013. If even one other wiki had cared to link to that wikidata item in some way we would likely have considerably more information about that institution in wikidata, and available to every other wikipedia simultaineously. Obviously all the wikipedias and wikidata are works in progress, I think the big question here is how do we most encourage that progress so that all the world benefits? Your approach of essentially "isolationism" strikes me as simply fundamentally the wrong way to go. It leads to duplication of effort, unresolved discrepancies and errors, and a far greater maintenance challenge in the long run. ArthurPSmith (talk) 18:06, 22 January 2017 (UTC)
How would having the differently named redlink on enwiki linked to that Wikidata item have made any difference? "we would likely have considerably more information about that institution in wikidata" how? This seems like more wishful thinking. At the moment, it looks to me as if Wikidata would not reduce maintenance or the error rate (just as often, it would spread errors to other wikiversions). It may be useful for smaller wikis (considering that something like the Volapuk wikiversion is happy being filled with botcreated articles to inflate their article count, I guess they would be very happy with Wikidata-generated articles and data as well), but for enwiki, it makes little sense. It seems more logical to use the big wikiversions to populate wikidata, and then use wikidata to populate the smaller ones once Wikidata has reached a sufficient quality level. But to create articles on e.g. Indonesian universities, the best way is to actually write them, based on reliable sources, and with perhaps at most the indonesian Wikipedia as a source of inspiration. Using Wikidata somewhere in this process (and let's be clear, this was the entry before I highlighted it here, a link to the Indonesian article and nothing else, since May 2013) would have been a waste of time. Now you have expanded the wikidata entry instead of creating an enwiki entry. One can wonder what would have benefited the most people and had the most impact in the long term. To me that would have been an enwiki article, not a Wikidata entry. Fram (talk) 08:51, 23 January 2017 (UTC)
I wrote my comment before I edited the wikidata entry; my editing there made some basic information from idwiki and the institution website available in over 300 languages. Doing the wikidata work took me about 10 minutes. I'm not sure why it's my responsibility to write an enwiki article for this - why haven't you written one? The reality is, wikidata editing is far easier than writing good wikipedia articles. Let me share another example from my experience in the area of organizations - the French university system. Public universities in France were significantly reorganized in 2007, then again in 2013. Enwiki articles on the French university system as of early 2015 were completely out of date, and describe the pre-2013 French system as if it was current. French wikipedia was, of course correct. The various lists on English wikipedia needed to be completely reorganized. Who was going to do that? It was as if enwiki and frwiki were describing two completely different realities. If those lists and list-like templates had derived form a common (wikidata) source across languages, there would have been far less confusion and a much easier maintenance problem. ArthurPSmith (talk) 13:40, 23 January 2017 (UTC)
It's not your responsability, everyone is free to write where and what they like. But you have now made "some basic information from idwiki and the institution website available in over 300 languages" on a website which hardly anyone will find (unless they have first gone to the idwiki page), and in a format most people won't be interested in using to read about a subject (it's after all a database). "Wikidata editing is far easier than writing good Wikipedia articles". True, writing a useful real article is harder than adding a few loose tidbits to Wikidata. I don't believe that's really an argument pro Wikidata though. Yes, from time to time having our data Wikidata-based might be beneficial. Most of the times the opposite would have been true (see the Judith Wright Prize example above), and with Wikidata used the way some people propose that wrong information could have been shown in 300 languages, not just in one database hardly anyone reads anyway. By the way, Wikidata calls it "Sumatra Institute of Technology", Google calls it "Sumatran Institute of Technology", and enwiki and other sources call it "Sumatera Institute of Technology". "Sumatera" is what reliable sources in English also seem to use[15]. So have you now promoted a wrong English name to 300 languages? And is it correct that your item doesn't even have an "original name" property? Of course, if you leave out essential information, then creating (or expanding) an item in 10 minutes won't be too hard. Fram (talk) 14:20, 23 January 2017 (UTC)
@Fram: wikidata really does repay spending some time with it. To answer your question on translation, d:Q3492 pretty clearly demonstrates that en:Sumatra is the English translation of the Indonesian word id:Sumatera. If there is an official English name for the institution of course that would be the best label to use in English, but barring that one has to come up with some sort of reasonable translation - or go with the name in native language if that's at least readable in English (I added the Indonesian name as an alternate label for English). Labels and descriptions are one piece of wikidata (along with sitelinks) that do not have a source mechanism but in a sense they are self-sourcing, they help (along with "instance of" and "official website" relationships) to define what the wikidata item is about. Not every piece of information needs a citation! ArthurPSmith (talk) 00:46, 24 January 2017 (UTC)

Looking at Sumatra on Wikidata, I notice straightaway the claim that this large island is or was (this is not indicated) the capital of Pagaruyung Kingdom. This even has a "reference" in Wikidata, namely... the Wikidata entry on Pagaruyung Kingdom. A site which wants to act as source or basis for enwiki articles and data but which uses "reference" in such an extremely loose manner is not welcome. Furthermore, the island Sumatra obviously wasn't the capital of the Kingdom, which was only a part of Sumatra in the first place. The capital of the Kingdom was Pagaruyung, now a small village... The Wikidata Sumatra entry next states that it is located in the administrative division of Pangkalan Kuras. Strange, this should be a huge administrative division to encompass the whole of Sumatra. But in reality, this is one of the sub-districts of Riau, where Riau is one of the ten provinces of Sumatra. If even the few items on a huge topic (an island with a population of 50 million) has at least two such glaring errors (in 17 items, most of them not really about the island but about Commons or a stupid link to the Wikivoyage banner, for crying out loud; and should "elevation" really give the highest point? Not clear at all), then I doubt that "wikidata really does repay spending some time with it." except for the chuckle factor perhaps. When 2 out of 11 real items on such an important and easy to check subject are blatantly wrong (and have been since 2015), then Wikidata really isn't ready at all to be used a a source of data for enwiki, never mind as a source to carry this disinformation to 300 languages at the same time.

This isn't a one-off problem: the above two errors were introduced by two long-term, trusted Wikidata editors, and when I look at Java (the island), I see the same kind of wrong claims (Java is located in the administrative units West Java, Central Java, East Java, ...) added by yet another very active editor. If such errors are not caught on major topics, then how are you ever going to make Wikidata good enough to be used as the source for infoboxes, lists, or whole articles (placeholders or real ones)? Fram (talk) 08:19, 24 January 2017 (UTC)

Wikidata could certainly use more human eyeballs to find and fix errors like these, as could every wikipedia. What do you suppose the error rate is in the average enwiki article? As I mentioned earlier, the French university articles (including long lists) were substantially wrong in enwiki for 2 years. I've edited enwiki articles on major physics topics that had basic misunderstandings that had stood in place for 5 years or more. WP:WIP - applies to wikidata just as well. But I suppose the more fundamental question is whether wikidata is seen as part of "us" or just another "them" which seems to be what most of this page is about... ArthurPSmith (talk) 17:34, 24 January 2017 (UTC)

Example of problematic Wikidata infobox

Carltheo Zeitschel has recently has its infobox converted to show some Wikidata fields[16]. One of the fields that are now visible is his occupation: diplomat.

While technically correct (though unsourced at Wikidata), the article gives quite a different image: "a German Nazi physician, and diplomat who organized the deportation of Jews in the German Embassy in France as Judenreferent." If you look at Wikidata, you also get the innocent "diplomat" as description in English and Dutch[17]. This whitewashing may be wanted on Wikidata, but I don't think enwiki should describe nazi criminals only as "diplomats" in their infoboxes. In the German Wikipedia, his first category is "Kategorie:Täter des Holocaust" for a reason. I removed the infobox. Fram (talk) 16:06, 23 January 2017 (UTC)

I just made 4 (failed) attempts (using Preview) to incorporate the infobox without the diplomat profession, and came to the conclusion the easiest way was to convert it to a non-wikidata infobox. Of course I then came to the conclusion having an infobox that contains his name, birth and death details, (which are already in the very first sentence of the lead) was completely pointless. I am going back now to have another go. Only in death does duty end (talk) 16:20, 23 January 2017 (UTC)
Behold the wonder of the (un)infobox! So you need to manually suppress a field from wikidata to use the template which works but leads to further problems down the line. I am familiar with how templates work in general so I knew right where to look to work out what I needed to do. Not all editors will know that. Secondly now I have suppressed it, any changes at wikidata will not come through, regardless if they are sourced correctly or not, until someone removes the suppression. This has basically added an extra layer of complexity to what should be a simple editing process - click edit page, insert/remove content - click save. Only in death does duty end (talk) 16:28, 23 January 2017 (UTC)
Unfortunately Fram replaced the "Wikidata" infobox, which said he died in 1945, with an old-style infobox, which said, as did the lede, that he died on "21 April 1945". The latter date is uncited in the en.Wikipedia article. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:00, 23 January 2017 (UTC)
And the Wikidata date was also uncited, so how was my change "unfortunate"? At least here we can add a "citation needed", in a Wikidata only infobox this is impossible. Fram (talk) 17:04, 23 January 2017 (UTC)
Your logical fallacy is tu quoque (and the {{Cn}} is needed in the body, not the infobox). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:11, 23 January 2017 (UTC)
It doesnt have an inline citation but it is actually sourced in the prose. His death was listed in the release of formally classified documents. Its a result of the patchy translation from DEWP - I think but my German is incredibly bad, it is sourced explicitly in the German article. Only in death does duty end (talk) 17:25, 23 January 2017 (UTC)
Well, if you think "hIs[sic] fate was unclear until the files [plural] output from the Foreign Office in 2014 in the literature" is a source... Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:42, 23 January 2017 (UTC)
It certainly is a better one than "1945" from Wikidata without any source at all... Fram (talk) 17:45, 23 January 2017 (UTC)
More FUD. "1945" is acceptably cited in the article, to " Bernhard Brunner, Der Frankreichkomplex: Die nationalsozialistischen Verbrechen in Frankreich und die Justiz in der Bundesrepublik Deutschland, Frankfurt 2008, S. 43." Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:50, 23 January 2017 (UTC)
Please explain what in my statement included "fear, uncertainty or doubt"? There was (probably is) at Wikidata no source for "1945", so why would we include it from there? Fram (talk) 18:01, 23 January 2017 (UTC)
And of course it isn't a tu quoque fallacy (it is a fallacy when it "intends to discredit the validity of the opponent's logical argument", as our article on it states; since you had no valid logical argument for your revert, my reply isn't an example of the fallacy. QED). I criticized the Wikidata infobox for another reason, you then saw fit to revert to it for a completely bogus reason (you replaced an unsourced date here with an unsourced date from an unreliable other site, which is hardly an improvement), and "cn" may equally be applied to infoboxes. Anyway, my infobox correctly summarized the article. If the article is wrong, then the infobox is wrong as well. But replacing it with the Wikidata infobox didn't help (just like it doesn't help in most cases). Fram (talk) 17:34, 23 January 2017 (UTC)
It is possible to edit Wikidata. I think a {{sofixit}} attitude towards Wikidata would help us a lot more than trying to find out exactly how terrible it is and why we should not use it. On the other hand, Wikidata should lower the bar for Wikipedians to contribute so it gets the critical editor mass it needs to work well. —Kusma (t·c) 17:31, 23 January 2017 (UTC)
This is going to be a problem, as SOFIXIT applies to ENWP, not another project. As soon as you start requiring people to go off-wiki in order to apply on-wiki changes, you have drastically increased the burden on editing. Apart from the whole having to learn an entire other projects rules and policies, its just adding un-needed steps to the editing process. The idea of editor retention is making it easier and simpler to edit. Not increase complexity. Only in death does duty end (talk) 17:35, 23 January 2017 (UTC)
I'm sympathetic to the goals of Wikidata, but this does seem a key point to me -- perhaps the key point. The number of users willing to go to Wikidata to fix an issue is surely several orders of magnitude less than the number who will make edits to articles including Wikidata. Teaching an editor like me how to fix something by going to Wikidata isn't the answer. Andy, if it should turn out that we cannot get adequate engagement from the bulk of en-wp's editors to keep up with the edits needed on Wikidata to keep it sourced and accurate, what would plan B look like? Mike Christie (talk - contribs - library) 18:32, 23 January 2017 (UTC)
We certainly should have a plan how to improve editor engagement with Wikidata. But I don't think the problem is insurmountable. Editing infobox data on Wikidata could possibly become as easy (or even easier) as editing infoboxes here and not require understanding of a template's intricate syntax. Thanks to SUL, I hardly notice when I am editing on other projects (well, I notice because I have my own custom .css files here).
About other projects: They are not alien things (and they should not feel "off-wiki"), they are part of our family. We share data with Commons quite a lot, and many people contribute to several projects. If we decide to ignore Wikidata and keep all data local, it may die (just like we killed Wikinews, as we have never been strict about WP:NOTNEWS and so have usually provided better news coverage than Wikinews). —Kusma (t·c) 20:23, 23 January 2017 (UTC)
Agreed, in principle. My question is about what we will do if it turns out that most editors won't engage with Wikidata, despite whatever efforts we may put in to encourage them. Mike Christie (talk - contribs - library) 20:27, 23 January 2017 (UTC)
Plan B I guess would be to use technical means to only filter out Wikidata info which is either properly sourced or does not need a source, and to develop anti-vandalism protection.--Ymblanter (talk) 21:04, 23 January 2017 (UTC)
@Fram: I've just spotted this. You seem to have some facts wrong:
  • The Wikidata infobox was added by this edit by @Brock-brac back in December - not myself. The edit I made was to keep it working when I made a breaking change to the infobox template (changing it from opt-out to opt-in). If you'd pinged me when you criticised my edit, I could have pointed this out sooner.
  • Individual rows in the infobox can now be suppressed by adding e.g. "suppressfields=occupation" (as @Only in death discovered, thanks for making that edit!)
  • This infobox template now includes opting out of unreferenced material from Wikidata (use "onlysourced=yes") if you want - I turned that on in BLP articles using this infobox, but not others.
  • In this case, the information added from Wikidata was correct (as the article says, he was a diplomat), but lacking context. The easiest way to fix that is to simply edit Wikidata to add more context - it's not hard!
  • Please don't build straw-man arguments like "whitewashing may be wanted on Wikidata" that are not true! (Or, provide a citation to support that.)
  • I've reverted the article back to the Wikidata version now (with the suppressed occupation field), since it displays the exact same information now as the manual version does (thanks @Pppery: for changing the date of death on Wikidata, which I think was the only change made here so far).
Thanks. Mike Peel (talk) 10:30, 24 January 2017 (UTC)
And I've reverted your pointy revert. Why would you change an infobox here that works allright, to the Wikidata-filled version of it, if it a) gives the same result, but b) takes it from an unreliable site with unsourced data, with c) worse layout of the infobox (added unnecessary clutter) and d) more difficulty for most people to edit the info or add additional fields (it is not really clear how I can add "context" to the item "diplomat" in Wikidata, I can add qualifiers but these are not the same at all). Fram (talk) 10:58, 24 January 2017 (UTC)
It's not me making pointy edits - it's you. You changed the infobox from a Wikidata-filled version that was working OK (after the occupation field was suppressed) to one that shows identical content. And then you raise a fuss about it here, which is definitely POINTy. In answer to your specific points: a) because it's a test case of the use of the infobox, and we can add additional information to it through Wikidata. b) Are you referring to Wikidata or Wikipedia here? c) edit links are now "unnecessary clutter"?!, d) citation needed - please see d:Wikidata:Introduction if you need an introduction to how Wikidata works. Thanks. Mike Peel (talk) 11:25, 24 January 2017 (UTC)
No. I removed the infobox, and then I raised a fuss about it here. [18] is the removal, on 16.06, the same minute I started this section. Only in Death succeeded in getting the infobox to hide theone field on 16.24[19], after which I replaced it with a standard infobox with more information than the Wikidata infobox had at that moment[20]. So you have both the order of events wrong, and the actual result of my edits (which was not "changed the infobox[...] to one that shows identical content".
As for other points: how many tests did you plan? In any case, it is at TfD now. For b), I meant Wikidata, duh. c) Yes, edit links to Wikidata are unnecessary visual clutter in read mode. And d): this link to the introduction contains no information about my question, but a lot of highly optimistic information which seems to describe more what you want Wikidata to be than what it actually is. Fram (talk) 13:08, 24 January 2017 (UTC)

Prototype for editing Wikidata from Wikipedia on the works

As far as I read on the Wikidata weekly summary 2017-01-21 the development team is working on a clicking prototype interface for editing Wikidata from Wikipedia, based on a research conducted in 2015. That would remove the barrier of going to another wikiproject that other editors find annoying. The ticket for the task is here T132790.--Micru (talk) 08:45, 24 January 2017 (UTC)

Thats going to be problematic for a whole host of other reasons. The main one being if the WMF makes an easy interface to edit wikidata from the article directly, instead of just ensuring article integrity at ENWP, you now have to contend with every other project that accesses the same data. Its basically an escalation of the current drawbacks without solving the underlying problem. Only in death does duty end (talk) 09:15, 24 January 2017 (UTC)
What do you mean by "ensuring article integrity"? And btw it is not the WMF, it is WMDE the one developing this.--Micru (talk) 09:18, 24 January 2017 (UTC)
One problem, though probably minor compared to data integrity: enwiki prefers, where possible, to have English-language sources for a claim. Other wikis probably have similar wishes. How many references are you going to end up with per statement at Wikidata? And if you then want to show on enwiki the references taken from Wikidata, how are you going to ensure that you show the English ones only (if those are added), and only other language sources if no English one is provided? As far as I know, Wikidata has no "language of the source" property.
As for the prototype, I wonder how that will ever work. I discussed on this page how I tried to add a claim to Wikidata, using a book as a reference. It turned out that I had to create a new item for that book, before I could even use it as a reference. The current mockup of the prototype doesn't seem to take such things into consideration. Fram (talk) 09:34, 24 January 2017 (UTC).
I guess it would be feasible to give preference to English sources.
Regarding the language of a reference there is a property for that d:Property:P407, here it can be seen in use: Notices of the American Mathematical Society (Q24158).
As for how to create an item for a reference from Wikipedia, I think it has not been taken into consideration for the first prototype. @Lea Lacroix (WMDE): Can you please clarify?--Micru (talk) 10:08, 24 January 2017 (UTC)
The property for the language of a source only works if the source is an item (e.g. a book), not if it is an URL (which is the vast majority at Wikidata). Fram (talk) 10:15, 24 January 2017 (UTC)
You can also use that property together with the reference url, for instance for the "occupation->computer scientist" of Tim Berners-Lee (Q80), I have added the language of the website. You can add more statements, like date retrieved, title, etc.--Micru (talk) 10:23, 24 January 2017 (UTC)
Thanks. These things really aren't obvious at all (and seem to be very sparsely used, this the first time I saw an URL with the language added on Wikidata). Fram (talk) 13:08, 24 January 2017 (UTC)
I agree that the system for adding references in Wikidata is not very clear, I think it should use the same system as in Visual Editor, but I guess for now there are other development priorities (Commons).--Micru (talk) 13:16, 24 January 2017 (UTC)
Hello, thanks for pinging me on this topic.
Yes, when there is no item for the book yet, the editor has to create it first to add a reference. That's how we improve Wikidata together, adding the book once allows all the next editors who want to source something with this book to have a quick access to the informations, and hopefully soon a lot of books will be already entered in the knowledge base.
Thanks for the suggestion about adding the possibility to create directly the new item for the book in the interface on Wikipedia. This will not available in the first version of the prototype (because we have to work first on a system to be sure that the book doesn't exist already, so we don't create a duplicate), but we keep this idea in mind to be added during the next steps of the development. Lea Lacroix (WMDE) (talk) 12:33, 25 January 2017 (UTC)

Visual clutter

Thanks to the use of this infobox instead of the standard one, readers of Charmian Clift get 6 pencil icons (tooltip: edit this on Wikidata), one comment in brackets "edit on wikidata", and two multicoloured icons which turn out to mean "Article is available on Wikidata, but not on Wikipedia". Indeed, we don't have a separate article on "short story writer" or "essayist": both of course don't need an article, everything is said in short story and List of short-story authors. We do have the redirects Short story writer and essayist, but thanks to the wonders of Wikidata (which doesn't connect to redirects!) and this template, these bluelinks are not shown in the infobox, and it pretends that we don't have info on the subject.

Similarly, at Abbé de Coulmier, we apparently don't have enwiki articles on psychotherapist and Catholic priest; at George Auriol, we don't have articles for type designer, painter, printer, ... Why we would divert readers to Wikidata for these, and why would want to give them the impression that we don't have information on, say, Catholic priests, is not really clear.

That the result of the Wikidata version of the infobox a lot of added visual clutter is, seems clear though. Fram (talk) 15:33, 24 January 2017 (UTC)

That is really awful visual clutter. Really bad. The sort of thing that people would go in and edit. Except here, it is difficult to learn how to do that. An example of what happens when a mature, well-documented and well-developed editing and reading environment (Wikipedia) clashes with a far-less developed (and sometimes really poorly designed) editing and display environment (Wikidata as displayed in templates). Carcharoth (talk) 16:09, 24 January 2017 (UTC)
This has been a known issue since the very beginning of Wikidata, the so-called 'Bonnie and Clyde problem' (T54564). Apparently there were some technical hurdles back then. I also don't like that there are so many pencils, it should be possible to edit WD from VE, that would remove the need of having them there.--Micru (talk) 16:14, 24 January 2017 (UTC)
Try setting "noicon=yes" in one of the current uses - that will remove the edit icons. That could be the default setting for this infobox if preferred. Thanks. Mike Peel (talk) 16:19, 24 January 2017 (UTC)
This is a result of the implementation of the module here on en.WP, not anything that is the fault of Wikidata. I would suggest that it is possible to get the existence/redirect status of the pages on en.WP and change the output in the module. --Izno (talk) 16:22, 24 January 2017 (UTC)

Addition

I just added this, giving something i have been thinking about and have discussed at WT:MED. Don't know if that entry was correct (I used "I" which seems out of sync with other entries, but I didn't want to generalize and instead own my perceptions). Happy to discuss. Jytdog (talk) 22:20, 25 January 2017 (UTC)

Wikidata and geographic coordinates

I'm just posting this here, to be taken for whatever it's worth. This is a post I made a couple of years ago at The Anome's talk page:

OK, here's the problem with Wikidata. A while back, someone on Turkish Wikipedia created a bunch of articles about villages in in the Ardahan district of Ardahan Province. That editor, however, included exactly the same coordinates in each article—probably because he copied the infobox from article to article without making the necessary changes. These coordinates then got imported, without correction, to Wikidata, and now your bot is adding them to en.wp articles. (The bot was previously using GNIS data rather than Wikidata to geotag these village articles, which, while not precise, at least didn't give them all the same damn coordinates.) Now I, or other editors, have to go through Category:Ardahan District, correcting the coordinates, and go to Wikidata to correct them there in order to forestall the further spread of the incorrect coordinates. (They've apparently already been imported to the Vietnamese Wikipedia, so someone else will have to correct them there.) One person's inattention on one Wikipedia is spreading misinformation far and wide, thanks to the power of Wikidata. Not your fault, I know, but I need to vent my frustration somewhere, and I don't have a dog to kick. I hope that the Turkish editor didn't do this sort of business more extensively than I've yet discovered, but I suspect that this is unlikely to be an isolated incident of inaccuracy in Wikidata's information.

I've since noticed a number of other problems with Wikidata's geographic coordinates. Since the spread of Wikidata inaccuracies to en.wp is already mentioned in the "Perceived disadvantages of using Wikidata on enwiki" section of Wikipedia:Wikidata/2017 State of affairs, I didn't see any point in editing the main page; I just think a concrete example may help to clarify the issue. Deor (talk) 17:57, 26 January 2017 (UTC)

The problem extends further than just geographic coordinates. I've been working on List of Roman consuls, providing sources for each name. (I'm up to AD 180.) However, the German equivalent Liste der römischen Konsuln, which has very few sources, is considered there a Featured Article. Which gets preferred as the source to use: an article that is not FA, but has reliable sources, or an FA without sources? And although I haven't bothered to check the two very closely, I expect there are differences between the two: the en version is updated to the latest research I've been able to find, while the de version is ... well, I don't know where they got their information, or how thorough their research is. One could put a lot of work into making sure one set of facts is accurate, only to discover Wikidata took the corresponding set from a less reliable Wikipedia & is pushing them down on us. -- llywrch (talk) 22:12, 1 February 2017 (UTC)
These are problems that can be fixed by using Wikidata - one of the areas it excels in is cross-language collaboration. If there's one location that we store the coordinates (rather than separately in each Wikipedia), then it just takes one edit (and reference!) to fix them across all of the Wikipedias. We have the ability to directly import coordinates from Wikidata now (through either Template:Coord or Template:WikidataCoord - although I suspect the latter will be deprecated at some point). We can also do visualisations like Wikishootme to help identify data bugs. With Roman consuls, ideally we'd gain the benefit of both your work and the work of the German editor(s) by working on a common corpus that can be used to generate the lists in any language (e.g. see Template:Wikidata list / Category:Articles based on Wikidata). Remember that Wikidata is editor-driven, so it isn't 'Wikidata took' it's 'someone made a choice to import' - and the better solution is to add the information to Wikidata to start with, so it doesn't need to be imported from somewhere. Thanks. Mike Peel (talk) 22:40, 1 February 2017 (UTC)
"Wikidata - one of the areas it excels in is cross-language collaboration. " Citation needed. Or at least some good examples. Wherever I come on Wikidata, I see a lot of bot work or semi-automated work, with a few (often very dubious) manual efforts thrown in, and very little discussion or collaboration. There has so far today been 1 talk page edit on Wikidata. 4 non-welcome user talk page message. And 2 Wikidata talk edits. The Wikidata namespace has some activity, but hardly an example of excelling in cross-language collaboration. Fram (talk) 07:16, 2 February 2017 (UTC)

Suggested structure for RfC questions

The goal of this page is to produce one or more RfCs that will determine how we want Wikidata to interact with enwiki. One way to look at it is to ask "What are the conditions under which Wikidata can be part of an article?" By "part of" I mean that there is data in Wikidata which, if changed, may change how the article displays or is categorized.

The following list of conditions is meant to include everything in both the "Perceived disadvantages" section. It doesn't address the "Perceived benefits" section -- those benefits are arguments for opening access to Wikidata, but they don't map to RfC questions -- they will be cited in the RfC discussions.

I've tried to phrase these neutrally, though naturally some of the example responses are not neutral.

  1. What classes of data on WP may Wikidata affect? E.g. interwiki links, entries in list articles, categories, talk pages, user space, citation contents, infobox contents, images/image captions, text embedded in article prose.
  2. Are there any filters that must be in place? E.g. only data with a source specified; or only data with a source that is not another Wikipedia; or only data with some as-yet-undefined type of source with stringent specifications intended to pass WP:V.
  3. How visible must changes in WD be to a WP editor? E.g. must article history show WD edits? Should WD edits be shown by default in all watchlists unless disabled?
  4. How comprehensible must WD changes be, when they are seen? E.g. do we require that the current format (D Hasan al-Kharrat (Q12207879); 16:46 . . Russian Rocky (talk - contribs) (‎Changed claim: Property:P19: Q3766)) be modified in some way?
  5. How easy must it be to edit WD? E.g. do we insist that WD not be used until there is a fully integrated editor for WP that can handle all WD editing? Or is it OK to expect WP editors to go to WD to fix data there? Note that reverting is a form of editing, so do we require people who wish to revert WD vandalism that affects WP to go to WD to do the revert? Or do we insist that it must be possible to revert WD vandalism from within a WP editing environment? Is there a middle option?
  6. Are there any requirements on the WD community before WP accepts WD data? E.g. do we require the WD community to accept that a WP admin decision to block a vandal should be followed, or is it possible for a WD admin to refuse a WP request to block a vandal (or ban a user, or oversight a revision, etc.). Does it matter to WP how WD decides to handle edit wars, e.g. between a foreign wiki editor and an enwiki editor over a WD data item, if the two wikis have different policies on WP:V or WP:RS? Are there any other places where WD policies might affect the acceptability of WD data?

Could an RfC be usefully structured as six questions, one for each of the conditions listed above? Do the questions omit any of the points raised here about the use of Wikidata? Mike Christie (talk - contribs - library) 03:48, 1 February 2017 (UTC)

A number of these are technical focused, so maybe @Lydia Pintscher (WMDE): can comment on their feasibility. For others, an RfC on Wikidata might be better, since the questions are wider than just enwp, they're about how all wikipedias interact with Wikidata (particularly as phrased). Thanks. Mike Peel (talk) 22:44, 1 February 2017 (UTC)
Has there been any case where en.Wiki wanted to ban someone and that person kept editing on Wikidata? Is the concern that the global banlist isn't enough and there should be extra measures in place that let Wikidata automatically ban people that are on the en.Wiki banlist but not the global one? ChristianKl (talk) 21:37, 10 February 2017 (UTC)
I'm not aware that it's a problem; I was just trying to assemble the comments I saw into a structure that would work for an RfC, and give examples in each one. The basic idea I was trying to convey was that if use of Wikidata is to be conditional, we need to specify the conditions; and the conditions fall naturally into types that can be considered independently of each other. Mike Christie (talk - contribs - library) 23:00, 10 February 2017 (UTC)
See this ANI. Jytdog (talk) 23:37, 10 February 2017 (UTC)
Perhaps not a problem (as in trying to influence enwiki through Wikidata), but User:Slowking4 is indef blocked here, but one of the regulars at Wikidata (e.g. regularly participating in the Project chat). (This is not a request to block them on Wikidata, just an example). Fram (talk) 07:54, 13 February 2017 (UTC)

Beyond ridiculous

Right, I'm done here. It's quite clear that a majority - a small number nonetheless - of people editing this page wish to use it to push a PoV, with no serious attempt at balance or neutrality. Simply including the word "perceptions" in a subheading is not carte blanche for promulgating falsehoods. The misrepresentations of my and my others beliefs on the talk page are unacceptable; and particularly laughable coming from people (including one or more admins) who claim to interested in upholding BLP standards.

I look forward to refuting the FUD if and when an RfC is eventually published. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:04, 11 September 2017 (UTC)

If you see any BLP violations on this page, please feel free to point them out here. Otherwise, perhaps don't lecture others on using straw-men... Fram (talk) 12:18, 11 September 2017 (UTC)
Already done, in my comment on this page timestamped "11:27, 11 September 2017 (UTC)". I'm now unwatching this page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:29, 11 September 2017 (UTC)
A difference in interpretation of what one wants or advocates or not is not a BLP violation (and that post doesn't indicate anything problematic "coming from one or more admins" either). Fram (talk) 12:41, 11 September 2017 (UTC)
I assume he is referring to his reponse to myself above. And as soon as I see any evidence that Andy is actually interested in having sourcing policies (on Wikidata) that meet Wikipedia's requirements I will be happy to change my opinion of him and his views - an opinion which has been generated from plenty of discussions on which he has been involved regarding Wikidata. If Andy is actually willing to have sourcing policies that suit Wikipedia's requirements, then he should say so. As it stands his attitude and publically stated responses give the impression he is just not interested. Only in death does duty end (talk) 13:30, 11 September 2017 (UTC)

Wikipedia:Village pump (policy)

I have posted about the problem with the Wikidata descriptions at Wikipedia:Village pump (policy)#Wikidata descriptions still used on enwiki. Fram (talk) 14:37, 11 September 2017 (UTC)