Jump to content

Wikipedia talk:Wikidata/2017 State of affairs/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5

Point of this page?

I thought the point of this page was "as a preparation for a sitewide RfC about the role of Wikidata on enwiki". If scaremongering and untruths such as "It means that Wikidata edits violate WP:V and WP:BLP" are allowed to stand unchallenged, we'll never get progress. What on earth are "Wikidata edits" supposed to be, anyway? If it means incorporating information imported from Wikidata, then {{Infobox telescope}} is the prime example of its use. I defy anybody to look at South Pole Telescope and tell me where the violations of WP:V and WP:BLP are. I had thought this was going to be a useful exercise where rational debate could prepare the ground for a sensible sitewide RfC. It appears to have become merely a vehicle for some editors to regurgitate half-baked slogans and ridiculous inventions as if they were Gospel. You can count me out of this discussion. --RexxS (talk) 21:29, 12 January 2017 (UTC)

Here's a Wikidata entry that's a BLP, yet apparently lacks sourcing and there is a corresponding article on en-wp. Took only a few moments to find it. In my view it is an issue that needs to be addressed. Victoriaearle (tk) 22:03, 12 January 2017 (UTC)
There are a lot of apparently unsourced statements on that Wikipedia article. None of them come from Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:14, 12 January 2017 (UTC)
Hello @Victoriaearle:. Let's take a look at the given page on d:Q18214287 and the English Wikipedia article on Véronique Carrot. The English Wikipedia article has exactly two references for statements in the text: one for the fact that she took over the choir of the Geneva Conservatory in 2007, and one for her passing her passion to young students at the Lausanne conservatory. The Wikidata page has no references at all -- in particular, her country of citizenship, her date and place of birth, etc. -- none of these have references in either. Why is this perceived as a bigger problem in Wikidata than it was in the English Wikipedia? --denny vrandečić (talk) 01:32, 13 January 2017 (UTC)
@Denny: both the English Wikipedia article on Véronique Carrot, and the article from the French Wikipedia (which is really where the information comes from, as the article history shows it was translated from there), have more than just the two sources you claim. You are looking at the footnoted references which are provided as inline citations. One of those references is a new one (a typical attempt to source a stray fact by referring to a Google Books snippet - a starting point, but one that needs tidying up at some point). The other one is an interview by a writer named Matthey in a publication called Schweizer Musikzeitung (that is the German name for what is also known as Revue musicale Suisse in French). So this reference (as can be seen by the bibliographic information being the same) is the same source as the one mentioned in the 'Sources' section of the article. And it is this sources section that contains the sources for the article (just as the French version did). So this article is in fact not unsourced. It is just not clear which bits of information come from which source (and hence also unclear if any bits are truly unsourced). This is not ideal, but is perfectly acceptable on en-Wikipedia. i.e. this is not an unsourced article, it just lacks inline citations. This does, however it does make it difficult to assimilate the information in the form of attribute-pair data points. So Wikidata and any other data-based system would choke at this point. I see Andy has added a tag that may help. My point is that only part of the editing culture on Wikipedia has been enculturated to provide references for every data point. Many people are quite happy to write paragraphs and sentences with multiple data points in them (facts, opinions, statements, attributions and so forth), and to put a single citation at the end covering everything in that paragraph or sentence. Wikidata, to take referencing seriously, as it needs to, needs to get its users into the habit of referencing everything. Solidly and carefully. Carcharoth (talk) 06:50, 13 January 2017 (UTC)
The Wikidata item d:Q18214287 contains the following pieces of data: the subject is a French speaker, is human, is female, is a citizen of Switzerland, date of birth, given name, occupation, place of birth (in France, yet a Swiss citizen), type of instrument played and image. The French article was created by a bot, and subsequently underwent some revisions. Apparently the bot, [1], takes entries from the fr:Bibliothèque cantonale et universitaire - Lausanne to create articles in Wikipedia space. The source for the information (all of it) is this entry from the Bibliothèque, at the bottom of which is a list of sources used to compile the entry. As best I can tell, the French article is a verbatim copying from that entry (which they seem to allow), with a link to it, and the corresponding list of sources. The article on en-wp Véronique Carrot appears to be a straight translation of the French article. There's nothing wrong with any of this. But - the source for the Wikidata records on item d:Q18214287 is in fact this entry from the Bibliothèque cantonale et universitaire - Lausanne, which confirms the date of birth, the place of birth, name, occupation, instrument played. It doesn't confirm whether she's a Swiss or French citizen (and Alsace is tricky, maybe German?). The first sentence of our article states: "Véronique Carrot, (born 8 March 1958 Forbach) is a French musician, harpsichordist and choirmaster, from Vaud."

The issue as I see it, is that fingerpointing is counterproductive, (because, let's face it everyone posting here, or reading, is invested), but rather to find a point of agreement and to allow dissent, so that we can move forward. In this case the thing to do is to add a record from the external secondary source for the data points we can verify. Furthermore, I'd suggest that it shouldn't be too difficult to add a binary field for every single record, indicating whether or not the record has been verified from an external secondary source. If not, then set a flag and throw up an error message. This is a small and fairly benign example, and frankly fixable (though I wonder about her citizenship), but if we've learned anything in the past few months "fake news" is just that and can easily be disseminated. As people who are heavily invested in a crowdsourced project, doesn't it make more sense to listen to each other's points, fix problems, and move on, rather than attack each other? Victoriaearle (tk) 16:34, 13 January 2017 (UTC)

I love it that Wikidata proponents only see "untruths", "half-baked slogans" and "ridiculous inventions" on the "disadvantages" side. Why can't you (plural) reciprocate the courtesy extended to the "benefits" section, i.e. to leave dubious statements like "Wikidata is easier to train new users " (what do you call them then, wikidata puppies?) because "No knowledge of wiki-syntax is necessary" (true, but you need to learn Wikidata structure and quirks instead, which isn't easier at all in many cases) or "A newly created article could use a Wikidata-aware infobox to help provide an overview of the key facts available on that subject in other Wikipedias at a glance." (true, but in many cases not a benefit at all; that's why I gave the example above of Willy Vandersteen, where the "notable work" is one of his least-known works...).
I knew very well before starting this page that people from both sides wouldn't agree on perceived benefits and disadvantages: I tried to prevent the bickering, snide remarks, edit warring (and current protection) by making it clear that entries didn't have to be objective, as long as they were "real"; some people really believe Wikidata would result in a massive increase of BLP violations, others believe that Wikidata really is much easier and better. But apparently for some people knowing the opinions of others and accepting that these opinions are out there is unacceptable. You speak of Gospel, but some proponents of Wikidata really show cult-like behaviour on this page, which is frightening. Love Wikidata as much as you like, but don't impose it on enwiki when there is so much oppposition, and don't act as if all criticism is ridiculous. Fram (talk) 05:46, 13 January 2017 (UTC)
The claim that "Wikidata edits violate WP:V and WP:BLP." is not real, nor even "real", whatever those scare quotes mean. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:16, 13 January 2017 (UTC)
Edits which automatically include wikidata data into enwiki (by template, or by a bot like Listeriabot) fail WP:V and (where applicable) WP:BLP. Wikidata is an unreliable source, and we shouldn't include data from other unreliable sources. See e.g. WP:CIRC: "Do not use articles from Wikipedia (whether this English Wikipedia or Wikipedias in other languages) as sources. Also, do not use websites that mirror Wikipedia content or publications that rely on material from Wikipedia as sources. Content from a Wikipedia article is not considered reliable unless it is backed up by citing reliable sources. Confirm that these sources support the content, then use them directly." using Wikidata as a data tool clearly violates this policy. (oh, and the scare quotes indicate that while some benefits or disadvantages may not be objectively, absolutely real, they are subjectively real to whoever added them: some people really believe Wikidata is easier to learn and use, some people don't, but I don't remove the entry from the benefits section or even add a comment to it) Fram (talk) 09:37, 13 January 2017 (UTC)
Your opening sentence is false. While you labour under such a misapprehension, it is not going to be possible to have a meaningful discussion on this matter, and you should step back from your attempts to moderate the page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:46, 13 January 2017 (UTC)
Lets' see: I have explained, with a policy quote to make it easier for you, why I made that opening sentence. You imply dismiss it as false without any explanation. I agree with you that it will not be possible to have a meaningful discussion, but the cause of this is not persons using well-reasoned and -supported arguments, but the ones dismissing eveything they don't want to hear out of hand. Fram (talk) 10:07, 13 January 2017 (UTC)
You have indeed cited a policy; but it's one that is not applicable to all cases of data being fetched from Wikidata. The largest such cases, at present, are interwiki links, which do not need to be sourced; and - as I have already referred to on this talk page - identifiers in {{Authority control}} (with well over half a million instances); URLs in {{Official website}}; and category links using {{Commons category}}. Do you really mean to say that they are all breaches of WP:V and ("where applicable") WP:BLP? Or were you, in fact, posting a falsehood? Like User:RexxS, I'm beginning to think this discussion isn't worth the candle. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:36, 13 January 2017 (UTC)
No, this is about your opposition to "The lack of reliable sourcing means that imported Wikidata text violates WP:V and WP:BLP." and your claim that "The claim that "Wikidata edits violate WP:V and WP:BLP." is not real, nor even "real", whatever those scare quotes mean." My comments are about text entries (things that may need sourcing), not interwikilinks and the like. I didn't think I needed to spell out every such detail here, but apparently I'm wrong. Feel free to leave the discussion though, no one is forcing you to stay. Fram (talk) 12:37, 13 January 2017 (UTC)
Your claim was expressly, and I quote, about "Edits which automatically include wikidata data into enwiki...". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:17, 13 January 2017 (UTC)
Yes, and your claim was expressly "The claim that "Wikidata edits violate WP:V and WP:BLP." is not real". I think most people here are capable of reading. I have since clarified (since you needed it) that I meant specifically "text entries (things that may need sourcing), not interwikilinks and the like.". If I am no longer allowed to clarify things when my initial statement was too broad or vague for your liking then it is indeed utterly useless to continue this discussion with you. I have now, ad nauseam, explained which type of Wikidata edits on enwiki violate WP:V and why. Do you still claim that this is "not real", and that no wikidata edits on enwiki can violate WP:V? Fram (talk) 15:31, 13 January 2017 (UTC)
You use the word "still". Where did I make the very specific claim that you now attribute to me, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:18, 13 January 2017 (UTC)
Wikidata isn't a source, it's a Wikimedia project. Same as Wikimedia Commons isn't a source. Thanks. Mike Peel (talk) 09:49, 13 January 2017 (UTC)
"Wikidata isn't a source, it's a Wikimedia project." Those two aren't mutually incompatible. Wikidata is a Wikimedia project and a source. The only difference with other sources is that instead of copying / rephrasing the information with a link to Wikidata, we import the information directly. A source is a place we get information from. That a source is based on other sources (which is clear for some bits of Wikidata, and not for others) doesn't change this. Fram (talk) 10:07, 13 January 2017 (UTC)
  • Listeriabot created populated articles that included biographical data (such as DOB) on living people imported from Wikidata and failed to provide a reliable source for the information within the article. Even if you excluded the bio's from the lists that actually had ENWP articles (there may have been confirming sources on the article and generally its accepted that on a list a blue-linked item can provide the sources), the people without articles violated WP:V in that no source or reference was provided for the information on ENWP. For living subjects, depending on the type of data imported from Wikidata, it also may have been a BLP violation. An argument 'a reference is available at Wikidata' is irrelevant. WP:V requires that articles on ENWP are referenced on ENWP. Not another project. When said other project itself does not have the same standards as ENWP, as well as gathering questionable data from *other* projects that dont have the same standards, trying to argue it is not a violation of WP:V is laughable. And this was a bot doing this which is easily trackable. This is not taking into account individual editors who just import data from wikidata without understanding that it cannot be used without it being verified first. Which just is not happening at the level that ENWP requires. Only in death does duty end (talk) 13:13, 13 January 2017 (UTC)
    • Would you please provide examples of articles created by Listeriabot on the English Wikipedia?--Ymblanter (talk) 15:06, 13 January 2017 (UTC)
      • As far as I know, no articles were technically created by Listeriabot: editors create an article, and Listeriabot then populates it. See e.g. [2] a sequence of article creation (with only a short introduction), and then the bot populating the list. The result of the bot is that e.g. "citation needed" tags get overwritten on the next bot run, no matter if anything has been done to the entries or not[3]. Fram (talk) 15:31, 13 January 2017 (UTC)
        • Thanks. @Emijrp:, I do not think such lists should be kept in the article space. It is perfectly fine to keep them in the user space though. At least not before we find a way to fight Wikidata vandalism and to make sure the lists are properly sourced.--Ymblanter (talk) 15:36, 13 January 2017 (UTC)
        • Yes perhaps I should have used 'populated' there - I have amended. Essentially the list was created by a user as a wikidata list and the next edit was the bot filling it with data imported from wikidata. I am assuming there is a specific reason the bot itself could not do the creation. But when it auto-fills the content the same minute the list is created, its clear there was no intent to actually manually vet the information. Only in death does duty end (talk) 15:41, 13 January 2017 (UTC)

Uncontroversial issues

I made this point repeatedly elsewhere, but people somehow tend to ignore it and proceed with their agenda. There are many things which can be imported from Wikidata in a completely uncontroversial manner. Those are (i) things we have on Wikimedia projects (such as Commons category, which is already imported in many cases, or, for example, main topic for a category, which is, to the best of my knowledge, is not imported); (ii) things which do need to be sourced - for example, a facebook account of an individual:it might be a good idea to require sourcing, but in 100% case on Wikipedia such data are not sourced, and nobody considers it to be a problem; (iii) facts which are directly derived from external databases, such as e.g. results of sports competitions or population of localities or facts that an individual was has a biographical article in a certain encyclopedia - these are sourced in Wikidata (provided they are imported from external sources) and can be imported from Wikidata by bot together with a proper source. In all these cases, WP:V is not an issue at all. What is the issue is occasional vandalism, but vandalism is an issue on Wikipedia as well; if we can build up workflow such that this vandalism can be dealt with, I do not see any problem with importing these things directly from Wikidata.--Ymblanter (talk) 16:58, 13 January 2017 (UTC)

Are you suggesting that I'm proceeding with an agenda (which is emotive and not a constructive way of bridging misunderstandings)? If so, I'll go ahead an unwatch this page. I do have one question though: if we can import biographical articles from external encyclopedias, which I did not know and it's a useful piece of information to have exlained, what's the downside to adding the source to comply fully with WP:V? Victoriaearle (tk) 17:14, 13 January 2017 (UTC)
Not importing the articles themselves, but making the connection better documented. For example, see this edit where I added the OBIN (Oxford Biography Index Number) to the entry for Margaret Thatcher. This flags up that the ODNB now has an article on Margaret Thatcher. The OBIN is an external identifier used to identify the article. This is slightly different to adding information sourced to the ODNB article itself. Carcharoth (talk) 17:22, 13 January 2017 (UTC)
(ec) No, I actually do not think I made this point earlier on this page, saying explicitly I made it elsewhere. Concerning your question, I am not suggesting the articles should not fully comply with WP:N. It is pretty easy either to ask the bot to import sources to Wikipedia, or to import info via template in such a way that the reference is shown on Wikipedia. Both usages would be fully compliant with WP:V.--Ymblanter (talk) 17:23, 13 January 2017 (UTC)
Agreed. I think much of this comes under the heading 'Identifiers'. d:Wikidata:Glossary is useful, but 'identifier' there seems to be talking about something else. In fact, 'identifier' seems to be used for many different things across Wikidata, all used to identify things, but in slightly different ways. d:Wikidata:Identifier migration might shed some light as well. Carcharoth (talk) 17:18, 13 January 2017 (UTC)See also d:Wikidata:WikiProject Biographical Identifiers and Wikipedia:Biographical metadata which hasn't been edited for four years and (ironically) has no mention of Wikidata. Carcharoth (talk) 17:25, 13 January 2017 (UTC)
An example of a Wikidata identifier (this weird thing which in Wikidata appears below properties) is indeed OBIN which you mention above. This covers some of my examples but not all of them. (Though the notion of an identifier can be indeed stretched).--Ymblanter (talk) 17:26, 13 January 2017 (UTC)
That (for the sake of clarity) is not a Wikidata identifier, it is an OBIN identtifer, used in Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:27, 13 January 2017 (UTC)
Indeed, I was talking about external identifiers used by Wikidata as such.--Ymblanter (talk) 17:29, 13 January 2017 (UTC)
[ec] d:Wikidata:Glossary is referring to Wikidata identifiers; for example Q42, as in Douglas Adams (Q42). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:27, 13 January 2017 (UTC)

Wikidata has more data

This is easy and obvious. Wikidata has more items than Wikipedia has articles. Wikidata knows about all the articles of all the Wikipedias and the other projects too. The result is that increasingly new articles in English Wikipedia get linked to items that are not connected to Wikipedia articles. These items exist to create links, they exist because they complete a collection of data. Given that the percentage on the amount of statements on items is going up, it means that one absolute measure of quality is improving.

Data from other Wikipedias

When data from Wikipedias is included in Wikidata, it is often heard that the quality is not good enough. The problem is that when this data is not good enough in Wikidata, it is not good enough in Wikipedia either. What you may notice is that many errors are corrected in Wikidata EXACTLY because there is this perspective from all the Wikipedias. It is how often categories based on Wikidata are more complete than existing categories. The underlying queries show any and all items / articles. They may even show the red links and thereby provide a tool for article creation.

When data is added to Wikidata based on Wiki links. It occurs regularly that the link is wrong; it does not refer to the right article. Fixing this is easy and obvious except where there is no article in Wikipedia. The practice of redirect pages and disambiguation pages makes this complicated. It means that time is to be spend on Wikipedia even when this is not the concern. At that it would be much better to associate both wiki links and red links with Wikidata items. It will make Wikipedia more consistent because this is what is easy and obvious to do.

One additional benefit would be that it then becomes easy and obvious to query for obvious mistakes. This will improve Wikipedia quality even further. It will also improve Wikidata quality making this a win-win solution.

Using data to inform our readers

When Wikidata has more information than Wikipedia, it is possible to share this information. Obviously from a Wikidata point of view, bot articles are not to be desired. It is however feasible to generate information from available data. Particularly for English this has been done to good effect. The point is that our mission is to share in available knowledge and this will allow us to do more at that.— Preceding unsigned comment added by GerardM (talkcontribs)

"Wikidata has more data"? Wikidata has more items, enwiki has a lot more information. Dividing your data in mini-packages doesn't give you more data, just more packages.
"This is easy and obvious. Wikidata has more items than Wikipedia has articles. Wikidata knows about all the articles of all the Wikipedias and the other projects too. The result is that increasingly new articles in English Wikipedia get linked to items that are not connected to Wikipedia articles. These items exist to create links, they exist because they complete a collection of data. Given that the percentage on the amount of statements on items is going up, it means that one absolute measure of quality is improving." Nothing "easy and obvious" about this, some huge leaps to get from some basic facts to an unwarranted conclusion. To just point out the most obvious and rather huge flaw in this argument: since when is "more items" an "absolute measure of quality"? I note that Wikidata will import 1.2 million items from Quora. This means an absolute increase in quantity, and an absolute decrease in quality. Your "result", "increasingly new articles in English Wikipedia get linked to items that are not connected to Wikipedia articles.", is often not a benefit at all, but a problem (at least if these links would get into enwiki as well, that you want to pollute wikidata with items indiscriminately is a problem for wikidata).
"It means that time is to be spend on Wikipedia even when this is not the concern." Can you give some examples of what you mean here? I don't get this at the moment.
"When Wikidata has more information than Wikipedia": first make sure that it has more information, usually it doesn't (again, more items is not more information). Then make sure that the information is verifiable. "Particularly for English this has been done to good effect." Examples? The 30-something Listeria bot articles which were done against consensus and have now been disables with consensus? Or do you mean non-mainspace things? In general, if you make statements like you do here, it is best if you include some examples, so that we know what you are actually talking about and whether these things really are benefits or just wishful thinking. Fram (talk) 15:06, 14 January 2017 (UTC)
The point is that Wikidata knows about subjects that are red links in en.wp or articles in other Wikipedias. The last time I looked it had at least 40% more items with English labels than en.wp has articles. Many of them do fit in existing categories or lists. When we talk about what the aim is of what we do, it should be about how we are to share in the sum of all knowledge. I added women who are notable in US states as items and I find that they are now being added as articles. When we are to share in the sum of all knowledge, it is more than what any Wikipedia has to offer. Fram, your way of talking gives me the impression that you do not understand how both Wikidata and Wikipedia can benefit from each other. That is a shame because your questions are framed in an en.wp only point of view and that is neither neutral nor conducive to finding ways to collaborate and improve both Wikidata and all Wikipedias. Thanks GerardM (talk) 20:16, 14 January 2017 (UTC)
"women who are notable in US states"? That's a strange expression. I perfectly understand how Wikidata can benefit from Wikipedia. I understand how Wikipedia theoretically can benefit from Wikidata, but in practice I see much more disadvantages than advantages. In much of your arguments, you can replace e.g. Wikidata with IMDb: IMDb has many items which don't exist on enwiki, and enwiki could (and does) profit from IMDb to identify such gaps, and to find information. Nvertheless, it would be a very bad idea (and against opolicy) to generate lists or articles directly from IMDb.
"your questions are framed in an en.wp only point of view and that is neither neutral nor conducive to finding ways to collaborate and improve both Wikidata and all Wikipedias." No kidding? My questions are about the use of Wikidata on enwiki, not about the use of enwiki data on wikidata or the use of wikidata on other wikiversions. That doesn't make my question not neutral, they just are about a different issue than the ones you present. Feel free to start a discussion about how Wikidata can profit from enwiki (I see you have done just that below), but don't complain that a discussion is "not neutral" because it starts from a different problem and looks for answers to that only.
Finally, you don't seem to have replied to any points I made. "The last time I looked it had at least 40% more items with English labels than en.wp has articles." And? An item on Wikidata is not comparable to an article on enwiki, and Wikidata has completely different notability standards (meaning that many of these "40%" should never be an article here). You seem to have a strong focus on quantity, which is nice for a PR point of view but not so much for the building of a quality encyclopedia. I could just as well argue that looking at my previous example, Willy Vandersteen (who is hardly known in the English language world) has some 10 links at wikidat with an English label, while some 80 articles link to it on enwiki. So it seems to me that enwiki is a much more complete "web" of information, with every point on that web being much more complete, and often more and better sourced. Your "40% more Wikidata" seems to be "framed in a very no neutral way" and to be in the end completely meaningless. Fram (talk) 10:22, 15 January 2017 (UTC)

Data used across many articles

@Waggers:: can you give some examples of "Data that are used across many articles, such as population data or current representatives / ruling parties"[4]? In most cases, such information is used by three or four articles at most; and we already use enwiki templates to reduce the maintenance of such situations anyway. Fram (talk) 11:07, 13 January 2017 (UTC)

@Fram: Template:Infobox UK place is transcluded by 22,933 articles and Template:Infobox UK constituency by 1,932, and obviously that's just the UK on the English Wikipedia. While an individual data item might only be shared across a handful of articles, updating them all after a new dataset release or general election is almost insurmountable, evidenced by the fact I'm still seeing pages in my watchlist being updated with 2011 census data (previously showing 2001 census data) as recently as last week. You're right that enwiki templates potentially offer an alternative solution, but they're not really designed for handling data and each Wikipedia would have to manage their own version of such templates. In contrast, Wikidata is designed for handling that kind of data and makes it available to all Wikimedia projects. WaggersTALK 10:25, 16 January 2017 (UTC)
Thanks. Template:Infobox German location (and a few others) automatically takes its population from Template:Population Germany, which is fed by the German Wikipedia. This seems to work without problems. The advantage is that changes to the template:population Germany are tracked onwiki; if the data was located at wikidata, subtle vandalism could only be easily spotted there. At the moment, it seems to me that vandalism spotting at Wikidata is worse than at enwiki or dewiki (not that it is perfect here, far from it). Fram (talk) 10:35, 16 January 2017 (UTC)
Everybody can switch on showing Wikidata edits on the watchlist in the English Wikipedia (or in any other project, for that matter). Except for some bugs which are currently being dealt with, this seems to work and helps to find vandalism in the articles on my watchlist. The vandalism-fighting per se on Wikidata is obviously much weaker than on the English Wikipedia, since the number of items is bigger, and the active community is way smaller than here.--Ymblanter (talk) 11:46, 16 January 2017 (UTC)
I had it turned on, and have turned it back off for a number of reasons. I find the edit summaries to be very unclear. I get e.g. a number of edits to Wikipedia talk:WikiProject Cycling which turn out to be completely unrelated edits like [5][6][7]. If I had seen the actual page those edits were to, then I would have noticed that the new labels were nearly identical and that furthermore these pages were of no interest to me, and thus I wouldn't have need to check them. I do notice that someone added the native name for Peter Paul Rubens, referenced to ... the Russian Wikipedia, even though this value is different to the name used by the Dutch Wikipedia, who probably know better. I could now go and change this, or just shake my head and remember why I don't think using Wikidata is a good idea. I'll turn Wikidata changes on my watchlist back off now, as this is a waste of time for me. Fram (talk) 12:36, 16 January 2017 (UTC)

Please let us be considerate ...

When you look at both Wikipedia and Wikidata you will find problems in all of them. That is normal. The point is that both Wikidata and any Wikipedia serve the same purpose and it is to share in the sum of all knowledge. Yes, there are violations to be found in any project of the policies of the other project. The point is not that they exist, the point is that we have a way to deal with such problems. This is what talk and consideration of our POV is there for.

When Wikipedia is to use data from Wikidata, it makes sense for us not to vilify each other. The point is that in order to achieve our goal it is best for us to work together. Magnus did some research and it showed that Wikidata is improving a lot on quality by having more and more sources, he blogged about it (recommended reading). I have been scapegoated often for using category data from Wikipedia because it proved to be wrong. The fact of the matter is that our projects are works in progress. We should not use absolutes at each others because the only net result is that we do not achieve our mutual goal and fail to collaborate. Thanks, GerardM (talk) 09:37, 15 January 2017 (UTC)

It would be useful when you propose some "recommended reading" that you would link to it. Anyway, "When Wikipedia is to use data from Wikidata, it makes sense for us not to vilify each other." is again meaningless. First, replace "when" with "if". Second, it only makes sense to criticize wikidata just because we may (or some people do) use data from it or want to push it here. That's not vilifying (thanks for the very neutral description there). We shouldn't blindly import data just because we share the same goal (somewhat), seeing how our policies and approach differ. Wikidata can be useful for enwiki, but we should tread carefully. Fram (talk) 10:39, 15 January 2017 (UTC)
"First, replace "when" with "if"." No. You are ignoring the hundreds of thousands of articles on Wikipedia that - as I have recently explained elsewhere on this page - already import data (and here I exclude interwiki links) from Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:26, 15 January 2017 (UTC)
That would include those that worked perfectly allright before, but which for some obscure reason have been changed to get the same value (if it hasn't been vandalized meanwhile) from Wikidata? Apart from Authority Control, I see very little that actually gets taken from Wikidata which wasn't already present on enwiki. The added value of Authority Control (ugh, dreadful name) is something that can be debated as well (most of these links would be removed from External links or Further reading since they don't add anything at all), but at least it's something new. Fram (talk) 08:03, 16 January 2017 (UTC)
The first thing Wikidata brought Wikipedia is a replacement for the interwiki links. The result is a much improved process of maintenance and the work on statements makes disambiguation for these interwiki links also easier.. It proved to be a major improvement. Thanks, GerardM (talk) 11:55, 16 January 2017 (UTC)
Yes, and...? I haven't really noticed much difference in disambiguations since interwikis were moved to Wikidata. But Wikidata is perfect for interwikis, which are hardly data but the connection between wikipedia versions. I would have no problem with Wikidata if that had remained its scope. Fram (talk) 12:46, 16 January 2017 (UTC)

Proposal

At this time it seems people are up in arms because of their entrenched positions. So let us seek a middle ground first. Let us have Wikipedia record under water the links to Wikidata on every Wiki link and red link.

Objective

When any link internal to the WMF knows it has a connection to Wikidata, we can more easily make the links and use advanced tools like queries to check for inconsistencies and to increase the quality in all our projects. The tools necessary will be left to user initiative as is the norm.

Impact on current use

The behaviour and the way Wikipedia works remains the same for those who do not opt in.

Only for those who opt in a new red link will prompt to add to an existing Wikidata item or to add a new item. It will be possible to add statements and references. When a new link is created, the associated item will be added and only the possibility of adding statements is added. For existing links there will be a page that shows existing links and associated statements.

Arguably, this page may be there for everyone to see and, it should have an easy option to opt in or out for the full functionality.

"Let us have Wikipedia record under water the links to Wikidata on every Wiki link and red link." Why would we record the link to Wikidata on every bluelink as well? We already have an article, why clutter the pages with wikidata indications? As for the redlinks, why would we link to an unreliable wiki and not to any other site? We can add a tool comparable to what is used at AfD, with links to Google, Google News, ... and add wikidata there as well. Why would we see Wikidata as some superpriviliged partner that needs linking on every link we have? Just because it is also a WMF project? No thanks. Feel free to change Wikidata to have a "what links here" from all wikis, with an indication if it is a redlink or a bluelink, no one stops Wikidata from doing this. This would make more sense than putting this on every Wiki, adding "this exists on Wikidata!" to every bluelink and some redlinks on every page.
"a new red link will prompt to add to an existing Wikidata item or to add a new item." Again, why not do the reverse? Send editors from wikidata to enwiki (and the other language wikis), don't siphon editors away from enwiki to wikidata. It makes no sense to tell editors on what is normally the most visible page about a subject to edit a much less visible page instead (Google returns wikipedia pages first, and wikidata much, much lower in the rankings; and the number of pageviews is similarly different). All Wikidata does with this is duplicate effort. Which seems to be what Wikidata has largely become; a duplication of effort for little benefit. Still, anyone is free to edit Wikidata, just don't try to use enwiki as a massive recruiting ground in your "a middle ground proposal". Fram (talk) 10:39, 15 January 2017 (UTC)
You do not get "opt in". You do not need to see any link to Wikidata in Wikipedia. When you are just reading an article you will not be bothered with any of the information that is hidden. Even with red links everything will be invisible to you when you read or edit. When you opt in, there is the option to link to Wikidata. How that is for the UI people to decide but, the point is that it is there only for the people who care about quality and understand how Wikidata can help. Thanks, GerardM (talk) 11:54, 15 January 2017 (UTC)
By the way, this is not about recruiting at Wikipedia. Far from it. I often use Wikipedia data and I often do not edit Wikipedia because of all the hassles I get. When the work I do is in line with what I do at Wikidata, it is more easy, more obvious. When you consider duplicate effort, it works both ways. By making it easy and obvious how things are changed and why we both gain in quality. In the example of IMDB there is one big difference; Wikidata is linked to any and all Wikipedias and consequently when best practice is there for all of us we all win in quality. Again, without adding anything obvious for those who do not want this. Thanks, GerardM (talk) 12:19, 15 January 2017 (UTC)
You are mixing the confusing and the insulting here. "it is there only for the people who care about quality": I think most people who commented here and elsewhere and are (very) critical of Wikidata and its role on enwiki, do so just because they "care about quality". For you, on the other hand, the only indication of quality you have discussed so far is the superior quantity of Wikidata, and then only if you count articles vs. items and somehow pretend that this count is in any way meaningful.
In the past I blogged about errors in Wikipedia where I found an error rate of 20% in the links. What I want is tools to fix them. This is what my proposal is about. Just have us work on links and red links. It will improve the quality in both our projects and if this is not your thing, do not be bothered having us fix things. My proposal is to grow together, if you do not want infoboxes for now, I do not care. It is probably not the time yet. Thanks, GerardM (talk) 07:08, 16 January 2017 (UTC)
I think I can safely ignore you now. You took one article, and found an error in 2 of the 19 links, and then claim as title "#Wikipedia - a 20% error rate" and in the body "With such statistics it is obvious to make the argument that replacing links with links through Wikidata will enhance quality in the English Wikipedia.". For starters, 2 out of 19 is not 20% at all, it is just a bit more than 10%. Then, using one article is not "statistics", it is "anecdote". Finally, you then, just like you do here all the time, make the totally unwarranted claim that "replacing the links with links through Wikidata" would enhance quality miraculously, without any indication that this is actually true. I notice that you didn't edit the Wikipedia article, but at the time of your blog post added some items to Wikidata (like [8], making this a self-fulfilling prophecy and an even less reliable blog (should I write a blog about items missing or wrong at Wikidata but better at enwiki, if I first make sure that these "facts" are true by editing enwiki in this regard). Basically, you are making your own truth and then present it to the world as if it are facts, with a rather extreme calculation error which just happens to make your point twice as strong. And your change to Wikidata has not improved enwiki at all, leaving the errors in an article Fram (talk) 08:22, 16 January 2017 (UTC)
I am not talking about replacement. What I am after is that at first the links include both the item and the article. There is nothing miraculous in what I propose. It is how we can use the tooling that is possible through Wikidata.
When I write my blog, it is because I care about what we do. Our aim is to share in the sum of all knowledge and I know we can do a better job. The problem is that the arguments are there and never mind if it is 10 or 20% it is by that number that we improve that one article. If that is not relevant, what is? Thanks, GerardM (talk) 12:00, 16 January 2017 (UTC)
"I am not talking about replacement"? My apologies then, it turns our that that blog post is even more useless than I thought, as a sentence like "it is obvious to make the argument that replacing links with links through Wikidata will enhance quality in the English Wikipedia." for some reason gave the very strong impression that you were actually talking about replacement. Perhaps first make up your mind what it is you actually want. I can't really understand what it is you are trying to say with the rest of your reply, all I notice is that you noticed a few problems in one Wikipedia article, then went to add some items to Wikidata, only to able to then tell us (and this is for once not a quote but a paraphrase) "look how poorly enwiki is doing and how much more Wikidata has on the same, aren't we superior?". No, you aren't, not by a long stretch. Perhaps try to explain instead how the addition of the wikidatalink (before you actually added the wikidata item to prove your point) would have prevented or corrected any of the problems in that one article you used to miscalculate statistics. Fram (talk) 12:46, 16 January 2017 (UTC)
"understand how Wikidata can help" That's what this discussion is trying to do. All you seem to offer is "Wikidata has an item, and if it hasn't you can create it". The question remains: how does this help enwiki? "this is not about recruiting at Wikipedia", but in the previous post you said "a new red link will prompt to add to an existing Wikidata item or to add a new item." Prompting people to edit Wikidata = recruiting at Wikipedia, no? You "often do not edit Wikipedia because of all the hassles I get. When the work I do is in line with what I do at Wikidata, it is more easy, more obvious." Fine, my experience is the opposite, but that's personal preference I presume. "When you consider duplicate effort, it works both ways." But this isn't true of course. You have to edit enwiki anyway, as there is a lot that is needed and wanted here but which can't be put in Wikidata (and Wikidata isn't intended to be "read" as an article anyway). E.g., as far as I can see, Wikidata only wants entries for the children of someone if these are also notable (otherwise they shouldn't get an item). But in a Wikipedia article, you often mention all the children someone had, whether they were notable or not. Enwiki simply has much more detail. So editing enwiki without editing Wikidata makes perfect sense; but the other way around is a waste of time, adding something there when you notice it missing here is a bit pointless. "By making it easy and obvious how things are changed and why we both gain in quality." After four years of Wikidata, how much quality gain has enwiki had from it? There is authority control, for what it's worth; and very little apart from that. Why would we beleive that somehow Wikidata will make enwiki qualitatively so much better in the future when it hasn't delivered on that promise until now? Please present some actual examples, not just some vague hopes.
"In the example of IMDB there is one big difference; Wikidata is linked to any and all Wikipedias and consequently when best practice is there for all of us we all win in quality." How? Best practice at enwiki and other wikis is not the same as best practice at Wikidata. Wikidata accepts findagrave, quora items, ... so it doesn't look to me as if we share "best practices" at all, or that Wikidata ssets an example of quality we should import or follow. That Wikidata is linked to all wikipedias is not a measure of quality, it is just a result of its first purpose, being a container for interwikilinks. But that container has spawned a monster which seeks a purpose to give any value to its size and effort (and cost? No idea how much Wikidata has cost so far).
We are working for the readers, not for ourselves; and the readers are at enwiki, not at Wikidata. If Wikidata can truly improve enwiki, then feel free to give us actual examples of this. Just indicating that "Wikidata has more items and is linked to everything, so it is better and produces quality" is to me at least far from convincing, and doesn't match what I see when I go to Wikidata. Fram (talk) 21:07, 15 January 2017 (UTC)
"The behaviour and the way Wikipedia works remains the same for those who do not opt in." - this is 100%, absolutely, demonstrably not true. To take even a small example- any article that has an infobox that is now pulling data from wikidata has the (strong) potential to pull in unwanted data, not to mention incorrect data. The article writer didn't have to "opt in"- infoboxes are "opt out" to begin with, and you also have to be editing the articles in question after the infobox got changed under you. Just as an example- I rewrote and got to GA Hugo Award back in 2011. I did not put in a country parameter in the infobox- and why would I? The award is a multinational, purportedly worldwide award, that happens to have it's trademark-holding body based in the US, so listing a "country" would be misleading at best. Not every field needs to be filled. Oh, but look! Apparently someone, at some point, added country=US into wikidata (source: Wikipedia), so... now it shows up in the infobox. Not my watchlist, of course, or the page history- wikidata edits get special privileges in that regard. The only way to get rid of it is to make a new edit to opt out of that one wikidata item, or to delete the incorrect wikidata item- though, of course, wikidata tries to shame you into not doing so and gives no indication at all that trying to delete it twice will have it work the second time.
I'm all for turning wikipedia's structured infobox data into a searchable database. But if wikipedia pages are going to be based on that database (in part), then edits to that database need to be tied in completely to wikpedia, not just halfheartedly imported. This is looking like just one more example of the wikimedia project's knack for taking really good ideas and failing them with poor technical execution. --PresN 03:16, 16 January 2017 (UTC)
When you restrict yourself to my proposal; ie having items associated with wiki links and red links only, you will not see any results if you do not opt in. Thanks, GerardM (talk) 07:02, 16 January 2017 (UTC)
But you still haven't explained why we would (opt-in only) add a a pointer to Wikidata (which certainly for bluelinks seems completely superfluous anyway, who cares on enwiki whether we have an item in Wikidata when we have one here), and not e.g. a pointer to Google Books, Britannica, IMDb, whatever, since each of them may be more useful and/or more reliable than Wikidata. On Wikidata, it yesterday took more than an hour before anyone notices that their page on Superman (not really an obscure page on enwiki) has been "moved" (renamed) to "UGLY". Why would we want to point to (never mind import data from) a site which is still so easily vandalized and clearly much less well patrolled? Fram (talk) 07:52, 16 January 2017 (UTC)
When you suggest that I do not care about quality for both our projects, you are a fool. When you suggest that Wikidata is no different than IMDB among others, you are foolish. As I showed before, with my proposal you do not need to experience how we will make a difference, the only question is will you let us do a better job for both our projects. Thanks, GerardM (talk) 12:03, 16 January 2017 (UTC)
This seems to be a reply to another post? I didn't say that you didn't care about quality here(although your actions show that when you identify problems on enwiki, your only care seems to be to improve Wikidata to prove the point that it is better somehow), I said that you seem to consider quantity an absolute indicator of quality, which is nonsense. I also didn't say that Wikidata isn't different than IMDb, just like I didn't say that IMDb, Britannica and Google Books aren't the same (which would have been a foolish claim indeed). What I said was that there are a lot of sources we could link to, and you single out Wikidata for no clear reason, even though it is an unreliable site where most of the information comes from Wikipedia to start with. "As I showed before, with my proposal you do not need to experience how we will make a difference, the only question is will you let us do a better job for both our projects." You haven't shown anything so far, actually, you have just claimed that your proposal will somehow make things better for enwiki (it may make things better for wikidata, by indicating which bluelinks don't have a wikidata item already, but that is not what this discussion is about). "will you let us do a better job for both our projects." If you can convince me that you will do a better job for enwiki, yes, why not? So far nothing you said has given me the idea that you actually will do a better job for enwiki though, you haven't given any examples of actual improvements which would happen through your proposal. Just give a few concrete examples of how enwiki would benefit from your proposal. Fram (talk) 12:56, 16 January 2017 (UTC)

What is the goal of this RfC ?

What is the goal of this RfC ? Is it a real discussion where we list problems and find solutions or a list of complaints from persons who don't want to change their habits ? We had already two big RfCs in the German and the French WPs and both communities were able to use WD with some rules. Why WP:en is so special to not be able to do a similar process ?

Some mains problems with WD and possible solutions:

  1. Problem: WD has a large amount of unsourced data or data sourced from other WPs. Solution: Just use data from WD with have a source which is not a WP. You can even be more selective and require that only reference like books, databases, newspaper or scientific articles have to be used if you don't trust data from websites. Data from WD have different qualities but you can filter the data to take only what you want. So bad quality data is not an obstacle because you can analyze the data quality before the data display and you can discard it.
  2. Problem: Data can be different between the ones written in the wikitext and the ones imported from WD. Solution: Display the source of the data in order to allow the readers to understand why the data are different (different sources so different values).
  3. Problem: Changes in WD items are not integrated in the watchlist of WP contributors allowing change in WP articles without any notice. Discussion: This is the same situation for pictures from Commons or for templates in WP:en and this was never a problem until now. If this is really a problem then the same conclusion should be applied to Wikidata, Commons and WP:en templates. Solutions: Reduce use of WD data in some particular environments like templates, infoboxes, tables and graphics. Then use the option of the watchlist which allows to follow the items linked to articles listed in the watchlist.
  4. Problem: WD is an external environment with a dedicated interface. WP contributors don't want to use WD or find the WD interface difficult. Solution: Use only templates which give priority to local data. Templates have to be able to display data present in the wikitext of the WP article first and then only if no value is found then data from WD can be used. WP contributors add data in the wikitext using the old system, then this data will take priority to WD data.
  5. Problem: New templates using WD data are ugly. Solution: Not a WD problem, this is an internal process of WP. You have to define a process to discuss and authorize the use of new templates, but this should already exist. If not don't blame WD.

As summary I just want to add that WD is not forcing you to display some data. WP:en can choose through the data selection what is displayed (you can even specify that only data from predefined books or databases can be displayed). The above solutions will solve most of the current problems and will allow WP:en contributors to use WD. WD has to improve its protection systems against vandalism and the data quality but this is ongoing. WP articles don't need to reach a FA-Class some days after their creation so if you allow contributors to start with stub articles why do you have a problem with the teething problems of WD ? Snipre (talk) 23:53, 21 January 2017 (UTC)

This isn't an RfC. The intention of this discussion is to inform a big RfC that will determine consensus on how enwp uses and doesn't use WD - in other words, the "rules". Nikkimaria (talk) 00:18, 22 January 2017 (UTC)