Jump to content

Wikipedia talk:Wikidata/2017 State of affairs/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 5

T152743

Might want to mention phab:T152743 somewhere as it has the ability of loading us with significant responsibility for Wikidata content if done improperly. Jo-Jo Eumerus (talk, contributions) 15:13, 11 January 2017 (UTC)

Wikidata is used when specifically requested on an article (opt-in)

@Fram: {{Video game reviews}} has an opt-in style, in that the parameters mc and gr can specify "wikidata" to pull from there. -- ferret (talk) 15:14, 11 January 2017 (UTC)

WP:VG guide to Wikidata

I'm unsure if this would be useful for the goals of this page, but I wrote a guide for WD at WP:VG/WD. Much of my Wikidata knowledge came from template/module work with a lot of help from Izno and RexxS. -- ferret (talk) 15:31, 11 January 2017 (UTC)

Wikidata community notified

I've posted a neutral notification about this page, on Wikidata: d:Wikidata:Project chat#English Wikipedia RfC on Wikidata planned. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:54, 12 January 2017 (UTC)

Identifiers and other databases

IMO, one of the major advantages of Wikidata is the ability to interface with (or more precisely to reference) other databases. Good examples of this are found in the 'Identifiers'. A lot of the data can be imported, but some gets added manually. An example is the identifier I added to the Wikidata entry for Margaret Thatcher here. This allows Wikidata to function as a central repository for such data, making it easier to handle it cleanly independent of, and outside of, Wikipedia. Having said that, some of the larger Wikidata pages are quite large now, such as Winston Churchill. Another example is the Commonwealth War Graves Commission IDs, an example of which is here. Wikidata may also make it easier to track use of external links to such databases that are used in references. One example (an idea for a project that will take a long time) is here, but you could do that for almost any set of external links referencing a database. Carcharoth (talk) 17:21, 12 January 2017 (UTC)

Request to add category

I can't edit this page. Can someone add Category:Wikidata? ---Another Believer (Talk) 21:57, 12 January 2017 (UTC)

 Done--Ymblanter (talk) 22:00, 12 January 2017 (UTC)

Integrity and quality of the summary

Fram Thanks for compiling this. I have seen many of these issues here and there but it is interesting to me to see a compilation like this. It is cleverly presented. I am impressed with the choices of subsection headings, because although the headings seem obvious after I read them I was never thinking of these features in this way before.

You said here and at WP:PUMP that this information could be used as a guide for future policy discussions. I appreciate what you did here, and so far as I can see, everything seems correct and in order, but may I ask if you can think of a way to arrange additional review of this compilation from knowledgeable people? I would not know offhand who should be asked, but what I am hoping for is for a few people to each say, "I know something about this because I have been involved in these projects... I looked over this page and it seems like a good summary to me... These parts are not controversial... This part is somewhat controversial... use it as a summary to help people join conversations." I am shy about saying too much because some of these conversations have been deep and I am not so aware of who the most involved participants have been in these many issues which each have their own story and history.

I wish to ask for addition opinions because this is a lot of information, past RfCs related to Wikidata have been confused, and if possible I would like to increase the amount of trust that people have in this list. Thanks. Blue Rasberry (talk) 15:28, 11 January 2017 (UTC)

I am happy to go over it with someone if you want me to. --Lydia Pintscher (WMDE) (talk) 01:55, 13 January 2017 (UTC)

No discussion on main page please

The intention was that both sides (and everyone inbetween) could list their real or perceived benefits and disadvantages, not that every entry would turn into a yes-no shouting match. An easily readable list of what people think about the use of wikidata on enwiki, no matter if they are right or wrong.

One can endlessly argue whether "Wikidata is easier to train new users " or not. Most enwiki users never were "trained", they just "did". I went to Richard Burton (an example given by another editor), noticed that the spouses were listed in a seemingly random order (neither alphabetical, chronological, or by length of marriage) and tried to reorder them. I can't find a way to do that. On enwiki, this would be easy. If such things are not in a logical order, it's much harder to check them for completeness (with five entries, it's still OK, but with 20 entries it would become very hard). I then looked for Willy Vandersteen, and noticed as the second entry in the search box this, which is to me, as a new Wikidata user, a completely useless entry which I would like to nominate for deletion (I'm also unable to find it at the supposed location anyway[1]). Does this option even exist there? Anyway, going then to my actual target, Willy Vandersteen, I notice that his notable work is "Q15874400". Really? How useful. This links to [2] which is one comic from a less notable series, created by his studio, but far from his most notable work. As a new user, I can edit/add other works, but have no idea whether I now have to add such an esoteric Q number as well, or whether I may simply add "Suske en Wiske" (or should I add "Spike and Suzy"? No idea). Listing all his comics would create a list of 500+ Qnumbers... His "languages" are wrong (only listing his second language), but here at least I presume I could correct it easily if I wanted to, although presumably his first language (Dutch) would be listed second then.

As an experiment, I tried to add his spouse. When I just give a name, I can click "save". However, I wanted to add a reference, a book. All I get, no matter what I try, is "no match found", and I can no longer save the property. So either I will add an unsourced property, or I won't add the property at all, if my only source is a book? Congratulations, you have trained this new user to avoid Wikidata at all costs. This also contradicts the "Citations are more robustly added" claim, as they are actually harder to add.

Should I add all this to the claimed benefits "easier to train new users" and "Citations are more robustly added"? It would make that section harder to read and to use afterwards.

Please, either start a discussion section below the list of benefits/disadvantages, or bring the discussion here. Fram (talk) 08:24, 12 January 2017 (UTC)

  • @Fram: Did you ever ask for help? --Izno (talk) 12:30, 12 January 2017 (UTC)
    For my enwiki editing? Not for such simple tasks, no (as far as I remember; I can't find cries for help among my first edits, which include creating new pages and so on). For more complicated template tasks I probably have. Fram (talk) 12:39, 12 January 2017 (UTC)
    For your Wikidata editing. Perhaps that was not implicit in my question. --Izno (talk) 12:42, 12 January 2017 (UTC)
    That's my point... I didn't need to ask for help for my enwiki editing, but can't see how to do some pretty simple things at wikidata. On enwiki, I would look at the wikisource, or at a diff (ah, "this" change in the page is caused by "that" diff). On Wikidata, I can see very little that is useful in a diff. In the help pages, I can find that to use anything as a source, you first have to add it as an item. However, the help page on items then tells me that I may only add notable things (things with a page on another wikimedia site) as items. So a book that doesn't have a page anywhere else yet (or a newspaper article or a paper in a journal) may not be added to wikidata, and thus can not be used as a source. Luckily that page (which should be one of the central help pages of Wikidata!) is "outdated"... Luckily we have the Wikidata Notability policy? Reading this, it appears that I still am not allowed to make an item for my source (if I was so inclined, I have long since given up in reality), as it doesn't have a "valid sitelink". Fram (talk) 14:16, 12 January 2017 (UTC)
    Items for sources are always provided for since that is a "structural need" (review N#3), though it seems you have identified at least one inconsistency. Have you asked about those inconsistencies prior? It sounds like you are experienced wiki-text user, and so coming to Wikidata is contrary to your wiki-expectation. What about completely new users? Do you think Wikidata or wiki-text would be easier to figure out? Is there, say, a question of age, or gender, which may impact that answer? --Izno (talk) 14:32, 12 January 2017 (UTC)
    Certainly a woman my age can't possibly know how to use Wikidata!! Victoriaearle (tk) 14:45, 12 January 2017 (UTC)
    @Victoriaearle: Certainly, indeed. ;^) --Izno (talk) 14:52, 12 January 2017 (UTC)
    Apologies for being sarcastic. I think Wikidata is opaque and counterintuitive but the suggestion about age & gender shouldn't be part of this discussion imo. Victoriaearle (tk) 15:37, 12 January 2017 (UTC)
    For me at least, wikitext was much easier to figure out than Wikidata (and wikitext can even be edited by an IP!). Some things (like correcting a typo) is probably easy and perhaps even easier on Wikidata. But the vast majority of things seems to me easier on enwiki. No idea what age or gender have to do with this though. Fram (talk) 15:15, 12 January 2017 (UTC)
    @Fram: The demographics of (en?-)Wikipedia are well-researched; editors are predominantly young affluent white men from first-world countries. My suggestion is that editing Wikidata may have similar demographics, or different demographics, for whom Wikidata is easier to edit. Toss in the delta between wiki-editing and Wikidata-editing and that's another divide of users potentially. --Izno (talk) 15:33, 12 January 2017 (UTC)
    Nothing in this pre-RFC is attempted at stopping anybody from editing wikidata. I don't see the relevance of this for this discussion at all. Will enwiki attract more non-white, non-first world, and/or women editors because they now have to know and edit two environments when they want to create or significantly edit an enwiki article? I seriously doubt this. Will wikidata have more editors from these groups? No idea, no interest, and hardly relevant for here. Or are there editors who want to add content to enwiki without actually editing enwiki? I don't think that will be seen as a benefit of wikidata by many... Fram (talk) 15:40, 12 January 2017 (UTC)
    All I'm suggesting is that you (and you, Victoria) may be part of some demographic who finds Wikidata "hard" to intuit (whether you're in that demographic by choice or not). I personally find it "easy". The current main page in both the "(dis)advantages" sections (and the user story above) cannot categorically be said to be the experience of all, much less many, though perhaps some, users. I think right now it's very much needlessly subjective statements being made in those sections which may be particular to certain users or demographics of users, and that we should change how we're approaching those sections. --Izno (talk) 16:27, 12 January 2017 (UTC)
  • Well this is rather the problem. See proponents of wikidata basically deny the disadvantages exist or are the fault of everyone else, the fault of the use of wikidata, the disadvantage doesnt actually exist etc. So far they cant even admit the basic fact that when wikidata is being imported into articles directly its defacto being used to source content. So literally anything negative that is put on the front is going to be haggled over. Only in death does duty end (talk) 12:43, 12 January 2017 (UTC)
    @Only in death: Jo-Jo Eumerus's contributions list on Wikidata is bereft of anything but page deletions and moves on en.WP. Are you suggesting he is a Wikidata proponent because he also contested your definition? Am I solely a WD proponent? Is RexxS? Categorizing people seems entirely at cross-purposes to the stated goals (and certainly advertised purpose!) of this page. --Izno (talk) 13:28, 12 January 2017 (UTC)
    @Only in death: If you substitute "Wikimedia Commons" for "Wikidata" in this so-called problem you talk about, you will find that this reflects the situation back in the days when Commons was young. It's early days, and until you get your feet wet in Wikidata, you should cut it some slack. Jane (talk) 13:30, 12 January 2017 (UTC)
    How long will Wikidata be considered "young" and "early days"? It's more than 4 years old now, and has been in use here for nearly as long. Until it is mature (if ever), it should be used here hardly if ever (just as valid an argument as "you should cut it some slack" just because it is "young") Fram (talk) 14:16, 12 January 2017 (UTC)
    Just as long as the anti-Wikipedians, or the anti-Commonists, so at least twice as old as it is today. As in all things on Commons, some parts of it are more useful for Wikipedia than others (and the other way around). Jane (talk) 14:39, 12 January 2017 (UTC)
    I don't understand. What does "as long as the anti-Wikipedians" mean? Which "anti-Wikipedians"? Fram (talk) 14:52, 12 January 2017 (UTC)
    I meant the anti-Wikipedians on Commons. There is a lot of wikihate between these projects, and a lot of it has to do with the problems of multi-lingual communication. Because of the multi-lingual interface on Wikidata, I find there is a lot less animosity in general there. Jane (talk) 16:14, 12 January 2017 (UTC)
    Your facts are a bit incorrect regarding English Wikipedia. Wikidata (phase 1--inter-language links) was first adopted here by this bot task in February 2013 shortly after the announcement of phase 1. Phase 2 (data use) was first adopted here in April 2013. Non-substantial use began almost immediately with the various and sundry external links templates (if not earlier--I'm sure we can dig). However, I'm not aware of any Wikidata use in e.g. infoboxes or lists until Mike Peel started working on {{infobox telescope}} in June 2015 (and even then it was predominantly testing on isolated pages). Perhaps there was an infobox or list page using Wikidata earlier, but the claim "It's more than 4 years old now, and has been in use here for nearly as long" does not tell the whole truth. --Izno (talk) 14:43, 12 January 2017 (UTC)
    Which parts are incorrect? February 2013 = nearly 4 years, which is what I said. I didn't claim that all uses of Wikidata here dates to that period of course. You could have added your timeline without the rather insulting "incorrect" and "does not tell the whole truth" bits. If tomorrow a new use of wikidata is started here, it doesn't magically make Wikidata on enwiki brand new again. Fram (talk) 14:52, 12 January 2017 (UTC)
    The comment to which I was responding seemed intent to mislead the innocent reader into thinking that all of the current functionality was implemented 4 years ago. It seems fairly evident to me that the interwiki linking portion is solidly mature (does anyone dispute that management of interwiki links is better off centralized?), and that the reason some other portion is not mature is because users here have not worked here to adopt Wikidata. --Izno (talk) 15:24, 12 January 2017 (UTC)
    "Seemed intent to mislead". Great AGF there. The immaturity of wikidata is now apparently the fault of the users here (at least, that seems the intent of your "fairly evident" post)? You really are not going to convert anyone to wikidata with such statements. Speaking of "intention to mislead", I've again removed the "official support" on phabricator for wikidata lists on enwiki bit on the pre-RfC page, it strongly suggested that somehow the use of such lists was officially supported by phabricator, which is a) not true and b) not authorized to support or oppose such things anyway. I'm just trying to protect the innocent readers here of course. Fram (talk) 15:44, 12 January 2017 (UTC)
    I'm not trying to say that Wikidata is immature and that's Wikipedia's fault (there's a separate discussion) but instead that "and has been in use here for nearly as long." is a statement about the maturity of Wikipedia's use of Wikidata's data, and that it's important to understand the context of that statement. Indeed, perhaps I should not have been so aggressive. --Izno (talk) 16:16, 12 January 2017 (UTC)
    @Izno: look at when Module:Wikidata was created, that will tell you when @RexxS: and others started looking at wikidata-driven infoboxes. I came along a bit later and was focused more on trying to have 'complete' infoboxes from Wikidata. Thanks. Mike Peel (talk) 16:03, 12 January 2017 (UTC)
    @Mike Peel: I'm aware that it was created at an early date, but I doubt that it was used to any great extent outside of the external-links and identifier templates before you set to making infobox telescope work. --Izno (talk) 16:16, 12 January 2017 (UTC)
    That's perfectly true. I have always tried simply to create the code needed for others to use in upgrading infoboxes. Just for info on the timeline, I created a demo of a self-populating infobox at Template:Infobox video game series/Wikidata following the extended discussion at Wikipedia talk:Wikidata/Archive 2 #Auto-population of wikidata into infoboxes in August 2013. --RexxS (talk) 17:18, 12 January 2017 (UTC)
  • @Fram: First, for people who are principally used to using word processors (most of us), I would expect editing Wikitext to be easier than editing Wikidata. For those who are used to dealing with databases, I would expect editing Wikidata to be at least as easy, if not easier. If you spend time teaching new editors how to edit, you'll see how steep the learning curve is for Wikipedia. There is a similar set of things to learn about Wikidata which may not be obvious to anyone not used to working with databases. The fact that Richard Burton's spouses are stored in a random order in Wikidata's underlying database is not surprising; I do find it a pity that the interface you see when viewing Wikidata does not sort them into some sort of order, but it's likely that the perceived benefits of doing that were outweighed by the effort involved when the interface was designed. A phabricator ticket might get that addressed at some point.
  • Secondly, the philosophy of Wikidata is that items in the database should link to other items in the database wherever possible (cf. Wikilinks). That means you may have to add a book as an item before you can use it as a reference. At least having done that, the next person who wants to use it as a reference will be able to use it immediately. Optionally, you can use ISBN as the property for the reference and the book's ISBN number as its value.
  • Thirdly, Willy Vandersteen (Q543908) has a notable work, a strip comic called De koekjesclan ("The Cookie Clan") in Dutch. If you view that Wikidata entry, d:Q22259605, you'll see that whoever added it did not supply an English label, although you can see the Dutch label by clicking on [All entered languages]. An article exists on the comic, but only in Dutch: nl:De koekjesclan. So without an English article or an English label, what do you expect to see when referring to the entry? You get Q22259605, as that's all there is to give you. The onus is on English-speaking Wikipedians to give it an English label. Why don't you try doing that, rather than the knee-jerk reaction of deletion? It's obviously not going to be deleted because it's notable and has an entry in Dutch. The rest of the world doesn't owe English any special treatment.
  • Finally, please feel free to add to Willy Vandersteen (Q543908) - you'll only improve it. Add "Suske en Wiske" or "Spike and Suzy" by all means; they are just different language labels for the same Wikidata item, Spike and Suzy (Q1240821). Add all 500+ comics if you're so inclined. A query on that could easily sort them into whatever order is wanted for use outside of Wikidata. Sorry for being so verbose, but you raised so many points. I just hope what I've written helps. --RexxS (talk) 16:49, 12 January 2017 (UTC)
    • About your first point, I'm well used to working with databases (mainly SQL ones). How things get stored there (e.g. the order of things) should have no impact on the user facing side of things (like the order of spouses, or the order of the propoerties on a page: it is on long pages hard to see which properties are there and which are missing (and which are near-duplicates).
    • I'm not really interested to turn this into a Wikidata tutorial, I just wanted to show some very basic aspects of it which are really not all that simple and straightforward. Forcing people to create another item (without any obvious indication for new editors that this is required) just to able to source something is more than enough to turn away many would-be contributors (most of which will come from a wikipedia anyway). Wikidata as it stands is very newbie-unfriendly, much more so than Wikipedia (which has its fair share of problems in that regard as well). This extends to things like the seearch function, where you look for a word and get results like Q123456.
    • Third: you miss my point: that one is not a "notable work", it is a work first published in 2012 (nearly twenty years after his death), created by his studio (not by him). It is close to being the least notable work one can find which is related to him. But even so, seeing the Dutch label would have been a lot more helpful than getting a Q number. "The onus is on English-speaking Wikipedians to give it an English label. Why don't you try doing that, rather than the knee-jerk reaction of deletion? It's obviously not going to be deleted because it's notable and has an entry in Dutch. The rest of the world doesn't owe English any special treatment." Speaking of knee-jerk reactions, perhaps reread my post first? I never suggested deleting this entry (although "it's notable" is very debatable, "it has been published and is listed at comic shops" is about the extent of its notability). I suggested deleting [3] as an unverifiable item (with no entry on any wiki as far as I could see). But my comment about the Wikidata entry, and your reply, relly show one of the main problems with wikidata for me. I am not interested in maintaining the information on Willy Vandersteen on two sites, with two different cultures and interfaces. I would define his "notable works" as his four or five main series, not his 1000+ individual comics (and I would certainly not start with one of his least notable, which just happened to be recently published: spam?). If we would populate the infobox field "notable works" on enwiki with the Wikidata values, this would make a serious difference. But one of the problems with the Wikidata concept is that in this case, his most notable works in Germany are completely different from his most notable works in Belgium.
    • About your final point: thanks, but why would I? People who are interested in Willy Vandersteen read his entry on Wikipedia, not on Wikidata. Adding 500 or 1000 comics to the Wikidata page would be a lot of work for very little or no benefit at all. Adding his five main series but leaving that one anomalous recent comic would create a very lopsided effect. Removing that one comic would probably be frowned upon (I have no idea of the Wikidata culture and little interest in getting to know it if some of the zealots here are representative of it). So I just leave it alone and hope that as little as possible of it will ever impact the articles here. Fram (talk) 08:14, 13 January 2017 (UTC)

Citation needed!

It would be really useful for there to be citations, evidence, or pointers to other discussions rather than lines of accusation. There is an amount of commentary that is unsupported, and as such should not be given weight. We can all make claims without evidence, and I would think that our practice would be to not do that. Much of the accusation thrown at Wikidata is the same accusation that has been thrown at Wikipedia about accuracy somewhere in the continuum of each's life cycle. So where such accusations should be fairly given, the repatriation should also be noted, how it was undertaken and any lessons learnt. — billinghurst sDrewth 12:09, 13 January 2017 (UTC)

Feel free to list here all commentary you perceive as unsupported. I presume much of what you refer to are the benefits and disadvantages sections, which are intended to contain opinions, which shouldn't be ignored even when they don't have a suorce attached or haven't been previously discussed. Ignoring the popular opinion about such things is what has lead to the failure of multiple WMF projects here; whether e.g. Visual Editor is better or worse for newbies than wikitext is an opinion which is very hard to provide evidence for, but if enough people believe it to be worse, it remains a niche product or (like Flow and Gather) gets disabled completely (I know that Wikidata is not so much a WMF-pushed project here, it's only given as a comparison). So, discussion, evidence, ... is all very welcome, but not giving weight to unsupported commentary would be a rather shortsighted thing to do. Fram (talk) 12:49, 13 January 2017 (UTC)

Claim that Wikidata overrides enwiki

The heading "Wikidata overrides enwiki" and the material currently below it are misleading if not false; for example a local value in {{Authority control}} overrides any value in Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:57, 12 January 2017 (UTC)

Then correct it! Pages that used Listeriabot certainly had that problem, and I have seen plenty of suggestions to do the same for other templates and pages (but perhaps none have been implemented so far). Fram (talk) 14:00, 12 January 2017 (UTC)
I was bold when Andy made the comment. --Izno (talk) 14:13, 12 January 2017 (UTC)
@Fram: Listeriabot does overwrite the tables it generates, but the wikitext once you start editing very clearly states NOT to edit the table but to edit the underlying data if you want to make changes, at least in the case I have used it (for a test on wikidata, not on enwiki, so maybe the template works differently here?). It would really be helpful if more enwiki and other wiki people felt free to make those edits (with their sources!) at wikidata. Is there something that could be done to make this easier? Doing it at wikidata makes the information available to wikipedias in every language, not just enwiki, so you are greatly multiplying the impact of your edits whenever you do something at wikidata. ArthurPSmith (talk) 15:23, 12 January 2017 (UTC)
The fundamental problem with that implementation of ListeriaBot (and who-knows regarding the WMDE implementation of the same) is that you can't add sometimes-necessary parenthetical statements to understand inclusion of a list item, or other clarification about some data regarding a particular list item (ref List of multiplayer online battle arena games). Basically, I can't "qualify" on Wikipedia with information that Wikidata can't (or mostly, won't) support from a community standard there. --Izno (talk) 15:30, 12 January 2017 (UTC)
(ec)"It would really be helpful if more enwiki and other wiki people felt free to make those edits (with their sources!) at wikidata." This is about how wikidata is used "on enwiki". It may be really helpful "for wikidata" if these edits are being made there, but at the deletion discussions for these pages, people have actually done this and still concluded that they much preferred the lists to be purely enwiki (as they are now). Everyone is free to reuse enwiki data on wikidata and on other wikipedia versions of course, but I (and apparently many others) am not inclined to edit both enwiki and wikidata whenever I add or expand an article, to check which elements belong here and which belong there in the Wikidata philosophy. If I have the choice between editing two different environments or one, then I choose one. And since I can add complete articles in enwiki, and only some elements in Wikidata, I choose enwiki. If I then come across articles with errors, where I am instructed to edit these at wikidata instead of at enwiki, then Wikidata doesn't seem like something beneficial I want to edit, but a hindrance.
Creating such a list on the talk page, or on a project page, to see which entries we (enwiki) are missing, is of course something completely different. In that case, wikidata is an editing aid, not an editing hindrance. Fram (talk) 15:34, 12 January 2017 (UTC)
It's true that it would be helpful if editors here felt some association with the Wikidata entry that corresponded to the articles they are interested in. That's because they would be the best people to steward those Wikidata entries. But it's not compulsory, and in a volunteer project, editors must feel free to prioritise how they use their time. Nevertheless, there will be other editors who think that spending time improving Wikidata is a useful exercise purely because it potentially benefits every Wikipedia, and en-wp gets the benefit as well. I wouldn't want to discourage them for doing that, just as I wouldn't want to give Fram the impression that he has to fix all the problems with Wikidata himself. Eventually somebody will do whatever job is needed. Incidentally, I do strongly disagree with using a bot to overwrite existing entries on English Wikipedia with Wikidata-generated data. If we want list articles based on Wikidata, we need to wait for the development of the Wikibase API to allow us to create those lists directly and dynamically from Wikidata, with local values taking precedence, in the same way as we do for infoboxes. --RexxS (talk) 17:03, 12 January 2017 (UTC)
I tried this and it was an interesting, though not very enjoyable, experience. After seeing Wikipedia:Articles for deletion/List of women linguists I went through about the first half of the list considering each person having a linked article on enwp. Where it seemed to me (frankly subjectively) that the subject was not a linguist by profession I removed the occupation from the Wikidata page. See my 28 December 2016 edits at https://www.wikidata.org/wiki/Special:Contributions/Thincat and my comment at Talk:List of women linguists#What's a linguist?. I really wish I had been able to go to a relevant talk page on Wikidata to explain my actions. I think the underlying problem was that someone had categorised translators as linguists.[4] This may be due the editor not being a native English speaker but the word is admittedly ambiguous even to a native speaker (unless tied closely to "occupation"). A large number of people were wrongly described. However, where the people did not have articles on enwp (or were not linked as such) I could not face delving into foreign wps to see what was going on there. In a very few cases I removed an inappropriate linguist category from enwp. Meanwhile the enwp list article had seriously, seriously inappropriate contents with a bot continuing to overwrite people's well-meaning edits to the wp article. This was the effect of my changes. No Wikidata enthusiast seemed willing to do anything other than alter the bot's detailed working such as removing from the list people who were classified as both translators and linguists. This was also inadequate because some linguists are indeed translators. No one edited Wikidata to reflect back the changes being made to the list which were being reverted by the bot.
I agree that dynamically generated lists would be an excellent facility.
Some unsolicited suggestions for Wikidata enthusiasts: I find that Wikidata editors are too tightly focussed on acquiring data by automated and semi-automated processes and are much less interested in checking individual details. The referencing and commenting on edits is very weak by wp standards so other folk cannot adequately review what has been done. Also, I suspect that once data has been "automatically" acquired it is not being automatically reviewed. See wikidata:Wikidata:Project chat#Item without sitelink with imported from (P143)= wikisomething... where error corrections were being reverted. It is good to see people so enthusiastic about Wikidata but this seems to be leading to strident rather than thoughtful advocacy on Wikipedia and this is doing more harm than good. Thincat (talk) 23:23, 12 January 2017 (UTC)
"I really wish I had been able to go to a relevant talk page on Wikidata to explain my actions" Given that every item on Wikidata has a talk page, just like every article on Wikipedia; and that Wikidata has "Project Chat", an equivalent to our "Village Pumps", what prevented you? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:34, 13 January 2017 (UTC)
Thank you. I have asked at wikidata:wikidata:Project chat#Linguists and I hope you will be able to contribute to a substantive reply. There are too many item talk pages to post in those locations. Thincat (talk) 13:22, 13 January 2017 (UTC)

Data quality comments

My Wikidata activity has tended to be occasional adding of interwiki links. After reading this discussion, I examined the Wikidata entries for a couple of political philosophers. There were errors on both datasets: (1) Adam Ferguson's Place of Birth was asserted as Perth rather than Logierait (the drive between these being around half an hour on the modern main road). This error had been distilled from "Logierait bei Perth" in the de article text, presumably because it lacks an "Atholl" article to provide context for Logierait (and Atholl according to the en article was not even in Perthshire at Ferguson's birth). So an incorrect positive assertion had been introduced from imprecise text in another Wikipedia. (2) The data on Max Stirner was asserting "Stirner" as Family Name, whereas it was a nom de plume behind which Johann Kaspar Schmidt sought shelter. So it appears a data upload did not cope with someone primarily known through a pseudonym and turned it into an incorrect positive assertion. Both individuals are of course long gone, so there are no WP:BLP issues, but such errors may be indicative of wider Wikidata capture issues which could affect use of the datasets. (Incidentally, the lack of a proper edit summary on Wikidata feels weird after more than a decade of providing rationale for edits on Wikipedia.) AllyD (talk) 12:10, 13 January 2017 (UTC)

How fortunate that Wikidata is a wiki, and you were able to fix Ferguson's birthplace (in Adam Ferguson (Q183094) - please give links when referring to pages other than this one). The second of your examples (Max Stirner (Q76725)) would appear to be semantically correct; the name "Max Stirner" has two components, one is a given name ("Max") and the other a family name ("Stirner"), even if they are pseudonymous. That said, it might be better to split the item, and have one for the real person and another for the pseudonym. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:23, 13 January 2017 (UTC)
Wouldn't diffs be even better than links? You can see that a reference was added here. Then we could actually talk about referencing standards at Wikidata. Similarly, this diff removed a data point. Not even en-Wikipedia has worked out yet how to reference a removal, though maybe it might be an idea to reference negatives. Carcharoth (talk) 14:43, 13 January 2017 (UTC)

Disadvantages of using Wikidata on enwiki

  • Wikidata is not as inviting to untrained new users as Wikipedia
  • Fear of something new
  • Patrolling on enwiki is not integrated with patrolling on Wikidata
  • Fear of corruption
  • Wikidata content is of variable reliability. By its nature (scooping data from multiple wiki-projects each with individual varying standards) it cannot be treated as a reliable source. There is still currently problems with circular sourcing (Wikidata taking data from Wikipedia which is then being used to fill-in content on Wikipedia). Only in death does duty end (talk) 15:42, 11 January 2017 (UTC)
    • "It can't be treated as a reliable source" <- where is it being used as such? --Izno (talk) 16:09, 11 January 2017 (UTC)
      • Whenever it is being used to insert content on ENWP articles that is unsourced in the ENWP article (this is not a hypothetical, this is actually happening). Wikidata is not a reliable source. Data *in* wikidata may be reliable depending on where it is from and how it was verified. Because Wikidata rips data from a wide selection of locations, those projects have wildly different sourcing requirements (compared to our WP:V, WP:RS etc requirements) so data cannot be trusted unless it is manually checked against our requirements. If it is incorrect, it may still be overwritten on Wikidata from elsewhere, Wikidata itself does not have the same policies we do etc etc. The 'list' above Fram was referring to was where some bright spark thought it was a great idea to have a bot create list articles in article space that had zero references, contained biographical data about living people (some non-notable) entirely ripped from Wikidata data. Only in death does duty end (talk) 16:24, 11 January 2017 (UTC)
        • I understand a reliable source to mean a work I am using to source information from, not the information itself (ref WP:V#Reliable sources for other definitions, though the one I am using suffices for all three in this case). We are adding the information itself to Wikipedia (through Wikidata), regardless of the source of that data. Do we share a definition of the meaning of reliable source? If we do, then it is not being used as a reliable source, and especially not in the sense that most people use the term "source" or "reliable" (as in relation to a work cited for some interesting content). Does that make sense to you? (I do not comment on whether the data itself is coming from some root reliable source.) --Izno (talk) 16:38, 11 January 2017 (UTC)
          • The 'source' of information is whatever is being used as a source to add content. As a matter of practice, the information being added to wikipedia is coming from wikidata. It is irrelevant what the ultimate source is, much like we (rarely) question reliable secondary sources on their information, because they are WP:RS. Wikidata is not. Only in death does duty end (talk) 17:01, 11 January 2017 (UTC)
            • Perhaps worth noting that Wikidata is technically content and not source for Wikipedia when used here. To make an analogy, we are allowed to include templates and images in Wikipedia articles even though they are not usually reliable sources, but they need to be themselves reliably sourced. Meaning that the lacking sourcing culture on Wikidata is a problem. Jo-Jo Eumerus (talk, contributions) 17:20, 11 January 2017 (UTC)
                • No its not, thats the main problem. 'Content' is what is on the wikipedia article at ENWP. 'Source' is where that content came from. Wikidata is a repository of content, but once that is used on Wikipedia, it becomes the source for Wikipedia. Wikidata's lack of a vigourous sourcing culture is because it sees itself (correctly) as a content depositary and other projects as the source of its data. That data may or may not be reliably sourced, depending on where it came from. Only in death does duty end (talk) 17:28, 11 January 2017 (UTC)
                  • No, that's quite wrong. The statement that Richard Burton's occupation is an actor may be fetched from Wikidata, but the source is the NY Times, a reliable source for that fact. Wikidata is never the source of the facts. Now it's true that there is no reference for Burton being a stage actor, i.e. nobody has added where that fact came from, but that doesn't make Wikidata the source – it's just the medium where the information is stored. Also, you may find it alarming that Burton is also described as a film actor with the reference being "imported from Dutch Wikipedia", but that doesn't make either Wikidata or the Dutch Wikipedia the source for that anymore than the paper that the NY Times is printed on is the source for him being an actor. --RexxS (talk) 18:05, 11 January 2017 (UTC)
                  • The source is wherever the information was sourced from. For content that is inserted on wikipedia, if the content comes from Wikidata, wikidata is the source. Unless an explicit reference is provided for where the information is actually sourced from. Only in death does duty end (talk) 18:17, 11 January 2017 (UTC)
                  • No, the source is where the information originated. For content that is inserted on Wikipedia, if the content comes from Wikidata, the source is wherever the editor who added it to Wikidata sourced it from. Whether or not they bothered to explicitly make a note of where that was or not, Wikidata is never the origin of the information. --RexxS (talk) 18:29, 11 January 2017 (UTC)
                    • Maybe Wikidata is properly considered a republisher or content deliverer (not quite the same thing)? See the 'via' parameter used in the {{citation}} templates:

                      via: Name of the content deliverer (if different from publisher). via is not a replacement for publisher, but provides additional detail. It may be used when the content deliverer presents the source in a format other than the original (e.g. NewsBank), when the URL provided does not make clear the identity of the deliverer, where no URL or DOI is available (EBSCO), if the deliverer requests attribution, or as requested in WP:The Wikipedia Library (e.g. Credo, HighBeam). See also registration and subscription parameters.

                      Carcharoth (talk) 14:31, 13 January 2017 (UTC)
                      • Not a republisher at any rate. Wikidata allows free input of data, some of which is taken directly from other sources, some of which is crowdsourced. They don't simply republish pages from other sources, but collect, enrich (does "enpoorer" exist?), and add new information. Highbeam etc. are comparable to a library; a library doesn't alter the books (the content, they often alter the binding and so on). Fram (talk) 14:43, 13 January 2017 (UTC)
  • Let me reformulate it: I do not think we have a good workflow on Wikipedia to fight vandalism on Wikidata. Much of the vandalism gets reverted within a day, but it still hangs out for a day.--Ymblanter (talk) 19:45, 11 January 2017 (UTC)
  • There are obstacles to verifying a lot of the information on Wikidata because much of it is unsourced or the reference is "Imported from Xyz Wikipedia". In either case, it requires effort to track down the original sourcing if it needs to be verified. --RexxS (talk) 18:11, 11 January 2017 (UTC)
    • And that is a major issue with the one of the ways data has been imported into Wikidata, machine readable fields from infoboxes etc have been lifted from a wiki into Wikidata without the underlying reference (if any) also being taken over, the deprecation of {{Persondata}} being a great example. Much information in Persondata, at least here on en-wp, was sourced but within the article text not the template but because it it easy to get a bot to lift information from a template only the facts not the sources were taken. As far as I can tell, any verifiable source is only being added to Wikidata as a manual addition. I don't think this is a drawback of Wikidata just the way that some have chosen to populate it with data. Nthep (talk) 18:52, 11 January 2017 (UTC)
      • To the best of my knowledge, no data from {{Persondata}} was ever imported into Wikidata by bot, because Persondata was too unreliable. The data there was invisible to the reader so was hardly ever checked for typos, transcription errors, etc. against the main text. --RexxS (talk) 19:08, 11 January 2017 (UTC)
        • WP:Persondata#FAQ provides information regarding that migration process. --Izno (talk) 19:20, 11 January 2017 (UTC)
          • And that is the problem, the Persondata information was migrated to Wikidata but not the sources and is, in some cases, being presented back to Wikipedia without any check that it is accurate. Like I said this isn't an issue with Wikidata but with a way it has been populated which, IMO, reduces the credibility of Wikidata content. Nthep (talk) 19:29, 11 January 2017 (UTC)
            • I'm pretty certain anybody who reads the WP:Persondata#FAQ (thank you, Izno) will come to understand that any of the Persondata that found its way into Wikidata was via an editor who assumed responsibility for checking the accuracy of the data. "The Wikidata community had concerns about the quality of the Persondata dataset. Therefore it was decided to do the migration semi-automatically. For every piece of information in the [KasparBot] database a user needs to manually decide if adding this data would increase the quality of Wikidata." Personally, I have far more concerns about unreferenced data that came via other routes. YMMV. --RexxS (talk) 19:51, 11 January 2017 (UTC)
            • A lot of persondata was not only unsourced, but in disagreement with the sourced data in the article body & infobox. Unlike with Wikidata, there were no categories to highlight such discrepancies. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 12 January 2017 (UTC)
  • As others have said, the lack of reliable sources is the major issue. It means that Wikidata edits violate WP:V and WP:BLP. It makes no sense to import unsourced data, in particular unsourced data from smaller wikis (fewer eyes) to larger ones. We have worked long and hard to introduce and maintain good sourcing on enwiki. The Wikidata edits are turning back the clock. SarahSV (talk) 20:31, 11 January 2017 (UTC)
    • Following your prompting about the problems with {{Infobox book}}, I've created a means where only data sourced to something better than a Wikipedia is returned from a Wikidata call. I imagine that a filtering system will be needed for importing Wikidata into any of the large Wikis. --RexxS (talk) 20:49, 11 January 2017 (UTC)
      • The lists of ... articles were imported without sourcing. That was stopped just a day or so ago by Fram. And I've seen unsourced dates of birth added to other wikis via Wikidata. The key thing is Wikidata's spread of unsourced material. There's increasing concern about nonsense on the Internet, fake news, etc. Wikidata is out of sync with the Zeitgeist. SarahSV (talk) 21:15, 11 January 2017 (UTC)
        • I've seen lots of "lists of ..." articles created with poor or non-existent sourcing, and I've seen many unsourced dates of birth (and dates of death and places of birth, etc., etc.) added to articles before Wikidata was even dreamed of. Of course, it's fashionable to blame Wikidata for sourcing problems, but the key thing is actually the misuse of Wikidata that's the problem, not the technology itself. Editors spread unsourced material not a website. --RexxS (talk) 01:12, 12 January 2017 (UTC)
          • Seeing a rebuke by the person who started this page about not putting "discussion" on the main page, so I make a comment here.WP's own article on Wikidata says "According to Wikimedia statistics, half of the information in Wikidata is unsourced. Another 30% is labeled as having come from Wikipedia, but with no indication as to which article." WP articles are not reliable sources for other WP articles, so doesn't that make about 80% of information from wikidata unreliable according to WP:RS and should not be used on WP articles? That information in the WP article is referenced to a piece called "Unsourced, unreliable, and in your face forever: Wikidata, the future of online nonsense".[5]I agree with the other comments that this is a very serious issue of unreliable information being added to the English WP. Looking for the first time at something on wikidata, a subject I know quite a lot about, the opera composer Verdi [6], it starts off by saying that in "traditional Chinese" his name is "vive verdi" (what?), goes on to identify him as an "instance of a human", lists awards he received (quite obscure ones, most unreferenced, none of which are mentioned in the WP article on the composer, but fails to mention that he was inducted into the French Legion of Honour, which is in the WP article). It then lists ten of his twenty-nine operas as his "notable work." Who decided that the other nineteen are not notable, and it is absurd that it does not include his constantly performed "Requieum" as "notable". It says he spoke Italian, only language given, but I am sure he spoke French also. It says he was a member of the Royal Swedish Academy of Music and says there are zero references for that information, I have never read that before, highly dubious. Then there is a lot of "identifiers" and gobbledeegook which I have no idea what it all is. I would say that page is absolute dross.Smeat75 (talk) 20:05, 12 January 2017 (UTC)
            • "half of the information in Wikidata is unsourced" That includes the same things that we have on Wikipedia, without requiring sources, like subject's website URLs, VIAF & ISNI IDs for people, ISBN numbers for books, images, and even gender (when did you last see the string he{{Cn}} in Wikipedia?). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:23, 12 January 2017 (UTC)
        • "increasing concern about nonsense on the Internet, fake news, etc." Does that include the nonsense and fake news on this page? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:23, 12 January 2017 (UTC)