Jump to content

Wikipedia:Village pump (idea lab)/Archive 57

From Wikipedia, the free encyclopedia

Search bars within the article

This is just an idea I had just now that there should be little search bars in different writings such as articles, category pages, lists etc. I feel it would be more effective and easier to use Wikipedia and so you can search for exactly what you want instead of having to scroll for ages, (depending on length). Thoughts?

。 🎀 𝒫𝓊𝓇𝓅𝓁𝑒𝓁𝒶𝓋𝑒𝓃𝒹𝑒𝓇𝓂𝒾𝒹𝓃𝒾𝑔𝒽𝓉𝓈 𝟣𝟩 🎀 。 (talk) 01:00, 19 April 2024 (UTC)

It's already (kind of) a thing with Ctrl+F! Chaotıċ Enby (talk · contribs) 01:10, 19 April 2024 (UTC)
oh okay 。 🎀 𝒫𝓊𝓇𝓅𝓁𝑒𝓁𝒶𝓋𝑒𝓃𝒹𝑒𝓇𝓂𝒾𝒹𝓃𝒾𝑔𝒽𝓉𝓈 𝟣𝟩 🎀 。 (talk) 01:52, 20 April 2024 (UTC)
I think we should just leave that to the browser. Aaron Liu (talk) 02:31, 19 April 2024 (UTC)
This discussion is a bit old but I will note that the wikipedia app has a search function, if that's any help. ¿VØ!D?  19:49, 26 April 2024 (UTC)
As others have said, this can be done with the browser's inbuilt search function. However, very few users seem to know about that feature, which is a problem especially for longer web pages (as many of our articles are). A second, Wikipedia-specific problem is that (by default) it does not work on mobile web because the sections after the lead are collapsed. Regards, HaeB (talk) 21:10, 26 April 2024 (UTC)
@HaeB A belated reply, but on Chromium-based browsers searching actually does work in collapsed sections since the mobile site uses hidden=until-found. Unfortunately this isn't implemented in other browsers yet [1]. the wub "?!" 13:51, 3 May 2024 (UTC)
Oh interesting, thanks. (I do in fact use Firefox for Android and had double-checked my pre-2022 recollections there before posting my above remark; but good to know that this is no longer an issue for a large share of mobile pageviews.) Regards, HaeB (talk) 14:52, 3 May 2024 (UTC)

Rapid archiving of talk pages

I find the general push to rapidly move talk page discussions out of sight into an archive sub-page as being one of the more unfortunate developments on Wikipedia. "Rapid" to me is anything less than about 10 years old. To clarify:

  • Some small number of pages actually need rapid archiving due to sheer volume of posts combined with rate of posts.
  • Most pages do not rapid archiving. It's fine to keep old discussions from 5 or 10 years ago so long as the page does not exceed a certain length. And even then, only the minimum number of the oldest posts should be archived to keep the page below a certain size.

This situation arose without design or intent. Some programmers made some archive tools and designed some algorithms. These algorithms have in turn significantly impacted the social discourse on Wikipedia. By effectively "blanking" talk pages, nobody will post follow-ups, add new information, etc.. it squelches conversation and ruins one of Wikipedia's most interesting features. Sure, it's possible to re-enable a discussion but hardly anyone ever bothers to do that. How often I have seen an archived thread I want to reply to, but don't bother copying it back to the main talk page, where it will just get sent back out of sight soon enough anyway.

I have no idea what the solution is, and I'm sure there are a million contrary opinions to mine. Most talk pages don't need these automated tools at all, or if so, a page-size based algorithm not date based. -- GreenC 14:55, 1 May 2024 (UTC)

"Rapid" to me is anything less than about 10 years old.
You obviously speak English. Are you using the word "rapid" correctly? Are we talking about geology here or what? Levivich (talk) 15:17, 1 May 2024 (UTC)
I would say that "rapid" in [article]added for clarity talk page terms means anything less than the age of Wikipedia. Why archive if the talk page is not too long? Phil Bridger (talk) 17:57, 1 May 2024 (UTC)
"Why archive if the talk page is not too long", is an excellent way to put it. Or In a Nutshell: archive when the talk page is too long. -- GreenC 15:52, 4 May 2024 (UTC)
I wonder whether WP:TALKCOND would be the more appropriate place to have this discussion. FWIW, based on that guideline, the age of a thread is less relevant than the size of the Talk page in question. I'm curious as to which Talk pages are of particular concern here. DonIago (talk) 15:53, 1 May 2024 (UTC)
TALKCOND is new to me, but appears to offer limited guidance. I don't see anything about size vs date? In terms of a particular page, I came across one (I no longer remember) that had maybe 50 threads from the earliest days of Wikipedia to the near present. The entire thing was archived, and now the talk page has a single recent post. Twenty some years of interesting and relevant discussion removed from general view. This is not anomalous. -- GreenC 15:50, 4 May 2024 (UTC)
I made an addition to TALKCOND. -- GreenC 16:01, 4 May 2024 (UTC)
It's very dependent on the talk page, obviously editors can archive their talk pages however they wish. For other talk pages it should be down to size, some pages would quickly become difficult to load without archiving while others may never need it. -- LCU ActivelyDisinterested «@» °∆t° 19:24, 1 May 2024 (UTC)
I do agree that page size based archiving makes more sense than date based. Maybe the documentation and examples for the bots could be updated to discourage aggressively expiring talk page sections. Barnards.tar.gz (talk) 19:46, 1 May 2024 (UTC)
Thank you for the feedback. -- GreenC 15:44, 4 May 2024 (UTC)
Just to clarify your train of thought here, GreenC, are you thinking of article talk pages or user talk pages? Or do you think that this hesistation to archive should apply to both?
My attitude in regards to archiving my talk page has changed over the years – I used to be really proud of this ongoing 100,000 byte conversation I had on mine that went on for more than a year – but then I realized it caused some loading issues and distracted from the newcomer links I was trying to emphasize on my talk page (this became really apparent in person when I was literally sitting right next to a newbie IRL and I wanted to show them my talk page). Lately I've been trying to minimize the amount of old discussions I keep just for the sake of it for this reason. I can stroll down memory lane whenever I want so it's not a big deal. However, I've always done this manually. I'm curious in regards to the perspective of editors who use bots or blank their talk pages.
My attitude in regards to article talk pages is receptive to the slow the archiving idea. I think it depends on a lot of factors but I agree with the premise that the average talk page doesn't have to be archived constantly because sometimes issues identified years previously are still of current relevance to the article. Sheer size definitely plays a role. It might be worth considering whether our increased reliance on automation to make these sorts of calls might not be nuanced enough in all situations? I'm under the impression that bot-managed archiving is strongly preferred to manual archives but that factors such as the rate of archiving can be decided upon by editors to tell the bot what to do. Clovermoss🍀 (talk) 15:34, 4 May 2024 (UTC)
Mainspace pages. The bot archiving tools mostly operate based on date based on what I see, it's possible some operate based on page size I'm not sure. -- GreenC 15:44, 4 May 2024 (UTC)
My approach to talk page archiving and cleanup is topical. Some talk pages posts are ephemeral and time-gated such as notifications for votes or backlog drives. I usually clear these quite quickly. Other common posts are monthly projects newsletters and I usually just keep the latest one.
Another big class of posts are the egoboo -- barnstars and other accolades such as DYK awards. I like to keep these in specific archives but it's quite a chore to maintain these.
Actual talk is comparatively unusual and I curate this carefully and slowly as it's interesting to revisit such conversations and see how the issue is progressing.
It would be good if there were archive tools that helped in maintaining topical archives but I haven't found one yet.
Andrew🐉(talk) 19:45, 4 May 2024 (UTC)
I get where you're coming from, but I also find it super annoying to scroll past 10+ old posts to get to something that actually might need a reply. Most article talk pages consist of a small core of regular editors responding to posts by a shifting cast of people who drop by to complain or suggest something. The former don't need to see the same stale discussions again and again, and they're the ones that set up archiving. The latter are slightly more likely to be interested in old posts, but mostly just want to say their piece. Probably the situation would have been much better if we'd gone for top-posting instead of bottom-posting, but here we are. – Joe (talk) 20:17, 4 May 2024 (UTC)
And then we have the editors who drop in on the talk page and respond to a 10 or 15 year old post. It is rare that such a response will help improve the article. Donald Albury 20:26, 4 May 2024 (UTC)

ITN reform

I am opening this discussion to invite suggestions and proposals to reform WP:ITN in anticipation of an RfC. Such reforms are long overdue: ITN has effectively shut itself off from the rest of the site as a walled garden and have developed their own system of rules that conflict with sitewide expectations, creating multiple problems:

  • The inclusion criteria are entirely subjective and based on the personal whims and opinions of participants. Editors at ITN routinely use original research to determine "significance", applying their own analysis of each situation. Weight in reliable sources is not a major factor in whether ITN considers something to be "significant".
  • ITN flouts community norms around consensus. Discussions are typically closed as head counts, without weighing arguments in regard to the application of policy. "I like it" votes are given equal weight in discussions. The discussions are also closed before consensus has time to develop: no other part of the project would dream of letting a discussion without a clear consensus be closed in under a week, but ITN's nature requires that they be closed in a few days at most. Many discussions are closed after just a few !votes one way or the other.
  • ITN requires a fast turnaround, sometimes as short as a few hours. In addition to contradicting WP:DEADLINE, this creates rushed work and prevents in depth review. Nominal quality checks are done to make sure citations are present, but the window is short and most participants only evaluate "significance". This results in articles that are not only not ready for the main page, but ones that are unstable as they are oftentimes recently created, subject to early reporting errors, and undergoing significant changes.
  • Since arguments about personal beliefs and opinions are built into ITN's processes, it is inherently less civil than other parts of the project. Over the years, drama at ITN has rivaled most CTOPs, to the point that applying general sanctions to ITN itself has been discussed in the past.
  • The arbitrary selection of news stories (with a major systemic bias toward elections, sports, and mass-casualty events) misrepresents the overall state of what's actually in the news. Pushing this bias onto our readers does them a disservice.

These problems are intractable with ITN in its current form. I am asking for suggestions on how it can be reformed, or if that fails and there is consensus to abolish it entirely, how it can be replaced.

Examples of reforms:

  • Remove the significance requirement entirely and include any article that is the subject of a recent news event.
  • Require that a story receive significant coverage in a certain number of countries or a certain number of newspapers of record
  • Promote articles based on trending topics (with oversight to prevent gaming the system)

Examples of replacements:

  • Use the space to display several short blurbs for good articles each day, like smaller versions of WP:Today's Featured Article
  • Use the space to provide links on how to edit and to help users find where to start in order to recruit more editors
  • Move the "Other areas of Wikipedia" section of the main page into the space for easier navigation

These are ideas that have been suggested in the past. This is not an exhaustive list of examples, nor do I personally consider all of them viable. I'm requesting input from the Village Pump so we can develop additional suggestions and get a general idea of what the community thinks about them. Thebiguglyalien (talk) 18:48, 30 March 2024 (UTC)

Two issues with points raised:
  • On the fourth point related to arguments: the only time general sanctions have been applied is when the topic itself falls into areas where general sanctions have already been applied (like AP2, etc.) There have not been any sanctions applies for topics outside these areas. So that's just how general sanctions should be applied, and not an ITN issue.
  • On the fifth point, WP is not a newspaper and we absolutely should not share topics to match what is big in the news. Not every news story that gets a short term burst of coverage deserves a WP article (per NOTNEWS, WP:N and NEVENT) and ITN reflects that.

Remember that the main page as a whole is meant to showcase articles that represent some of the best of WP. Replacing it with a list of top stories in newspapers or with popular topics won't work at all, because not all those topics meet the main page requirement. — Masem (t) 19:17, 30 March 2024 (UTC)
I would also offer a suggestion that the ITN box include a permalink to the Wikinews sister project, which may be more attractive to those potential editors who get confused or put off in their first reverts that we are WP:NOTNEWS. SamuelRiv (talk) 15:09, 5 April 2024 (UTC)
Back when Wikinews was a news site, we had such a link. Wikinews is far too slow to be useful these days, possibly because of too much focus on quality given the size of the userbase. Overall, Wikipedia is a much better news site than Wikinews ever was, whatever WP:NOTNEWS says. —Kusma (talk) 17:22, 26 April 2024 (UTC)
I wonder if there's even a consensus that ITN needs to be changed, before we even start talking about the specifics of how to reform it. But I do not see any indication on the above thread that such a consensus exists. If there is, please feel free to share the diff, otherwise I feel like this will just go around in circles again as such discussions always do. Duly signed, WaltClipper -(talk) 13:13, 8 April 2024 (UTC)
I think that (much like changing the MP) consensus could be obtained for the general idea of changing ITN but it would break down when specific proposals are offered. I now believe that it should function more like RD with a reduced or nonexistent "super notability" significance requirement- but I don't think that would gain a consensus.(I've discusses it here previously) 331dot (talk) 15:12, 18 April 2024 (UTC)
Just a thought on "article quality" as a criteria: maybe it shouldn't be one. Perhaps a purpose of ITN should be to direct readers to articles that are currently hot topics in the hope that the extra attention results in improvements. There's no better opportunity to improve an article than when lots of people are interested in it. Barnards.tar.gz (talk) 14:18, 18 April 2024 (UTC)
Content on the Main Page is supposed to demonstrate high quality articles and some of WP's best work. We can't remove the quality aspect with changing that aspect of Main Page. — Masem (t) 18:59, 18 April 2024 (UTC)
I get that ITN has its problems, but I don't get why it gets singled out so much when the other sections of the main page all have their quirky ways of working. ~~ Jessintime (talk) 17:19, 18 April 2024 (UTC)
ITN has actually been working quite fine recently, in my opinion. I am not sure this is needed anymore. Curbon7 (talk) 18:34, 18 April 2024 (UTC)
I'm still in favor of just removing it (make TFA 2 columns wide on desktop), and then if people want to propose adding something to the main page where it used to be, discussing that separately. I've written enough about why in the previous rounds of this discussion so I won't repeat myself here (tldr: because it's bad at its job, irredeemably and structurally). Levivich (talk) 11:03, 26 April 2024 (UTC)
The OP is correct in pointing out that ITN violates numerous policies including WP:NOTNEWS, WP:NOTHERE, WP:NEWSSTYLE, &c. This might be excusable if it worked but it clearly doesn't. Right now, the lead blurb is about the 2024 Solomon Islands general election. Those happened over two weeks ago and so the topic is quite stale as news. And, as the Solomons is a minor nation with less than a million population, it wasn't really in the news in a big way to start with. Meanwhile, we have ongoing elections in major English-speaking countries like India, the UK and US but ITN is ignoring those.
The next blurb which will replace this non-event is likely to be a routine horse race while a trailblazing moonshot is also ignored.
These hiatuses and bizarre blurb choices are normal at ITN. Other main page sections are not nearly so dysfunctional as they manage to post fresh content every day with better quality and variety.
There are numerous ways that ITN could be improved, reformed or replaced but structural change is quite gridlocked too. Tsk.
Andrew🐉(talk) 11:11, 5 May 2024 (UTC)

In-page attribution of authors/contributors

Something I've been thinking about for a while is how Wikipedia could provide better attribution for the contributors of its articles. After all, attribution is a key part of our license, but the authorship of an article is very much hidden away out of sight. In order to see the authorship of an article, one first needs to go to the "View history" tab, then click on the "Page statistics" external link, which redirects to an entirely different website, and even then you need to scroll down in the page before seeing authorship details. It also appears to only be visible in the web browser version, while it seems completely absent from the mobile app version. This presents a pretty unintuitive set of hoops for readers to jump through in order to discover (and attribute) the authorship of various articles. Even as a regular contributor, I didn't know this tool existed until a year or so ago.

This means that, to the lay reader or content re-user, all of our articles might as well be written by a monolithic "Wikipedia", or maybe a vague gesture at "Wikipedia contributors". Contrast this with other prominent encyclopedias like Britannica, which display the primary contributors to an article quite prominently beneath the article title and list secondary contributors in an easy-to-find section in the article history.

I was thinking that, either in the main page or at the top of the "View history" tab, it may be worth including such a list of contributors. It could be as simple as listing the primary author (with the most percentage/characters contributed), followed by any secondary contributors (with >10% contributed), followed by an "et al", which could itself link to further information about the article's authorship.

I think this would be a very important fulfilment of our own license's terms of attribution, both for in-wiki use and for anybody that might be reading or thinking about reusing the article's content. It would be a step away from the monolithic conception of "Wikipedia contributors". It could also provide a greater sense of impact for the articles' authors themselves, who would be able to easily see the fruits of their labour on screen. Of course, I understand this may come with its own drawbacks. I know some editors prefer the anonymity, while others may be worried that this could encourage low-effort edits in order to farm credit. But I personally think that the potential benefits of such a credit feature would outweigh the potential costs.

Having looked through the village pump archives, I'm confident this isn't a perennial proposal. I only managed to find one discussion of such a feature, which was opened by @Doc James almost a decade ago, but it didn't seem to gather a clear consensus and I'm not sure if anything ended up resulting from it. If anyone has any comments on this embryonic proposal, I would be happy to hear from you. --Grnrchst (talk) 14:27, 2 April 2024 (UTC)

@Grnrchst as an example, could you produce what you would expect the output of such to look like for this page? — xaosflux Talk 14:32, 2 April 2024 (UTC)
It's an odd example, because this is a discussion page, but it would currently be something like "Written by WhatamIdoing, Xaosflux, et al." --Grnrchst (talk) 14:43, 2 April 2024 (UTC)
So you'd be fine ignoring the thousands of other authors on such a byline? — xaosflux Talk 14:58, 2 April 2024 (UTC)
I don't think it's feasible nor particularly useful to include every single contributor in a byline, but I don't think they should be entirely ignored either, that's why I've included the "et al." (could also say "and others", or something). The reason I set the byline inclusion at >10% is because that's the rule of thumb used by the good articles project to determine major contributions. I think named inclusion in the byline should be limited to such major contributors, but linking to total authorship in the "et al." would also be useful for showing the full scope of contributions. --Grnrchst (talk) 15:05, 2 April 2024 (UTC)
I don't think there is going to be any good way to programmatically determine those values. In your example above, what calculation did you use to determine that WhatamIdoing and I were >10% contributors for example? This is primarily why the cite this page tool example just says "Wikipedia contributors". (see more below) — xaosflux Talk 15:17, 2 April 2024 (UTC)
I was using the "Top editors" section in the Xtools page I linked to in the "et al." As this is a discussion page, rather than a mainspace page, Xtools doesn't show an "Authorship" section in this case (hence why I didn't think it was a good example). Whereas in the one for Morgan Bulkely, there is an authorship section that shows Wehwalt at 79% of authorship, Real4jyy at 6.2% of authorship, etc. The authorship tool on Xtools is apparently powered by WikiWho, which we may be able to use for generating such a byline. --Grnrchst (talk) 15:25, 2 April 2024 (UTC)
As for the "Cite this page" tool, I think this is an example of how just vaguely citing "Wikipedia contributors" is unhelpful and even redundant. Of course it's authored by Wikipedia contributors, it's a Wikipedia article! --Grnrchst (talk) 15:51, 2 April 2024 (UTC)
Another example, using today's featured article Morgan Bulkeley, would read: "Written by Wehwalt, et al." Grnrchst (talk) 14:47, 2 April 2024 (UTC)
I was thinking this "Written by [...]" could go next to the bit where it says "From Wikipedia, the free encyclopedia". --Grnrchst (talk) 14:51, 2 April 2024 (UTC)
In the mobile view, we do advertise the "last" author (e.g. see the bottom of this page) - that could possible be ported to desktop. — xaosflux Talk 15:00, 2 April 2024 (UTC)
I think the "latest contribution" is a good measure of recent activity, what I'm aiming at with this proposal is trying to demonstrate a broader scope of total activity. --Grnrchst (talk) 15:07, 2 April 2024 (UTC)
  • Note, a feature request that may address this idea is open at phab:T120738. — xaosflux Talk 15:18, 2 April 2024 (UTC)
  • Note that we currently have https://xtools.wmcloud.org/articleinfo/en.wikipedia.org/C418_discography and Who Wrote That which calculate the percentage, the latter using an API that does it Aaron Liu (talk) 15:20, 2 April 2024 (UTC)
    I think you'd want to use the WhoColor API (which is what mw:Who Wrote That? uses). The other methods tend to overemphasize people who don't actually write any words. For example, if the article has 50 edits total at the time of calculation, and five of them are me blanking the whole article, or changing the whitespace on a template, then I've made 10% of the edits, but I haven't written a single word on the page.
    @Grnrchst, the last time I remember seeing a discussion about highlighting the names of contributors, a jerk who normally edited under his real name created an account with a vulgar username and made one edit, just for the purpose of asking whether we really wanted to have vulgarities displayed in the mainspace. WhatamIdoing (talk) 16:41, 2 April 2024 (UTC)
    And his alt was banned for WP:DISRUPTNAME, right? Aaron Liu (talk) 20:40, 2 April 2024 (UTC)
    Yes, but DISRUPTNAME was declared to be an insufficient reason to revdel or oversight/suppress the change. WhatamIdoing (talk) 02:27, 20 April 2024 (UTC)
    I was having the same thoughts. It should be based on who contributed to what the article as currently displayed. It would be wrong for instance to list the top contributor as someone who hasn't edited the article since it was completely rewritten.
    It might also encourage more competitive editors to try and find ways to boost there standings, without having to do any actually helpful work. -- LCU ActivelyDisinterested «@» °∆t° 15:54, 3 April 2024 (UTC)
    For the reasons mentioned above, I'm not a big fan of crediting whoever ran the link archiver most often as being the "author" of the page. Nor am I fan of being assigned as the author of a page, even if I am indeed the #1 author. The mw:Who Wrote That? extension is excellent and should be promoted, because it allows seeing authorship of words and sentences currently live, which is an excellent (though not infallible) way of tracking down who has added nonsense to an otherwise decent page (caveat lector: sometimes someone who copyedits nonsense will be shown rather than the original nonsense-adder). One thing that may not be widely known is that you can run Who Wrote That? on old versions of pages, making it an arguably more efficient tool than WikiBlame.-- SashiRolls 🌿 · 🍥 18:37, 8 April 2024 (UTC)
    Although XTools is influenced by use of link archiver tools, the underlying WikiWho service provides token-by-token attribution of who added what. This can be used to determine authorship without considering anything between ref tags, as well as other markup that's seen to unfairly influence authorship stats. I have implemented this in SDZeroBot 6 task which makes the bot somewhat smarter about whom to notify about AfDs. – SD0001 (talk) 21:15, 8 April 2024 (UTC)
  • I am concerned that this would encourage WP:OWNERSHIP of articles. The entire point of the Wikipedia model is that articles are “authored” by the community, not by individuals. Blueboar (talk) 20:48, 2 April 2024 (UTC)
    This is a completely fair and valid concern. --Grnrchst (talk) 21:54, 2 April 2024 (UTC)
I don't see the need for this. I do get a certain pleasure from seeing how much of the content in an article I have contributed (which I can see at Page Statistics), but I am well aware that no one else really cares, and the future of such content is out of my hands. I am not editing Wikipedia to build my curriculim vitae. Donald Albury 21:41, 2 April 2024 (UTC)
Personally I strongly dislike the idea of articles where I have primary authorship saying "written by Levivich, et al." or anything like that. That is very much not the kind of attention I want. Also, xtools authorship is not really reliable. For example, I am listed as the #2 author of Alexandria Ocasio-Cortez [2] but that's only because I once ran the archive bot on that page [3], I am not actually anywhere near a top author of the actual prose on that article. Levivich (talk) 17:15, 3 April 2024 (UTC)
If you have a divisive article and then add a note that User X was its most prolific contributor, readers will immediately assume User X holds those divisive views. And all the better if User X is an IP and offended readers can immediately find their location. (Which is already possible, of course, but why place it front and center?)
Speaking of which, how would this even work with dynamic IP addresses? 2603:8001:4542:28FB:25EE:12B6:DCFA:E43E (talk) 18:43, 3 April 2024 (UTC) (Send talk messages here)
I agree with the concerns voiced by Levivich and others. And even if the authorship statistics were 100% accurate I still don't like the idea of omitting certain users; as sappy as it sounds I think every contribution matters. Potentially, we could do something like Based on the contributions of 328 users but even then I think a more appropriate place for this kind of thing would be the footer alongside the last edited date. ― novov (t c) 08:21, 4 April 2024 (UTC)
  • I can totally see the issues with this: we'll never have a 100% reliable measure of authorship, you can't include everyone, and we'll probably see a slight uptick in WP:OWNERSHIP and authorship gaming. But overall, I think it would be a nice way to acknowledge our volunteer editors and to communicate to readers "look, this was written by real people – you can join us". And twenty years into the project, with declining editor numbers, increasing restrictions and expectations of those editors that persevere, and donations to "Wikipedia" siphoned off by an organisation that has increasingly little to do with it, I think we really do need to start prioritising looking after our volunteer base over other concerns. Relying on the ideal of the selfless, perfectly-self motivated contributor, happy to work in complete anonymity, was fine in the early days of the project when the internet was the playground of affluent nerds with utopian ideals; those days are long gone. – Joe (talk) 08:56, 4 April 2024 (UTC)
I can see how algorithmically a shortened authorship list would be generated automatically, fine-tuned for a relatively accurate representation. And while anything like that could be gamed by users, that's why we have lots of human eyes to review abuse. I can also see how such a list would be useful to researchers and those making citations. However, were such an authorship list to be implemented, I'd suggest it be hidden out of the way a bit, at least certainly from the front page of the article, and perhaps even completely hidden from UI except as metadata.
I'll give some contrasting examples: Scholarpedia places its curator-authors (respected subject-matter experts) prominently at the top of its articles: non-random example is BCM Theory (SP). By contrast, Internet Encyclopedia of Philosophy has its authors' names and affiliation mentioned simply at the bottom, under "Author information" following the bibliography; while Stanford Encyclopedia of Philosophy has its author even more nondescript, being in a footer at the bottom under a copyright notice, and not implied to be an actual "author" until you click on the "Author and citation info" link on the sidebar. (Again, respected subject-matter experts; random ex.: Gender in Chinese Philosophy (IEP), Plato on utopia (SEP).) Wolfram MathWorld also has authorship given relatively subtly at the bottom of the page -- if it's contributed by someone other than the editor, the contribution note precedes the bibliography; otherwise authorship is indicated only in citation information (ex. Chen-Gackstatter Surfaces (MW)).
Given all this, I don't know what example editors here would want to find themselves compared to, especially since an algorithm listing authors would not distinguish one editor writing 95% of an article immediately preceding GA, from one editor writing 55% of the prose in a B-class, for which others had to find new citations (unless we'd want it to do so, but this would epitomize wp:ownership). SamuelRiv (talk) 15:37, 5 April 2024 (UTC)
It’s really hard to measure the significance of contributions to an article. It’s not just a count of who added the most words, or even of who added the most words that survive into the current version. How should we weigh a user who adds some high word count nonsense to an article, against a user who painstakingly sifts through the garbage, deletes most of it, and copyedits down any remaining kernel of valid content? Or the user who contributes great source analysis to a talk page discussion on a matter which results in a single word changing? Perhaps the article on Antoine de Saint-Exupéry is finished not when there is nothing left to add, but when there is nothing left to take away? Barnards.tar.gz (talk) 18:05, 5 April 2024 (UTC)
This is an excellent point. We don't have any accurate automated way to assess contribution levels, and xtools authorship isn't it (neither is bytes added or edit count). Levivich (talk) 18:56, 5 April 2024 (UTC)
Just because an algorithm isn't currently implemented, does not mean an algorithm doesn't exist. As a rough starting point: authorship+curation can be measured by taking the diffs made by an editor to bring text in line with the current stable state, weighted by time. (For the simplest measure, you can just use edit longevity with hysteresis.) Now, absent some new API properties, this is an expensive calculation to maintain for every article, but it's perfectly technically doable. (Another, more sophisticated method is analogous to a co-authorship network.) Of course this has been done before: Arazy et al 2010; Lanio & Tasso 2011 (citing Biuk-Aghai 2006). SamuelRiv (talk) 19:28, 5 April 2024 (UTC)
No algorithm will be perfect though, and the exact value of things other than clear-cut addition/removal is to some extent subjective. ― novov (t c) 01:57, 7 April 2024 (UTC)
  • Fundamentally, I think this is a nice idea and seems like it would be easy to implement since we already track editor contribution metrics, so it would just be a matter of making this visible on the page itself somewhere, maybe in the footer (though, yes, XTools is imperfect and an alternate system would probably be better). That said, I would hope there might be some opt-out system for those of us who don't want any sort of public credit, which includes me. (I never let anyone IRL know what WP articles I've worked on because the layperson assumes that the current status of any given WP article I "created" represents my writing. And, in many cases, I want nothing to do with how an article I "created" has evolved.) Chetsford (talk) 02:07, 7 April 2024 (UTC)
  • Much as with the hall of fame suggestion (people really seem to be concerned with credit and recognition, lately), this could go so very badly. Any algorithm that we set up is going to be full of holes. It is simply impossible to represent, with any sort of non-LLM algorithm, the extent to which an edit impacts an article's development and structure for a multitude of reasons. The first reason is the fact that anyone can edit Wikipedia, and edits are happening on a constant, minute-by-minute, second-by-second basis. An article could look one way today and then look totally different the following day, if it gets a massive rewrite. How would one recognize contributions in that scenario? If Editor X wrote the previous version of the article before its replacement, do they lose attribution now that it's been rewritten? Or do they receive attribution for something that no longer resembles their work?
The second reason, for Wikipedia articles, especially larger ones, the whole is greater than the sum of its parts. One can do a massive amount of copy-editing in a large edit, mainly to make stylistic or grammatical corrections, and as a result it would have very little impact on the article's growth but they would be considered an outsized contributor. On the other hand, the addition of a few vital facts or details could contribute heavily to the article's development, ensuring that it's meeting WP:GNG or perhaps even putting it on the road to becoming a WP:GA.
The third reason is that someone will always disagree with the algorithm that determines who is an author. Attitudes on Wikipedia among editors regarding WP:OWNERSHIP are already very fierce. This would exacerbate it to a fever pitch. Or it would simply not be taken seriously, if enough people take umbrage with it. This metric would be divisive at worst, and superfluous at best. If one wanted to track the metrics of the article, the "Statistics" are available for this very reason. They do not attempt to ask the question "who wrote this article?" They simply provide the data. In fact, it's a good rule of thumb - Anytime you ask any sort of question regarding creative or scholarly human impact, the question should always be answered by a human and not by a machine. Duly signed, WaltClipper -(talk) 13:20, 15 April 2024 (UTC)
100% agree with all your comments WaltClipper! Alexcalamaro (talk) 08:08, 4 May 2024 (UTC)
  • I've got a slightly different idea that I think gets around Ownership issues and the algorithm not being able to make a perfect summary of "main editors". What if it just counted the contributors, that wouldn't be nearly as expensive a calculation and the text could read:
This article was created by 3,428 volunteer editors. (and you could be one of them).
The number of bot editors would need to subtracted of course. It would serve the purpose of directing anyone who wanted to know all the editors to the right page, and it would both more accurate and more precise than saying "This article was written by a monolithic 'Wikipedia'" And just for fun, there could be a text variant depending on whether the editors of that article includes the logged in user or not, saying:
This article was created by 3,428 volunteer editors. (including you)
I think one line like this could potentially be a lot more informative to a lay reader than the "about" pages that I don't know if anyone actually reads. -- D'n'B-t -- 18:25, 25 April 2024 (UTC)
👍 Like Levivich (talk) 19:25, 25 April 2024 (UTC)
I also like it (but don't know how to make the nifty icon)! jengod (talk) 06:37, 2 May 2024 (UTC)
Wikipedia is source wikitext, you can always view the source of a section to check it out! Aaron Liu (talk) 11:05, 2 May 2024 (UTC)
Aaron Halfaker did excellent research on this issue of attribution and gave a good presentation about it at Wikimania: WikiCredit - Calculating & presenting value contributed to Wikipedia. But they have now left the WMF and it's a great shame that more has not been done with this idea. The main metric which we still use to measure work is edit count and that is awful. Edit count encourages low value editing -- busy work, chatter, churning and conflict -- rather than the creation of high quality content. Andrew🐉(talk) 09:22, 6 May 2024 (UTC)
So basically, sum up WikiWho? What about contributions that are later improved and overwritten/copyedited upon by others? Aaron Liu (talk) 16:11, 6 May 2024 (UTC)
Halfaker cited the WikiWho paper in his presentation and so was building on that work.
Such superior measures of value-added exist and so we have the technology. And crediting authorship is something that is legally required by the CC licence but not enough is done to enforce this. More visible credit, as suggested by the OP is a good idea.
Andrew🐉(talk) 21:23, 6 May 2024 (UTC)

Currently we have an embarrassingly large backlog named Category:Articles lacking sources with 94500 articles in it. The WikiProject Unreferenced articles, even with its fairly active membership, can only clear 500 articles every week, which amounts to around 3.5 years to clear the backlog. I'm curious, do you guys have any ideas to accelerate this progress? CactiStaccingCrane (talk) 06:41, 26 April 2024 (UTC)

I looked at a couple dozen articles in Category:Articles lacking sources from March 2024 and I found that more than half of them were incorrectly tagged. Specifically, they didn't have little blue clicky numbers, but they did contain external links that verified some of the content of the article, and one or two had a list of books.
I suspect that the fastest way to reduce the number would be to send a bot through all the articles to replace {{unref}} with {{No footnotes}} if there are any URLs anywhere on the page. It wouldn't catch everything, but it might clear thousands of articles. WhatamIdoing (talk) 08:20, 26 April 2024 (UTC)
That would just push the problem to another backlog, which is bad practice. But you give me an idea... if a lot of these articles already have external links, why not use that to create an inline citation? CactiStaccingCrane (talk) 14:03, 26 April 2024 (UTC)
Triage is not bad practice. Levivich (talk) 14:09, 26 April 2024 (UTC)
I have to agree. CactiStaccingCrane (talk) 14:11, 26 April 2024 (UTC)
I think you're being hard on the Wikiproject if they're clearing 500 articles a week. That's admirable. CMD (talk) 08:29, 26 April 2024 (UTC)
One solution is the “one step back, two steps forward” approach: Delete the existing unsourced articles, and encourage people to create NEW articles on the same topics - this time with proper sourcing. Blueboar (talk) 12:29, 26 April 2024 (UTC)
This is literally Wikipedia:Village_pump_(proposals)#Deprecating_new_unsourced_articles, but the community has rejected this proposal. CactiStaccingCrane (talk) 14:01, 26 April 2024 (UTC)
Yup… but… consensus can change. Not saying it has changed, just that it can. Blueboar (talk) 14:29, 26 April 2024 (UTC)
How about making a proposal to every WikiProjects so that they cite all articles belonging to the Wikiproject with at least one source? We already have the Bambot cleanup listings (click "by cat" then "Cites no sources"). We just need to put it to work. CactiStaccingCrane (talk) 14:43, 26 April 2024 (UTC)
A WikiProject is a group of people who want to work together to improve Wikipedia. They don't own articles, and they aren't responsible for them. WhatamIdoing (talk) 16:16, 26 April 2024 (UTC)
@Blueboar: Highly unlikely that it changed since 10 days ago. Even the month between that and Wikipedia:Requests for comment/Deletion of uncited articles was probably too little time for consensus to have changed. Anomie 16:29, 26 April 2024 (UTC)
Essentially the reason that the proposal from a few days ago was rejected was because it created a "grandfather clause" where older unsourced articles would be allowed to fester while ones would be stopped at the gate. I would suggest then that any drastic action on unsourced articles should start at the other end of the backlog with articles that have been unsourced for over 15 years. -- D'n'B-t -- 17:52, 27 April 2024 (UTC)
Fun fact: Non-BLP articles are not technically required to name a single source unless there is some specific material that falls into one of the categories listed by WP:MINREF. If you can write an article that avoids those categories (e.g., the content is something like "The capital of France is Paris"), then you are not required to cite any sources. Only the material that fits one of those sources is required to have an WP:Inline citation.
Consequently, if you want to be able to delete non-BLP unsourced articles the way that we currently delete BLP unsourced articles, that requires a change in policy. WhatamIdoing (talk) 16:21, 26 April 2024 (UTC)

Triaging

User:Levivich and User:WhatamIdoing, so how exactly can we sort these articles in the backlog? Is there a way to manually to sort articles with a reliable citation with AWB? CactiStaccingCrane (talk) 16:17, 30 April 2024 (UTC)

Well, that depends on what you mean.
I believe that you (i.e., people with regex skills) can use AWB to find articles that contain {{unref}}, do not contain any ref tags, but do contain a URL, and change {{unref}} into {{no footnotes}}. For extra points, if there is only one URL on the page, and it is either {{official website}} or inside an infobox, then AWB could add {{third-party}} as well.
A bot can be (and might already have been) sent around to remove the {{unref}} template from articles that contain any ref tags. In the past, that bot has substituted {{refimprove}}.
There are no good ways to make lists of articles with reliable vs unreliable sources. WhatamIdoing (talk) 16:26, 30 April 2024 (UTC)
My thoughts:
  1. Separate no sources of any kind ("unsourced", marked with {{unreferenced}} and thereby placed in Category:All articles lacking sources) from some kind of source of some kind but maybe not good enough ("undersourced", marked with {{more sources needed}} and thereby placed in Category:All articles needing additional references).
  2. Separate BLP from non-BLP.
  3. Backlog drive to source unsourced BLPs, then either unsourced non-BLPs or undersourced BLPs (not sure which I think are more urgent), and lastly, tackle undersourced non-BLPs.
One can separate BLPs from non-BLPs by searching for Category:Living people (assuming the category has been applied). There are other methods to catch uncategorized BLPs (e.g. look for {{infobox person}} or {{authority control}}), and I think there is already some automated process that does this for all new creations (plus I think NPP does this?).
Finding "unsourced" is a bit harder...
  • You can search for articles that have an external link (any external link), and put them in the "undersourced" category (an external link being "some kind of source but maybe not good enough").
  • However, even if an article has zero external links, they may still have a source, just one without a hyperlink. The most obvious example being a citation to a book or other offline source. So you can also search for all the various citation templates ({{citation}}, {{cite book}}, {{cite news}} etc. etc.), and move anything with a citation template to "undersourced".
  • Still, plaintext citations to offline sources won't be caught by searching for external links or citation templates. So a third method is to search for a heading like "Works cited," "Sources," or even "External links." If an article has one of those sections (and it's not empty), then it can be moved to "undersourced."
  • What you'll be left with in "unsourced" are articles that have no: (1) URLs, (2) citation templates, (3) obvious citation-style headings. You'll still find stubs that have no headings, URLs, or templates, but have sources. Those will be false positives in the "unsourced" pile, but that's probably OK, because presumably the "unsourced" pile will be much smaller now.
This has been done before (how I know these methods off the top of my head), see e.g. WP:Unreferenced BLP Rescue and the bottom of that talk page, which has links to some Quarry queries and other pages from when some folks were messing around with this last year. I think we found at that point, with various methods, that there were less than 1,000 unsourced BLPs (the highest-priority category). I'm not sure what the current state is. I'm pretty sure that the vast majority of the 94,500 "articles lacking sources" probably do have some sources of some kind, just not in an easy-to-find format.
Final thought: in doing these sorts of searches, one cannot rely on the existing {{unsourced}} maintenance tag or Category:All articles lacking sources, because there may be false positives in that category, and there may be unsourced articles that aren't in that category. So just searching within that category won't find everything. Levivich (talk) 16:44, 30 April 2024 (UTC)
@Levivich, thank you for your amazing write up. I have a question though, would it be much more helpful to generate a list of these articles rather than sorting the category outright? Like you said, there will be a lot of false negatives that will not be detected using this method. Also, my regex skill is very terrible but I can try to hack up one to plug into JWB as a test. CactiStaccingCrane (talk) 17:10, 30 April 2024 (UTC)
Yes. At least for initial exploration purposes, I think the best approach would be to generate lists and post them into subpages and then spot-check them manually to see what it all looks like (as in: do the lists pass the spot-check, and how many are in each list). Another option/layer is to create new maintenance categories or even templates (templates can be hidden, as can categories, so no one will even see it) just for this purpose. At some point, when there is sufficient confidence that the sorting is being done correctly, then change the "live" templates/categories on the affected articles. As a bonus, if you dump the list of articles into a subpage on Wikipedia, you can easily import all the (linked) articles on the subpage into AWB/JWB for further batch editing.
BTW I also am terrible at regex, but you know who's good at it? ChatGPT, the free version. You can ask it to "write a regex that will find [string] [string] or [string]" and it'll usually do a pretty good job. It still hallucinates sometimes or writes a regex that doesn't exactly get what you're looking for, so I'd still check any ChatGPT regex against a regext checker like https://regex101.com/ or https://rextester.com/tester or any of a million others you can find in a google search for "regex tester." But that's how I do regex now: with ChatGPT, then verify/tweak it.
You don't actually need regex (although it would probably make things easier and more accurate if it were used), you can use WP:PETSCAN or (with SQL knowledge) WP:QUARRY, or you can ask someone at WP:SQLREQ, where the folks who write SQL probably also are good at regex and various search tools.
Another alternative is to just use the regular Wikipedia search; you can limit to searching only for mainspace articles, and the advanced search options let you search by category, template, and text string. So for example, this search for articles in the unreferenced category with "Works cited" found List of Christian universalists, a false positive that actually cites sources and shouldn't be in that category.
But WP:PETSCAN will output a PagePile, which I found makes it very easy to generate lists of pages meeting certain criteria. If you're already doing AWB/JWB I'd encourage learning PetScan and PagePile, and then later if you're so inclined, SQL and Quarry to do more complicated things that PetScan won't do or won't do easily. Levivich (talk) 17:54, 30 April 2024 (UTC)
As well as the false negatives highlighted above - I think it's important to consider false positives, in that the presense of a URL doesn't necessarily imply a source. It could be the homepage of the website of the subject, for instance. Or a slightly off attempt at internal/interwiki linking. Could be spam links. Or, I have seen, linking to the homepages of various words mentioned in the body of the article for some reason. -- D'n'B-t -- 06:18, 1 May 2024 (UTC)
@DandelionAndBurdock, the homepage of the website of the subject would be a reliable source for information about that subject. WhatamIdoing (talk) 19:32, 1 May 2024 (UTC)
I'm not saying it's not reliable - I didn't mention reliability. A homepage is often not a source full stop. For example nationaltheatre.org.uk doesn't really tell you anything about the National Theatre, so it's not a source for information. If you wanted an ABOUTSELF type source then you'd link to nationaltheatre.org.uk/about-us/our-history/, not the homepage. So when you see an organisation's homepage in eg. an infobox or external links section, it'd be incorrect to think "oh, that's one source". -- D'n'B-t -- 20:38, 1 May 2024 (UTC)
I could use nationaltheatre.org.uk to WP:Directly support these facts:
Ergo, nationaltheatre.org.uk is an actual source. WhatamIdoing (talk) 00:30, 2 May 2024 (UTC)

Another idea

I have long thought that the main page should include an “Articles highlighted for improvement” section… where once a week we choose (say) five needy articles and encourage the community to work on them (looking for sources, improving language and grammar, and generally raising their quality). This would compliment the “Featured Article” section (which showcases our best), by highlighting the fact that we still have lots of ways for newcomers to contribute. Blueboar (talk) 16:54, 30 April 2024 (UTC)

I think that your idea is a stroke of genius. Genuinely. Make this proposal a little bit more detail and boom, there you have it. CactiStaccingCrane (talk) 17:11, 30 April 2024 (UTC)
We can also use this as a way for new editors to join in Wikipedia, as a kind of mentorship of sorts. CactiStaccingCrane (talk) 17:12, 30 April 2024 (UTC)
Definitely. Other projects have similar things (enWS has "proofread of the month", for example) Cremastra (talk) 20:36, 7 May 2024 (UTC)
Wikipedia:Articles for improvement used to have a section on the main page, but it was removed after its trial was considered unsuccessful, as there were few new editors making edits to the highlighted articles. The project still exists, with articles being nominated and accepted into a queue for listing on the project page. I suggest working with that WikiProject on the feasibility and potential cost/benefit ratio of having a corresponding section on the main page. isaacl (talk) 17:36, 30 April 2024 (UTC)
Interesting to see that it was attempted and didn’t work. Oh well. Blueboar (talk) 20:00, 30 April 2024 (UTC)
It was over ten years ago, and I suspect the queue-filling process has been honed by now, so it might be worth a discussion at the WikiProject talk page. It could also be something to consider for user home pages, which has a specific intent of suggesting tasks for new users. isaacl (talk) 21:30, 30 April 2024 (UTC)
This is why ITN should feature the worst articles to the front page instead of the best! Ha ha, only serious. A subject being in the news is almost a guarantee that new sources are becoming available and people are interested in it - the perfect conditions for article improvements and recruiting new editors. Barnards.tar.gz (talk) 21:20, 30 April 2024 (UTC)

Semi all template documentation

Regarding Wikipedia:Template documentation, would it not make sense to automatically semi-protect all "/doc" template pages? I don't see any reason why IPs, etc., should edit these pages and since they are practically unwatched they are ripe for BEANS. A bot could do it. Thoughts? Rgrds. --BX (talk) 05:12, 2 May 2024 (UTC)

"I don't see any reason for them to edit" is not a valid argument for protection. IPs, just like everyone else, can constructively add things and fix errors in template documentations. Also, some templates aren't even semi-protected due to low use, IPs improving these templates should be able to update their templates. Chaotıċ Enby (talk · contribs) 12:43, 2 May 2024 (UTC)
A related, alternative suggestion that would, I believe, require MediaWiki changes, would be to make it so in the Template space watching the template itself would also show the /doc subpage in your watchlist, in the same manner that watching either Page or Talk:Page (in all namespaces I know of) watches both. That would at least somewhat increase visibility of documentation pages. And I think many new(er) editors probably already assume the doc page is watched along with the template itself (I think I did earlier in my Wikipedia time). Skynxnex (talk) 16:54, 2 May 2024 (UTC)
As this is the idea lab, how about a related idea: introduce an option to watch a page and all of its subpages, with new pages automatically watched as they are created. Like, say, my user page and all of its subpages and talk archives. Or a portal including all its subpages. (Probably not a good idea for Wikipedia:Articles for deletion) though). —Kusma (talk) 20:18, 4 May 2024 (UTC)
I'd love that - I've got a quick-and-dirty script that watches each daily subpage of P:CE and WP:DRV that I run every year, but usually only after I spend the first week or so trying to figure out why my watchlist is suddenly so quiet. —Cryptic 06:56, 5 May 2024 (UTC)
Good generalization of my idea that would also be generally useful. I assume there's some performance issues like with AFD subpages, some users have dozen of subpages and many WP-space pages have hundreds, or thousands, of archived talk pages. But it'd be interesting to know if there's any fundamental reason to not do something like filing a phab to get the software side start at some point. Skynxnex (talk) 12:35, 5 May 2024 (UTC)
Yes, that would be a better solution, even if it is harder to achieve. We did get a watchlist change for temporary watching implemented, so enhancing this area of code is not impossible. Certes (talk) 12:53, 5 May 2024 (UTC)
Some data (for the original request) is at quarry:query/82554. —Cryptic 06:56, 5 May 2024 (UTC)

Add watchlist option to include subpages

Reformulating the idea based on the above: When choosing to watchlist a page, currently a drop-down menu appears with a time option to choose from (permanent, 1 week, 1 month, etc.). A new option would appear to watchlist all subpages. An advanced option would allow unwatching or watching of specific subpages (you would probably have to type in these pages manually I imagine). Rgrds. --BX (talk) 03:54, 10 May 2024 (UTC)

area codes

Once in a while (most recently in 234 and 661), i edit a page with a 3-digit title to disambiguate an area code. How desirable is this, and how easily could someone set up a bot or template to add links to area codes on all pages whose titles are matching 3 digit numbers?

On a related subject (which might belong on a different help page), what does the {{Year dab}} template do? or what is it supposed to do? Pages 234 and 661 both have that template, but only one shows a hatnote (This article is about the year 234. For the number, see 234 (number).)

--173.67.42.107 (talk) 19:03, 9 May 2024 (UTC)

See Template:About year for information about that template. — xaosflux Talk 19:24, 9 May 2024 (UTC)
Thanks, but that seems to be a different template than these pages use. --173.67.42.107 (talk) 19:37, 9 May 2024 (UTC)
Although, in theory, using Template:About year instead of {{Year dab}} could automatically add 234 (area code), etc., if somebody/some bot/something made such redirects for all the area codes... --173.67.42.107 (talk) 19:45, 9 May 2024 (UTC)
Template:Year dab redirects to Template:About year. When in doubt about a template, just check its page. Aaron Liu (talk) 20:00, 9 May 2024 (UTC)
The reason why only one of them shows the hatnote, as far as I know, is because 661 (number) does not have a standalone page. Chaotıċ Enby (talk · contribs) 01:09, 10 May 2024 (UTC)
That makes sense. Thanks all for answering my technical question. Anyone have any thoughts about my disambiguation suggestion? --173.67.42.107 (talk) 16:42, 10 May 2024 (UTC)

Almost-notable or seemingly-notable-but-not topics create duplicated effort

Anyone who plays any rhythm video games at all has probably heard of Camellia, a music artist who creates fast-paced EDM songs. I was quite surprised to find that we do not have an article on him, and 30 minutes later, I was even more surprised with my conclusion that he isn't notable.

I'm sure I'm not the first person to investigate writing an article on him, and I won't be the last either. The thing is, unless someone has tried before (creating a deletion log entry), a non-notable topic leaves no evidence that someone else has tried to write the article but deemed the topic to be non-notable.

I think that evidence should be somewhere. Perhaps an index of topic titles where each topic is a section on a page, that contains a list of sources if any are found, a couple of possible redirects for searchability, and a log of editors who have determined the topic to be non-notable and when they did so. This would make my search much faster—I would check this page, see that 2 editors have already looked into making the article less than 6 months ago, and be on my way. Snowmanonahoe (talk · contribs · typos) 19:28, 21 February 2024 (UTC)

I feel like, having sort of memorized the notability guideline, such a person didn't get significant press coverage or do something really big (which can be a summary of most notability guidelines), so they're not notable. I'm not very convinced that there are a lot of subjects that are nearly notable. WP:BFDI seems to be the only one I can think of. I'm sure there are more, but not by much.
Personally I like core-y songs like those from LeaF better Aaron Liu (talk) 19:48, 21 February 2024 (UTC)
Sometimes a topic that is not notable becomes notable. Years ago I repeatedly reverted attempts to add an up-and-comming rapper named Flo rida to various lists of notable people. And then he made the grade. Donald Albury 23:43, 21 February 2024 (UTC)
See also Wikipedia:Before they were notable. WhatamIdoing (talk) 07:30, 25 February 2024 (UTC)
Camellia's one person? I guess that shows how much I know. jp×g🗯️ 21:29, 22 February 2024 (UTC)
You could add URLs about Camellia to his Wikidata item at Camellia (Q40857248) using the Property P973 "described at URL".
As an example, you can see the reviews I added to the novel A Fire So Wild (Q124606008) in case someone were to eventually create a Wikipedia article about it. Besides helping editors, the URLs there help to establish Wikidata notability. Lovelano (talk) 01:04, 25 February 2024 (UTC)
I think it’s a valid point, but I would be concerned that any centralised location for recording information about non-notable subjects could become a garbage magnet.
If you think there’s a chance that the subject is not notable yet but stands a chance of becoming notable eventually, you could maintain a stub in Draftspace, with your rationale on the talk page. But without continuous effort it would eventually get G13 speedy-deleted.
As a minimum you could maintain a “research log” on your user page detailing your efforts, which might possibly be found by a future editor. Maybe. Possibly. Barnards.tar.gz (talk) 16:37, 28 February 2024 (UTC)
IMO it'd be better to create that Wikipedia:Too soon article in your userspace, so that it won't require constant vigilance against deletion. WhatamIdoing (talk) 20:08, 2 March 2024 (UTC)
Well, what if the "note" left behind is a statement like "not notable as of <date>" in a G13 deletion log entry for a draft? That would solve the garbage magnet problem. Snowmanonahoe (talk · contribs · typos) 15:39, 4 March 2024 (UTC)
Probably the best idea I've seen at the pump – in fact, I wouldn't be surprised if it already exists somewhere or at least has been proposed in the past. Would be huge for avoiding unnecessary doubling-up on research. To answer Barnards's concern about it becoming a garbage magnet, the easiest thing to do would be to establish some minimum standard for inclusion as a listing. To have a stab at it: "entries must list at least a passing mention or greater in an independent reliable source and make an A7-style credible claim of significance." (Just a thought off the top of my head). – Teratix 15:29, 3 March 2024 (UTC)

@Snowmanonahoe I've found Wikipedia:Source assessment, which is sort of what you've described. It really needs more promotion... any ideas for what I might include in some WP:WPADS? Aaron Liu (talk) 00:45, 11 May 2024 (UTC)

Interesting thought, Snowmanonahoe! I think the most straightforward way to achieve it would be to modify our rules to allow talk pages for non-notable topics, so that one could leave a note along the lines of I did a search and found X and Y sources, which I don't think is enough for notability because Z.
However, then there's the problem of articles that might be located at multiple different locations, so then we need to have redirects as well. And given that part of why we impose a notability standard is to reduce the maintenance burden, some editors might feel that it's not worth it for maintaining the talk page/redirects.
The other problem is datedness. If I come across a note topic not notable as of [a year ago], that doesn't help me out all that much, since (a) there may be more recent sources, so I still have to do a search, and (b) I may not trust that the initial editor did a deep enough dive to find everything that existed at the time.
The other approach here is drafts, which is what I currently use. For instance, at one point I started writing an article on the video journalist Cleo Abram, but ultimately concluded she wasn't yet notable. However, I strongly suspect she will be at some point, so I've just kept it at Draft:Cleo Abram and set up a Google Alert so I'll be notified when a source comes around. The main maintenance burden there comes from the 6-month rule we impose on ourselves, which means I have to tweak it every so often or ask for it to be undeleted if I forget. However, I know for a fact that the draft's existence has saved others from duplicated effort (and the community from the effort of a probable AfD) due to this exchange.
The structural form related to the draft approach would be to adjust the 6-month rule somehow to make it so that waiting-for-notability drafts like the one on Abram don't quasi-automatically get deleted.
Cheers, Sdkbtalk 18:03, 11 May 2024 (UTC)
What do you think about the approach at Wikipedia:Source assessment and its subpages, which are collections of SA tables along with some information on the subject? Aaron Liu (talk) 20:01, 11 May 2024 (UTC)
If there's no way to point users trying to create the article toward the corresponding source assessment page, then it's basically useless. Sdkbtalk 23:33, 11 May 2024 (UTC)
There's a consensus against keeping WP:BFDI drafts for some reason, so I doubt that draftinng would prevail. Aaron Liu (talk) 00:15, 12 May 2024 (UTC)
@Aaron Liu: Good find. As for an advertisement... hm...
  • "Want to create an article on something, but can't just yet?"
Not very proud of that, but it's something. Do people actually click those? Snowmanonahoe (talk · contribs · typos) 19:46, 13 May 2024 (UTC)
I would promote this at WT:N WT:NPP, and WT:AFC. voorts (talk/contributions) 02:14, 14 May 2024 (UTC)

Coming up with principles for future icon redesigns

A few years ago I botched an RfC to change to flat icons. Of course I still feel strongly about that we need new icons, for many reasons from accessibility to load times to new features like dark mode (which oceans of white in the middle of some of today's icons are a bit too much).

I want to see if we could figure out ways to propose icon redesigns in a matter that is most likely to pass. I kind of want to discuss some of my principles when choosing icons (and combining User:Awesome_Aasim/Flat_design_idea, where I just added a section for viewing on a dark background; and User:Arsonxists/Flat_Icons).

In general I believe icons must be:

  1. Clear: Easily understood by many, including new users who may be unfamiliar;
  2. Accessible: Icons must use multiple properties to distinguish themselves from one another when used in the same context. Uw1 and uw2 fail at this, because they only alter the color of the icon but not the shape or the contents inside of it.
  3. Fast: Icons must load within seconds even on the slowest of connections. I notice that some flat icons (particularly the OOUI/Codex icons) use a fraction of the space and bandwidth as equivalent skeuomorphic icons.
  4. Contrast: Icons must be adaptable based on different themes, skins, etc. Icons must be visible on both light and dark themes, and if not then should be easily adaptable from light to dark and vice versa. MediaWiki's default skins do not override the @media (prefers-color-scheme: dark) or @media (prefers-color-scheme: light) properties yet, but when they do there will certainly be icon clashes and colors.
  5. Helpful: If text is able to more clearly communicate an idea than an icon, then we should reconsider whether it is actually necessary. Stop hands do a great job at communicating "stop what you are doing" or "you have been stopped", but the i's in some messages just seem purely decorative rather than actually helpful.

Is there anything else I am missing? I probably want to add more icons that should be switched for better accessibility, etc. Awesome Aasim 03:39, 5 May 2024 (UTC)

The design principles for icons in the Wikimedia Design Style Guide may be of interest, as well as the guidance for icons in the Wikimedia Codex. Regarding effect on page loading: note images will be cached by the browser, so loading time will be amortized across many page loads. Additionally, for images the size of an icon, the number of requests being made is a more significant bottleneck than the byte size of an image. This is why sites will use the CSS sprite technique to bundle many icons together onto one image. However, this adds more steps for changing icons. isaacl (talk) 05:44, 5 May 2024 (UTC)
@Isaacl If I recall there is a way to get Codex icons to work in wikitext. Can you maybe show? Awesome Aasim 15:56, 5 May 2024 (UTC)
Not directly that I know of. If I understand correctly, they are available for use in a gadget/user script or extension. An extension could be written to provide a wikitext interface. (Maybe one exists already? Perhaps someone who knows more about the Codex can weigh in.) Regarding design guidance, here is a more direct link to the principles and guidance for designing icons in the Wikimedia Codex. isaacl (talk) 17:50, 5 May 2024 (UTC)
I wish there was a parser function like that:
 {{#codex:name_of_icon|size=size|color=color|type=type|...}}
. Then it would allow codex icons to be used inline. Awesome Aasim 19:52, 5 May 2024 (UTC)
The WMF's previous, similar icon set OOUI is on Commons which made it easy to use with standard wikitext syntax. Curiously, Codex doesn't seem to be – but surely it has compatible license? – Joe (talk) 21:17, 5 May 2024 (UTC)
I am also going to link my past two attempts attempting to gain input on using the OOUI icons on Wikipedia. Wikipedia:Village_pump_(idea_lab)/Archive_37#Changing_to_flat_icons Wikipedia:Village_pump_(proposals)/Archive_168#Flat_Design (the latter is cringe because I was new to how RfCs worked at the time, so I did not really understand the best way to format RfCs at the time, now I do). Awesome Aasim 02:04, 6 May 2024 (UTC)
I don't think RfCs are the way to go here. It touches too many buttons: Wikipedians are reflexively sceptical about new UI, and about anything the implies changes to many pages, and about anything that implies the WMF might be better at some things than volunteer editors are... What I'd do is write an essay that outlines what guidelines you think people should follow when selecting icons in templates etc., and then try to build consensus around those guidelines by arguing for its application in specific discussions. It's the longer road, but I think it's more likely to build a broad and stable consensus. – Joe (talk) 07:23, 6 May 2024 (UTC)
I think the argument by many will be "if it ain't broke, don't fix it", and yes some of the icons ain't broke, but this does not mean there wouldn't be benefits to switching. Awesome Aasim 13:02, 6 May 2024 (UTC)
The key challenge is that interface design decisions are often difficult to make using English Wikipedia's consensus-based decision making traditions, because many users respond on a like it/don't like it level, and aren't fussed about compliance with guidelines. I agree with the idea underlying your original post (which matches Joe Roe's suggestion) of building up support for basic principles, and I think that offers the best path towards UI changes. But for better or worse, results from A-B testing is likely the only hard data that will get some users to overlook their own personal initial reaction, and that generally needs funding. isaacl (talk) 16:18, 6 May 2024 (UTC)
Let me take a look at another icon redesign RfC: Wikipedia:Village_pump_(proposals)/Archive_155#Proposal/RFC:_Redesigning_page-protection_padlock_icons_to_be_more_accessible. This one focused exclusively on accessibility. I think that might be the key. Showing that accessibility is poor especially in dark mode for some of the images, or if it is there the icons have terrible contrast if Wikipedia had a properly implemented dark mode (which it doesn't yet but one is in the works). Awesome Aasim 16:56, 6 May 2024 (UTC)
The in-development night mode is a good reason to revisit the use of colour (for instance, some sports team pages use team colours for the text and background in table and infobox headers, which already is a readability problem now). But rather than jumping ahead to discussing new icons, perhaps we can continue discussing the basic principles? isaacl (talk) 17:16, 6 May 2024 (UTC)
This might be good for a multistaged RfC. One might be asking about design principles, the next icons are found and submitted that meet those design principles, and then after the icons are voted on. This would be a good three phase discussion. Awesome Aasim 18:54, 6 May 2024 (UTC)
As I mentioned, I'm more interested in getting some discussion going on the original post and my original reply than discussing process matters...
Again, I think it's looking too far ahead to be planning a multi-stage RfC. As alluded to by Joe and evidenced by the page protection icon discussion, I think it will be more effective to look at a specific icon or set of related icons, gain consensus that there is a specific problem (or problems) with them, and then work on replacements. I think a generic "here's the next stage: let's discuss this bunch of icons as replacements based on design principles" discussion isn't going to build up consensus for a change. isaacl (talk) 22:46, 6 May 2024 (UTC)
On that note, as well as dark mode, I think the switch to Vector 2022 as the default skin is a good reason to revisit icons and broader template design choices that now look out of place with the rest of the site as most readers see it. – Joe (talk) 07:38, 7 May 2024 (UTC)
You think so? I actually think it's really important that the icons are as not-flat as they are, so that I don't feel the awful life-sucking I often do from other "flat" mobile-ready web design. Perhaps the C-class etc. icons could be given a refresh, but I would see a bottom-up redesign as searching for a clear problem. Remsense 07:46, 7 May 2024 (UTC)
I actually think that the circle (aka Norro) icons are the ones that don't need to be replaced. They're accessible and consistent across their little bubble.
The unblock icons definitely need replacing. They're inaccessible, and there's no reason I can see for making them all clocks. Aaron Liu (talk) 11:30, 7 May 2024 (UTC)
Tackling the icons group by group is another path that may be more productive than a whole-site icon RfC. I don't know the history of every icon in use, but I doubt they emerged all at once. CMD (talk) 12:07, 7 May 2024 (UTC)
Whether or not you personally like the new default theme and its flat design, it's here and it's here to stay. With that in mind, I would say the use of two very different design systems (Vector 2022/Codex for Mediawiki UI; an eclectic mix of mid-2000s elements for templates) is a clear problem from both a usability and aesthetic point of view. – Joe (talk) 12:25, 7 May 2024 (UTC)
Oh, I want to make clear that I quite love Vector 2022, and part of the reason is what I feel in its balance between flat and non-flat. Remsense 13:07, 7 May 2024 (UTC)
I suspect many of the editors who like to weigh in on these matters focus separately on icons that are part of the surrounding general page framework versus the icons that are within the main content area. Thus I'm not sure that differences in the style between these is enough to generate a consensus for change. isaacl (talk) 15:42, 7 May 2024 (UTC)
Yeah, I think you might be right there. – Joe (talk) 15:47, 7 May 2024 (UTC)
Is there a technical way to make icons depend on skin and on whether or not "dark mode" is on, so dinosaurs like me can use icons that work well in Monobook? That way, we could have icons that look good in each of the skins. —Kusma (talk) 13:00, 7 May 2024 (UTC)
There certainly is to some degree: SVGs are capable of being context-aware using pure CSS. Many will swap the foreground color from black to white based on what mode the browser tells it is being used. Remsense 13:01, 7 May 2024 (UTC)
More straightforwardly, template style sheets can be used to select different icon files based on theme, night mode, or the browser configuration specifying that a dark theme is preferred. I believe SVGs are rendered server-side into bitmap images, so right now they won't be able to adapt based on CSS differences. isaacl (talk) 15:48, 7 May 2024 (UTC)
We could still do CSS hackery to switch the BMP icons. It just is that the BMP icons themselves cannot adapt unless if we do some external CSS. Awesome Aasim 17:24, 7 May 2024 (UTC)
Yes, that's what I said. isaacl (talk) 17:31, 7 May 2024 (UTC)
Bumping because I got a comment that idea lab might not be a good idea. I am seeing that icon by icon RfCs are going to be more productive. We can use principles in this idea lab to help develop icon sets. Awesome Aasim 16:59, 14 May 2024 (UTC)
That's a bit oversimplified... I was saying that since no one, including you, has responded to my comments on the design principles, and no one else has said anything about them, that it doesn't seem there is enough interest on the page to reach a consensus viewpoint on the design principles. isaacl (talk) 21:13, 14 May 2024 (UTC)
Maybe we can then focus on the icons themselves? The last time I tried workshopping in VPIL, I came to the conclusion that finding multiple icon sets and then giving people options to choose would be better. Awesome Aasim 18:29, 17 May 2024 (UTC)
Maybe the miscellaneous village pump would find more takers to discuss base principles. However it's true enough that usually more editors are attracted to comment on specific examples of icons, rather than discussing abstract concepts. I think it would be helpful for these proposals to have an explanation of how they are improvements with respect to the base design principles.
I was hoping there would be more discussion on load time considerations and use of colours. Personally I think client-side caching is likely good enough to make loading time a small factor. Colour is a tricky issue, as Wikipedia editors are accustomed to using any colours that strike their fancy, but best practice for supporting themes (which can have light and dark modes) is to have a defined palette that each variation can customize. In accordance with mw:Recommendations for night mode compatibility on Wikimedia wikis, for HTML, CSS variables can be used, and for gadgets/extensions making use of Codex, design tokens can be used. But with pre-rendered icons, any alignment with customized colour palettes would have to be done manually. isaacl (talk) 23:03, 17 May 2024 (UTC)

Proposal: "job aids" for Wiki editors

To help editors moving from the Absolute Beginner stage into the Getting Comfortable But Still Overwhelmed stage, I have a proposal: that some of you experienced luminaries create some job aids based on flow charting to guide decisions and actions. By acting on this proposal, I think you’ll be amazed at the benefits — for not only editors like me who are slogging our way up toward your realms of enlightenment but also you yourselves due to increasingly reduced gnashing of teeth and silent screams at our work.

Here's why.

Job aids are a staple in the world of training today for their proven ability to simplify complex steps and decisions that workers have to make. We might think of them as glorified “cheat sheets.” They break tasks down to such an extent in flow chart format that workers can quickly see what to do when. Just a few of the types of jobs that rely on the use of job aids include pre-flight inspection, tax auditing, employment interviewing, and customer service. The result:

— Greatly increased accuracy, quality, and consistency of work

— Greatly decreased need for time in training and memorizing rules, not to mention frustration for not only workers but also supervisors

This is an example of a job aid that could easily be adapted for Wiki editors, filling in sequences of steps to follow in various tasks and then, when required to make a decision before proceeding, the alternative next steps.


Augnablik (talk) 01:04, 7 May 2024 (UTC)

1) This doesn't really have anything to do with the WMF, so I'm not sure this is the right message board? 2) For pre-flight procedures, the term is "checklist". No flowcharts required. And indeed, we already have tons of checklists on Wikipedia whenever there's a routinized task that requires doing Step A, then Step B, then Step C. Take a look at, say, WP:CFD#HOWTO which will give you a checklist of tasks to do in order to nominate a category or categories for discussion. Or Wikipedia:ReFill#Usage for how to use a specific tool. No flashy graphics, just the info a volunteer needs. 3) But there's a limit. We can use checklists for tasks where we know exactly what to do. We can't use them for tougher matters like "what sources should I use to build this article" or "what to do when two guidelines offer different editorial suggestions" or "how to resolve a dispute between two editors." And attempting to create a flowchart for these situations is potentially risky if there's a chance some editors will simply use the flowchart unconditionally even when things are misty or shouldn't apply. About the only decent flowchart I can think of on behavioral-type actions is File:MRV Flowchart.png, which happens to be explaining the expectations of a very specific type of consensus discussion that 99.9% of editors shouldn't worry about. SnowFire (talk) 02:30, 7 May 2024 (UTC)
We'll always have the File:NPP flowchart.svg. CMD (talk) 02:58, 7 May 2024 (UTC)
Just what I was thinking of, @Chipmunkdavis … and really helpful. I sure wish I’d known about that file before I posted.
At times while involved in Wikipedia, I feel like an explorer down in a cave filled with treasures invisible to the “naked eye,” which I either stumble on serendipitously or learn out about unexpectedly.
The sense of adventure is fun, but I keep finding myself repeating exactly what I said to you above: “I sure wish I’d known about that!” Augnablik (talk) 18:24, 7 May 2024 (UTC)
Behold, strange and ancient treasures: c:Category:Flow charts for Wikimedia projects. Levivich (talk) 05:38, 8 May 2024 (UTC)
XKCD shenanigans
This would make a great cartoon for Wikidom. Which reminds me that I once saw a hilarious cartoon showing a protest with a Wiki editor holding up a sign saying, "Citation needed!" Might there be a collection of such cartoons somewhere, perhaps down among the "strange and ancient treasures"? Augnablik (talk) 06:23, 9 May 2024 (UTC)
xkcd produced such a cartoon; we have a version here. Similar (but non-free) cartoons can be found in issues 214, 446 545, 739, 903, 906, 978, 1167, 1665 and 2467 I don't know of a list other than the usual search engines. Certes (talk) 09:35, 9 May 2024 (UTC)
@Certes 2782 is one of my favorites. Gråbergs Gråa Sång (talk) 09:57, 9 May 2024 (UTC)
😂, @Gråbergs Gråa Sång. Augnablik (talk) 10:15, 9 May 2024 (UTC)
@Augnablik Compare 2022 United Kingdom government crisis, which got some media coverage: [4] Gråbergs Gråa Sång (talk) 10:29, 9 May 2024 (UTC)
Was this reply meant for this thread? Augnablik (talk) 10:34, 9 May 2024 (UTC)
Off-topic, (sorry) but yes. It was my continuation on "Meryl Streep seagull incident (disambiguation)", bit obscure, perhaps. Gråbergs Gråa Sång (talk) 10:35, 9 May 2024 (UTC)
...and by googling for that I just discovered the explain xkcd wiki. Gråbergs Gråa Sång (talk) 10:01, 9 May 2024 (UTC)
...which uses MediaWiki 1.30, EOL in 2019, and the people who actually run the site are inaccessible Aaron Liu (talk) 19:58, 9 May 2024 (UTC)
You are right, @SnowFire, I posted in the wrong section of the Village Pump. At the time I posted, I didn’t understan there were different sections of “the Pump” instead of just one “pump.” (do pumps have sections?). I think I posted here in WMF because that’s where I first came to the VP, and I saw another proposal (though it turned out to be directly related to WMF). I wonder if I can move both my original post and the replies I’ve received so far over to the Proposals section.
In any event, thanks for such a comprehensive reply to what I wrote. Augnablik (talk) 18:11, 7 May 2024 (UTC)
If you want other editors to work on these kind of diagrams, it would help to list which specific tasks or processes you think they would be helpful for. – Joe (talk) 12:23, 9 May 2024 (UTC)
I hadn’t gotten quite that far, Joe — but every so often, as I noticed multiple things we ‘re supposed to do in various editing situations, I began to wish I had a job aid of some sort.
Confession: because I have an EdD in instructional design, as I get personally tangled up as a learner in various areas … like learning how to do Wiki editing … it’s natural for me to see how particular instructional tools could help me (and others) understand and perform the yet-unknown. Kind of an intriguing situation to be in! Augnablik (talk) 13:11, 9 May 2024 (UTC)
Augnablik, would a possible job aid be the following workflow?
  1. an editor adds a new paragraph to an article
  2. they click publish
  3. Mediawiki highlights the paragraph, asking if adding a citation would be a good thing
If so, I encourage you to use this link, add a new paragraph on the article (more than 50 words, with no citation), click publish, and let me know what you think. :)
Trizek_(WMF) (talk) 12:12, 21 May 2024 (UTC)
@Trizek_(WMF), are you saying you’d like me to add a paragraph on an article about Brittany — though I know rather little about it — then publish it, and I’ll see what you described in your third point happen? Augnablik (talk) 12:37, 21 May 2024 (UTC)
Yes, if you add more than 50 words, with no citation. It is safe: clicking « publish » will NOT publish it:
  1. if the conditions I gave you are fulfilled, the visual editor will trigger the surprise I try to share with you.
  2. if the conditions aren't match, then you will have the edit summary step to stop publishing.
I juste tested it by pasting your reply there.
Trizek_(WMF) (talk) 13:57, 21 May 2024 (UTC)

Another job aid proposal, this time with AI

Perhaps this idea has already been thought of, but I'll propose it anyway and see what happens.

The other day I proposed job aids to help guide editors about what to do in different situations, and I was going to add to it with this one, but that post seems to have been archived. As I thought about some of the feedback it received, with the obvious one about how sometimes there's no clear path to the next step because of some of the legitimate alternative paths, it occurred to me that perhaps some day when AI gets more accurate (not long, it seems!), it could be possible to use that instead of the flow chart format I had in mind.

Has this been discussed by senior editors? If so, is there some sort of a team involved with it? And also if so, anything to share so far? Augnablik (talk) 04:31, 10 May 2024 (UTC)

@Augnablik: Your original suggestion wasn't archived, it's here. – Joe (talk) 07:33, 10 May 2024 (UTC)
Thanks, Joe. Augnablik (talk) 09:30, 10 May 2024 (UTC)
Since no one else has replied yet, I'll go on with my question about whether AI has yet been discussed by senior editors in terms of eventual possibilities to aid editors (especially ones not too far along in their Wiki careers) trying to remember all the Wiki documentation they've pored through, or heard about, when faced with a situation in which they need to do something but aren't quite sure what.
Example: a moment ago, while editing Houseboat, I wanted to create a box in which the term scow would be defined. For a moment, I froze, trying to recall. Although I finally remembered, I'd have loved a Wiki feature similar to Siri that I could query about how to do what I'd wanted, and get a reply. If it weren't what I needed, I'd get a chance to ask again.
This example is just one of many where I see AI could one day be of great help for us, and thus my question on May 10. Augnablik (talk) 10:31, 11 May 2024 (UTC)
Wikipedia:Large language models and [5] may have something of interest. Also related to the use of AI, see Wikipedia:Conflict_of_interest/Noticeboard/Archive_206#Madame_Tussauds_COI and the thread above. Gråbergs Gråa Sång (talk) 11:16, 11 May 2024 (UTC)
A discussion on a similar topic can be seen at Wikipedia:Village pump (proposals)/Archive 211 § AI for WP guidelines/ policies. As I suggested in that thread, I think the best way to follow up is to contact the Wikimedia machine learning team, or any suitable university research group (your academic network should be helpful!), who could potentially partner with Wikimedia on developing something. isaacl (talk) 16:45, 11 May 2024 (UTC)

Martial Artist Name / Rank Style Guide?

Hiya! If you can't tell by my talk page, very very much a newbie to Wikipedia and cannot find a style guide / consistent way that martial artists of notable rank are referred to. Here's three different entries with differing ways of referencing it in the infobox. Richard Norton's is the closest to what I would consider a good example, however referring to someone of dan rank as a "nth dan black belt" is not always technically correct (e.g: Hanshi Tino Ceberano currently wears a red embroidered belt (source) My suggestion is

nth dan title in style

with the appropriate links providing context for title & style. E.g:

9th dan hanshi in Gōjū-ryū

This makes sense to me, but I am still very much a fledgling editor who is quite passionate about martial arts. Open to feedback, and the inevitable it's covered here reply :D TrixieWasAnEgg (talk) 14:58, 22 May 2024 (UTC)

Welcome, @TrixieWasAnEgg. I think Wikipedia:WikiProject Martial arts might be helpful for you. If you don't find an answer on that page, ask on the project's talk page. Schazjmd (talk) 15:09, 22 May 2024 (UTC)
Thankyou kindly @Schazjmd - I figured there'd probable be a place like I just didn't know existed. I appreciate the warm welcome :D TrixieWasAnEgg (talk) 15:42, 22 May 2024 (UTC)

Listing Topics on an article

An idea I've been considering for a while is to include an icon in the top right, linking to an article's good or featured topic on the article itself, akin to how we use the Good Article icon and Featured Article star in articles already. Given how much work goes into topics, and given how beneficial these can be to reader navigation, it feels bizarre that they're relegated almost exclusively to the talk page. I think it'd be a good way to encourage work on topics as well, given that an editor's work will now actually be viewable by the average reader. Obviously, if this can't be done for technical reasons, then disregard it, but I feel this would be beneficial and wouldn't negatively influence the current state of the articles. This is my first time using the pump, so forgive me if this in the wrong section, but I wanted to throw this idea out there due to the fact it would be beneficial for both readers and editors. Has one ever considered Magneton? Pokelego999 (talk) 02:11, 23 May 2024 (UTC)

@Pokelego999:, could you clarify what you mean by "topic" in this context? How about giving a specific example (a certain article, and what links your feature would add). DMacks (talk) 04:27, 23 May 2024 (UTC)
I assume good topics and featured topics. Sdkbtalk 04:32, 23 May 2024 (UTC)
This is an interesting thought! And no technical reason I'm aware of that we couldn't do it. However, I'm not sure I'd support it from an editorial standpoint. My main hesitation is that I feel our editorial decisions should always be informed solely by our editorial process, never by article quality or other non-editorial factors. It's one thing to use a topicon to denote the quality of the article the reader is at — it's relevant information to them that it has undergone a review and is therefore more trustworthy. But a good topic or featured topic topicon would veer into "see also"-type territory, and it just feels wrong to me to have anything like that be based on article quality. I can't quite articulate fully why, but one reason is that our quality content suffers from a lot of systemic bias, and promoting that content more because of its quality would reinforce that bias. Sdkbtalk 04:31, 23 May 2024 (UTC)

Ideas for promotion of the mentorship program

Hello all,

As you are probably aware, the growth team has introduced the Mentorship program on the English Wikipedia a little while ago. Currently, only 50% of new accounts on Wikipedia are assigned a mentor due to the amount of mentors currently available. I would like to know if anyone had ideas about potential ways to promote the mentorship program to experienced editors who would not know about it, or not have considered it?

This could include:

  • Mass-messaging users involved in help forums who might be interested in mentoring
  • Adding mentorship on the Task Centre for experienced editors


Any other ideas welcome!

Cheers, Cocobb8 (💬 talk • ✏️ contribs) 18:20, 11 May 2024 (UTC)

Have you asked @Trizek (WMF) about what's worked at other Wikipedias? WhatamIdoing (talk) 23:11, 17 May 2024 (UTC)
Let's see what he says! Cocobb8 (💬 talk • ✏️ contribs) 21:09, 18 May 2024 (UTC)
Well, you are the only wiki where it is difficult to find enough mentors!
At most other wikis, we got the right amount of mentors to signup; in a few cases, we had too many mentors for the size of the wiki (Ukrainian Wikipedia has so many mentors that some mentors never got a question after 6 months).
My advice: try what you suggested. :) Trizek_(WMF) (talk) 15:09, 20 May 2024 (UTC)
Thanks for the info! @WhatamIdoing, what are your thoughts on the options I proposed for promoting mentorship here on English Wiki? Should I propose adding mentorship on the task centre on the centre's talk page, and formally proposing mass-messaging at proposals in village pump? Cocobb8 (💬 talk • ✏️ contribs) 16:55, 20 May 2024 (UTC)
How about an article in the next Signpost, explaining the program and recruiting experienced editors to sign up? Schazjmd (talk) 16:57, 20 May 2024 (UTC)
@Schazjmd Neat idea as well, I would support that! I have no experience in writing/proposing anything for the signpost though. Cocobb8 (💬 talk • ✏️ contribs) 17:01, 20 May 2024 (UTC)
@Cocobb8, me neither, but they do have a quick-start guide. Schazjmd (talk) 17:05, 20 May 2024 (UTC)
Nice idea! Let me know how I can assist you there @Cocobb8. I can find facts and data, but also stories to share. Trizek_(WMF) (talk) 17:05, 20 May 2024 (UTC)
@Trizek (WMF) and @Schazjmd, I have started a Signpost draft. Please, feel free to edit it and add to it (anyone, really)! The more people who have some things to add, the better chances are we can have a successful Signpost news story to share :). Cocobb8 (💬 talk • ✏️ contribs) 18:54, 20 May 2024 (UTC)
There is also opportunity for graphs and images, so if you have any of value @Trizek (WMF) that would be amazing. Feel also free to comment direclty on the draft's talk page. Cocobb8 (💬 talk • ✏️ contribs) 18:55, 20 May 2024 (UTC)
I have added the mentorship on the Task Center. Cocobb8 (💬 talk • ✏️ contribs) 19:10, 20 May 2024 (UTC)
@Cocobb8, rather than a mass message, if might be more effective to try Snowball sampling. Imagine a template that's easy to post on someone's talk page that says something like this:
Seeking mentors for new editors
The homepage needs new mentors. Mentors are important because... If you're interested in helping new editors learn how to become productive contributors, please take action...

If you know someone that would make a great mentor, please invite them by adding {{subst:Snowball}} to their talk page.

The overall idea is that a semi-personal invitation from someone you know might be more effective than impersonal spam. WhatamIdoing (talk) 19:29, 20 May 2024 (UTC)
Great point, that might be more effective and there would be no need for consensus before posting on talk pages. I'm not really good at creating templates though, but what you proposed looks pretty good! Cocobb8 (💬 talk • ✏️ contribs) 19:31, 20 May 2024 (UTC)
The important part is writing good text. You probably need:
  • a brief explanation of what it is,
  • a reason why this is important,
  • a specific, straightforward action that they should take, and
  • a talk page where they can ask questions.
The format above is just a table. I copied it from one of the Wikipedia:Barnstars. WhatamIdoing (talk) 05:50, 25 May 2024 (UTC)

Quantifying current consensus on LLM usage

Would User:Sohom Datta/LLM be a good description of current consensus in the area? Sohom (talk) 16:20, 23 May 2024 (UTC)

I don't think there is anything close to a consensus yet, although that description is something I would personally agree with. Chaotıċ Enby (talk · contribs) 16:27, 23 May 2024 (UTC)
Your statement generically covers any content: all submitted content must be checked for accuracy, including references, and all submitted content that conflicts with policy can be removed. isaacl (talk) 17:19, 23 May 2024 (UTC)
Yes, definitely. I wanted to use it as a starting point to having a baseline policy/document on the usage of LLMs that we could point editors to. If there is a majority opinion/consensus that we need to do something stronger, I see that as a good outcome. Sohom (talk) 17:40, 23 May 2024 (UTC)
My suggestion for a base line policy on the use of LLMs on Wikipedia: “Don’t”. Blueboar (talk) 12:48, 24 May 2024 (UTC)
+1. Although Sohom's text is the closest formulation I've seen to what I think might be consensus. Levivich (talk) 14:09, 24 May 2024 (UTC)
Agreeing too. And yes, in terms of consensus Sohom's proposal is a baseline we should be able to start with. Chaotıċ Enby (talk · contribs) 14:13, 24 May 2024 (UTC)
Well, I agree with your comment in terms of creating new articles/new content. I'm less convinced that indirectly using LLMs to copyedit a paragraph is really that bad but I'm open to being proven wrong. Sohom (talk) 15:13, 24 May 2024 (UTC)
Yes, I mostly had content generation in mind, although LLMs can add a lot of puffery when "copyediting" so that's also a risk. Chaotıċ Enby (talk · contribs) 15:15, 24 May 2024 (UTC)
Yeah that's fair, I just saw the link you posted on the Discord (relevant link for peeps not on Discord) and I'm now less confident of that statement :) There are ways (anecdotally) of prompt injecting LLMs remove puffery, but I absolutely don't trust new editors to be using them. Sohom (talk) 15:29, 24 May 2024 (UTC)
It's a starting point for anything, though: Any X must be checked for accuracy, including references, and all X that conflicts with policy can be removed. isaacl (talk) 17:17, 24 May 2024 (UTC)
The text you've linked is identical to that of WP:LLMP, which I wrote over a year ago.
I started an RfC, in which a quite wide majority of commenters supported adopting it, and it was closed as "no consensus" for reasons that remain unclear; essentially, the closer arbitrarily chose to interpret a large amount of the support votes as being opposed to each other, in which case it would no longer have a majority. This has been extremely disheartening to me, and I have been intending to initiate a review of the close, but it's been several months, and I think the whole landscape has changed in the meantime. jp×g🗯️ 12:39, 24 May 2024 (UTC)
It's not identical, it's not even similar. Sohom's text takes out the two major provisions of LLMP that most editors in that RFC objected to. From the LLMP RFC closure:

The primary objections from those opposing included concerns about mandatory disclosure and about the proposed ability to summarily remove suspect LLM content...There does seem to be an implied consensus for "Large language model output, if used on Wikipedia, must be manually checked for accuracy (including references it generates)" among those both favoring and opposing this wording but this was not stated explicitly enough by enough editors for me to formally find a consensus for it. Nothing in this close should be construed to suggest that current policies and guidelines do not apply to Large Language models, with a number of editors explicitly noting (especially among those opposing) that current policies and guidelines do apply.

Sohom's text omits the mandatory disclosure and summary removal provisions of the LLMP draft (that's how it differs). The first sentence of Sohom's text is the manual checking provision, and the second sentence of Sohom's text is the existing-PAGs-apply provision, both of which were suggested by the LLMP close as having implied consensus or at least significant support.
@Sohom: I wouldn't go so far as to say that is the "current consensus" as that's never been confirmed, but it certainly seems to match the prevailing views of editors, based on prior discussions (such as the LLMP RFC). Levivich (talk) 12:58, 24 May 2024 (UTC)
You are incorrect.
The proposal was supported by 44 people and opposed by 23, so almost a factor of two -- it was closed as "no consensus" under the completely arbitrary claim that half of the 44 supporters were actually opposed to it.

However, even if we play along with the idea that everyone who said e.g. "I support this as a half-measure for the thing I really want" was lying (?) and disregard their comments, the proposal was still favored by 52% of commenters, both a plurality and an absolute majority.

Furthermore:
Large language model output, if used on Wikipedia, must be manually checked for accuracy (including references it generates), and its use (including which model) must be disclosed by the editor; text added in violation of this policy may be summarily removed.
+
If used on Wikipedia, large language model output must be manually checked for accuracy (including references it generates). If potentially LLM-generated text conflicts with existing policies, it should be removed.
jp×g🗯️ 13:05, 24 May 2024 (UTC)
This is not the place to argue with the close of that RFC, jp. You're derailing this discussion, which is not about the close of that RFC, so I'm going to hat this now. Levivich (talk) 14:05, 24 May 2024 (UTC)
I reverted the hat, the discussion of the consensus established at the previous RfC is absolutely relevant to the topic of Quantifying current consensus on LLM usage Chaotıċ Enby (talk · contribs) 14:09, 24 May 2024 (UTC)
You were the one who brought up that close in the first place -- and to say something that was flatly incorrect (that "most editors objected to" it). Most editors did not object to it. If you're going to be clerking this section, how about you strike "most editors" from your comment, and I'll strike "identical" from mine? jp×g🗯️ 21:38, 24 May 2024 (UTC)
I'm not going to pull out the "if it's disruptive they can be blocked for disruptive editing" line, but due to the lack of a firm consensus that is what I did recently when blocking for, largely, disruptive LLM use. There are tools in the toolbox now, but firmer consensus would be great.
In that recent situation I reverted multiple responses at WP:AN to close reviews that were clearly generated by chatGPT. Editor time is our most valuable resource, so making people argue with robots is disruptive on its face. ScottishFinnishRadish (talk) 14:14, 24 May 2024 (UTC)

In terms of article content, I don't understand why we need anything more than what we already have: "all submitted content must be checked for accuracy, including references, and all submitted content that conflicts with policy can be removed." If content is accurate, referenced, notable, neutral and compliant with copyright, etc. then we should be including it regardless of whether it was written by human or machine. Likewise if it isn't those things we should be removing it regardless of whether it was written by human or machine. Thryduulf (talk) 21:28, 24 May 2024 (UTC)

The issue that arises is that enormous amounts of shit can be created with little or no time. Investigating and addressing that requires a significant investment of editor time to remedy. Time we're already lacking. ScottishFinnishRadish (talk) 21:33, 24 May 2024 (UTC)
That's true of both humans and machines. If it's shit it doesn't matter whether it's human shit or machine shit, get rid of it. Thryduulf (talk) 21:36, 24 May 2024 (UTC)
The difference is that the average human typing speed is fifty words per minute, versus a thousand. jp×g🗯️ 21:45, 24 May 2024 (UTC)
You can generate shit that's hard to easily detect infinitely faster training an LLM. The speed and scale at which large language models (LLMs) can produce content present unique challenges that surpass those associated with human-created content. While human editors certainly can and do produce inaccurate or low-quality material, the volume is naturally limited by human capacity. An LLM, however, can churn out vast quantities of text in a fraction of the time it would take a person. This means that even if only a small fraction of machine-generated content is problematic, the absolute number can quickly become overwhelming.
One major issue is the subtlety of errors in LLM-generated text. Machines can produce content that appears superficially accurate and well-referenced but may contain nuanced inaccuracies or fabrications that are not immediately obvious. These errors can be particularly insidious because they might be less apparent to casual readers or even experienced editors at a glance. The result is that detecting and correcting such content requires a disproportionate amount of editorial effort.
Moreover, the nature of machine-generated text often involves creating convincing but ultimately false narratives, sometimes interwoven with factual information. This can mislead readers and complicate the editorial process as it blurs the lines between fact and fiction. The effort required to meticulously verify every claim, cross-check references, and ensure neutrality becomes exponentially greater when dealing with a high volume of content.
In addition, there is the issue of copyright compliance. While human writers are generally aware of the need to respect intellectual property, LLMs may inadvertently generate content that closely mirrors existing copyrighted material, leading to potential legal issues. This requires an additional layer of scrutiny to ensure that machine-generated content does not violate copyright laws.
The editorial community is already stretched thin, and adding the burden of sifting through vast amounts of machine-generated content exacerbates the problem. Each piece of content, whether human or machine-generated, demands a thorough review to ensure accuracy, neutrality, and compliance with policies. However, the sheer speed and volume at which LLMs can produce text mean that editors could find themselves perpetually behind, struggling to keep up with the influx.
Additionally, the motivation behind content creation differs between humans and machines. While humans generally write with specific purposes, interests, or biases, machines generate content based on prompts and training data without intrinsic motivation or understanding. This can lead to the production of content that is contextually irrelevant, off-topic, or otherwise misaligned with editorial standards and community guidelines.
To address these challenges, it might be necessary to implement stricter pre-publication checks for machine-generated content, including more rigorous fact-checking protocols and automated tools to assist in the detection of subtle inaccuracies. Furthermore, encouraging transparency about the origins of content can help readers and editors apply the appropriate level of scrutiny.
In summary, while the core principles of content inclusion should remain consistent—accuracy, neutrality, notability, and compliance with copyright—the unique challenges posed by machine-generated content necessitate additional considerations. The editorial community must adapt its strategies and tools to effectively manage the influx and maintain the quality and integrity of published content. ScottishFinnishRadish (talk) 21:46, 24 May 2024 (UTC)
More shortly: humans and elephants poop the same thing, but you won't get rid of elephant poop by flushing it down the toilet. Chaotıċ Enby (talk · contribs) 21:54, 24 May 2024 (UTC)
You haven't seen my toilet! Levivich (talk) 22:36, 24 May 2024 (UTC)
And how is a policy against using LLM going to stop bad actors from flooding the encyclopedia with LLM-generated content? In the end, we will have to sort the bad contributions out, and we can deal with editors who make disruptive edits without having to figure out whether the edits are LLM-generated or human-generated. Donald Albury 00:09, 25 May 2024 (UTC)
And how is a policy against X going to prevent bad actors from doing X isn't a good argument.
The benefit of having a policy is that administrators could take action immediately upon recognizing LLM usage with significantly less opportunity cost than there is currently. ScottishFinnishRadish (talk) 00:18, 25 May 2024 (UTC)
The main issue is that there is (or at least was when I last looked a few months ago) no reliable way to detect LLM content - everything has significant rates of both false positives and false negatives. And anyway what we want is not to detect LLM content, but to detect bad content.
how is a policy against X going to prevent bad actors from doing X isn't a good argument why is it not a good argument? A policy that cannot achieve its aim is a bad policy. Thryduulf (talk) 00:26, 25 May 2024 (UTC)
WP:VANDALISM cannot achieve it's aim, so let's get rid of it? That's bonkers. We establish and enforce acceptable and unacceptable behavior all the time. ScottishFinnishRadish (talk) 00:37, 25 May 2024 (UTC)
And how do admins recognize that an editor is adding LLM-generated content? If the edits violate any of our policies or guidelines, we can deal with it. And if edits adding LLM-generated content do not violate any of our policies or guidelines, why should we care? Donald Albury 00:27, 25 May 2024 (UTC)
That depends, do you want to read 18.8 tomats long arguments between LLMs? Do you want to close those discussions, or are you fine with people just feeding discussions into chatgpt and asking it for a close? That's why we need to care.
As for recognizing it, it's pretty easy with a human brain. User talk:FailedMusician#LLM usage has an example of an admin (me) easily recognizing LLM written text. ScottishFinnishRadish (talk) 00:35, 25 May 2024 (UTC)
Looking at the linked thread, this feels like we should have a "LLM-generated text is not exempt from 'don't make up fake sources'" reminder somewhere... Chaotıċ Enby (talk · contribs) 00:44, 25 May 2024 (UTC)
That concerns me far less than using LLMs in internal project discussions. ScottishFinnishRadish (talk) 00:48, 25 May 2024 (UTC)
One doesn't prevent the other, but you're right that the latter is much more worrying indeed. Chaotıċ Enby (talk · contribs) 00:56, 25 May 2024 (UTC)
While some edits are definitely harder to detect and fall into a grey area, having a policy could at least help weed out the most egregious cases. LLM-generated content is often unsourced and way more editorializing than our standards, and this kind of policy could help deal with these mass additions more easily. Chaotıċ Enby (talk · contribs) 00:37, 25 May 2024 (UTC)
having a policy could at least help weed out the most egregious cases how?
LLM-generated content is often unsourced and way more editorializing than our standards often != always, but when it is unsourced and/or more editorialising than our standards we can source it, rewrite it or remove it under existing policy.
this kind of policy could help deal with these mass additions more easily how? Thryduulf (talk) 10:39, 25 May 2024 (UTC)
If users repeatedly add many paragraphs of unsourced, editorializing text that is clearly consistent with the use of keywords used by LLMs (see Wikipedia talk:WikiProject AI Cleanup#Some common AI-generated phrases), it could be good to have a policy allowing us to remove it rather than having to nitpick every single sentence. Chaotıċ Enby (talk · contribs) 11:04, 25 May 2024 (UTC)
If any user repeatedly adds many paragraphs of unsourced editorialising text that's a user problem that needs to be dealt with regardless of whether they're using an LLM or not. Thryduulf (talk) 11:26, 25 May 2024 (UTC)
Except that's what LLMs typically write. That's their default text output. Yes, some users might do it on their own, but LLMs make it so much easier and more frequent that we shouldn't ignore the correlation between the two. Chaotıċ Enby (talk · contribs) 11:48, 25 May 2024 (UTC)
The problem is the output, not the method.
If our current polices are adequate to resolve the problems with the output then adding new ones just for LLMs is a waste of time and effort better spent actually resolving the problems.
If our current policies are not adequate to resolve the problem, then we need to strengthen our existing policies for all methods not just LLMs. Doing otherwise would mean we can't fix them problem when the method isn't LLMs, which would obviously be harmful to the encyclopaedia. Thryduulf (talk) 11:56, 25 May 2024 (UTC)
The issue is that the method (LLMs) completely changes the scale of the problem, which means policies adequate at a smaller scale might not translate effectively at a larger scale, even if the output is superficially similar. While LLM ability to generate text is unlimited, volunteer time isn't, meaning we simply do not have the time to verify every single line of massive LLM outputs like our current policies will have us do. Chaotıċ Enby (talk · contribs) 12:04, 25 May 2024 (UTC)
Your argument only makes sense if you are claiming that large amounts of nonsense in the encyclopaedia is only a problem when it is written by an LLM. If that is not what you are claiming then the final paragraph of my previous comment applies. Thryduulf (talk) 12:08, 25 May 2024 (UTC)
No. I'm claiming that LLMs can add a much larger amount of nonsense than users can write organically. While you claim that Doing otherwise would mean we can't fix them problem when the method isn't LLMs, the fundamental difference is that the scale is completely different. LLMs can write more nonsense in a minute than users can in an hour. Chaotıċ Enby (talk · contribs) 12:11, 25 May 2024 (UTC)
None of that addresses my points. Either our current policies are adequate to handle it or they are not. In neither case do we need a policy specific to LLMs. Thryduulf (talk) 12:33, 25 May 2024 (UTC)
Our current policies are not adequate to handle it, so we need stronger policies, like, being allowed to remove paragraphs of unsourced, editorializing text on sight rather than having to argue every sentence of it. Whether these policies are LLM-specific or not, we still need them because of the disruption potential caused by LLMs. Chaotıċ Enby (talk · contribs) 12:57, 25 May 2024 (UTC)
Also, regarding the "often unsourced" part, the cases where the generated text isn't unsourced are often more concerning, as LLMs are known for creating spurious, plausible-sounding sources, or even citing unrelated sources that don't support the claims. Chaotıċ Enby (talk · contribs) 11:06, 25 May 2024 (UTC)
Again, adding sources that don't support the claim is a problem regardless of whether it's LLM-generated or not. Verifying whether sources actually support what they claim to is something we should be doing for all text added. Thryduulf (talk) 11:28, 25 May 2024 (UTC)
The issue is that users have already (more or less successfully) argued that they were exempt from the blame of adding fake sources if ChatGPT wrote them. This shouldn't be the case, and users should not get off the hook just because they used a LLM to generate fake sources instead of making them up themselves. Chaotıċ Enby (talk · contribs) 12:00, 25 May 2024 (UTC)
users have already (more or less successfully) argued that they were exempt from the blame of adding fake sources if ChatGPT wrote them‹The template Fake citation needed is being considered for merging.› [citation needed]. Policy is already clear that everybody is responsible for the content of the edits they make. If what you claim is true then all we need is to point editors claiming that to the existing policy and noting that there is no exception for LLM-generated text. Once again the problem is the fake citations, not the method used to generate the fake citations. Thryduulf (talk) 12:11, 25 May 2024 (UTC)
See User talk:FailedMusician#LLM usage for an example, where some users argued that ChatGPT-hallucinated sources were not fully the responsibility of the user adding them, or less than made-up sources they wrote themselves. Chaotıċ Enby (talk · contribs) 12:27, 25 May 2024 (UTC)
No they are not claiming that at all - they are claiming that ChatGPT was not used to generate content. Regardless of whether that is true or not, nobody is arguing that they are not responsible for the content of the edits. Thryduulf (talk) 12:37, 25 May 2024 (UTC)
I linked to the highest talk page section for context, but other users later down the line claimed that if you truly believe that this wasn't the work of an LLM, this [...] means the block was even more justified and that adding fake citations is just a simple misunderstanding regarding an LLM (if I misinterpreted these comments, please tell me and I will retract this). Chaotıċ Enby (talk · contribs) 13:01, 25 May 2024 (UTC)
What we have there is a user that didn't know that ChatGPT can hallucinate references and a lot of argument about whether that is a suitable justification for an indefinite block. I'm still not seeing anybody claim that the editor inserting such sources is not responsible for them, just disputing whether someone should be indefinitely blocked for doing something they didn't know was wrong. Thryduulf (talk) 14:00, 25 May 2024 (UTC)
That's the issue. Whether a user knows or doesn't know that ChatGPT can hallucinate references, they are still responsible for checking that whatever they add is not made up, and not knowing about what they added should not exempt them from this responsibility. Chaotıċ Enby (talk · contribs) 14:18, 25 May 2024 (UTC)
And nobody is arguing anything different. A few years ago I saw something that claimed child pornography wasn't banned in one of the countries in the Middle East. The justification for this claim was that there was no law specifically banning child pornography. While technically true, the reason for this was because that country bans all pornography, so no specific law was needed. It's the same issue here - we don't need a specific rule against adding fake references using LLMs because we already have a rule against adding fake references. Thryduulf (talk) 14:27, 25 May 2024 (UTC)
Yes, that makes sense and I agree. I just feel like it would be helpful to have it spelled out that "ChatGPT wrote this" is not an exemption, but I agree that it does already fall under our current policies. Chaotıċ Enby (talk · contribs) 15:14, 25 May 2024 (UTC)

Inflation template

The template is a terrific idea, but can give ludicrously precise quotients. An example I recently came across in the Miles Davis article:

". . . leaving Davis to pay over $25,000 (equivalent to $251,800 in 2023)"

It's a basic principle that the solution of a conversion should not have a greater degree of precision than that of the supplied data. That applies to measurements where the conversion factor is known with a high degree of accuracy, but of course depreciation of the value of currency depends on the medium of exchange, be it gold or a tradesman's wages. In an improved template, the above should, by default, read:

". . . leaving Davis to pay over $25,000 (equivalent to roughly $250,000 in 2023)" (my bolding)

with the option for the editor to increase or decrease the level of precision for special purposes. Doug butler (talk) 14:50, 14 May 2024 (UTC)

IMO {{inflation}} should be deleted and Wikipedia shouldn't be in the business of calculating inflation. {{inflation/US}} uses the Consumer price index (CPI). The problems with using CPI to measure inflation are extremely well-reported, so much so that even the Bureau of Labor Statistics, which published CPI in the US, has said The CPI cannot claim to be a completely precise measure of inflation and publishes the variance of its estimates. {{inflation/US}}, however, doesn't present the figure as a range, it provides a calculation, leading to false precision (on top of the problems with using CPI in the first place). I have no idea what {{inflation}} uses for countries other than the US or how accurate it is. And I don't know how often the numbers are updated or with what precision or what the qualifications are of the editors who are doing the updating. If you ask me, the templates should be deleted, we are basically misinforming the reader when we say that $X in year Y is equal to $Z today. Such statements should be cited to reliable sources, not to CPI calculations by editors. It's WP:OR. Levivich (talk) 15:21, 14 May 2024 (UTC)
Well, if you don't know (and cannot be bothered to look up) anything about the template, how it works, where the information comes from, or how it's computed, why would your opinion about it be useful or relevant? jp×g🗯️ 04:03, 15 May 2024 (UTC)
FWIW, I fully agree with you on this. Compassionate727 (T·C) 16:25, 26 May 2024 (UTC)
I would agree. If I recall correctly, the inflation template does some amount of rounding (either by default or as an option, I can't remember) so as to avoid false precision of this sort. jp×g🗯️ 04:05, 15 May 2024 (UTC)
The template already has a parameter to round the output - r=, see Template:Inflation#Rounding so what the OP is really asking is to change the default from unit precision to the same precision as the source data. Given that it is not possible for the template to know the precision of an input (e.g. is £2000 accurate to 1, 2, 3 or 4 figures?) the option to specify a non-default level of precision is always going to be required.
All pages that currently correctly use the default precision would need to be adjusted to explicitly specify that before changing the default so as to avoid introducing inaccuracies. Given that there is no automatic way to know which articles are using the default correctly and which are using it incorrectly, every such usage would need to be examined by a human. That would be a very large job, for not really much benefit. Thryduulf (talk) 08:23, 15 May 2024 (UTC)
I agree that inflation figures usually need rounding, but it is difficult to get the default right. Whenever you see this type of false precision, WP:SOFIXIT by using the parameter as suggested by Thryduulf. —Kusma (talk) 09:18, 15 May 2024 (UTC)
Yep.  — SMcCandlish ¢ 😼  18:20, 15 May 2024 (UTC)
I've marked it as such in the templatedata. Aaron Liu (talk) 18:27, 15 May 2024 (UTC)
It would surely be possible to automatically calculate the number of significant figures in the input and by default round to the same. Compassionate727 (T·C) 00:23, 26 May 2024 (UTC)
It isn't possible to reliably calculate the number of significant figures in the input without the context from the source material. For example "It cost £2000" could be accurate to 1, 2, 3 or 4 significant figures. Thryduulf (talk) 00:34, 26 May 2024 (UTC)
Nobody on Wikipedia uses scientific notation or "$4000. blabla were lost." with that dot. Aaron Liu (talk) 01:17, 26 May 2024 (UTC)
Good points. However, I do think that for numbers ending in some quantity of zeroes, it would probably be okay to round by default to the corresponding number of preceding digits, especially given the vagaries inherent in inflation calculation. Compassionate727 (T·C) 16:23, 26 May 2024 (UTC)
Hmm, good point. I guess the user could always specify if it's different. Aaron Liu (talk) 19:34, 26 May 2024 (UTC)

Wikipedia:Naming conventions (royalty and nobility) - RfC drafting for reversion of the November 2023 change

In November 2023, NCROY was altered by consensus to instruct editors to not disambiguate royalty and nobility with their geography unless disambiguation is required.

This has proven controversial with some editors arguing that the result does not reflect the consensus of the broader community; there has been considerable disruption as a result of this.

To resolve this I believe a second, broadly advertised, RfC would be beneficial; this would be held at the Village Pump, be listed at WP:CENT, and ping all the editors involved in the recent RfC as well as any relevant RM's. One way or the other, this should provide a path to resolving this dispute; I am opening this discussion to help draft it with the intent of opening it once the current ArbCom case request closes.

My initial proposed question is:

Should our naming convention on royalty and nobility instruct editors to generally disambiguate royalty and nobility with their geography, unless there is an "overwhelming commonname"?

The context for this discussion includes:

  1. A November 2023 RfC consensus instructing editors to disambiguate only if disambiguation is required.
  2. A May 2023 ArbCom case request that raised concerns about disruption in the topic area. This case lists a number of recent requested moves and move reviews.
  3. A village pump discussion drafting this RfC.

BilledMammal (talk) 05:11, 13 May 2024 (UTC)

It's a good question, in my opinion. Deb (talk) 10:18, 13 May 2024 (UTC)
First thought: I'm not convinced that the RFC should lump royalty and nobility together. The changes made following the November RFC related only to monarchs. Non-ruling nobility are a different kettle of fish with their own issues, but if we want to avoid getting sidetracked it would be best to keep the focus on monarchs only at this point IMO. Rosbif73 (talk) 07:38, 13 May 2024 (UTC)
I'm not sure that the proposal quite captures the nature of the opposition. In at least some of the RMs the dispute is whether NCROY is more, less or equally important than other guidelines (e.g. WP:PRECISE, WP:RECOGNISABLE, WP:PRIMARY, WP:COMMONNAME). I think a better question would be something like "Should articles about monarchs include a geographical element in the title when it would be unambiguous without it (e.g. "Oliver III" or "Oliver III of Montenegro")?" The possible answers to that should be "(almost) always", "sometimes" and "(almost) never". Including the "sometimes" option is important, further discussion would probably be needed (if that gets consensus) to determine whether there should be guidance (and if so what) about when to include and when not to inlcude. Thryduulf (talk) 09:08, 13 May 2024 (UTC)
  • @User:BilledMammal Thank you for the initiative.
  • For pragmatic reasons, I'd also like to consider the possibility of a carve-out for British monarchs and/or 20th Century monarchs (or some other arbitrary date). We're searching for a norm that is workable for thousands of articles, across two thousand years of history, covering hundreds of countries, serving readers from all across the globe. The search for a workable norm should not be held hostage to nationalist squabbles of recent or local interest.
  • It should also consider the matter of variant spellings of names. Monarch names are usually translated in reliable sources, often in a variety of ways (Louis, Luis, Ludwig, Ludovico, Luigi, Lajos, Lodewijk etc. are all the same name). Wikipedia readers come from all sorts of backgrounds, using different sources with different spellings, and they should not have to guess which language or spelling Wikipedia editors happened to choose. The aforementioned "Oliver III" is the same name as "Olivier III".
  • Expanding the norm with full designation "king" should be considered. That is, to obtain "Oliver III, King of Montenegro", or "King Oliver III of Montenegro" or "King Oliver III" (which is how he is usually referred to in most RSs, e.g. Britannica, indexes of books, etc.). This would be consistent with how we treat non-numeral monarchs (e.g. "John, King of England"), and practically all nobility and peers articles ("Geoffrey II, Count of Anjou", "William Cavendish-Bentinck, 3rd Duke of Portland" etc.). I expect this would not find much support among minimalists. But maybe should be an option to consider. It would be particularly useful for such unrecognizables as Nicholas II (Who? Pope? King? Duke? Rocket? Ship? Movie? Hedge fund?), who would be instantly recognizable as "Tsar Nicholas II".
  • On Sovereign vs. Nobility. Clarification needs to be made for non-sovereign German & Italian nobles, who some editors oddly think or treat as sovereigns, and try to apply WP:NCROY rather than WP:NCPEER (weirdly arguing that NCPEER only applies to British). And if they are treated as sovereigns, then does this apply also to great French nobles, Polish nobles, Danish nobles, etc.? If the norms are going to be different, then it needs to be clear who is sovereign and who is nobility.
  • On Big vs. Small Countries. This came up in RMs. If shortening is permitted, what, if any, safeguards will there be for small countries? Or shall big famous countries (Great Britain, France, etc.) be allowed to dominate the titling? In the previous norm, "Oliver III of France" is on equal footing with "Oliver III of Montenegro" - article titles are distinct, neither primary over the other. But with shortening, which is the WP:PRIMARYTOPIC for "Oliver III"? The current post-RFC says to keep country "when disambiguation needed", but nonetheless that was not respected. Once shortening was allowed there was an immediate spate of RMs to move the kings of large countries as primary at the expense of small countries (or other units - e.g. Tsars are apparently primary over Popes for some reason). France is a big country, with a large population, and a lot more history books written about it, whereas Montenegro is a small country with fewer works on it. So if allowed to shorten, then it is very easy for RMs to insist that the French king is "primary topic" for "Oliver III", and relegate the Montenegrin Oliver III. I don't think Wikipedia should be setting up a norm that reinforces the dominance or superiority of big countries over small countries. The insinuation is disturbing: "my king is more important than your king", "my country is large & important, yours is small & irrelevant" etc. This is not something Wikipedia guidelines should endorse, in an international encyclopedia, written for a global audience. It is not only distasteful in itself, it it also setting up a pig's breakfast that will feed a lot of horrific nationalist squabbles (France over Sweden, Great Britain over Georgia, Serbia over Montenegro, Spain over Portugal, Russia over Ukraine, etc.). So I'd like the proposal to contain CLEAR safeguards that prevent large country monarchs from dominating titling, and protect monarchs of small countries from being relegated to second tier status. Walrasiad (talk) 10:32, 13 May 2024 (UTC)
If one of the contenders for a primary topic features much more prominently in the history books (or other reliable sources) than all the others, then it is normal for them to be considered primary. Sure, in many cases that will appear to favour "big countries", but there are also cases where the sources show a prominent monarch from a "small country" to be of greater long-term significance than their namesakes from larger countries. In all cases, primacy on Wikipedia is simply a reflection of primacy in sources, not any form of judgement on the relative importance of the countries. The only safeguards that are needed are already laid down in WP:PRIMARYTOPIC. Rosbif73 (talk) 13:21, 13 May 2024 (UTC)
Regarding translation / variant spellings of names, Wikipedia should simply reflect the preferred spelling in reliable English-language sources. It is perfectly normal, per WP:SMALLDIFFERENCES, for Maria I to be a different person than Mary I. It is also normal to find Philip V but Felipe VI, because the norms in English-language sources have changed over time. Any potential confusion can and should be cleared up via standard wiki mechanisms such as hatnotes and short descriptions. Rosbif73 (talk) 13:49, 13 May 2024 (UTC)
Strongly disagree with both your points.
(1) Deliberately introducing large country bias should not be acceptable in Wikipedia guidelines, for both moral and practical reasons. (WP:SYSTEMICBIAS). There are ways of avoiding it. I would hope for a better answer than that.
(2) You can title the article as per RSs. But you can't rely on "small differences" when the sources Wikipedia readers are coming from use many variations. Readers of Wikipedia can be expected to read & understand English. They should not be also expected to know Romanian, Danish, Portuguese, etc. and certainly not be demanded to guess the idiosyncratic tastes of Wikipedia editors. Walrasiad (talk) 14:16, 13 May 2024 (UTC)
You raise some points that will be worthwhile to consider - but we also need to keep in mind that the primary goal is to conclusively resolve whether editors who oppose the recent moves are correct that the November 2023 RfC did not reflect community consensus.
Because of that I want to keep the primary question of the RfC simple, so that this question can be clearly answered - if the primary question starts to deviate too far from the pre-RFC status quo then it will become unclear whether the community opposes recommending geographical disambiguation even when otherwise not required, or whether the community merely opposes the additional changes proposed in that RfC.
However, if we are going to run a widely advertised RfC then it may be appropriate to make the RfC a multi-part one, to take full advantage of that attention - are any of your points part of long-running disputes that it would be helpful to bring the broader communities attention to? BilledMammal (talk) 17:41, 13 May 2024 (UTC)
I think we should almost certainly phrase this in terms of an example. Using Louis XVI, the question would be:

In the absence of a need to disambiguate, how should we title the articles of monarchs?

  1. Louis XVI
  2. King Louis XVI
  3. Louis XVI of France
  4. King Louis XVI of France

Loki (talk) 13:34, 13 May 2024 (UTC)

I like this formulation because I find it very easy to understand. But whether the rule (whatever it may be) should apply to monarchs or nobles or both, and whether it should apply to all monarchs/nobles or just some, seem to be live issues? I also think the polling should be set up as ranked voting so people can express a preference for, e.g., 3/4 over 1/2 (include geography regardless of title), or 1/3 over 2/4 (exclude title regardless of geography), or 1>2/3>4 (less is better), or 4>2/3>1 (more is better), etc. Levivich (talk) 14:57, 13 May 2024 (UTC)
Having read WP:NCROY more closely, I would like to also add 5. Louis XVI, King of France as a possibility, to match what it currently recommends for monarchs with a title lower than king. Loki (talk) 21:19, 13 May 2024 (UTC)
I like this phrasing. I would adjust it slightly, though, to include both the general description and the example:

In the absence of a need to disambiguate, how should we title the articles of monarchs?

  1. Regnal name and nominals; eg Louis XVI
  2. Title, regnal name, and nominals; eg King Louis XVI
  3. Regnal name, nominals, and realm; eg Louis XVI of France
  4. Title, regnal name, nominals, and realm; eg King Louis XVI of France

If you support multiple, please rank your preferences. If the closer finds it necessary to resolve preferences, they will resolve them through the single transferable vote method.

This also includes Levivich's suggestion of ranked voting; I've also added a proposed method for resolving the preferences, as previous ranked !votes have resulted in disputes over the method used - by specify it at the start we should be able to avoid that. BilledMammal (talk) 17:52, 13 May 2024 (UTC)
Should we really be envisaging a poll (single transferable vote or otherwise) rather than the usual assessment by the closer of policy-based arguments? WP:VOTE reminds us that the use of polls is often controversial and never binding. Rosbif73 (talk) 19:05, 13 May 2024 (UTC)
I agree that the only allusion to voting we should have is an instruction to rank choices. Saying the closer will resolve "votes" in any particular way mistakes how RFCs work. While they can feel similar to votes from the perspective of the participants, from the perspective of the closer they're very clearly not votes. Loki (talk) 02:29, 14 May 2024 (UTC)
While I understand the impulse here, in my view the actual effect of including the description is to add a bunch of extra jargon that's completely redundant with the examples. Loki (talk) 02:32, 14 May 2024 (UTC)
How about
  1. Louis XVI (regnal name and nominals)
  2. King Louis XVI (title, regnal name, and nominals)
  3. Louis XVI of France (regnal name, nominals, and realm)
  4. King Louis XVI of France (title, regnal name, nominals, and realm)
  5. Louis XVI, King of France (regnal name, nominals, title, and realm)
Levivich (talk) 02:47, 14 May 2024 (UTC)
I like that a lot better. I'm still not convinced the description is necessary, but I'd accept it. Loki (talk) 03:37, 14 May 2024 (UTC)
What about Louis XVI (king of France) (following standard practice, no special rules for monarchs) or Louis XVI (France) (how German WP does it)? Not all options would be possible for every king, since some names are ambiguous. We have to distinguish between the question of how to handle kings like Henry IV (which one?) and Louis XVI (no question of primary topic). Srnec (talk) 15:45, 14 May 2024 (UTC)
At this time, I have no comment on which of Loki's proposals may be the "best". However, after seeing the comment on German Wikipedia article titling, I do want to the note that according to WP:CONSISTENT (emphasis mine): The English Wikipedia is ... under no obligation to use consistent titles with other language versions of Wikipedia. AndrewPeterT (talk) (contribs) 15:54, 14 May 2024 (UTC)
The English Wikipedia is indeed under no obligation to be consistent with other language Wikipedias, however that is irrelevant to Srnec's comment. If another language Wikipedia solves a problem in a certain way, it is entirely reasonable to suggest including that way in a list of options for how to solve that same problem on the English Wikipedia. Thryduulf (talk) 21:28, 14 May 2024 (UTC)
  • One other point worth clarifying is the definition of "Europe" (assuming that the November 2023 change is preserved). There were some recent RMs over Georgian monarchs for which it was disputed whether Georgia was a European country. -- King of ♥ 17:41, 13 May 2024 (UTC)
    If we are going to include multiple questions I like the idea of focusing on ones that will apply regardless of the result of the primary question, such as the question you raise here. BilledMammal (talk) 17:53, 13 May 2024 (UTC)
    The introduction to NCROY sets out its scope as being European monarchs that share a set of given names, and tells us that elsewhere, territorial designations are usually unnecessary in article titles. Georgia is something of a special case: some of its monarchs share given names (Stephen, David, George, Michael, Alexander, Constantine, Simon) with western European monarchs, but most do not. The important point is the namestock, not the geographical location. Rosbif73 (talk) 19:34, 14 May 2024 (UTC)
    Indeed, it may be worth amending the guideline to explicitly apply to all names following the (name)(ordinal)[of territory] pattern, because Georgia is not the only non-European example of this. (E.g., Musa III [of Mali]). Compassionate727 (T·C) 16:16, 26 May 2024 (UTC)
    Yep. If the proposed RfC takes place and confirms the current status (i.e. use a territorial designation only if disambiguation is necessary) then we really ought to review the entire guideline. As it stands, the November RfC was implemented by making a minimal change to the wording, and some of the provisions in the introduction could do with being revised for consistency. Rosbif73 (talk) 06:30, 27 May 2024 (UTC)
BilledMammal, thank you for taking the time to start this conversation. Because my previous comments on this topic (namely in a previous November 2023 request for comment (RfC) on this same matter) have received negative feedback, I have no comment on the scope of potential new RfC at this time. My only hope is that the community will finally be at a mutually agreeable place with WP:NCROY after this discussion concludes, however long it takes. AndrewPeterT (talk) (contribs) 18:43, 13 May 2024 (UTC)
That being said, should a notice at WP:AN to invite an uninvolved administrator to monitor and close this possible RfC be posted? This way, any problematic conduct can be immediately addressed. AndrewPeterT (talk) (contribs) 18:43, 13 May 2024 (UTC)
  • I would also note that European monarchs whose rank is below that of emperor or king (i.e. those of the tiny German states in the Holy Roman Empire) follow the different notational standard established in WP:NCROY#5 ([Name] [Ordinal if applicable], [Title] of [Primary holding]; ex. Maximilian I, Elector of Bavaria or Casimir, Margrave of Brandenburg-Kulmbach). So the latest proposed RfC question In the absence of a need to disambiguate, how should we title the articles of monarchs? may more accurately be titled something along the lines of [...] how should we title the articles of European imperial and royal monarchs?. Curbon7 (talk) 20:53, 13 May 2024 (UTC)
  • I think all of the above is skirting the real issue. The scope here is only those articles where the COMMONNAME for the subject does not include regional information ("of country") and is sufficiently PRECISE to distinguish from other uses (i.e., the COMMONNAME is either unique or this use is the PRIMARYTOPIC for the COMMONNAME). That is, if usage in RS demonstrates that including the regional information is the COMMONNAME, or the regional information is necessary for disambiguation because the COMMONNAME is not unique and this use is not primary for this COMMONNAME, then there is no issue about including the regional information. So the RfC question should reflect exactly that:

When the COMMONNAME for a sovereign or royalty subject does not include "of country" regional information, and the COMMONNAME is unique, or the subject is primary for its COMMONNAME, should we ever include "of country" information in the title? If so, when and why?

Of course, people can also disagree about what the COMMONNAME is based on usage in RS, but in those cases CONCISION is an excellent and convenient tie-breaker, favoring leaving off the regional information. Similarly, there can be disagreement about whether a given use is primary for the COMMONNAME in question, but that's no different than for any other article with an ambiguous COMMONNAME and there is debate about whether the topic is primary, and is not a problem unique to NCROY. So we don't need to be concerned with addressing those cases in this guideline.
--В²C 03:41, 17 May 2024 (UTC)
As I asked elsewhere, can you provide any actual examples of sovereigns whose regnal name would be unambiguous without a territorial designation but whose COMMONNAME unequivocally includes one? If not, then this issue is moot and should be kept out of the guideline to avoid future contention. Rosbif73 (talk) 06:41, 17 May 2024 (UTC)
I wanted to account for the possibility for such cases. But if you want to assume there are none, fine:

When the COMMONNAME for a sovereign or royalty subject is unique, or the subject is primary for its COMMONNAME, should we ever include "of country" information in the title? If so, when and why?

В²C 23:05, 17 May 2024 (UTC)
  • I think that the missing piece of context is: A lot (most? nearly all?) of the individual RMs since that November RFC have come to the opposite conclusion. It's bad form to have a rule saying the opposite of what the community wants to do. WhatamIdoing (talk) 23:16, 17 May 2024 (UTC)
    The majority of the decisions have been consistent with NCROY since it was revised. —В²C 12:49, 24 May 2024 (UTC)
    That's obscuring that many of the decisions have been highly controversial and there have been comments that support moves to match NCROY with the sole purpose of matching NCROY rather than any opinion about whether that is best - indeed I've seen at least one comment that supported a move to match NCROY despite the editor thinking that was an inferior title. Thryduulf (talk) 16:28, 24 May 2024 (UTC)
    That may be, but presumably most editors who supported "per NCROY" like the guideline's prescription. Those who dislike it have been vocal enough. Compassionate727 (T·C) 22:24, 25 May 2024 (UTC)
    Maybe, maybe not. We have insufficient evidence to say one way or the other. Thryduulf (talk) 00:23, 26 May 2024 (UTC)

idea for a new resource; "Town Hall"

I would like to suggest a new type of resource here for discussions. It would be called a "Town Hall." The purpose would be a place where wikipedians for various communities could have group discussions, centered on a particular field, i.e. based on topic.

  • One main idea is that there would be various town halls centered around groups of active WikiProjects, based on their general topical category.
    • Would also include any other groups of wikipedians who edit various topics, as a group.
    • E.g. one town hall would be all wikipedians who work on wikiprojects focused on science. and another for all based around history. and another for all wikiprojects centered on specific places such as cities, countries, etc.
  • a wikiproject would add the relevant Town Hall to their own page, by adding a new tab to their own tab header, where that Town Hall would be transcluded.

Ok so what do you think of that? Open to feedback. Thanks! Sm8900 (talk) 15:17, 23 May 2024 (UTC)

Comments on "Town Hall" idea

I like the idea. Would it be for Wikipedia-related discussion or topic-focused discussion? Cocobb8 (💬 talk • ✏️ contribs) 15:29, 23 May 2024 (UTC)
it would be mainly topic-focused discussion, but it would be as it relates to new efforts to add new entries to wikipedia, or refine existing entries, but the discussion itself would have a very broad scope and latitude, within the topic itself of course. and thanks for your reply! Sm8900 (talk) 15:31, 23 May 2024 (UTC)
I think this a great proposition because rn most discussion places have to be Wikipedia-related Cocobb8 (💬 talk • ✏️ contribs) 15:37, 23 May 2024 (UTC)
We already have larger and more encompassing wikiprojects like Science, and many projects have task forces. I don't see the need, which will also require some infrastructure in guidelines. Aaron Liu (talk) 15:39, 23 May 2024 (UTC)
I think the point that Sm8900 is making is that it would allow for discussion on the topic to happen, scuh that it wouldn't need to be related on specific articles to improve, Cocobb8 (💬 talk • ✏️ contribs) 15:41, 23 May 2024 (UTC)
I don't think any stretch beyond reference desks would be accepted, due to WP:NOTFORUM. Aaron Liu (talk) 15:43, 23 May 2024 (UTC)
Hmm good point. Cocobb8 (💬 talk • ✏️ contribs) 15:44, 23 May 2024 (UTC)

ok. any other comments? Sm8900 (talk) 16:30, 30 May 2024 (UTC)

Guestbook/New Editors' Corner?

My proposal is to create a simple page in which new editors can say hi and other editors can greet them. It will introduce newcomers to our community and the concept of editing and communicating. I think being greeted and welcomed would create a warmer atmosphere and aid in editor retention. Korean Wikipedia has a space identical to my proposal Do you see any potential issues? What should the pagename be? From where should it be linked? Ca talk to me! 14:28, 29 May 2024 (UTC)

Are you familiar with the Wikipedia:Tea Room? I'm not active there but my first impression is that covers much of what you propose. Thryduulf (talk) 15:20, 29 May 2024 (UTC)
Regarding the place where to put a link to any greeting space, each newcomer has access to Special:Homepage as their first step while onboarding. Trizek_(WMF) (talk) 15:24, 29 May 2024 (UTC)
WP:TEAHOUSE is more for answering questions from new editors. What I am thinking of is a space for new editors to introduce themselves or just say hi. Ca talk to me! 15:25, 29 May 2024 (UTC)
I totally support this idea. Sm8900 (talk) 16:29, 30 May 2024 (UTC)
I would also support this. Cocobb8 (💬 talk • ✏️ contribs) 20:01, 31 May 2024 (UTC)
The best way for new editors to introduce themselves is to start editing. They can introduce themselves on their user pages, in case anyone cares. EEng 20:49, 31 May 2024 (UTC)

Better pipeline for anonymous AFD nomimations

Currently, if an anonymous editor wants to nominate an article for deletion, they have to make a request at WT:AFD. Sometimes it gets done promptly, sometimes slowly, and sometimes not at all. This also tends to flood this page with AFD requests instead of general talk about the main AFD page or specific AFD-related issues.

It might be good to have a more streamlined approach for this situation. Maybe a template that could be placed on the article's talk page that would place open requests onto a tracking page, like how edit requests are handled? Maybe just a dedicated subpage with a clear "yes", "no", or "more info needed" response to requests?

Thoughts? Suggestions? 35.139.154.158 (talk) 17:02, 2 June 2024 (UTC)

It sounds like you're asking for another page that would still work the same way: users post their nominations somewhere, and logged-in editors use that as a backlog. I don't see how either of your suggestions would make that more efficient. Toughpigs (talk) 17:24, 2 June 2024 (UTC)
I'm not sure "efficiency" is the full goal; if you've watchlisted the page interested in policy changes in the text of the page or development of new procedures but have no interest in helping with anonymous deletion requests (and I have zero idea how many people that describes), you're getting a lot of unhelpful notifications. Meanwhile, discussion topics are archived quicker than they might be otherwise because there is so much more traffic (example this attempt to suggest a new notification related to AfDs was on the talk page less than a month.) So the current situation is not ideal, but if we shuffled it off to its own tracking board, I'm not sure how many people would move over there to actually answer requests (he says, looking at the months of wait for a AfC.) -- Nat Gertler (talk) 18:53, 2 June 2024 (UTC)
I agree with Toughpigs. Plus, if you can't be bothered to read to find out how to do it in a way that even if you mess up someone'll notice it, I doubt that you have read deletion policies. Aaron Liu (talk) 17:41, 2 June 2024 (UTC)


Talk page archives and RfCs

Hey, don't do that, you should know there was an important RfC 6 years ago on talk page archive page #7 of #14. The only one who remembers has retired from Wikipedia. There should be a way for perennially important consensus results to be permanently and easily visible on the main talk page. Easily accessible not just for those with a long memory or time to doom scroll talk archives. -- GreenC 18:08, 27 May 2024 (UTC)

There was very recently a discussion (archived at Wikipedia:Administrators' noticeboard/Archive362) about these "/Consensus" subpages listing key points, often decided in previous RfCs. It seems to be the closest to what you are looking for, if you wish to take a look at that discussion. Chaotıċ Enby (talk · contribs) 18:14, 27 May 2024 (UTC)
Apparantly there are only 7 of these, and they are controversial. I was thinking like the mechanism to notify of prior AfDs. This could include prior RfCs and prior RMs. That's all, nothing proscribing consensus, only notifying the location of formal consensus discussions, and maybe a brief summary of the result. -- GreenC 18:27, 27 May 2024 (UTC)
That could be a great improvement on that old "Current consensus" system indeed! Chaotıċ Enby (talk · contribs) 18:37, 27 May 2024 (UTC)
Templates already existing:
  1. {{Old XfD multi}} for AfD and related deletion discussions.
  2. {{Old moves}} for RMs.
Templates that might be useful:
  1. {{Discussion interlink}} for RfC notification on top of talk page, using the |T=yes option. See also Category:Wikipedia requests for comment templates
I added a discussion interlink to the top of Talk:Elizabeth_Holmes, it's not very good or clear. This is not what the template was designed for. -- GreenC 19:21, 27 May 2024 (UTC)
FAQ, a similar system, has 574 results. Aaron Liu (talk) 16:24, 28 May 2024 (UTC)
User:Aaron Liu, FAQ could work. Although it looks easy to abuse: gaslighting non-existent or weak consensus. Looking at Talk:Shin_Bet/FAQ, consensus is described, but there is no link a consensus discussion(!) Checking the talk archives, there was never an RM for this question. Only a few talk page posts, but no clear consensus discussion. Recently I had 4 users tell me, forcibly, that there was established consensus end of story. I disagreed, started an extremely neutral RfC (black and white question this or that), they go so angry, they almost took me to ANI for "abusing the consensus process". After 30 days the RfC closed and my position prevailed, they lost. I guarantee you they would have used FAQ and "current consensus" to continue gaslighting their non-existent consensus. -- GreenC 18:10, 28 May 2024 (UTC)
The same thing can happen to any way of "pinning" consensus. Such gaslighting is very disruptive editing, and I believe the benefits outweigh the risks here. Aaron Liu (talk) 01:04, 29 May 2024 (UTC)
Also, there was, in fact, a requested move. I'll edit that FAQ entry to include relevant information. Aaron Liu (talk) 01:05, 29 May 2024 (UTC)
I don't see how notifying users of previous RfCs is the same. There is no interpretation of results, like you see in Talk:Shin_Bet/FAQ. They are qualitatively different things. -- GreenC 21:57, 29 May 2024 (UTC)
I don't see why one must include the entire RfC to inform on prior consensus. If something's misleading, overwrite it. Aaron Liu (talk) 22:28, 29 May 2024 (UTC)
Hmm.. check Talk:Jack_Schlossberg. Notice the list of previous AfD's listed at the top. It is generated by {{Old XfD multi}}. This is all, except for RfCs. It's a way for readers to find old consensus discussions in the archives. It's completely neutral and informative and in-line with other similar systems for AfDs and RMs. -- GreenC 00:18, 30 May 2024 (UTC)
I don't see how omitting a summary of rationale makes things more neutral or helpful. Aaron Liu (talk) 00:40, 30 May 2024 (UTC)
It's helpful because it removes the ability to gaslight users with fake consensus results like we see with the broken FAQ system at Talk:Shin Bet/FAQ. The proposed template {{Old RfCs}} would display the question and a link to the actual RfC result, so users can explore it directly, for themselves. Nobody should be describing the consensus, it's unnecessary and easily prejudiced. -- GreenC 05:23, 30 May 2024 (UTC)
As if you can’t just link and claim consensus or change the bolded AfD/RfC result to something you’d like. Disrupters gotta disrupt, and only blocks and sanctions may stop them. Aaron Liu (talk) 10:58, 30 May 2024 (UTC)
Also, results like userfy, keep and delete have clear meanings across all AfD contexts while these and Option 4 don’t. Aaron Liu (talk) 11:00, 30 May 2024 (UTC)
Now you're at the level of vandals, and obviously that is true for the entire site on every level with everything so we might as well shut Wikipedia down because it's hopeless. But there is a qualitative difference between a system like FAQ and {{Old RfCs}}, if you can't see it, I don't know what else to say. -- GreenC 14:41, 30 May 2024 (UTC)
I am indeed saying that an {{old AfD}} analogue does not provide any more protection. Your premise is that anyone can edit the FAQ to be absolutely wrong and mine is that anyone can edit the results to be absolutely wrong. Aaron Liu (talk) 15:02, 30 May 2024 (UTC)
Talk:Elizabeth_Holmes#RfC_list_(pinned) works fine, until we get a template like {{old RfC}}. You will notice it does one thing, and one thing only: link to previous RfCs. Thus, when someone uses the word "fraudster" in the article, one can simply revert with the comment "see talk page", and the person will know where to find the discussion. All this stuff about rephrasing the consensus results, indeed even creating consensus results, is nuts it will be a continuous source of trouble. -- GreenC 02:52, 3 June 2024 (UTC)
I think the /Consensus pages interpreted as binding policy exist on somewhat questionable procedural grounds, but the /FAQ pages are a very good idea, and ought to be used wherever appropriate, jp×g🗯️ 21:37, 30 May 2024 (UTC)
What I've done is create a frequently asked questions page, and use {{FAQ}} to include it on the talk page. isaacl (talk) 21:29, 27 May 2024 (UTC)
there are only 7 of these - So far. Best way to prevent wide adoption of something: Avoid it because it isn't yet widely adopted. and they are controversial Show me something powerful and highly visible that isn't controversial. No good deed goes unpunished. that old "Current consensus" system, read time-tested.
Create something that functionally overlaps consensus lists (the lists cover RfCs and other consensuses), solve a nonexistent problem, I don't care, but I'm not letting go of my consensus list. And don't ask me to help maintain your new thing in parallel with it. ―Mandruss  08:45, 28 May 2024 (UTC)
Well... actually I do care, per every new thing should survive a rigorous application of the question: Is it really worth the added complexity? A principle that has been ignored far too long, resulting in the massive over-complexity that we have to live with every day (never mind the barrier to entry and the several-years-long learning curve). We have a strong tendency to focus on benefit and fail to fairly weigh it against cost. Lacking a concerted community effort to simplify the existing environment (good luck with that), this can only continue to get worse with time. ―Mandruss  02:04, 29 May 2024 (UTC)
By the by, RMs are nothing but a special kind of consensus, so there's no reason they couldn't be included in consensus lists.
9. Article title is: Reproductive habits of Wikipedia admins. (RM July 2024)
Fits quite nicely, actually. ―Mandruss  00:42, 30 May 2024 (UTC)
Aaron Liu (talk) 00:44, 30 May 2024 (UTC)

Why Wikipedia articles have no DOIs?

I was recently asked and I am not sure. Why don't we have them? Wikipedia:Digital Object Identifier does not answer this. (WikiJournals articles have them, but also many articles in academic encyclopedias do have them, ex. https://onlinelibrary.wiley.com/doi/full/10.1002/9781405165518.wbeos0736.pub2). Should we have DOIs for our articles? Piotr Konieczny aka Prokonsul Piotrus| reply here 06:04, 31 May 2024 (UTC)

Huh, I had a vague impression that a DOI was something higher and mightier. But is DOI stuff supposed to be as changeable as a WP-article? Gråbergs Gråa Sång (talk) 11:59, 31 May 2024 (UTC)
@Gråbergs Gråa Sång Good question. Need more research, but this forum post suggests it should be fine. Better citation needed, sure :P Piotr Konieczny aka Prokonsul Piotrus| reply here 12:09, 31 May 2024 (UTC)
It costs money and I'm not sure it provides a great deal of benefit. Barnards.tar.gz (talk) 12:30, 31 May 2024 (UTC)
It would provide a magnitude more credibility to middle school papers.Main Author et. al (2024). "Village pump (idea lab)". A Wikimedia Project. doi:10.1234/enwiki.26740553. CMD (talk) 12:57, 31 May 2024 (UTC)
In that case, Veto. Gråbergs Gråa Sång (talk) 13:06, 31 May 2024 (UTC)
We have RevIDs. Our articles get merged, deleted, revised, etc... which goes against the purpose of DOIs: to link to stable versions. Headbomb {t · c · p · b} 03:04, 2 June 2024 (UTC)

I think it would make sense to have DOIs that resolve to Special:Permalink/rev_id, so if someone has reason to reference a Wikipedia article, they would at least perforce be providing a link to the version they're quoting, but I'm not sure how DOIs interact with CC-BY-SA 4.x licensing. Folly Mox (talk) 12:26, 31 May 2024 (UTC)

The goal of a DOI is that you can always find the link, even if the periodical gets sold or the website gets rearranged. We don't really have that problem. A link to a revid will always find the page.
(I could imagine someone doing this for the specific revid at FA promotions. It would cost about US$2,000 in fees, and much more [the time needed to find/check each relevant revision].) WhatamIdoing (talk) 19:27, 31 May 2024 (UTC)
As in US$2,000 per article/instance? Gråbergs Gråa Sång (talk) 19:33, 31 May 2024 (UTC)
No. As in US$2,000 to get all of the existing FAs and FFAs listed, at an average cost of about twenty-five cents each. WhatamIdoing (talk) 21:59, 1 June 2024 (UTC)
Just noting that a, WMF has a ton of surplus funds (enough to give six digits to other NGOs doing stuff of no relevance to the community, see User:BilledMammal/2023 Wikimedia RfC) and b, according to this, "it is possible to get DOI without... paying a fee". I think getting DOIs for FAs, and possibly GA+ (i.e. articles that have passed semi-reasonable review process) would be a good idea. Piotr Konieczny aka Prokonsul Piotrus| reply here 02:50, 2 June 2024 (UTC)
Zenodo’s scope is “research outputs” so it’s debatable whether encyclopaedia articles would qualify. But even if it were a completely free and open system, I’m not seeing a problem that it would solve. Barnards.tar.gz (talk) 08:05, 2 June 2024 (UTC)
DOIs make a source look more reliable/respectable. Making Wikipedia more reliable/respectable is good, no? Piotr Konieczny aka Prokonsul Piotrus| reply here 11:52, 2 June 2024 (UTC)
Making Wikipedia more reliable/respectable is definitely a good thing. Making it look more reliable/respectable... not so much. Barnards.tar.gz (talk) 15:27, 2 June 2024 (UTC)
Yeah, that. WP:General disclaimer etc. Though pretty good in parts, we're a user generated wiki, to a significant extent open to any article-change whatsoever. Though I might enjoy it if a WP-text I mostly wrote has a DOI. What do you think, @Jenhawk777? Gråbergs Gråa Sång (talk) 17:34, 2 June 2024 (UTC)
I have no real opinion on this one way or the other. Jenhawk777 (talk) 23:00, 2 June 2024 (UTC)
Imagine the co-author list. I’ll change my mind if this proposal allows me to lower my Erdős number. Barnards.tar.gz (talk) 21:27, 3 June 2024 (UTC)
Exactly. Our links are stable. There's no need for DOIs. Case closed. Jason Quinn (talk) 20:36, 31 May 2024 (UTC)
The DOI databse isn't freely available is it?©Geni (talk) 23:45, 31 May 2024 (UTC)
Yes, it is. However, getting new DOI numbers is rather expensive, especially for a free project like us. NW1223<Howl at meMy hunts> 20:54, 1 June 2024 (UTC)
The expensive part is getting someone to do the work. WhatamIdoing (talk) 22:00, 1 June 2024 (UTC)
Specifically, someone would need to spend hours figuring out which version of the 8,000 FAs and FFAs is the correct link. For example, Schizophrenia is an FA. Do we use the original 2003 promotion, or the 2011 FAR keep?
Once we had the list of versions in hand, someone need to spend hours figuring out which metadata to report for that version.
Some of this is easy (titles, posted_date, acceptance_date [the same], institution, license, funding [i.e., none]). I don't know whether citation_list is desirable. I'm not sure what component_list is (Table of Contents? Or is that for abstract/tables/figures/appendices?).
The entry for contributors is possible, but it would take some doing. Clicking around with Wikipedia:Who Wrote That? tells me that Vaughan wrote 84.4% of the 2004 version, 213.253.39.xxx 4.1%, and Ram-Man 3.4%. For the 2011 version, Doc James wrote 19.7%, Vaugahn 13%, Casliber 6%, EverSince 5.3%, SandyGeorgia 4.3%, and a whole lot of people wrote tiny amounts (this is not systematic, so I could have missed important contributors).
The fees themselves are trivial. The costs are all in human time. WhatamIdoing (talk) 19:31, 2 June 2024 (UTC)
Where can I get a complete database DOI dump from?©Geni (talk) 00:36, 2 June 2024 (UTC)

Executive summary: There are zillions of instances of "He then ate a Pork chop, and sang..." when it should be "He then ate a pork chop, and sang..." and it's slurvy and let's think if it's possible to do anything about it. I got an idea for a bot.

Detailed exposition:

So, there are very many instances where links are capitalized, or the first word of multi-word links are capitalized, when they shouldn't be. For example, a passage that should read "...he worked in legitimate theatre after that..." instead reads "...he worked in Legitimate theatre after that..." (emphasis added). So wrong. I'm sure it jars other readers than me. Very very common.

It's little better than if our material was peppered with constructions like "...there were twenty People on deck...". It looks slurvy and it is slurvy.

I'm confident that this mostly happens because the editor is copying from the title of a page and pasting it into running text and moving on. There's no way to stop that I guess.

I can imagine a fix for this -- a program, a bot. Writing code for that would be way above my pay grade, but the process I imagine would be something like this (if this is laughably wrong or simplistic, OK, but you can see where I'm going):

  • Go thru an article checking each link
    • If it's the beginning of a sentence (preceded by a period and space(s) or a line break or whatever), skip it. Else
    • If it's piped, skip it. Else
    • If it's just one word, skip it (for now). Else
    • If the first word is capitalized and the others not, you've got a potential positive.

So then

  • Go to the article that the link points to.
    • If it's a redirect, go home (for now). Else
    • Check for each instance of the article-title string in the text. And
    • If it's not found (somewhat uncommon), go home. Else
    • If it's at the beginning of a sentence, skip it. Else
    • If the first word (perforce midsentence) is capitalized, go home. Else
    • Check the next instance (if any). And when you reach the end
    • Since you've gotten to the end without being sent home (non-zero instances of midsentence use, and they all have the first word uncapitalized), you've got a likely positive.

So uncapitalize the first word in the link in the original article.

I recognize that this would be checking a lot of links -- into the nine figures I'd guess. Not a billion I hope. I have no idea if that'd be a deal-killer. If it is, well just slap my butt and put me to bed and forget it I guess. Just an idea.

Still here? OK. Now... of course you're going to get some false positive. I can't think of any, but there must be some. In which case maybe the devs could do code magic. They're smart. If not, maybe that's a deal killer.

Altho, for every "Should be this" --> "should be this" errors you're going to have a hundreds of "Pork chop" --> "pork chop" corrections I would think. A net positive, at least by the numbers, altho "we're not going to have a bot that we know is going to generate some non-zero number of errors, period" would probably be the response. Anyway, I got this Off my chest. Herostratus (talk) 06:44, 28 May 2024 (UTC)

This idea sounds at least interesting enough to try it with a read-only bot that proposes a list of changes without making them. The main false positives might occur where the link is already incorrect, e.g. Ash Vale is near Ash where the second link should have been piped to Ash, Surrey but is still correctly capitalised. It may also be led astray by the occasional lead such as Cairns is named after some rock cairns. (it isn't; I made that up) but hopefully not too many. Certes (talk) 16:38, 28 May 2024 (UTC)
In the latter case, I think the title "Cairns" will likely also be found in later places in the article. From what I understand, it looks at whether there's at least one capitalized mid-sentence instance, not whether all mid-sentence instances are capitalized. Chaotic Enby (talk · contribs) 17:21, 28 May 2024 (UTC)
Fair enough. We can probably also exclude targets with {{Infobox settlement}} (or {{Infobox person}}, if k.d. lang and will.i.am aren't looking). Certes (talk) 17:31, 28 May 2024 (UTC)
Probably not a terrible idea to exclude targets with a taxobox or automatic taxobox (or the various other related templates like {{speciesbox}}), as well, since the binomial naming system requires initial-capitals on anything genus and above, as well as the generic part of a species' binomial name, subspecies' trinomial name, etc. AddWittyNameHere 08:12, 2 June 2024 (UTC)
Here are some other interesting cases that could make things easier or harder:
  • Acronyms. I would expect nearly any link beginning with two or more caps is intentionally that way, so the target probably would not need to be checked. Except "IPhone" is not correct and "iPhone" is not a unique situation.
  • Multi-word phrases. Building on the acronyms case, if there is any subsequent capital letter, maybe assume the first letter should also be capitalized? That immediately accepts "City, State" and a ton of proper nouns without having to parse any target pages.
  • {{lowercase title}}. Although this template is "only" used 24k times, checking for its presence (which should be even prior to the first sentence of content) catches 24k targets that are known with absolute certainty and need no further parsing or heuristics. It solves the iPhone case.
  • Prefixes. Lots of technical terms have tons of prefixes (non-English characters, numerals, English-spellings of Greek letters) that should be ignored, prior to the first English character that is actually the one that follows sentence-case. Therefore, they also are subject to the "copy-paste the article title" mistake. And many of those articles' pagenames make it even harder to automate. Example: the page Gamma-Hydroxybutyric acid is titled "gamma-Hydroxybutyric acid" but "gamma-hydroxybutyric acid" is how it should be used mid-sentence. But L-Glucose is "L-glucose" mid-sentence (and the "L" is set smallcaps) and 1-Bromobutane is "1-bromobutane" mid-sentence. Many of these also use the various title-formatting templates, so that could be a short-circuit to flag ones that really need human eyes.
  • All special titles. It would be useful if articles could declare their own non-obvious sentence-case styling. If (just thought about this as a seed for a different task) articles did that, we could always just obey them, and that includes forbidden special characters, italics, smallcaps, and other fun things.
I definitely don't think we need a perfect solution, but I'd want to be as conservative as possible (minimize false-positives) to make this potentially large task that affects many pages be more acceptable when watchlists start blowing up. DMacks (talk) 17:29, 28 May 2024 (UTC)
I think the biggest problem is going to be figuring out whether it's the start of the sentence. Some are easy enough: Pork Chop, otherwise known as Floyd Womack, ate a pork chop but others are really complicated: My shoping list includes 1. pork chop 2. flour 3. eggs 4. mushrooms. Detecting these things programmatically is very difficult in English.
@Herostratus, I suspect that most of these are being added in the visual editor. In 2013 or 2014, the Editing team's PM (a long-time Wikipedia admin) had to decide whether to default to incorrect capitalization in Pork chop or in porkchop Cash. Getting the capitalization right for BLPs won.
It's possible that what we should do is consider an inline tag that says [capitalization?]. It could be bot-applied under limited circumstances (e.g., less experienced editor, using the visual editor, article hasn't been edited for >30 minutes [to avoid an edit conflict]). Checking and fixing them could be an easy task for the Wikipedia:Newcomer homepage. WhatamIdoing (talk) 22:43, 1 June 2024 (UTC)
I'd rather have it default that any link after terminating punctuation will be ignored. Worst case is a false negative and nothing is corrected. Aaron Liu (talk) 23:20, 1 June 2024 (UTC)
False negatives aren't a big problem. We should accept that the solution can't be 100% automated and ensure that there are a negligible number of false positives which might bring calls to cancel the whole task. Certes (talk) 10:40, 2 June 2024 (UTC)

Wow thanks for the interest! OK, so first, yes, of course, test rollouts first. Second, to address some of the false positive issues-- many I think, tho far from all I'm sure, can be avoided by being as conservative as reasonably possible; some of the objections raised would not go thru cos the links are more than one word or it gets caught in one of the other nets. Yes links that start a sentence should not be checked, and also let's say anything with a character that is not in the Latin alphabet would be skipped (links with parenthesized disambiguation would have been thrown out anyway as these are very often piped except on disambig pages), and any links with a word other than the first capitalized, as suggested.

Zimmerman telegram... you'd have soooo many potential positives like that, but is that article going to have either no midsentence uses of "Zimmerman telegram", or, if any, all of them would be lowercase "zimmerman telegram"? Not many... but you are going to have an article with a link to say "Liederman effect" capitalized like that (which is the correct capitalization), but the person(s) who edited the Liederman effect article prefer "liederman effect" on grounds of parallelism with "low-voltage effect" etc. (or whatever), and use the title term once (or more) midsentence and always in lowercase. You'd think with AI there's be a way to figure out some solution to this, I don't know. Herostratus (talk) 22:01, 2 June 2024 (UTC)

Please don't use an AI, unless you want an external link to the Telegram account of someone called Zimmerman. Certes (talk) 22:19, 2 June 2024 (UTC)
I think that before attempting to code anything, it would be worth deciding what level of (active) errors we could tolerate from the bot/script/whatever. Does it need 95% accuracy? That might be acceptable if it's only tagging, but downcasing the wrong thing in one out of every 20 edits will not be accepted by a bot. WhatamIdoing (talk) 00:02, 3 June 2024 (UTC)
99.9+% I would say, if you're talking about corrections vs erroneous changes. Very very few changes of links from "This should be capitalized" changed to "this should be capitalized" would be acceptable I think. People watching an article where this is done will be plenty mad (also the bot will keep doing it I supppose). I think this'd be achievable by adding code to handle exceptions causing errors found during debugging? I haven't yet seen any of you come up with an example of a error (once you throw out links with non-Latin characters as suggested above).
I don't know about AI. I do know that I can open a free app and ask it for a picture of Susan Hayward in a bikini (don't crush shame) and boy howdy if it doesn't do it a couple minutes. It's smart. Maybe it can figure out stuff like that.
Absolutely run paper tests first, I assume that that's done with all bots. Maybe instead of a fix we just send a message to users on their talkpage as suggested above: "You capitalized the text for such-and-such link, we think that might be wrong, are you sure you intended that?" might be as far as we're willing to go. This is done for links to disambiguation pages and it's almost always correct. This might well be the most that is acceptable to the community (and that'd be after we'd shown like 99.9+% accuracy). This means that existing wrongly capitalized links would not be changed, which is big shame IMO, but that's life.
So... I'm actually seeing possible interest in this idea (by a few people, granted). Would it be reasonable to go on to a next step? Would that be to move it to the Proposals section of the Village Pump? Herostratus (talk) 21:51, 3 June 2024 (UTC)
AI uses a ton of resources for this task for which I have yet to see a false positive for.
So, bots usually test themselves on test.wikipedia first and then go through a trial run of a certain amount of edits. Aaron Liu (talk) 21:57, 3 June 2024 (UTC)
This sounds like a task for human supervised editing, like many typo fixing projects or disambiguation runs. I do not think it is suitable for bots. —Kusma (talk) 22:17, 3 June 2024 (UTC)
Initially, yes. If we find that 99.9+% of its suggested edits are good then we can consider automating later. Certes (talk) 23:30, 3 June 2024 (UTC)

Category:Wikipedians against censorship

I was going to create Category:Wikipedians against censorship but saw that it was removed during a long discuss in 2007. I think, the members of this project need to be placed in a separate category. This category can be connect to Template:User against censorship then, on users pages, "Wikipedians against censorship" is shown instead of "WikiProject Wikipedians against censorship participants". Same as "Wikipedian WikiFairies" or "Disabled Wikipedians".

In my opinion, the same idea should be implemented for "Biography Wikipedians" and "Human rights Wikipedians". Claggy (talk) 04:39, 3 June 2024 (UTC)

The OP has been glocked. Liz Read! Talk! 02:56, 4 June 2024 (UTC)
That’s a great abbreviation Dronebogus (talk) 09:33, 4 June 2024 (UTC)

Having WMF pay for IRCCloud for all users who have a Wikimedia cloak

Before officially posting this to the WMF section of Village Pump, I'd first like to gather some feedback on this. What I would like to propose is that all users who have a Wikimedia cloak be given a paid subscription to IRCCloud by the WMF.

  • Why IRCCloud?
IRCCloud seems to be the best IRC client that works for all users. It does not require installation, as it works directly on any web browser. As such, it will work on MacOS, Windows, any Linux distro including ChromeOS (Chromebooks). IRCcloud also has iOS and Android apps.
  • Why a paid subscription?
A paid subscription to IRCCloud would give cloaked users the ability to stay connected to Libera chat server at all times even when they're offline, unlimited access to chat history, and priority support.

Given the WMF's stable economic state, this seems like a productive use of their money that would benefit users accross Wikimedia projects.

Thoughts?

Cheers, Cocobb8 (💬 talk • ✏️ contribs) 20:10, 3 June 2024 (UTC)

Would support this. Qcne (talk) 20:13, 3 June 2024 (UTC)
Even though I wouldn't use this - I have a client I regularly use already - I would support this. —Jéské Couriano v^_^v threads critiques 20:16, 3 June 2024 (UTC)
Before proposing this, I would recommend working out a ballpark figure for how much it would cost. Pricing seems to be £5 (+20% VAT) per user per month, but I haven't the foggiest how to find out how many people have a cloak. I haven't used IRC in years, but when I did I think I had a cloak - if they don't expire (again, I have no idea) then the raw figure will be too high as it's only worth spending money on people who are going to benefit (i.e. active users). Thryduulf (talk) 20:21, 3 June 2024 (UTC)
@Thryduulf It would totally depend on the amount of cloaked users of course (I also have no clue what the current number is)... But that's a good point though, maybe have something like "the subscription cancels after 3 months of inactivity on Libera chat (i.e. connection to)"? That way money wouldn't be spent unnecessaril. The good thing is, in order to obtain a cloak, users still need to be quite active on Wikimedia projects so it shouldn't be too much of an issue. Cocobb8 (💬 talk • ✏️ contribs) 20:27, 3 June 2024 (UTC)
Speaking as a GC, there are currently 802 cloaked users. (This includes Wikimedia-cloaked bots though too) stwalkerster (talk) 21:14, 3 June 2024 (UTC)
Another option would be to have this be on-request so that only those who are botk cloaked and active on IRCCloud for Wikimedia channels. Cocobb8 (💬 talk • ✏️ contribs) 14:22, 4 June 2024 (UTC)
What is a Wikipedia cloak? Phil Bridger (talk) 20:25, 3 June 2024 (UTC)
@Phil Bridger See https://meta.wikimedia.org/wiki/IRC/Cloaks Cocobb8 (💬 talk • ✏️ contribs) 20:26, 3 June 2024 (UTC)
It also auto-voices users in #wikipedia-en-help. Cocobb8 (💬 talk • ✏️ contribs) 20:30, 3 June 2024 (UTC)
While I'd love for folks to get easier access to IRC clients that are usable on mobile devices and from a browser, I'm going to have to oppose this. I love the general idea of "WMF provides web-based IRC client to Wikimedians", but this proposal has too many issues. Firstly, cloaks on Libera Chat are designed to show an affiliation to a project, and some people (myself included!) don't have a Wikimedia cloak even though they may be eligible for one. Sure, it's a minor detail, but it changes the cost calculation immensely. Secondly, this would have to be an on-request thing. Like Jeske, I've got my own preferred client setup and I suspect most people who've been around IRC for a while do too. Finally, IRCCloud is a commercial service, and free self-hosted alternatives exist that could probably be hosted somewhere for a much lower cost. stwalkerster (talk) 21:20, 3 June 2024 (UTC)
There are dozens perfectly good IRC clients including many free and open source options. Why on earth should the WMF, an organisation that exists to promote free content, throw all this money at one particular commercial product over all the rest? – Joe (talk) 09:37, 4 June 2024 (UTC)
I used to use Matrix as an IRC bouncer before they (Libera) shut that off, so if we were wanting an official form of IRC persistence IMO a better use of resources might be looking at a way to reenable or self-host a bridge for Wikimedia users. Or just host a bouncer themselves, that would probably be easier. One good thing about bridging is the potential to have one or two channels (probably not the main ones unless people primarily using IRC want that) also bridged to Discord (I know it's not official, but we could probably make an official one) to unify the community. I know some open source projects that do that. Alpha3031 (tc) 14:07, 4 June 2024 (UTC)
Doesn't Libera Chat already provide a free web client for everyone at web.libera.chat? Why would we want to send our users to this other vendor in the middle? — xaosflux Talk 14:12, 4 June 2024 (UTC)
KiwiIRC is not robust and lacks in functionality when you compare it to IRCCloud, at least from my experience in using web-browser clients. Cocobb8 (💬 talk • ✏️ contribs) 14:13, 4 June 2024 (UTC)
@Xaosflux, general webchat tools like either of Libera Chat's web-based clients do not allow for persistent presence, and generally disconnect users whenever the user's browser decides to stop running JS on that tab to save resources. It leads to a fairly unreliable connection for anyone who wants to idle in a channel and isn't actively using IRC all the time. Bouncers (ZNC, irssi, weechat, soju, etc) and web clients which behave like bouncers (such as IRCCloud or The Lounge) persist the connection server-side, allowing persistent presence and easy session resumption.
For casual users, such as those who visit -en-help, -en-revdel, or -en-unblock for assistance with a specific issue, webchat is absolutely fine. For users who want to hang around in channels long-term, webchat isn't really feasible. stwalkerster (talk) 14:22, 4 June 2024 (UTC)
Thanks for the update. We have several channels that specifically ask for no-logging, wouldn't this completely circumvent that? Additionally, as far as privacy goes this would insert another man-in-the-middle correct? IF WMF is going to purchase and provide subscriptions to this vendor I'd like to hear what their legal, privacy, and vendor management teams have to say about it. — xaosflux Talk 14:33, 4 June 2024 (UTC)
I think you're conflating things here. Most of our channels ask for no public logging - aka publishing of channel logs in a way that others can access it. It's even one Libera Chat's policies that public logging shouldn't happen without explicit notification. I'm not aware of any of our channels which ask no private logging (please do enlighten me if you know of any - I'm curious), nor am I aware of any non-webchat client that doesn't log privately by default. If we do have no-logging-at-all channels, then I strongly suspect that not every member of those channels has configured their client to not log that channel.
We already have many users who use IRCCloud as their main client, either using the free version that doesn't offer persistent sessions or the paid version which does. As far as I'm concerned, WMF paying for a tool that many people use already will have zero practical impact on our privacy stance. If you look at a names list in any of our channels (or even look through joins/parts), you'll probably see that the user/ident part of a good portion of people's nick!user@host is either sid000000 or uid000000 - this is the common pattern for IRCCloud users (paid and free respectively).
That said, as I mentioned above, I don't think IRCCloud is a good solution to the underlying issue when other FOSS alternatives exist (The Lounge) that we might be able to host somewhere like WMCS. stwalkerster (talk) 16:39, 4 June 2024 (UTC)
As long as a good, free (and unlimited in terms of connection) alternative that works for basically any device can be found, that would satisfy what I wanted with this proposal :). Cocobb8 (💬 talk • ✏️ contribs) 17:39, 4 June 2024 (UTC)

Adding some bracket colors (or anythin) to improve readability (and shorten sentences) therefore helping the reader's eye to jump from core sentense part to core sentence part (without the need to seek eternally for a lines away closing bracket) even though shorter sentences are a superior solution (but not so easy to enforce with open data).

Maybe mention drafts after redirects?

In my opinion, the {{Draft at}} template is very useful. I'm wondering if it would be useful for visitors to get a similar notice after being redirected from an article that has a draft. I will give three semi-random examples (their names start with A, B, and C). AIOS, Beatpath, and Capitol Highway. Each has a draft, but each redirects without displaying the template. Maybe a knowledgeable editor visiting one of these would have helped improve the draft, were they made aware a preliminary version is available. Maybe something as basic as "(Redirected from X; there is a draft at Y)"? --Talky Muser (talk) 18:22, 4 June 2024 (UTC)

Technically speaking this could work. By default it would only display a message if there exists a page in draftspace with a name that exactly matches the title of the redirect used. For example Capitol Highway, Oregon would not indicate the existence of Draft:Capitol Highway unless explicitly linked (I'm not sure whether matching is case sensitive). The message would also not display in a logical place if the redirect is to a section (the "redirected from" message always appears at the top of the page). Within those limitations though this seems like something useful to have. Thryduulf (talk) 19:28, 4 June 2024 (UTC)

Revisiting date auto-formatting

Back in the late 2000s to early 2010s, we had a feature by which dates were auto-formatted. This ended up leading to a bunch of strife, and the feature was turned off. This happened because the feature (implemented as a MediaWiki parser function) relied upon linking, such that every date was a link, and it caused a "sea of blue" problem.

With the advent many years ago of Lua modules, it seems that we could do this better now, untied in any way to linking. We already have Lua code in various templates (as well as Javascript code in user scripts) that can parse most dates. So, it seems to me that it would be beneficial to have something like the following:

  • Parse all sane date formats, from 1852-02-08 to 8th February 1852 to Feb. 8, 1852 to 8-FEB-1852, and so on.
    • Exclude material between quotation marks or inside a quotation template.
    • Flag as errors any instances that cannot be unambigously parsed, e.g.
    • Provide an "ignore" wrapper template that can be used around things that appear to be (or contain) dates but should not be parsed as such (e.g. a serial number that is coincidentally in the format 1852-02-08, or a book titled On Feb. 8, 1852.
  • Identify templates like {{use mdy dates}}, etc., at the top of the page and "obey" them, to normalize all dates to the prescribed format for that page.
    • If there isn't one, but there is a {{Use X English}} template, pick the date format that conventionally matches the specified country name.
    • If both of the above conditions fail, then do some statistical analysis, and normalize dates to whatever date format already dominates in the article (other than ISO's YYYY-MM-DD, which is not human-friendly for our readers).
  • Read a preferences setting, for logged in users, and override the display of dates to whatever the user set as their preference.
  • Use a bot to replace all the non-excluded dates in the code with a single canonical format (probably ISO).

The results of this would be:

  • An end to the need to keep re-re-re-normalizing dates (manually or by script) in an article back to the format specified in {{use xxx dates}}. (The dates always become inconsistent over time because various citation tools that people use only output a single format, or have an option to pick one that people don't bother to use, or people are writing entirely manually and use the date format they like better without regard to the rest of the article content).
  • Article source code that is better for WP:REUSE purposes, with consistent dates that can be reformatted by reusers in an automated manner as they see fit.
  • Articles with consistent date display, matching whatever is set by the article-top template.
  • Ability of readers who really, really like one particular format to impose it on their personal WP experience.

 — SMcCandlish ¢ 😼  18:17, 15 May 2024 (UTC)

WHAT HAVE YOU DONE WITH SANDY MCCANDLISH? EEng 18:25, 15 May 2024 (UTC)
That sounds better suited as a configured mw:Writing systems/LanguageConverter than basically a specific version of zhwiki's NoteTA, though both would work, and Lua does seem like the language more popular here than PHP. Aaron Liu (talk) 18:31, 15 May 2024 (UTC)
No, there is no need to revisit auto-formatting, for the reasons that were given when it was discontinued, but we should recognise that is some countries, such as the UK and India, either 8 February 1852 or February 8, 1852 is perfectly acceptable, with the addition of "th" to "8" also acceptable. Phil Bridger (talk) 20:24, 15 May 2024 (UTC)
Given that the Chinese Wikipedia manages to allow people to switch between different varieties of Chinese with a single click, I have always found it a bit embarrassing that we can't even offer date formatting choices. A date autoformatter would need to be more powerful than the old one, though, and would need to be able to deal with date ranges (the old one could only do full dates). —Kusma (talk) 20:47, 15 May 2024 (UTC)
I certainly see no harm in developing a tool that allows readers to choose a display for their own purposes. Starting with that functionality and getting it working reliably and consistently would likely be a useful first step towards implementing something broader.
Rather than/as well as inferring things that look like dates in prose and marking things that aren't, having something like "Bob Smith ({{date|1852|February|8}} – {{date|1921|August|8}}) was a British politician. He served as Chancellor of the Exchequer from {{date|1890|March|1890}} to {{date|April|1894}}. He was president of the Imperial Society {{date range|1899|-|1908|April}}" a la semantic HTML may prove useful more broadly. Thryduulf (talk) 12:59, 16 May 2024 (UTC)
I don't think this would work for "readers". See Read a preferences setting, for logged in users. The average reader does not have prefs settings.
If I were going to mess with dates, it would be to specify in WP:BADDATE that unambiguous year–month combinations (e.g., 2024-05, which never means "this year through nineteen years in the past) are acceptable, and that it is concerned solely with what readers see, and definitely does not restrict the input for templates such as the CS1 citation templates.
In other words, people shouldn't be manually replacing the ISO-approved "2024-05" to "May 2024" in citation templates, because the citation templates should detect the unambiguous dates and treat them exactly the same way they already convert the display of "2024-05-01" to "1 May 2024" or "May 1, 2024" (the choice is made automagically, based on the specified ENGVAR for the page). WhatamIdoing (talk) 23:34, 17 May 2024 (UTC)
The average reader does not have prefs settings. Easy, default to converting to mdy. I also don't see how "2024-05" is relevant here. Aaron Liu (talk) 00:43, 18 May 2024 (UTC)
The problem to be solved is: "An end to the need to keep re-re-re-normalizing dates (manually or by script) in an article back to the format specified in {{use xxx dates}}."
Most of the dates that need to be re-re-re-normalized are in the citations (not in the words of the article). See Category:CS1 maint: date format for the current list.
This problem could be solved by changing a bit of code in the citation templates. To be clear: nearly every article that appears in this category, or that has been in this category during the last few years, could have been prevented from appearing there by changing the citation template's code.
The reason this change was rejected is because the maintainers of that code believe that MOS:BADDATE disallows editors from putting unambiguous, ISO 8601-compliant numeric year–month combinations in wikitext, even when the numeric form of the date would never be shown to a single reader. That is, they believe that the MOS will allow editors to type |date=2024-05-01 in a citation template, so they can show "May 1, 2024" to readers (if the article is tagged as mdy), but that the MOS does not allow editors to type |date=2024-05 and have "May 2024" shown to readers.
If you want to end "re-re-re-normalizing dates (manually or by script) in an article back to the format specified in {{use xxx dates}}, I suggest that you start with the problem that could be solved in two edits: a single edit to the MOS page, to officially reassure the template maintainers that it's 'legal', and a single to the citation template's main module, to implement the code (which AIUI already exists). WhatamIdoing (talk) 06:36, 18 May 2024 (UTC)
Well, that change would probably be put in tandem with this one, but only if this (automatic date conversion) is developed and passed. Aaron Liu (talk) 00:59, 19 May 2024 (UTC)
I don't think that solving the citation problem needs to wait for solving the other problem. WhatamIdoing (talk) 04:37, 19 May 2024 (UTC)
I don't think we should explicitly allow "2024-05" as a "good " date format. There could be an RfC about this small issue, though. Aaron Liu (talk) 13:04, 19 May 2024 (UTC)
Why not? It's an officially accepted international standard. It cannot be confused with a date range. So why not use it? WhatamIdoing (talk) 19:31, 20 May 2024 (UTC)
Huh, TIL ISO 8601 actually explicitly allows it. Aaron Liu (talk) 20:30, 20 May 2024 (UTC)
2024-05 cannot be confused with a date range, but 2004-05 can. That's why we disallow it. Headbomb {t · c · p · b} 09:44, 22 May 2024 (UTC)
But we could allow the format for anything can't be confused that way (e.g., the years 1912 to 1999 and 2012 to 2099 + year–month combinations that can't be a range of years, such as 2010–09). I think the whole thing could be evaluated in a single line of regex. WhatamIdoing (talk) 05:48, 25 May 2024 (UTC)
No, we really shouldn't, because as soon as you have a new date, you have to re-evaluate if the old date format is allowable or needs to be converted to something new, or have inconsistent date formats in an article, e.g.
Headbomb {t · c · p · b} 12:28, 2 June 2024 (UTC)
Allowing it would reduce that problem. We would go from today:
  • 2009-01 produces a red error message that will have to be fixed by hand
  • 2009-08 produces a red error message that will have to be fixed by hand
  • 2009-23 produces a red error message that will have to be fixed by hand
to
  • 2009-01 automagically gets displayed as January 2009, which means the article has consistent date formats
  • 2009-08 automagically gets displayed as August 2009, which means the article has consistent date formats
  • 2009-23 produces a red error message that will have to be fixed by hand
I'd rather have two automatically fixed than three broken. WhatamIdoing (talk) 00:24, 4 June 2024 (UTC)
I'm a bit leery of this step: Use a bot to replace all the non-excluded dates in the code with a single canonical format (probably ISO). Wouldn't this entail making edits to every article on the project that contains a date of the format "11 November 1919" or "January 1, 1970"? Do we have any idea what the scope of that effort might be? I would expect at least a few hundred thousand articles, which people with large watchlists might get pretty annoyed about. Folly Mox (talk) 21:26, 18 May 2024 (UTC)
I also agree that that step is probably unnecessary. They're gonna get converted anyway. Aaron Liu (talk) 00:52, 19 May 2024 (UTC)
For something like this, we need a special way to hide these edits from watchlists (something that will not hide other edits), where editors who do want to see them need to opt in and swear an oath that they won't complain about seeing these edits. We should solve the problem of annoying people with large watchlists, but we should not let this issue prevent large-scale improvements. —Kusma (talk) 08:56, 22 May 2024 (UTC)
But why would we want to replace all the dates in the first place? Aaron Liu (talk) 11:27, 22 May 2024 (UTC)
If we don't need to, that is fine. Currently there are people with scripts who annoy me greatly by replacing my beautiful ISO dates in citation templates with mdy or dmy although they are already displayed like that via {{use dmy dates}}. So apparently some people think we need to replace all dates. —Kusma (talk) 11:34, 22 May 2024 (UTC)
I agree, that is annoying. But the better solution would probably be to filter out edits that only change date formats. Aaron Liu (talk) 11:41, 22 May 2024 (UTC)
FYI, a {{#dateformat:}} magic word already exists to apply the preference. I don't think the preference is directly accessible in Lua, but you could use frame:callParserFunction{ name = '#dateformat', args = { date } } to format dates according to the preference. Anomie 12:11, 19 May 2024 (UTC)
I'm very supportive of this entire proposal except for using a bot to update dates in articles (or any other changes to how dates should be entered in the source / editor behaviour). Today we accept any date format in citation templates and have them display in the proper format without any extra work from the editor; this works well today and it doesn't seem to cause any problems of having the article display dates in a different format (the article's "correct" format, in the future also a user preference) than what's entered in the source. This proposal would extend the functionality to any date in a "sane" format within the article body, without the need for any edits to the article content or any changes to editor behaviour. I also think the "Ignore" bit would take some work to minimize unintentional date reformatting (extending the proposed ignore logic to try to catch as many reasonable scenarios as possible where a date might be in a title or quotation). I would generally oppose any changes that require change to editor behaviour or bot edits (e.g. having dates structured in the source in a non-editor friendly format like {{date|1852|February|8}}). Consigned (talk) 08:19, 5 June 2024 (UTC)
Who are these people who either don't understand or get seriously offended by dates, with the month being non-numeric, in the "wrong" format? I certainly haven't come across any. Phil Bridger (talk) 08:39, 5 June 2024 (UTC)

Copyleft trolling -- taking the temperature in the room

Years ago, there was someone who paid low-wage workers to create a ton of stock photos. He released them all with a free license, then scoured the internet for anyone who used those images and sued them indiscriminately for even trivial violations. This practice is called "copyleft trolling", taking advantage of the "please use this!" signal that free licenses send and then exploiting it. It has been strongly condemned by Creative Commons itself (here and elsewhere) and various free culture advocates (Cory Doctorow has written about it several times, for example, referring to it as "a new breed of superpredator"). Flickr changed their community policies in response to the behavior, requiring users who opt for a CC license to give people the opportunity to fix the problem before suing them. In short, it's widely viewed as antithetical to free culture principles.

Over on Commons, that user's files were deleted. It was no great loss because the quality wasn't all that high to begin with. Sometime later it happened again, but with someone's personal concert photography. In that case, as the quality of the photo was higher, Commons settled on "forced watermarking" as an intervention that would let us keep good quality free media while doing all we can to ensure reusers know the terms and the risk involved with making a mistake. You can see an example at File:Lukas Nelson.jpg. I don't know how we settled on that precise wording of the watermark, but in hindsight it should probably be changed to be a bit clearer and less pointed. Of course, these files can still be displayed on Wikipedia without the watermark using {{CSS image crop}}, but when you go to download the image or view it at full size it's intrusive by design. The idea is to make someone either use the image with correct attribution intact or manually crop an image that ensures they see the correct attribution that will avoid them getting sued.

It's happening again, and this time with one of our most accomplished and celebrated wikiphotographers, whose images are widely used on enwiki and elsewhere on the internet (long threads here, here, and here). They have been clear that they have contracted with Pixsy (the company most closely associated with copyleft trolling) and will insist on payment from even independent/small-time/non-profit media users, even if they agree to fix the issue and there was no real damage done. It's perhaps the case with the most potential for harm for people who rely on us for free content, but also the case with the most valuable images.

There are ongoing debates on Commons over what to do, and forced watermarking is back on the table. If it happens, enwiki will have a decision to make: stop using the images altogether (not likely), replace all syntax with {{CSS image crop}} (on about 1500 pages), or host a watermark-free local version (ignoring the risk for reusers in order to avoid the hassle of the CSS image crop template). If forced watermarking doesn't happen on Commons, enwiki has a different decision to make about whether it's ok with people building a business demanding money from the wide range of people who depend on our content but make a mistake in using it (whether a major mistake or trivial mistake).

Another way to frame this discussion is: where does the English Wikipedia community stand on using our projects for "copyleft trolling"?

There are a lot of strong opinions over on Commons ranging from "delete all the files and ban the user" to "anyone who messes up attribution deserves to be sued" (though most are somewhere in between). I'm hoping this section won't turn into a splintered conversation between the same folks so we can better understand the English Wikipedia's take on what we've been discussing on Commons. — Rhododendrites talk \\ 15:07, 29 May 2024 (UTC)

Genuine question, why isn't the usual attribution on Commons considered enough for these specific authors, like it is for everyone else? I don't see why they should be able to say "nope, I don't like this attribution, I want a watermark on the file". Otherwise, no, I don't believe it's productive to host images whose main purpose is to make money by suing people who re-use them. Chaotic Enby (talk · contribs) 17:22, 29 May 2024 (UTC)
What's the usual attribution, though? Technically the CC licenses typically require (a) appropriate credit (which can include the name of the creator/attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material), (b) a link to the license itself, and (c) must indicate if any changes were made. If it's a sharealike license and you produce a derivative work with it, you must also release the resulting work with the same license.
Most of us who use the licenses just don't care that every detail is satisfied because we, you know, want people to use the work. That's why we use a free license. Now, I'm not a professional photographer (although my images are used enough that copyleft trolling would probably be pretty lucrative), but when someone says "By Rhododendrites, via Wikimedia Commons" I don't care that they omitted the license.. because meh. When they just say "By Wikimedia Commons" I might send them an email to tell them to fix it to include the "by Rhododendrites" part (and they almost always do). But I would technically be within my rights to demand money for their error, even if just missing a little piece of it, and even if it's because I chose an intentionally overcomplicated attribution statement, and even if they offer to fix it, and even if it's just some kid who threw it up on their blog with 0 visitors. The question is what we want to do about it if someone abuses CC licenses in this way (which, again, CC itself condemns). — Rhododendrites talk \\ 17:33, 29 May 2024 (UTC)
By usual attribution, I mean the fact that the link to the image points to the usual Commons file description page with the license and attribution statement, as described in c:Commons:Credit line. As far as I know, this has usually been considered enough to comply with CC BY licenses. In this case, as the creator's required credit line Photo by Larry Philpot, www.soundstagephotography.com is already present in the file's description, I don't see why this is not considered enough and an additional watermark is needed. Chaotic Enby (talk · contribs) 17:40, 29 May 2024 (UTC)
Presumably the software used to identify violations is either not taking linking into consideration or, more likely, when people reuse content they find here on on Commons, they're not providing that link. To clarify, we're not talking about Wikipedia violating the license and creating watermarks to protect this encyclopedia. The watermarks are to protect everyone else who comes to Wikipedia/Wikimedia Commons knowing that this is a great place to find free content (WP:5P#3) and don't fully understand the complexities of what's technically required by the license. Wikipedia doesn't need to display a watermark, which is why I talk about using the CSS crop template to avoid displaying the watermark here. — Rhododendrites talk \\ 17:47, 29 May 2024 (UTC)
Thanks a lot for the clarification! I still believe it's quite a sleazy move from the user doing the "copyleft trolling", and very much not in the spirit of Creative Commons. Chaotic Enby (talk · contribs) 17:52, 29 May 2024 (UTC)
I think as you explained on the Commons village pump, the ultimate issue is that the options for dealing with failures in attribution are poor. Unless some billionaire decides to fund a organization to only pursue violations in a spirit-of-the-license manner, accounting for the legal savviness of the violators, unfortunately at present it may be best to advise Commons contributors that there is no cost-effective way to pursue violations selectively. isaacl (talk) 18:29, 29 May 2024 (UTC)
We learned in subsequent discussions that there is a stage in the process whereby the copyright owner could decide not to take action on the violations Pixsy flags. Diliff still pursues compensation when the violation wasn't a big deal because he feels it is owed to him for time spent determining that the violation wasn't a big deal. It was that which pushed me over to "the other side". At that point, where you're billing people for your own time spent investigating their violation, it has become the kind of business Creative Commons objects to (and, I would argue, we should, too). — Rhododendrites talk \\ 19:43, 29 May 2024 (UTC)
I'll qualify my earlier statement: any contributor who wishes to enforce the attribution requirements needs to consider the cost-to-benefit ratio, including the opportunity cost. I think trying to single out specific methods of enforcing licensing terms is in effect a backdoor way to try to add additional license conditions, but it's not very effective in limiting problems for re-users, since we can't enact any remedies that will help them. Watermarking is probably the best way to inform re-users of potential issues, but cropping them out on Wikipedia would work against this. It would both hide the attribution information, and seem to make it legitimate to do so. If the community can't agree on displaying watermarks, I think either a new license needs to be allowed that balances the concerns of professional photographers versus unaware, casual re-users, or Commons and English Wikipedia will just have to make do without contributions from photographers concerned about enforcing attribution. isaacl (talk) 21:51, 29 May 2024 (UTC)
a new license needs to be allowed that balances the concerns of professional photographers versus unaware, casual re-users – That's actually what CC BY 4.0 already does, as it gives people one month after a notice to fix the attribution. And that's why it is (and should be) encouraged compared to previous versions of the CC licenses. Chaotic Enby (talk · contribs) 22:36, 29 May 2024 (UTC)
CC4 doesn't do that. CC4 makes it so after you cough up your fee, your license is restored and, assuming you've fixed the violation, you can continue using it. Under older licenses even if you fixed it, the license was invalidated so you could be sued again. So CC4 doesn't stop this behavior, but does make it less lucrative in some situations. — Rhododendrites talk \\ 23:33, 29 May 2024 (UTC)
Thanks for the explanation! I knew it gave an opportunity to fix the violation, but didn't know the person was still liable for prior use. Chaotic Enby (talk · contribs) 23:58, 29 May 2024 (UTC)
Re after you cough up your fee: while you may still legally have to pay a fee for the violation, the restoration of rights if you fix the violation within 30 days is automatic as soon as you do so, whether or not you pay any fees. Anomie 13:08, 1 June 2024 (UTC)
Sure, and commons:Commons:Village pump/Proposals#Feedback from Creative Commons covers the viewpoint that content licensed under version 4 is probably less likely to be a target for claims of copyright violation. But it doesn't prevent suing for damages that occurred prior to the violation being cured, so there remains an exposure for casual re-users and the potential for aggressive enforcement. There is good reason for the curing provision not to eliminate damages for prior harm, and so I can't see the Creative Commons licence getting rid of it in a future version. isaacl (talk) 23:39, 29 May 2024 (UTC)

I think Rhododendrites has misrepresented things a little here. Copyleft trolling is the deliberate creation of supposedly freely licenced images and then running a business collecting "fees" from people who don't follow the licence conditions perfectly. Diliff was an active Commoner and Wikipedian for many many years and uploaded 1500 world class images. Many English cathedrals and other historic buildings are illustrated to professional level thanks to Diliff. They are frequently lead images and featured pictures. Diliff was active on both projects reviewing images for featured picture and offered his expert knowledge to others trying to learn better techniques (myself included). Diliff uploaded his images not only to illustrate Wikipedia, but also on both Commons and Flickr with CC licences so others could use the images for free (though the Flickr licence was NC). He frequently got asked by people to reuse the images and if X was acceptable and were helped to do so. You can see such a query here just yesterday, where someone enjoyed being able to use Diliff's images for free for next to no effort and no cost.

What appears to have happened is Diliff got increasingly pissed off with companies using his images for free without attribution, including for example Apple. He enlisted the service of a controversial company, Pixy, that locates such misuse and demands a fee for unlicenced use of copyright images, which is indeed the case and legal. Unfortunately, Pixy's business model doesn't allow for forgiveness or for differentiating between small companies and large. I don't think Rhododendrites characterised Diliff's attitude correctly, as he has said that personally he would do so but (a) he doesn't have time/resources to do the work Pixy do and (b) he isn't given the choice by Pixy who need to get a return on their investigative work.

I think the general mood on Commons is that what Diliff is doing isn't acceptable and doesn't meet with CC's own recommendations that forgiveness and allowing users to fix up the attribution should be a priority and fees/fines left for egregious misuse. However, despite what Rhododendrites hinted above, about "on the table", neither deletion nor watermarking are anywhere near consensus levels of support.

I think we mostly are where we are because Wikipedia itself does not explicitly attribute in the way the CC licence demands. If you read a CC licence it requires something like "(c) David Iliff, CC BY-SA 3.0, Source" and none of that appears on Wikipedia. Users have to click on the image and then (for most) get a page where the licence reuse terms are only shown after further right clicks or clicking on download buttons etc. If the user simply left clicks on the big image to get the full size one, then they don't get any help on licence at all. Wikipedia has hidden the attribution behind a click, which legally is acceptable but practically isn't working 100% and isn't what nearly any reuser would do.

I assume the purpose of "idea lab" would be to figure out if we can come up with a good solution. Can we enhance the experience when a user clicks on one of Diliffs images so they are even more aware of the conditions attached to reuse. I wouldn't be opposed to such a page having a carefully worded warning that users of such images have been sued if the licence conditions are not met, for Diliff's works. But I don't think we should start defacing the image with a "watermark" or deleting them. -- Colin°Talk 08:01, 30 May 2024 (UTC)

You may want to read the CC licenses again, they do not require a specific method of attribution. They merely require the method be "reasonable to the medium or means You are utilizing". The 4.0 license simplifies the language and explicitly states that a URI or hyperlink to a page providing the information is reasonable. Anomie 11:38, 30 May 2024 (UTC)
I don't think he's alleged an actual violation by Wikipedia. I think he's pointing out that when we don't provide a "visible" credit line (e.g., in plain text in the caption of an image in an article), then it's not obvious to casual re-users that they might need to include a visible credit line. WhatamIdoing (talk) 19:04, 31 May 2024 (UTC)
In re Users have to click on the image and then (for most) get a page where the licence reuse terms are only shown after further right clicks or clicking on download buttons etc. If the user simply left clicks on the big image to get the full size one, then they don't get any help on licence at all.
In mw:MediaViewer, which is what most readers use to get the full size one, there is a proper credit line. For example, if you open today's Featured Picture at Commons in MediaViewer, underneath the image is the caption ("Interior of the Cathedral of Brasília.") and the credit line ("Donatas Dabravolskas - Own work"); on the other side is the "More details" button and the license ("CC BY-SA 4.0").
If by "they don't get any help", you mean that there's no tutorial about license requirements, then that's true, but all the information is there. Nobody in this discussion would need any further information to know what's required. WhatamIdoing (talk) 19:16, 31 May 2024 (UTC)
But there isn't a credit line where they see the image - in the article. There is nothing that tells them they need to give a credit when they use the image. There is nothing that tells them they need to know what the license is, not what "CC BY-SA 4.0" means. I suspect many people just click to get the larger version, then right click to save the image locally so they can use it. Obviously as experienced Wikimedians we understand what copyright is, what licenses are, that we need to know what license an image is released under, that there are (probably) conditions we need to comply with and that the license will tell us what they are. Not everybody does know that, and they don't even know that there is something they don't know. Thryduulf (talk) 19:36, 31 May 2024 (UTC)
It might not be a bad idea to put a copyright notice above the images in the File: namespace. Putting messages like that "above the fold" increases the chance that readers will see it. The WordsmithTalk to me 20:58, 31 May 2024 (UTC)
On Wikipedia, readers mostly don't see the File: page. On Commons, it's not up to us. WhatamIdoing (talk) 22:46, 1 June 2024 (UTC)
Most readers don't see it, but if they're looking to reuse images they're going to click on the image (which brings up the File: page) so they can get a better resolution and copy/save it from there. Putting a notice there gives them a better chance of seeing it. The WordsmithTalk to me 18:38, 2 June 2024 (UTC)
When readers click on the image, it doesn't bring up the File: page. It brings up the image (including credits and license information) in MediaViewer. WhatamIdoing (talk) 00:09, 3 June 2024 (UTC)
The Wordsmith, the behaviour you describe is similar to me and controlled by my Settings choices. I go straight to the Commons file page, which is fairly useless at helping re-users. If you read a Wikipedia page with a different browser or logged out you'll see what 99% of our readers see. Alternatively, try this link. The MediaViewer does the attribution explicitly per CC license terms (if the appropriate templates are used on the Commons page). It is good, but it doesn't explain to the reader that this is what they need to do to reuse this image, vs some random clutter on the page. I mean "CC BY-SA 4.0" looks like some sort of code. If that text appeared in the caption of every CC-licenced thumb, the reader would be left in no doubt that this is some kind of requirement. Showing it on mouse hover might also be acceptable (I know tablets and phones don't have this, but I suspect most people reusing the images are doing it on a computer).
If you right-click on the big image you do get a pop-up saying that you need to attribute this image and guidance for doing so. But if you left-click on the big image, it goes to the full size version and you are dealing with a JPG in your browser, not an HTML page, so no help at all. Since your cursor is a little magnifying glass, the temptation to left click and zoom in is high. -- Colin°Talk 13:50, 3 June 2024 (UTC)
Got it, I have MediaViewer disabled so I forgot it exists. I definitely think there's room to change that text to better explain the copyright terms. The WordsmithTalk to me 19:11, 3 June 2024 (UTC)
I think that the root of the problem is that the English Wikipedia's piss-poor image attribution practices somehow manage to combine the worst possible aspects of every approach to create a user-hostile experience AND provide insufficient attribution. That is to say:
  • On one hand, it's far too onerous. Every image in every article is a hyperlink that takes you away from the current page -- often including icons and interface elements! -- this is horrible for usability.
  • On the other hand, it's nowhere near onerous enough, as it doesn't really give attribution to the photographer/artist/etc -- the page itself is totally verboten to give any textual credit, and something as simple as their name is hidden behind a hyperlink... the very same hyperlinks that are annoying and disruptive to click on and everyone learns to avoid doing that. This gets far worse if somebody, say, mirrors the page or prints it out or reuses the image somewhere else -- they probably don't even realize there is a photographer. Before I became an involved Wikipedia editor I just figured that they had all come from some kind of public-domain catalog, or were taken by paid employees of the WMF, or were fair use stock photos or something.
There are other Wikipedias that provide the photographer's name in the thumbnail next to a photo, and it's completely normal and unobtrusive (almost every major website does this already, and you don't see anybody saying that it makes the Washington post look unprofessional).
I think that we should probably lead by example and stop hosing all the photographers who give their images out for free. jp×g🗯️ 11:58, 30 May 2024 (UTC)
Generally endorse all of this. There are a wide variety of ways that attribution could be better integrated into Wikipedia itself. But that would require some combination of consensus to highlight dreaded "authors" in articles or improvements to the interface that volunteers can't realistic expect to execute. But yeah if we could resolve all that, clunky kludges like forced watermarking wouldn't be necessary (and, I'd add, could easily be undone should these improvements come along). — Rhododendrites talk \\ 12:51, 30 May 2024 (UTC)
@Colin: Copyleft trolling is the deliberate creation of supposedly freely licenced images and then running a business collecting "fees" from people who don't follow the licence conditions perfectly - It's defined by the enforcement, not the creation. Someone who creates images in order to collect fees is indistinguishable in effect from someone who creates images and then collects fees.
Pixy's business model doesn't allow for forgiveness or for differentiating between small companies and large. I don't think Rhododendrites characterised Diliff's attitude correctly, as he has said that personally he would do so but (a) he doesn't have time/resources to do the work Pixy do and (b) he isn't given the choice by Pixy who need to get a return on their investigative work. - I think one of us misunderstands something about Pixsy. When I was looking for information about the site, what I saw was that Pixsy doesn't take action without the rights holder's go-ahead. i.e. there is a point where one could intervene. The way I interpreted that, combined with what Diliff actually said was that [when he reviews the violation] even if he determines there was no real harm to his photography business, he still wants money from people who messed up attribution to make up for the time he spent determining there was no real harm to his photography business. (this is from multiple comments in the DR). If Pixsy doesn't actually give him that opportunity, or if Pixsy perhaps charges Diliff if he decides to pass, which would be surprising, then that changes my understanding of the situation a little without actually fixing anything as the effect on reusers is the same. — Rhododendrites talk \\ 12:42, 30 May 2024 (UTC)
I think you'll find law has a thing or two to say about intent rather than effect. Diliff's first comment on the DR was to deny being a "copyleft troll". Diliff's a bright chap. I think there is room for reasonable people to disagree on the definition, which isn't in any dictionary. All the evidence points to his body of CC work being created in good faith with the ongoing (just yesterday on his talk page) intention of allowing free use, subject to the terms of the licence. There is zero evidence of the opposite. Zero evidence of malicious intent. Zero evidence of setting a trap. Zero evidence of someone making a "minor" mistake. Zero evidence that anyone asked to pay is actually short of funds or using the image non-commercially. We know nothing about the person who complained at Commons other than that they fully admit to thinking the image was public domain and not giving licence conditions a moments thought. Why they might think a modern professional-level photo of a London landmark was public domain I don't really know.
I think there is room to interpret Diliff's comments in different ways. You could ask him. I read it more as an explanation of Pixsy's business model. He says "I do want to make it clear that I am sympathetic to those who have been inadvertently caught up in this due to accidental misuse, and that this is not, as Nosferattus implies, the actions of a heartless copyleft troll" and "I did feel the need to correct the assumption being made by many here that all reusers involving minor breaches are being 'extorted' for the initial asking figure as I don't believe that is the case."
I think we are giving way way too much weight to some random unknown person on the internet who's annoyed they messed up and wants revenge, and not enough to a user we actually know to be a generous free-content producer for many many years. I suspect the fact Pixsy's involved has led to the automatic assumption this is copyleft trolling, but Diliff's remarks suggest otherwise. Read the descriptions here. These are people who set out to create a mass of supposedly CC images and then demand high fees for "minor attribution errors". There's nothing at all in Doctorow's articles about him having sympathy for people who just steal photos off the internet because "If it's on the internet it is free" stupidity and lack of care about content creators. Are we overreacting because someone got butthurt when they got caught stealing one of Diliff's photos for commercial use without the slightest concern for licence conditions? No evidence at present to suggest otherwise. -- Colin°Talk 13:34, 30 May 2024 (UTC)
That gets back to the ultimate problem to which I referred, regarding the available enforcement mechanisms being poor. From the re-user's perspective, the photographer's intent doesn't matter, just how enforcement of the licensing requirements is done. Changing English Wikipedia to provide attribution next to the images would at least provide a very prominent example of the attribution requirement being visibly met, thus making the need to do so more evident to re-users. (To avoid any discrepancy with user interface graphic elements, English Wikipedia could choose to only use public domain images for this purpose.) isaacl (talk) 15:18, 30 May 2024 (UTC)
On a side note, I disagree that the use of term "trolling" should be defined solely by enforcement. Not everyone who tries to enforce their license terms should be considered a troll, whose origins as an online term derives from its meaning in fishing. isaacl (talk) 15:25, 30 May 2024 (UTC)
Of course. I mean "by [manner of] enforcement". — Rhododendrites talk \\ 16:16, 30 May 2024 (UTC)
I don't think manner of enforcement is a sole determinant, either, but as I said, it doesn't really matter to the re-users. isaacl (talk) 01:36, 31 May 2024 (UTC)

IMO both the apparent intent of putting the images up and also the nature and scope of enforcement should be relevant to Commons/Wikipedia's stance. North8000 (talk) 21:20, 31 May 2024 (UTC)

  • I don't think this has necessarily been presented neutrally, and I'm in favour of going after people who are commercially appropriating CC-licensed work without adhering to the terms of the license. At the same time, I think we're within the licensing requirements - I guess the question is whether we make it easier for users to understand what is CC-licensed. I generally like the idea of including attribution in-line in the article itself. SportingFlyer T·C 02:50, 1 June 2024 (UTC)
    • Indeed not neutral, and not intended to be (this isn't a proposal, after all). But [again] this has nothing to do with whether Wikipedia is within the licensing requirements. This is about the reality that these licenses are poorly understood by the public and our "use our media!" messaging has been so successful that people assume they can just use it. A few users have decided to build a business on that misunderstanding -- a business model which both Flickr and Creative Commons itself have condemned and taken steps to curb. What I've said thus far is that we should be thinking about two things (a) design changes to avoid these misunderstandings, but (b) thinking about what to do about users who adopt this model of enforcement, without taking for granted that we will be successful in implementing any significant design changes (which is not something we're reliable for). Colin disputes whether Diliff should be considered a copyleft troll in particular. I'll disagree and say that anyone using automated means to demand money from small-time reusers, without providing any opportunities to fix the problem and demanding money even after determining damages are little-to-none, should qualify for use of that term, but we don't need to decide about Diliff here. I opened this thread to talk about the different options available and to see how enwiki views the more ... unorthodox approaches like "forced watermarking" (whether or not it happens for Diliff, and that does very much remain to be seen, it has already happened for Philpot and many users seem to consider it on the table for this sort of scenario). Ideally we can come to better design solutions to avoid licensing errors to begin with, but until we get there we need to deal with the problems as they arise. — Rhododendrites talk \\ 04:03, 1 June 2024 (UTC)
  • I see a few people above suggesting the idea of inline image credits in articles. If some of you think you want to turn that into an actual proposal, I'd encourage you to address the points listed at WP:Perennial proposals#Add in-article credit for images, particularly the points about whether there's evidence that doing this would actually make an impact on people copying images without correct attribution (not just your hopes that it would), whether it would incentivize people to spam their images into articles to get their name in them, and whether the CC-BY 3.0 and earlier's requirement that credit be "at least as prominent as the credits for the other contributing authors" would be problematic for icons using those licenses if we make the standard credit for illustrative images more prominent. Anomie 12:50, 1 June 2024 (UTC)

So what level of enforcement is acceptable

There are many many sites out there using my stuff without attribution. So which of these people if any would people say its acceptable for me to send legal threats to:

  • conferences-uk Seems to be a commercial site using my image with no credit of any kind nor mention of CC-BY-SA
  • organrecitals.uk same image seems to be a one man fan site no credit of any kind nor mention of CC-BY-SA
  • 4coffshore.com some kind of consulting firm using this image no credit of any kind nor mention of CC-BY-SA
  • plymouth.ac.uk University on the south coast of England. same image used as a header. no credit of any kind nor mention of CC-BY-SA
  • railfreight.com Some kind of trade publication. Gives credit but no mention of CC-BY-SA.

©Geni (talk) 12:58, 1 June 2024 (UTC)

My take is that the problem isn't in contacting them, it's what you're saying to them. Are you trying to get them to fix the missing attribution and such, or are you going for big fees regardless of whether they fix it? I think https://creativecommons.org/license-enforcement/enforcement-principles/ that was linked at the start of the discussion is a good read. Note that doesn't mean you can't ask for money in any situation, just that it should be reasonable to the use.
I'd also expect little sympathy from general editors (versus other photographers) for "I need $X from every violator to compensate me for the time I spent searching for violations and sending letters" or "to compensate me for what I paid some company to do the searching and sending for me", or for "the company I hired does this, even though I sometimes try to reduce it in some cases". Anomie 13:58, 1 June 2024 (UTC)
Well US statutory damages start at $750 a time so lets assume thats what is being asked for.©Geni (talk) 15:33, 1 June 2024 (UTC)
Going straight for statutory damages seems counter to the principles suggested in https://creativecommons.org/license-enforcement/enforcement-principles/ to me. Anomie 16:50, 1 June 2024 (UTC)
Many of these people probably aren't specialists about the exact specifics of what attribution is needed, so maybe first you could contact them to ask them to fix it and tell them what attribution is needed? Going for legal threats directly sounds a bit too much for just forgetting to credit an image. Chaotic Enby (talk · contribs) 22:29, 1 June 2024 (UTC)
This was Flickr's approach when they modified their community guidelines (see "give some grace"). — Rhododendrites talk \\ 22:54, 1 June 2024 (UTC)
That puts us in the "CC-BY-SA means public domain outside the unlikely event that the photographer contacts you personally" position. Remember most of these images have no attribution at all so we aren't talking "exact specifics" here.©Geni (talk) 00:35, 2 June 2024 (UTC)
Well, sure, in that "CC BY-SA means public domain outside the unlikely event the photographer sues you personally" is also true. Both require that the copyright owner notices and does something about it; the difference is what they do. But you're right that figuring out how to draw the line is very difficult and will probably defy short definitions. I don't think we'll see Commons adopt the Flick approach of "you need to give them a chance to fix it" (maybe, but I'd be surprised). At minimum because we just host so much content imported from other sites where the copyright owner isn't involved. So any determination of "copyleft trolling" (or whatever we want to call it) is going to require a degree of evidence, pattern of behavior, and judgment of uninvolved parties like most other behavioral cases around here, From the past cases the things which, I think, have pushed people over the line in that judgment have been: commissioning a ton of low-quality work just to enforce licenses and using a very particular custom attribution line that includes a URL and suing people for omitting the entirety of that custom line. In the current case, for me anyway, it was learning that the copyright owner wants to charge people for the time he spends determining that their license mistake wasn't damaging. I should say, in case it's not evident by Colin's replies, that not everyone sees a problem with that (or doesn't see it as enough of a problem to intervene). — Rhododendrites talk \\ 16:39, 3 June 2024 (UTC)
I think is unfortunate that Rhododendrites focused on the "copyleft trolling" aspect rather than a neutral discussion of how we help our content re-users comply with the licence conditions. It taints things with the idea that those enforcing the licence conditions are bad people, and has let this discussion to focus too much on whether Diliff is a copyleft troll and has "decided to build a business on that misunderstanding" which is so exaggerated I am getting BLP concerns we might have to erase this conversation. -- Colin°Talk 13:38, 3 June 2024 (UTC)
That was the framing in this section because that was the framing on Commons. Regardless of what you'd like to call it, this section (which didn't even directly mention Diliff to begin with) was to see what the enwiki community thinks about the various solutions that have been proposed, in part because you said at the DR that the enwiki community would reject the watermarking idea. It's not about how bad anyone is but how to protect reusers from people who, yes, build a business on people making licensing mistakes (that quote referred to a category of people doing so and didn't mention Diliff btw). After learning a bit more about Pixsy, I learned that they do nothing other than automate the collection of possible violations and sort them into a bunch of categories of websites for copyright owners to peruse. It's 100% up to the user to decide whom they want to initiate a case against. So that means Diliff is manually looking at these independent reusers and then, per his comments in the DR, making the affirmative decision to pursue damages even when there wasn't any real harm, in order to make money from the time he spends evaluating those cases. If that isn't explicitly contravening the enforcement principles set out by Creative Commons, I don't know what is. I don't know why you're trying to litigate the Diliff case here, though. — Rhododendrites talk \\ 16:10, 3 June 2024 (UTC)

Re

Unless some billionaire decides to fund a organization to only pursue violations in a spirit-of-the-license manner

Well I can think of one, if not a billionaire then a multi-millionaire at anyway: the Wikimedia Foundation, which takes in millions and millions of dollars more than it needs, could throw a roomful of good lawyers at anyone, and is a nonprofit public good with as much standing as anyone to sue or counter-sue.

The Foundation continues to have a huge income (for good or ill). It only costs us a fraction of our income to run the IT department, run a developer group, run a cubicle farm with people in suits doing accounting and all the other stuff a big nonprofit requires, run Wikimania, and do various other cost-effective interfacing with other organization and whatnot. And I mean not only do we not need to turn a profit, we can't. Can'td take it with us, so might as well spend it on something worthwhile. Experienced and profitable grift operation or on, adozen white-shoe lawyers being unleashed on you is hard to beat, or at least being a slam dunk to beat, so...

(I do get that there are various ah organizational issues that might prevent the Foundation from doing this, but also some encouraging markers... never know til you try I guess...)

I'm just saying, to anyone who knows how to get thru to important people in the Foundation and the political chops to make a good case... wouldn't this be a job for the Foundation? Herostratus (talk) 19:01, 2 June 2024 (UTC)

No. WhatamIdoing (talk) 00:12, 3 June 2024 (UTC)
This seems very very unlikely. I definitely can't see an organization with the ethos of Wikimedia being at all associated with suing someone for content enforcement. In theory, it might be nice if there were some automated process that contacted websites to say "hey it looks like you have an unattributed photo that came from commons -- can you fix that. the copyright owner could take legal action if you don't fyi" with no teeth behind it, but that doesn't seem remotely realistic to automate reliably and there's still a lot more we can do on the front-end much more cheaply to better communicate what reusers have to do. — Rhododendrites talk \\ 16:14, 3 June 2024 (UTC)
Wrt "I don't know why you're trying to litigate the Diliff case here, though", your opening post brought Diliff into the discussion with "It's happening again, and this time with one of our most accomplished and celebrated wikiphotographers,...." so your turned what could have been a neutral discussion into a posting that is trashing a Wikipedian's reputation and standing in our community.
I think there are two possible Wikipedia discussions. One is a neutral one about trying to improve things so that when people reuse CC images, they do so properly per the licence. I don't see how that discussion needs to involve Diliff or trolling or moral views. And that discussion might well belong on the idea lab. The second discussion would be if the Commons DR decided they would in fact delete or watermark all of Diliff's excellent photos (which right now seems as likely as the Tory party winning the general election). Then I think you'd need to go to the more public village pump, and ping all the relevant wiki projects, and personally I think a mob would go after you with pitchforks and flaming torches. That's the aspect I was most concerned about, because I don't think Wikipedians prioritise "free content project" the same way Commoners do. And also because Diliff is a human and I'm concerned about what news coverage might do. We both are disappointed in Diliff's actions here. -- Colin°Talk 14:20, 5 June 2024 (UTC)