Jump to content

User talk:The Earwig/Archive 14

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 10Archive 12Archive 13Archive 14Archive 15Archive 16Archive 18

Signpost issue 4 – 29 March 2018

Presidents and Vice Presidents of Palau

In my opinion, it would be a good idea to combine the articles President of Palau and Vice President of Palau. Both are short articles. We could redirect Vice Presidents of Palau to President of Palau. I would like your thoughts on this.Векочел (talk) 01:58, 4 April 2018 (UTC)

Векочел, I don't know enough about the politics or history of Palau to give an informed response. However, the subjects of the two articles are quite distinct and each one could be expanded to have unique content that wouldn't fit in the other article. Looking around, Category:Vice presidents by country is well populated and I don't see any other countries that have chosen to merge them. — Earwig talk 02:23, 4 April 2018 (UTC)

Bot request

Erasing copyvio detector bot is one of the best thing I have seen. Can this bot be used on mrwp? --✝iѵɛɳ२२४०†ลℓк †๏ мэ 07:32, 10 April 2018 (UTC)

Tiven2240, thank you. You can use the tool anywhere, ideally, though I don't guarantee it handles other languages as well as English. However, I don't run a bot that detects copyvios and removes them automatically. There are too many incorrect identifications (the results are not reliable enough) for this to be a good idea; humans should always have the final say. — Earwig talk 02:05, 11 April 2018 (UTC)

Template:AFC_statistics hasn't been updated since yesterday. Is there something wrong with the bot? -- » Shadowowl | talk 13:33, 18 April 2018 (UTC)

Shadowowl, there is an issue with lagging databases on Wikimedia Cloud Services; the data is about a day and a half old so the chart can't update. You can check on that here (that page seems to be misbehaving as well, but you can still see the lag). Anyway, it looks to be going down and should resolve itself soon. — Earwig talk 01:57, 19 April 2018 (UTC)

The Signpost: 26 April 2018

Copyvio detector not working

Resolved

Hello Earwig, I have a problem with the copyvio detector today: It's returning an error "An error occurred while using the search engine (Google Error: HTTP Error 403: Forbidden)." Any help would be appreciated. Thanks! — Diannaa 🍁 (talk) 12:27, 5 April 2018 (UTC)

Kaldari, do you have any idea? Unfortunately, I'm not seeing anything in the logs that could help diagnose—just the 403 error. — Earwig talk 03:51, 6 April 2018 (UTC)
It looks like we hit the daily query limit (10,000 queries per day). Any idea why there was such a big spike today? Usually, we only get to about 5,000 queries a day. Kaldari (talk) 04:41, 6 April 2018 (UTC)
No idea why that would happen. It's working again today. Thanks for looking into this. — Diannaa 🍁 (talk) 10:26, 6 April 2018 (UTC)
This is likely related to SQLBot's AFC-Ores reports, which are using the tool. — JJMC89(T·C) 05:14, 7 April 2018 (UTC)
@JJMC89: Nope, explicitly didn't use the google search functionality (ever), and in the last 24 hours rewrote to cut the amount of api pulls by 85%. SQLQuery me! 05:25, 7 April 2018 (UTC)
Query levels seem to be back to normal today. Kaldari (talk) 05:36, 7 April 2018 (UTC)
@SQL: So then your bot is basically just comparing articles against the external links included in the page? How useful is this? — Earwig talk 18:14, 7 April 2018 (UTC)
Seems to be pretty helpful so far. It would probably be better with google on - for sure, but I was trying to follow the 'Etiquette' section (I also use a sleep() in between queries), and not consume more than my share of finite resources. And, looking at today's high score, Draft:Asli_Demirguc-Kunt - my query shows 90.2% confidence, while bypassing the cache and using google shows 89.6%. I've spot checked a lot of them, and most seem to have a similarly negligible difference. That mainly leaves articles with no links. I'm not 100% sure how I should proceed on those ones yet. SQLQuery me! 01:29, 8 April 2018 (UTC)

I got the same error again late yesterday (circa 22:00 UTC) and the tool is functioning normally again this morning. Posting as information. @Kaldari:Diannaa 🍁 (talk) 11:49, 2 May 2018 (UTC)

Yes, we’ve been discussing this one over at phab:T193559. — Earwig [alt] talk 15:38, 2 May 2018 (UTC)
@The Earwig and Diannaa: It looks like we're hitting the daily quota every 5 days exactly due to a regularly timed spike. On April 26, May 1, May 6, and May 11, there were huge spikes in Google Search API usage from Tool Forge resulting in hitting the quota and then being denied service for the rest of the day. I'm going to file a Phabricator task to investigate further. Ryan Kaldari (WMF) (talk) 20:11, 11 May 2018 (UTC)
Thanks Ryan. — Diannaa 🍁 (talk) 20:43, 11 May 2018 (UTC)
From looking at the proxy logs we were able to confirm that the traffic spike is coming from Earwig's Copyvio Detector. Earwig, could you look at the logs on your end and see if there's anything there that could be helpful in tracking it down. As I mentioned, the last spike was between 1 and 2am PST this morning. Ryan Kaldari (WMF) (talk) 22:34, 11 May 2018 (UTC)
Thanks for investigating. Sure, I'll see what I can find in the logs tomorrow morning (just got home, a bit tired). — Earwig talk 02:19, 12 May 2018 (UTC)
Replied at phab:T194541. — Earwig talk 21:13, 12 May 2018 (UTC)

The Signpost: 24 May 2018

Tool did not detect this case per here Wikipedia_talk:WikiProject_Medicine#Agenesis_of_superior_vena_cava

The page Agenesis of superior vena cava was entire copied from here https://journals.lww.com/md-journal/Fulltext/2018/06010/The_first_reported_case_of_factor_V_Leiden.1.aspx yet it missed it.

Best Doc James (talk · contribs · email) 20:19, 4 June 2018 (UTC)

Doc James, I took a look. In this case, the tool searches Google for the right phrases, but Google does not return that page as result. Sometimes it seems their API is not as accurate as the regular web search us humans have access to. My general advice is that the tool can't detect everything: while a hit is a good sign that a copyvio might be present, the absence of a hit certainly does not mean an article is copyvio-free. — Earwig talk 02:14, 5 June 2018 (UTC)
Interesting. Thanks for the follow up. Doc James (talk · contribs · email) 08:51, 5 June 2018 (UTC)

Women in Red tools and technical support

We are preparing a list of tools and technical support for Women in Red. I have tentatively added your name as you have provided general technical support, including tool developments. Please let me know whether you agree to be listed. You are of course welcome to make any additions or corrections.--Ipigott (talk) 07:29, 8 June 2018 (UTC)

Sure Ipigott, I'm happy to help and to continue maintaining things as necessary. (Though I can't promise significant new features.) — Earwig talk 02:18, 9 June 2018 (UTC)

Notifying you of the requested move on this module, because it would affect one of EarwigBot's tasks. {{3x|p}}ery (talk) 21:54, 26 June 2018 (UTC)

Thanks, I will comment there. — Earwig talk 02:42, 27 June 2018 (UTC)

The Signpost: 29 June 2018

Copyvio Bot on Punjabi Wikipedia

Hi @The Earwig: I am a Punjabi Wikipedia admin and I think the Copyvio Bot will be great addition on Punjabi Wikipedia. Besides, running it on new articles from now, can we also run the bot on existing articles on Punjabi Wikipedia as well ? Let me know if anything else in required. --Satdeep Gill (talkcontribs 07:29, 30 June 2018 (UTC)

Hi Satdeep Gill. While I do have a tool to check for copyvios, I don't have a bot that does it automatically. The main reason is that checking for copyvios is slow and expensive (there is a daily limit of about 1,000 checks due to the data source we use), and there are enough false positives that I think humans should always review the results before they get shown to other people (like the article creator). See my response to a similar question here. — Earwig talk 14:36, 30 June 2018 (UTC)
I totally agree that humans should check it. What we are looking for is to have it enabled and that the tool adds a template to articles that might have copyvio. --Satdeep Gill (talkcontribs 07:43, 1 July 2018 (UTC)
Thursday July 12, 5-8pm: Wiki Loves Pride Edit-a-thon @ Jefferson Market Library

Wikimedia NYC invites you to attend a Wiki Loves Pride Edit-a-thon on Thursday, July 12th at Jefferson Market Library! Wiki Loves Pride is a global campaign to expand and improve LGBT-related content across all Wikimedia projects, in all languages. We are holding this year's event in July in order to support folx who want to contribute a photograph they took at one of NYC's many Pride events or edit an article about something they learned this June. Not sure what to contribute? No problem! We will have a list of articles that need your help.

5:00pm - 8:00 pm at Jefferson Market Library, 425 6th Ave

--Megs (talk) 14:57, 10 July 2018 (UTC)

P.S. You are also invited to the "picnic anyone can edit", the Great American Wiknic NYC @ Prospect Park, Sunday, July 29!

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Sunday July 29: Annual Wiki-Picnic @ Prospect Park

Sunday July 29, 2-7pm: Annual Wiki-Picnic

You are invited to join us the "picnic anyone can edit" in Brooklyn's green Prospect Park, as part of the Great American Wiknic celebrations being held across the USA. Remember it's a wiki-picnic, which means potluck.

2–7pm - come by any time! Our reserved picnicking area is by Bartel-Pritchard Square entrance, located at Prospect Park West and 15th Street.
The picnic will be held by the park's Bartel-Pritchard Square entrance immediately on the lawn to your right as you walk through the lovely lotus columns.
Look for us by the Wikipedia / Wikimedia NYC banner!

We hope to see you there! --Pharos (talk) 08:24, 23 July 2018 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 31 July 2018

Bots Newsletter, August 2018

Bots Newsletter, August 2018

Greetings!

Here is the 6th issue of the Bots Newsletter. You can subscribe/unsubscribe from future newsletters by adding/removing your name from this list.

Highlights for this newsletter include:

ARBCOM
  • Nothing particular important happened. Those who care already know, those who don't know wouldn't care. The curious can dig ARBCOM archives themselves.
BAG
  • There were no changes in BAG membership since the last Bots Newsletter. Headbomb went from semi-active to active.
  • In the last 3 months, only 3 BAG members have closed requests - help is needed with the backlog.
BOTREQs and BRFAs

As of writing, we have...

Also

Discussions

These are some of the discussions that happened / are still happening since the last Bots Newsletter. Many are stale, but some are still active.

New things

Thank you! edited by: Headbomb 15:04, 18 August 2018 (UTC)


(You can subscribe or unsubscribe from future newsletters by adding or removing your name from this list.)

Wednesday August 29, 7pm: WikiWednesday Salon and Skill-Share NYC

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Babycastles gallery by 14th Street / Union Square in Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Babycastles gallery, 145 West 14th Street
(note the new address, a couple of doors down from the former Babycastles location)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Pharos (talk) 00:14, 29 August 2018 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 30 August 2018

Earwig Bot!

Heya, thanks for all the things ya do! I noticed the AfC bot is on strike. Hopefully y'all can settle this labor dispute :D I was gonna tinker with the bot run setting thingy, but didn't wanna bork it. Anywho, thanks in advance! Drewmutt (^ᴥ^) talk 17:27, 7 September 2018 (UTC)

Thanks for letting me know, Drewmutt. I restarted him and he should be back to working now after a short delay. — Earwig talk 00:03, 8 September 2018 (UTC)
Seems it is doing something unusual at Template:AFC statistics. Curb Safe Charmer (talk) 17:01, 10 September 2018 (UTC)
@Curb Safe Charmer: what do you mean? — Earwig [alt] talk 18:09, 10 September 2018 (UTC)
Yes, quite odd indeed.. here's how it looks to me.. Drewmutt (^ᴥ^) talk 19:10, 10 September 2018 (UTC)
That is, unfortunately, expected behavior. The backlog is large enough that the status page is too long for MediaWiki to render all of it. We need more reviewers! — Earwig [alt] talk 23:15, 10 September 2018 (UTC)
Dang. Well, until backlog drive season, can we make it simply link to the draft as opposed to having a somewhat useless invoke tag? Not sure if this helps the issue, or if that's even feasible. Drewmutt (^ᴥ^) talk 00:01, 13 September 2018 (UTC)
I don't recommend that. It's not easy to tell in advance where the cutoff point is. For what it's worth, we're only losing about 15% of the page, and probably a fair bit of that are drafts that have already been declined/accepted. If you really want a list of every draft, there's always CAT:PEND. By the way, I've wanted to move the status page to Labs for a while so we don't need to deal with rendering it on-wiki, but I haven't had the time/desire to make that change yet. — Earwig talk 00:29, 13 September 2018 (UTC)
Wednesday September 26, 7pm: WikiWednesday Salon / Wikimedia NYC Annual Meeting

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Babycastles gallery by 14th Street / Union Square in Manhattan. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

This month will also feature on our agenda, upcoming editathons, the organization's Annual Meeting, and Chapter board elections - you can add yourself as a candidate.

We will include a look at the organization and planning for our chapter, and expanding volunteer roles for both regular Wikipedia editors and new participants.

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Babycastles gallery, 145 West 14th Street
(note the new address, a couple of doors down from the former Babycastles location)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Pharos (talk) 20:44, 20 September 2018 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

Copyvio Detector

Hi Ben; it seems that people using your Copyvio Detector are occasionally too quickly jumping to the conclusion that a Wikipedia article must have been taken from some external site when it's in fact the other way round. The text of Wikipedia articles that have been around for some time might appear on many websites, sometimes lacking appropriate attribution. So I wonder whether you might consider adding a caveat to the page of your tool - something like: "If the Wikipedia article was created some time ago, please check whether similar content on other websites might be based on the Wikipedia article before assuming a copyright violation on Wikipedia's side"? Gestumblindi (talk) 11:48, 29 September 2018 (UTC)

That's reasonable, Gestumblindi, I'll add something similar. — Earwig talk 17:37, 29 September 2018 (UTC)

The Signpost: 1 October 2018

The Signpost: 28 October 2018

Copyvio tool downtown

Hey Earwig, just wanted to let you know that Earwig's Copyvio Detector wasn't working for about half a day due to an issue with Google. It has been resolved and is working again. Sorry for the inconvenience. Kaldari (talk) 19:19, 31 October 2018 (UTC)

Got it, thanks for letting me know. — Earwig [alt] talk 21:16, 31 October 2018 (UTC)

ArbCom 2018 election voter message

Hello, The Earwig. Voting in the 2018 Arbitration Committee elections is now open until 23.59 on Sunday, 3 December. All users who registered an account before Sunday, 28 October 2018, made at least 150 mainspace edits before Thursday, 1 November 2018 and are not currently blocked are eligible to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2018 election, please review the candidates and submit your choices on the voting page. MediaWiki message delivery (talk) 18:42, 19 November 2018 (UTC)

ZackBot 12

Regarding ZackBot 12, and ZackBot, how do I go about getting the bot flag on that account? --Zackmann (Talk to me/What I been doing) 18:58, 19 November 2018 (UTC)

@Zackmann08: You should already have a bot flag on that account? It's been flagged since 2016. — Earwig talk 01:54, 20 November 2018 (UTC)
Hmm... How do I get my edits tagged with the bot flag then? --Zackmann (Talk to me/What I been doing) 01:57, 20 November 2018 (UTC)
@Zackmann08: Oh. You need to send a special parameter with each edit for the flag to be used. Your bot framework should have an option for it (if you’re using one). The raw API parameter is just “&bot=true” I think. — Earwig [alt] talk 17:06, 20 November 2018 (UTC)
I tried that a while ago and got an error message that I needed to have the param assigned to my account. I'll re-investigate. :-) Thanks! --Zackmann (Talk to me/What I been doing) 17:40, 20 November 2018 (UTC)
Also, when you get a chance, would love input on Wikipedia:Bots/Requests for approval/ZackBot 13. :-) --Zackmann (Talk to me/What I been doing) 20:09, 20 November 2018 (UTC)

Template:Lc and Template:Lc1 merge

I'm wondering if you can provide some background on Template:Cfd2/sandbox? CfD is now bizarrely using a monospaced version at 110% size with a hyphen instead of the normal Template:Lc. The change proposed at the sandbox seems a great idea. --Bsherr (talk) 19:25, 26 September 2018 (UTC)

Hi Bsherr, unfortunately, I have no recollection of that edit! It seems the change to make the text larger was done here, so you would probably want to ask Redrose64 before undoing that, but the hardcoding of monospace instead of the normal font has been in place for a long time. I'm not sure why, nor do I have a strong preference either way. — Earwig talk 02:15, 27 September 2018 (UTC)
Thanks for the advice. I'm going to propose a change to just use Template:Lc or, in the alternative, to eliminate the monospaced font in favor of increasing the kerning. I'll let you know when I post should you like to comment. --Bsherr (talk) 21:54, 28 September 2018 (UTC)
Done. The discussion is at Wikipedia:Templates for discussion/Log/2018 November 23#Template:Lc1. --Bsherr (talk) 21:50, 23 November 2018 (UTC)

The Signpost: 1 December 2018

December 19, 7pm: WikiWednesday Salon and Skill-Share NYC

You are invited to join the Wikimedia NYC community for our monthly "WikiWednesday" evening salon (7-9pm) and knowledge-sharing workshop at Fordham University's Lincoln Center campus in Manhattan, near Columbus Circle. Is there a project you'd like to share? A question you'd like answered? A Wiki* skill you'd like to learn? Let us know by adding it to the agenda.

We will also follow up on plans for recent and upcoming edit-a-thons, museum and library projects, education initiatives, and other outreach activities.

7:00pm - 9:00 pm at Fordham University's Lincoln Center campus (South Lounge) at 113 W 60th Street, Manhattan
(note this month we will be meeting in Manhattan, near Columbus Circle, not at Babycastles)

We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Wikimedia New York City Team 03:23, 13 December 2018 (UTC)

(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)

The Signpost: 24 December 2018

A email I sent.....

Hello, The Earwig. Please check your email; you've got mail!
Message added 18:02, 2 January 2019 (UTC). It may take a few minutes from the time the email is sent for it to show up in your inbox. You can remove this notice at any time by removing the {{You've got mail}} or {{ygm}} template.

 — fr 18:02, 2 January 2019 (UTC)

Replied. — Earwig talk 06:58, 3 January 2019 (UTC)

Copyvios

Copyvios is currently down, the connection times out. Is this related to the new workers? Best regards, Luke081515 01:19, 19 January 2019 (UTC)

Copyvio tool

Hi Earwig, your api documentation for the tool mentions that there is a global limit for requests using the search engine of 1000. I want to continue the task merlbot did until 2016, checking all new articles in dewiki for copyvios. From the stastics I calculated that these are around 300 articles per day, so pretty much. That's why I currently implemented the function without using the search engine (I don't want to consume so much of the limit, would be bad for other users), however the tool is much more effective with the search engine. Is there a way to extend the global limit? And is there a way to include Turnitin in the api request as well? I have not found anything in the api documentation about it. P.S.: Please ping me when you reply, I mostly do not look at enwiki. Best regards, Luke081515 02:03, 13 January 2019 (UTC)

@Luke081515: Unfortunately I do not control the global limit, that's set by Google. However, I think it's fine if you enable the search engine for a while as a test. We can see whether it ends up making too many requests and disable it later if so. I planned to add Turnitin to the API, but haven't gotten around to it. You can access it separately, though; the URL should look like https://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=PAGE_TITLE&lang=de&report=1 I think. — Earwig talk 06:06, 13 January 2019 (UTC)
Ok, thank you. I've now set &use_engine= to true. The bot will check any new articles that are not disambig pages or redirects, and runs every 30 minutes. If it's too much, please ping me and I will disable it again. Best regards, Luke081515 14:49, 13 January 2019 (UTC)
Is there a way to extend the limit? I'm planning to check also big insertions into dewiki, not only page creations. I know that the limit is on googles side, and I guess making it bigger would cost a bit money. I can imagine, that wmf or wmde would support this, can you tell me who is your current contact concerning the google api at wmf? Best regards, Luke081515 00:04, 20 January 2019 (UTC)
That would be User:Kaldari. I’m fairly certain that there is no way to raise the limit, based on previous attempts to do so. You should try to tune down the request rate if we’re hitting it too frequently. Maybe there are some simple heuristics you can apply to ignore certain pages? — Earwig [alt] talk 00:17, 20 January 2019 (UTC)

Possible copyvio tool bug report

I tried to check the page Dorothy Misener Jurney using Earwig's Copyvio Detector with its default settings, to examine URLs listed in the article. It reported about 2.0% violations. HOWEVER it didn't actually check one of the sources cited, https://shsmo.org/manuscripts/descriptions/womenmedia/essays/names/j/jurney/ If I tell it explicitly to do a URL comparison to that citation, I get a > 64% violation rate. I'm working on cleaning up the article, but I'm concerned that the URL didn't get checked initially. Mary Mark Ockerbloom (talk) 02:08, 21 January 2019 (UTC)

Thanks for the bug report, Mary Mark Ockerbloom. It looks like that URL is causing the tool some trouble. The first time you ran the check, that page timed out before it could return any data, which gets shown as "0%". But when you did the direct comparison, it loaded fine, showing the potential match. Unfortunately there's not much we can do about this kind of situation, though I suppose the tool could indicate that error more clearly. — Earwig talk 03:23, 21 January 2019 (UTC)
  • I would strongly encourage a clear and visible distinction between "0%" meaning "No copyvios found" and some other marker to indicate the page could not be examined... Thanks for your work on this tool. I've found it really useful & keep it bookmarked :-) Mary Mark Ockerbloom (talk) 03:38, 21 January 2019 (UTC)

Problem in City of Stonnington#Buses and probably other locations

@The Earwig:You appear to have past connection with template MetlinkBus which appears to have been renamed PTVBus in 2015 by yourself - not a problem in itself. In City of Stonnington#Buses there are six bus routes which use this template, with route 734 still working OK but the other 5 routes 624, 612, 623, 767 and 822 no longer working. Earlier this week the PTV put up a new version of their website [1] where a lot of earlier links are no longer working. The reason these routes are not working may have been caused by this or possibly the data has changed earlier as I had not looked at this article before today. Can you be of any assistance in this area? Fleet Lists (talk) 07:18, 25 January 2019 (UTC)

I think I have solved the problem. I will try and d\fix it and let you know how I go.Fleet Lists (talk) 07:54, 25 January 2019 (UTC)
I have made some changes to Module:PTVBus/data‎ which seem to have solved the problem. I found another article which has a large number of this type of error but that will need to wait until another day to fix those. I was surprised to find that that module had not had changes made to it since late 2015.Fleet Lists (talk) 08:15, 25 January 2019 (UTC)
OK. This was a while ago and I don’t remember the situation, so your guess is as good as mine as to what needs to be done here. Glad to hear you’ve mostly figured it out. — Earwig [alt] talk 14:18, 25 January 2019 (UTC)

Rejected AFC submissions and AFC statistics

Last year a new AFC review result, "rejected", was introduced. It is more severe, more final, than "declined" in that it doesn't give the submitter a path to improve and resubmit the draft. (A random example is User:Naveengrande/sandbox.)

Now that the reject option is being used, questions are arising about when it should be used, how much it's being used, whether it's being used properly, etc.

EarwigBot shows recently rejected submissions the same way as recently declined ones on Template:AFC statistics. It would be useful if one could distinguish the rejects on that page. Perhaps EarwigBot could display them in a different section from "declined", or with "rejected" in the notes column. Is something like that an enhancement you'd be willing to make? --Worldbruce (talk) 15:37, 28 January 2019 (UTC)

@Worldbruce: Thanks for the suggestion and for letting me know about the new status. I added 'rejected' as a note for the declined section. It will take a while for the whole table to update, but freshly declined submissions should have it starting now. — Earwig talk 03:49, 29 January 2019 (UTC)

The Signpost: 31 January 2019

definitions.net

Hey,

You might want to look into adding definitions.net onto the Wikipedia mirror list, I've been going through Category:Articles with improper non-free content and quite a few of them, after looking at various archives, appear to be copied from Wikipedia, generating false copyvio reports.

Thanks,

SITH (talk) 16:30, 8 February 2019 (UTC)

Thanks for the suggestion. Added. — Earwig talk 23:23, 10 February 2019 (UTC)

Copyvio Detector

Hi, I am unable to access at Copyvio Detector. It shows some "502 Bad Gateway" and "The server timed out". Please fix it. I think the main problem is the server speed getting slow. Xain36 (talk) 08:18, 16 February 2019 (UTC)

Please look two threads up. — Earwig talk 17:12, 16 February 2019 (UTC)

Copyvio Detector not working

He Ben, the copyvio detector quit working a couple hours ago, with the page failing to load but not timing out. If I leave it spin long enough it shows a 502 Bad Gateway. Any assistance you can offer to get it working again would be most appreciated. Thanks, — Diannaa 🍁 (talk) 23:05, 1 February 2019 (UTC)

It's working again! in fact it's zippy and full of pep. Thank you, — Diannaa 🍁 (talk) 01:04, 2 February 2019 (UTC)
Well, I see some bizarre errors in the log that I've never seen before, like we're running out of memory. I'll see if I can defend against this for the future. — Earwig talk 01:16, 2 February 2019 (UTC)
Hi Ben, the copyvio detector is not working. I'm not sure how long it's been down; it failed to load on my first attempt to use it this morning and it's been down for at least half an hour. Any assistance would be appreciated. Thanks, — Diannaa 🍁 (talk) 13:14, 13 February 2019 (UTC)
I kicked it, think it's OK now. This looks like the same issue as before. Didn't have a chance to investigate then, but I'll try to do it later when I have some free time. — Earwig talk 13:42, 13 February 2019 (UTC)
Thanks so much Ben. I don't know how I ever got along without this tool, so helpful for copyright cleanup. Diannaa 🍁 (talk) 13:49, 13 February 2019 (UTC)
Hi Ben. The page is once again failing to load :/ Could you please take a look? Thanks, — Diannaa 🍁 (talk) 02:36, 15 February 2019 (UTC)
It looks like the bot is running - do you just mean the webpage? — xaosflux Talk 02:50, 15 February 2019 (UTC)
There's two different tools. The reason I posted here is because Earwig's copyvio detector tool is not working. It spins for a while and then produces a 502 Bad Gateway. Eran's CopyPatrol is also failing to load; the last time I was able to use the page properly was at around 03:02 UTC. — Diannaa 🍁 (talk) 03:45, 15 February 2019 (UTC)
This time it’s definitely not my fault! Toolforge has been experiencing an unlikely combination of issues that would bring down most tools using a database for anything. That’s presumably why CopyPatrol was affected too. I’m not sure when things will fully stabilize. I will kick it in a little bit, but I don’t know how long that will last. — Earwig [alt] talk 12:40, 15 February 2019 (UTC)
Thanks. I have some cases that will be impossible to solve without your tool, and not having it triples the time it takes to do the checks, so anything you can do to keep it working in the interim would be appreciated. — Diannaa 🍁 (talk) 14:39, 15 February 2019 (UTC)

() Just following up. Unfortunately, things on Labs are in even worse shape now, and there doesn't seem to be anything I can do to fix it myself. Will continue to keep an eye out, but I think I just have to wait for now. — Earwig talk 03:31, 16 February 2019 (UTC)

Just a "thanks" for writing and supporting this tool. I turned to it today for a DYK check ... hope it's back soon! ☆ Bri (talk) 17:17, 16 February 2019 (UTC)
Update: The issues will likely not be resolved until Tuesday at the earliest. — Diannaa 🍁 (talk) 17:34, 16 February 2019 (UTC)
Well, I rewrote the tool to remove the dependency on the broken part of Toolforge. We seem to be OK for now. Since I'm not sure how this change will affect performance in general, I will continue to monitor things throughout the day. — Earwig talk 19:26, 16 February 2019 (UTC)

Forbidden error on earwig

Hi, I keep getting:

An error occurred while using the search engine (Google Error: HTTP Error 403: Forbidden). Try reloading the page. If the error persists, repeat the check without using the search engine.

When using Earwig's copyvio tool.

Any advice,

RhinosF1(chat)(status)(contribs) 21:48, 24 February 2019 (UTC)
There's a daily limit on the number of searches with Google that was exceeded. It will reset at midnight. — Earwig talk 22:04, 24 February 2019 (UTC)
Thanks, RhinosF1(chat)(status)(contribs) 22:12, 24 February 2019 (UTC)
RhinosF1, I think it's at Midnight Pacific time, where Google's servers are located. — Diannaa 🍁 (talk) 00:55, 25 February 2019 (UTC)

Quote Box

Have only used this tool recently and it seems great. Can I comment it does not seem to identify content within Template:Quote box in the article compare pane giving an increased risk of false positives unless the article is checked. If it is not possible to do this would it be advisable to indicate to users they need to manually check this? Thank you. Djm-leighpark (talk) 18:27, 25 February 2019 (UTC)

@Djm-leighpark: That's strange, because I thought it did look inside quote boxes. Do you have an example page? I tried in my sandbox and it seems to work. — Earwig talk 02:28, 26 February 2019 (UTC)
The 18:48 version of this page ... to be absolutely clear it matches the text the the quote in red however in the left compare pane the user (ie person runnning the tool) cannot see that it is inside a quote (without looking at the article). Issue is with the quote One of my proudest moments ... Amererica (by P. R. Brown) not being easily identifiable in a quote in the left hand pane. Hope it makes sense what I am trying to say. Thankyou.Djm-leighpark (talk) 03:21, 26 February 2019 (UTC)
Oh, I see, you're saying that the text inside the quote box is not identified as being part of a quote. That's true. I think this falls under the general disclaimer that all results from the tool need to be manually reviewed. False positives can also come from inline quotes in the article text as well as things like book titles and long proper nouns, and detecting these would be difficult. — Earwig talk 03:48, 26 February 2019 (UTC)
That's fair enough. I do wonder if the emphasis on the tool initiation page of Be aware that other websites can copy from Wikipedia, so check the results carefully, especially for older or well-developed articles without mention to do a manual check of the results for quotes can be misleading ... perhaps especially with articles such as Dead to the World Tour and [https://www.revolvermag.com/culture/marilyn-mansons-antichrist-superstar-story-behind-album-cover-art this source. Its just a thought from a user. One other though would be to change the submit button from active from once the tool is launched ... I've now got used to looking for the spinning working icon from the chrome browser but an active looking Submit button holds my eye and I am so tempted to press it again! Just of couple of thoughts. Thankyou. Djm-leighpark (talk) 04:19, 26 February 2019 (UTC)
Those are reasonable suggestions, thank you. I'll see what I can do. — Earwig talk 02:05, 27 February 2019 (UTC)

Video tutorial regarding Wikipedia referencing with VisualEditor

Hi, I have received a grant from WMF to support production of a video tutorial regarding creating references with VisualEditor. I anticipate that the video will be published in March 2019. If this tutorial is well received then I may produce additional tutorials in the future for English Wikipedia and possibly other projects such as Commons and Spanish Wikipedia. If you would like to receive notifications on your talk page when drafts and finished products from this project are ready for review, then please sign up for the project newsletter.
Regards, --Pine 00:30, 28 February 2019 (UTC)

The Signpost: 28 February 2019

Project Tagging based on Category

Hi. I know that quite a few pages that should be tagged with the Children's Lit WikiProject banner lack them. I was wonder if articles lacking the project banner in the following two categories (inclusive) could be tagged: Category:Children's literature and Category:Young adult novels? Best, Barkeep49 (talk) 02:00, 28 December 2018 (UTC)

From a cursory look, this should be possible. I'll let you know when I start/finish the task, or if I have any questions before I start, probably within the next couple days. — Earwig talk 03:11, 28 December 2018 (UTC)
Just checking in on this. Thanks and Best, Barkeep49 (talk) 02:19, 13 January 2019 (UTC)
Apologies for the delay, I had to do some work to migrate the bot to a new backend on Toolforge. I'll try to start this when I come home from work tomorrow. — Earwig talk 07:45, 14 January 2019 (UTC)
@Barkeep49: Here's the full list of categories the bot will process (all subcategories recursively of those two you mentioned): User:The Earwig/Sandbox/Children's Lit. Can you help me look through this and remove anything that doesn't belong? It seems mostly OK, but there are some things I imagine we don't want to tag, like anything including "video game"... — Earwig talk 07:56, 15 January 2019 (UTC)
I chopped a few hundred from the list - the project has generally covered derivative properties to some extent and so when that connection felt strong I left it but when it got too faraway from the original book (or if it was not a literary property to begin with), I removed it. I also removed many of the comic/manga categories as only a smaller percentage of those would be covered in our scope - its intended audience would have to be children or young adults which is not the case for a substantial percentage of comics/manga. Let me know if you have any other questions and thank you for your ongoing help with this. Best, Barkeep49 (talk) 18:04, 15 January 2019 (UTC)
Excellent, that's exactly what I needed. The bot is running now. — Earwig talk 03:18, 16 January 2019 (UTC)
Thanks. I'm abashed to admit I already knew this because an article on my watchlist got the banner... Thanks for all your assistance. Best, Barkeep49 (talk) 05:49, 16 January 2019 (UTC)
@Barkeep49: I paused the task until I get home and can look a bit more carefully. I see we’ve been tagging films based on children’s books (see the bot’s recent contribs); I understand the consideration for derivative works, but do you think the relationships are clear enough in general to tag automatically? — Earwig [alt] talk 22:34, 16 January 2019 (UTC)

Hello! I'm curious as to why Don Paterson has been tagged with the Children's Literature project banner. I don't associate him with children's literature, and nothing in the article or its categories seems to support this. Am I missing something obvious? --Deskford (talk) 20:41, 16 January 2019 (UTC)

@Deskford: The connection comes from the Costa Book Awards; he is in the category of winners, which is in a category of children’s literary awards. This is an incorrect relationship, as the CBA does not look exclusive to children’s literature. I’ll corrrect this when I get home. — Earwig [alt] talk 21:41, 16 January 2019 (UTC)
Ah, that makes sense. Thanks! --Deskford (talk) 21:54, 16 January 2019 (UTC)
I've recently reverted EarwigBot's edits to Talk:Tommen Baratheon, Talk:Arya Stark, Talk:Bran Stark, and Talk:Rickon Stark, edits that added and WikiProject Children's Literature banner to the talk page. While the characters are children, A Song of Ice and Fire is definitely not children's literature, so I'm wondering why this happened. --TedEdwards 21:19, 16 January 2019 (UTC)
@TedEdwards: Thank you for pointing that out. This is coming from Category:Child characters in literature, which is in Category:Children's literature, a clearly incorrect relationship. We’ll fix this. — Earwig [alt] talk 21:41, 16 January 2019 (UTC)
Earwig anything I can do to be of assistance at this point? Best, Barkeep49 (talk) 02:06, 17 January 2019 (UTC)
@Barkeep49: See my comment above in case it got lost; I think we should be a little more careful with the categories that pertain to derivative works like films. While some of those works might be in scope, there's a high enough false-positive rate that I don't think a bot determination is safe. If we pare down the list a bit more, I'll feel more comfortable restarting the task. I can also have the bot revert its taggings for certain categories that we decide were mistakes (like a couple of the ones mentioned above)—this has happened before, so I'm somewhat used to it and it's not a problem. — Earwig talk 03:25, 17 January 2019 (UTC)
Just an update that this newsletter has been requested to go out and so hopefully I'll be able to get some help with this update soon. Best wishes, Barkeep49 (talk) 18:17, 13 February 2019 (UTC)

@Barkeep49: So, I finished going through the bot's tagging and have reverted what I consider mistagged (by category, primarily non-written works or people/books with only dubious connections to children). This leaves about 4000 of the original 5000 taggings (for the first half of the category list). While idly spot-checking afterwards, I found unreverted yet questionable examples like Rush Limbaugh and Laura Bush that came from a category I hadn't thought to re-check: American children's writers. The problem is that often cats are used for non-defining classification, which isn't necessarily unreasonable—those people have published books for children—but I think you would agree that they aren't well known enough for that to place them within the project's scope? Maybe I am wrong, but it's enough that I'm nervous to rerun the bot, even on the new doubly reduced list. Hmm... — Earwig talk 04:47, 4 March 2019 (UTC)

The Earwig I would agree we should have Rush Limbaugh and Laura Bush tagged and the issue of people who've sometimes written for children but not always (e.g. Gaiman) certainly caused concern the first time through. Where does that leave things then? Best, Barkeep49 (talk) 04:53, 4 March 2019 (UTC)
I'm not sure. Some cats in the list should definitely be fine, if they exclusively contain in-scope works of literature, like Polish children's novels. I don't have a problem running the bot on these. In contrast, I don't feel comfortable running "Works based on"-type categories because these are often in other genres and only tenuously related (and the pages that are in-scope usually fall under another category anyway), so I'll probably remove these. Unfortunately that still leaves about 2/3 of the list. I'm not sure what to do with articles about people, which is a large number of them. I'm wondering if there is a reliable semi-automated test to decide whether a person is in-scope? I'm thinking of looking to see whether the article lead mentions "children", but I'm not sure how well this will work. — Earwig talk 05:05, 4 March 2019 (UTC)
The Earwig For the categories which are troublesome are you able to just have the bot log where it would tag? I would then go through and remove the big red flags. In spot checking the first 50 A's in that category the hit rate was very high (only possible question marks would be Britt Allcroft E.J. Altbacker and Aubrey Ankrum and no clear cut nos like Limbaugh or Bush). Now that's for everyone so it includes people already tagged. Presumably the error rate for untagged people would be higher but in an essential category like American children's writers I really am wondering if it would be within a margin the project would find OK, especially as they will get rated (most of the activity that happens on the project is article assessment at the moment). Best, Barkeep49 (talk) 05:24, 4 March 2019 (UTC)
I can definitely do that. I'll follow up over the next day or so. — Earwig talk 05:26, 4 March 2019 (UTC)
Sorry that took so long, Barkeep49. I updated User:The Earwig/Sandbox/Children's Lit with the full list of untagged/unprocessed pages after running the bot through another 50 categories. — Earwig talk 07:49, 17 March 2019 (UTC)

Nomination for deletion of Template:List of crambid genera

Template:List of crambid genera has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Zackmann (Talk to me/What I been doing) 21:33, 19 March 2019 (UTC)

The Signpost: 31 March 2019

EarwigBot not working

It hasn't edited for 3 days (I noticed it wasn't working when task 3 (creating AfC categories) wasn't running). Just wanted to let you know in case you weren't already aware. Thanks, --DannyS712 (talk) 04:15, 14 April 2019 (UTC)

Thanks for letting me know. It should be back up now, and I think I've fixed the auto-restart so this should be prevented in the future. — Earwig talk 06:19, 14 April 2019 (UTC)
@The Earwig: It still hasn't edited yet... --DannyS712 (talk) 06:20, 14 April 2019 (UTC)
It's not supposed to yet. The AFC status page gets updated hourly, and the category creation runs nightly at 00:00 UTC. — Earwig talk 06:41, 14 April 2019 (UTC)
Oh, okay. --DannyS712 (talk) 06:43, 14 April 2019 (UTC)

The Signpost: 30 April 2019

ArbCom 2019 special circular

Icon of a white exclamation mark within a black triangle
Administrators must secure their accounts

The Arbitration Committee may require a new RfA if your account is compromised.

View additional information

This message was sent to all administrators following a recent motion. Thank you for your attention. For the Arbitration Committee, Cameron11598 02:49, 4 May 2019 (UTC)

Administrator account security (Correction to Arbcom 2019 special circular)

ArbCom would like to apologise and correct our previous mass message in light of the response from the community.

Since November 2018, six administrator accounts have been compromised and temporarily desysopped. In an effort to help improve account security, our intention was to remind administrators of existing policies on account security — that they are required to "have strong passwords and follow appropriate personal security practices." We have updated our procedures to ensure that we enforce these policies more strictly in the future. The policies themselves have not changed. In particular, two-factor authentication remains an optional means of adding extra security to your account. The choice not to enable 2FA will not be considered when deciding to restore sysop privileges to administrator accounts that were compromised.

We are sorry for the wording of our previous message, which did not accurately convey this, and deeply regret the tone in which it was delivered.

For the Arbitration Committee, -Cameron11598 21:04, 4 May 2019 (UTC)

Question about copyvio detector functioning

Howdy - I just happened upon some startling behaviour in the copyvio detector, and wanted to ask whether this is a known thing or a fluke. Draft:Nathaniel Bartlett comes out squeaky-clean [2], but when running the tool on the identical draft once it was moved to mainspace, it finds the full-page copyvio [3]. The difference here must to be the AfC header, I guess... is that known behaviour? If the AfC header has this capacity to throw off copyvio detection, maybe it would be worth thinking about a function to strip it from an article before comparison? After all, AfC is probably one of the heaviest users of the tool - bit of a scary scenario. Cheers --Elmidae (talk · contribs) 22:01, 9 May 2019 (UTC)

Hi Elmidae. The AfC header does not make a difference here—we already strip out templates from the article text before we start looking for matches. (The exact article text you see on the results page is what we try to find copies of, and in this case, you can see that neither include the template.) However, there is another difference: the one were we missed the violation has its categories rendered as normal wikilinks (prefixed with colons), and this makes them show up in the article text when normally they wouldn't. Because of an unlucky sequence of events, this is enough for us to fail to find the correct source. If you're interested in a more detailed explanation why, I've written up one below, but the main takeaway should be that this kind of outcome is always a risk because of how the tool works, but in general it should be uncommon enough that the tool remains useful.
For the full explanation, I'll need to go into a bit of detail about how the tool finds possible sources. The problem it's trying to solve is that we have a large string of text and we need to query a search engine with that text to search for exact (or very close) matches. We can't paste the entire article into Google, because Google doesn't accept strings that large and it would miss cases where sentences are added or rearranged. Instead, we divide the article into chunks of text (about sentence length, 10-20 words), and search for each chunk independently, the idea being that at least one of them should be a near-verbatim copy of the plagiarized source (if one exists) and will give a hit. But the problem is that we can't search for every single chunk in a particular article, because an article might have hundreds of sentence-sized chunks of text, and Google limits the number of searches we can make per day, so we can only make up to 8 searches per article. This means we have the task of selecting about 8 representative sentences from throughout the article in hopes that at least one of them will contain the violation, if one is present. (We do this by picking a sentence from the start of the article, then the end, then the middle, then around the 25% mark, and so on, until we run out of text or reach 8 chunks.) For articles that are heavily copied, the odds of this working out are quite good, but we sometimes get very unlucky, like we did here. Because those wikilinks added text to the end of the article, our algorithm ended up picking 8 chunks for which not a single one returned the correct match in Google. (If you're curious what it searched for, I've reproduced below.)
Extended content

Violation found:

  1. Nathaniel Bartlett (April 22, 1727 – January 11, 1810), pastor of the Congregational Church of Redding, Connecticut during the
  2. The History of Redding, Connecticut Puritan Protagonist- President Thomas Clap of Yale College University of North Carolina
  3. The Bartlett family, however, was firmly united in support of the American cause.
  4. Jonathan Bartlett (1764–1858) served as co-pastor with his father for a few years, but resigned due to ill health prior to his
  5. (Russell Bartlett was living in Cooperstown, Otsego County, New York, and Daniel Collins Bartlett was living in Amenia, Dutchess
  6. 1753-1810, was one of the numerous Colonial American clergymen who played an active role during the American Revolution.
  7. of Kraus-Thompson Organization Ltd.
  8. animosity between neighbors in so small a community, and no doubt many families experienced divided loyalties as well.

Violation missed:

  1. Nathaniel Bartlett (April 22, 1727 – January 11, 1810), pastor of the Congregational Church of Redding, Connecticut during the
  2. the American Revolution :Category:American Revolution chaplains
  3. In addition to verbal assaults on the enemy, Bartlett supported the war effort by officiating as Military Chaplain to
  4. The Rev.
  5. Upon his death in 1858, the Rev.
  6. 1753-1810, was one of the numerous Colonial American clergymen who played an active role during the American Revolution.
  7. Congregationalist ministers :Category:People from Guilford, Connecticut :Category:Yale Divinity School alumni :Category:Clergy
  8. Army General Israel Putnam's Division during their encampment in Redding the winter of 1778/79.
Thinking about this further, I believe the tool should have stripped out the disabled category links as well, because despite being "article text", they should not appear in any sources. This is something I can add in the future. However, it's important to keep in mind that because of how the chunking logic works, as long as we don't search for every chunk, and we can't, there's always a chance that we could miss the violation. Something to keep in mind. Thanks. — Earwig talk 04:04, 10 May 2019 (UTC)
Thank you, that's both informative and interesting! So in essence, it's a bit of potluck of whether a given selection of chunks contains detectable material; and a random frame-shift mutation (e.g. by adding a few lines of category text) may result in a selection that registers entirely differently. That's heuristics for you, I guess :) Cheers --Elmidae (talk · contribs) 14:08, 10 May 2019 (UTC)