User talk:The Earwig/Archive 10
This is an archive of past discussions about User:The Earwig. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 5 | ← | Archive 8 | Archive 9 | Archive 10 | Archive 11 | Archive 12 | → | Archive 15 |
The Signpost: 21 May 2014
- News and notes: "Crisis" over Wikimedia Germany's palace revolution
- Featured content: Staggering number of featured articles
- Traffic report: Doodles' dawn
AN
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- I am going to take the unusual step (I hope The Earwig doesn't mind) of closing a discussion thread on another user's talk page. The AN thread that this relates to has also been closed. Sven Manguard Wha? 06:02, 25 May 2014 (UTC)
This message is being sent to inform you that there is currently a discussion at Wikipedia:Administrators' noticeboard regarding an issue with which you may have been involved. Thank you.--Mark Miller (talk) 03:04, 25 May 2014 (UTC)
- Apparently you can do as you please as I guess you hosted the bot outside of Wikipedia policy and don't need a consensus to shut the bot down. Cool. I have no idea what is going on or what pissed you off, but if its your ball you can keep it.--Mark Miller (talk) 03:32, 25 May 2014 (UTC)
- Hi Mark. Please read the message I just posted on AN; you are mistaken. — Earwig talk 03:34, 25 May 2014 (UTC)
- Different words that mean the same thing. Yes, you operated the bot outside of Wikipedia Policy and consensus. As I understand it, as the "operator" that means you host it. If you didn't host it you would be allowed to pull the plug without a consensus, but would have to give another editor a chance to take it and host it without a disturbance or "disruption" of the project. Thanks for the disruption. Admin get away with so much it makes my head spin. --Mark Miller (talk) 03:41, 25 May 2014 (UTC)
@Mark Miller: Please do not abuse volunteers. You are mistaken on every level. Johnuniq (talk) 04:59, 25 May 2014 (UTC) −
- (edit conflict) I have no idea what happened but from what I am being told, you deserve a thank you for operating the bot against the very criticism of others and the need to adjust the bot to conform to the ever changing needs of new volunteers. So, thank you. Now, however. i think it is up to others to decide if it is worth even having a bot. In a way, what you did could be seen by some as a blessing. Many have been suggesting that DRN is too complicated and with this current situation I have to admit they may be right. If one person getting ticked off is enough to shut down a bot.....I am for a simpler Noticeboard. I am only pissed off that you just freaked out and stopped everything over some comment someone made. That is your right and i will certainly defend you over it, but i am also very critical of you for taking whatever was said so persoanlly that you just stopped everything knowing that it would be a tad disruptive. Again, thank you for operating the bot for so long. I, myself do not believe you should be asked to change your mind or come back. You decision should stand and be respected for whatever reason you decided to stop. At least you can say you tried.--Mark Miller (talk) 05:01, 25 May 2014 (UTC)
- There seems to be a bit of confusion on your part. Stopping it was disruptive? No, keeping it running was disruptive. There were too many complaints regarding the bot and I don't have time due to other obligations to take on that maintenance burden. If a tool works some of the time but frequently causes problems that clearly affect people and the creator is unable to fix them, would you be upset if he decided to let go of it? You have come into this situation believing that I have an obligation to run this bot, but EarwigBot had a very minor role in the DRN process as a whole. If anything, the DRN volunteers had been asking for the bot to do less over the years, not more. — Earwig talk 05:10, 25 May 2014 (UTC)
- Now that I can understand. Yes, I have been with DRN long enough that i think I have even contacted you over an issue once. Can't remember what that was about. When i say this situation is disruptive, I mean it is disruptive to the project's ability to mediate disputes. but do we really need a bot to do that. the answer is a clear no. it only disrupts what Steven has put together with your help. Again...I have no idea what happened, just caught the tail end of the discussion. Every admin that has weighed in has been clear that bot operators do host the bost themselves. You ability to stop is no longer in question. Yes, I do question your professionalism, but your responses and level headiness in commenting back tyo me has been a good demonstration that you are a professional person. So I can only assume 9 seriously) that you have had to put up with a lot of changes and crap from editors. I doubt this will change my opinion about how bots work in a collaborative environment like Wikipedia...but that is just my opinion. IUt means very little. Trust me....no one is going to care what I think. Pretty sure that was just proven today. LOL! (why an I lauging? I suppose to lighten the moment) Have a good night.--Mark Miller (talk) 05:34, 25 May 2014 (UTC)
- Ya know...we started the movie "The Monuments Men" when I first discovered the discussion on DRN. And the credits are rolling now. I hope the movie sucked because I missed all of it. :-)--Mark Miller (talk) 05:36, 25 May 2014 (UTC)
- Now that I can understand. Yes, I have been with DRN long enough that i think I have even contacted you over an issue once. Can't remember what that was about. When i say this situation is disruptive, I mean it is disruptive to the project's ability to mediate disputes. but do we really need a bot to do that. the answer is a clear no. it only disrupts what Steven has put together with your help. Again...I have no idea what happened, just caught the tail end of the discussion. Every admin that has weighed in has been clear that bot operators do host the bost themselves. You ability to stop is no longer in question. Yes, I do question your professionalism, but your responses and level headiness in commenting back tyo me has been a good demonstration that you are a professional person. So I can only assume 9 seriously) that you have had to put up with a lot of changes and crap from editors. I doubt this will change my opinion about how bots work in a collaborative environment like Wikipedia...but that is just my opinion. IUt means very little. Trust me....no one is going to care what I think. Pretty sure that was just proven today. LOL! (why an I lauging? I suppose to lighten the moment) Have a good night.--Mark Miller (talk) 05:34, 25 May 2014 (UTC)
- There seems to be a bit of confusion on your part. Stopping it was disruptive? No, keeping it running was disruptive. There were too many complaints regarding the bot and I don't have time due to other obligations to take on that maintenance burden. If a tool works some of the time but frequently causes problems that clearly affect people and the creator is unable to fix them, would you be upset if he decided to let go of it? You have come into this situation believing that I have an obligation to run this bot, but EarwigBot had a very minor role in the DRN process as a whole. If anything, the DRN volunteers had been asking for the bot to do less over the years, not more. — Earwig talk 05:10, 25 May 2014 (UTC)
- (edit conflict) I have no idea what happened but from what I am being told, you deserve a thank you for operating the bot against the very criticism of others and the need to adjust the bot to conform to the ever changing needs of new volunteers. So, thank you. Now, however. i think it is up to others to decide if it is worth even having a bot. In a way, what you did could be seen by some as a blessing. Many have been suggesting that DRN is too complicated and with this current situation I have to admit they may be right. If one person getting ticked off is enough to shut down a bot.....I am for a simpler Noticeboard. I am only pissed off that you just freaked out and stopped everything over some comment someone made. That is your right and i will certainly defend you over it, but i am also very critical of you for taking whatever was said so persoanlly that you just stopped everything knowing that it would be a tad disruptive. Again, thank you for operating the bot for so long. I, myself do not believe you should be asked to change your mind or come back. You decision should stand and be respected for whatever reason you decided to stop. At least you can say you tried.--Mark Miller (talk) 05:01, 25 May 2014 (UTC)
Sorry for taking you to AN over the situation that you had a right to do. While I have trust issues, that is my problem not yours. You seem like a pretty level headed editor and I my not knowing how bots work is for me to learn and not for others to teach.--Mark Miller (talk) 19:09, 25 May 2014 (UTC)
- It's okay. No hard feelings. — Earwig talk 19:12, 25 May 2014 (UTC)
Copyvio Tool Not Compatible with sr.wikipedia?
https://toolserver.org/~earwig/copyvios?lang=sr&project=wikipedia&title=22._%D0%BC%D0%B0%D1%98&url= --JustBerry (talk) 20:06, 25 May 2014 (UTC)
- I already got your message on IRC, thanks. — Earwig talk 20:09, 25 May 2014 (UTC)
Request for comment
Hello there, a proposal regarding pre-adminship review has been raised at Village pump by Anna Frodesiak. Your comments here is very much appreciated. Many thanks. Jim Carter through MediaWiki message delivery (talk) 06:47, 28 May 2014 (UTC)
The Signpost: 28 May 2014
- News and notes: The English Wikipedia's second featured-article centurion; wiki inventor interviewed on video
- Featured content: Zombie fight in the saloon
- Traffic report: Get fitted for flipflops and floppy hats
- Recent research: Predicting which article you will edit next
The Signpost: 04 June 2014
- News and notes: Two new affiliate-selected trustees
- Featured content: Ye stately homes of England
- In the media: Reliable or not, doctors use Wikipedia
- Traffic report: Autumn in summer
The Signpost: 11 June 2014
- News and notes: PR agencies commit to ethical interactions with Wikipedia
- Traffic report: The week the wired went weird
- Paid editing: Does Wikipedia Pay? The Moderator: William Beutler
- Special report: Questions raised over secret voting for WMF trustees
- Featured content: Politics, ships, art, and cyclones
The Signpost: 18 June 2014
- News and notes: With paid advocacy in its sights, the Wikimedia Foundation amends their terms of use
- Featured content: Worming our way to featured picture
- Special report: Wikimedia Bangladesh: a chapter's five-year journey
- Traffic report: You can't dethrone Thrones
- WikiProject report: Visiting the city
Sunday July 6: WikNYC Picnic
Sunday July 6: WikNYC Picnic | |
---|---|
You are invited to join us the "picnic anyone can edit" in Central Park, as part of the Great American Wiknic celebrations being held across the USA. Remember it's a wiki-picnic, which means potluck.
Also, before the picnic, you can join in the Wikimedia NYC chapter's annual meeting.
We hope to see you there!--Pharos (talk) 16:51, 28 June 2014 (UTC) |
(You can unsubscribe from future notifications for NYC-area events by removing your name from this list.)
The Signpost: 25 June 2014
- News and notes: US National Archives enshrines Wikipedia in Open Government Plan
- Traffic report: Fake war, or real sport?
- Exclusive: "We need to be true to who we are": Foundation's new executive director speaks to the Signpost
- Discussion report: Media Viewer, old HTML tags
- Featured content: Showing our Wörth
- WikiProject report: The world where dreams come true
- Recent research: Power users and diversity in WikiProjects
The Signpost: 02 July 2014
- In the media: Wiki Education; medical content; PR firms
- Traffic report: The Cup runneth over... and over.
- News and notes: Wikimedia Israel receives Roaring Lion award
- Featured content: Ship-shape
- WikiProject report: Indigenous Peoples of North America
- Technology report: In memoriam: the Toolserver (2005–14)
The Signpost: 09 July 2014
- Special report: Wikimania 2014—what will it cost?
- Wikimedia in education: Exploring the United States and Canada with LiAnna Davis
- Featured content: Three cheers for featured pictures!
- News and notes: Echoes of the past haunt new conflict over tech initiative
- Traffic report: World Cup, Tim Howard rule the week
An update for the Tool
Hi,
Please do include HTTPS version of the sites to the exclude list of the tool.
Because right now it gives result like this:
https://fa.wikipedia.org/* is a suspected violation of fa.Wikipedia.org/*
!
OK?
Thank you, Regards, KhabarNegar Talk 09:03, 13 July 2014 (UTC)
- The same for Wikinews,
- Thank you, KhabarNegar Talk 09:04, 13 July 2014 (UTC)
- @KhabarNegar: Can you give me a specific page where this happens? Copy a URL where the tool gives a bad result. It shouldn't be happening (there's a check for HTTPS already) and I can't reproduce the issue so I'm pretty confused. — Earwig talk 04:08, 14 July 2014 (UTC)
- I think it was a mistake from my side, because I thought if I gave URL it helps the tool, but now I read the above text and understand what URL(optional) means there, sorry :). [1]... But, by the way, this page which I gave to the tool is somehow copy pasted from some websites but the tool don't show any confidence. I use the same tools online it shows 18 websites which the material is copied from, anyway I will try this very useful tool more, again & again and give you my results.
- Thank you very much,KhabarNegar Talk 04:50, 14 July 2014 (UTC)
The Signpost: 16 July 2014
- Special report: $10 million lawsuit against Wikipedia editors withdrawn, but plaintiff intends to refile
- Traffic report: World Cup dominates for another week
- Wikimedia in education: Serbia takes the stage with Filip Maljkovic
- Featured content: The Island with the Golden Gun
The Signpost: 23 July 2014
- Wikimedia in education: Education program gaining momentum in Israel
- Traffic report: The World Cup hangs on, though tragedies seek to replace it
- News and notes: Institutional media uploads to Commons get a bit easier
- Featured content: Why, they're plum identical!
The Signpost: 30 July 2014
- Book review: Knowledge or unreality?
- Recent research: Shifting values in the paid content debate
- News and notes: How many more hoaxes will Wikipedia find?
- Wikimedia in education: Success in Egypt and the Arab World
- Traffic report: Doom and gloom vs. the power of Reddit
- Featured content: Skeletons and Skeltons
Sunday August 17: NYC Wiki-Salon and Skill Share
Sunday August 17: NYC Wiki-Salon and Skill Share | |
---|---|
You are invited to join the the Wikimedia NYC community for our upcoming wiki-salon and knowledge-sharing workshop on the Upper West Side of Manhattan.
Afterwards at 5pm, we'll walk to a social wiki-dinner together at a neighborhood restaurant (to be decided). We hope to see you there!--Pharos (talk) 15:58, 4 August 2014 (UTC) |
(You can unsubscribe from future notifications for NYC-area events by removing your name from this list.)
The Signpost: 06 August 2014
- Technology report: A technologist's Wikimania preview
- Traffic report: Ebola
- Featured content: Bottoms, asses, and the fairies that love them
- Wikimedia in education: Leading universities educate with Wikipedia in Mexico
The Signpost: 13 August 2014
- Special report: Twitter bots catalogue government edits to Wikipedia
- Traffic report: Disease, decimation and distraction
- Wikimedia in education: Global Education: WMF's Perspective
- Wikimania: Promised the moon, settled for the stars
- News and notes: Media Viewer controversy spreads to German Wikipedia
- In the media: Monkey selfie, net neutrality, and hoaxes
- Featured content: Cambridge got a lot of attention this week
The Signpost: 20 August 2014
- Traffic report: Carpe diem, quam minimum credula postero
- WikiProject report: Bats and gloves
- Op-ed: A new metric for Wikimedia
- Featured content: English Wikipedia departs for Japan
The Signpost: 27 August 2014
- In the media: Plagiarism and vandalism dominate Wikipedia news
- News and notes: Media Viewer—Wikimedia's emotional roller-coaster
- Traffic report: Viral
- Featured content: Cheats at Featured Pictures!
"Earwig's Copyvio Detector" Automation
A recent discussion at WT:AFC had the output of an idea of procedurally submitting all the pending afc submissions over a certain pending age to the detector and getting back the Spam/NotSpam and percentage likelyhood counts to be placed in a userspace page as a burndown log. Do you see any problems with this before I propose the task at Bot Operator's noticeboard? Hasteur (talk) 22:49, 3 September 2014 (UTC)
- @Hasteur: Hmm... that would be fine, just make sure to not send requests too frequently (I'd suggest waiting at least ten seconds after the previous check has completed). It wouldn't break the tool or anything, but it would slow it down for anyone else who is trying to use it. Also, I would suggest not checking a single page more than once unless its content has changed significantly. The detector has no API yet, so it'd be hard for other bot devs to write the task, but an API is next on my list after finishing this issue. Once I'm done with it, I'll let you know, and you can request it.
- Alternatively, a bot dev can install earwigbot, get an API key for Yahoo! BOSS through Coren, and run the checks themselves, but that might be more trouble than its worth. On the other hand, they could configure it to spend more/less time doing checks and they don't have to worry about web tool downtimes or whatever. I have a bot task that might be useful, but note that it'll need to be updated as soon as I finish the aforementioned issue. — Earwig talk 23:30, 3 September 2014 (UTC)
Copyvios options
It appears that the Copyvios tool looks for similarities in 3-word strings. Lots of false positives with that setting. Is possible to adjust this or can you raise it to 5-words? ~KvnG 18:34, 4 September 2014 (UTC)
- @Kvng: Hm. False positives are expected with short phrases that are part of common speech, but they should be infrequent enough that the overall confidence value remains low. I'm not concerned with a few false positives being shown in the comparison view, but if you have an example where it suspected a violation was present when it shouldn't have, then I might change my mind. It would be easy to change the n-gram size, but I would prefer to keep it small since using five words is more likely to miss cases where parts of sentences are reordered, etc. — Earwig talk 23:36, 4 September 2014 (UTC)
- The tool is down at the moment. I can give you some examples once the tool is running again. ~KvnG 14:14, 5 September 2014 (UTC)
- Gah. Try now. — Earwig talk 15:11, 5 September 2014 (UTC)
- Have a look at this. I'll bring more as I find them. ~KvnG 23:19, 5 September 2014 (UTC)
- To be honest, I don't see a problem with that one. There are definitely some suspiciously similar sentences that you could argue is close paraphrasing. The tool only gives it 50% confidence, which seems reasonable. — Earwig talk 23:28, 5 September 2014 (UTC)
- Have a look at this. I'll bring more as I find them. ~KvnG 23:19, 5 September 2014 (UTC)
- Gah. Try now. — Earwig talk 15:11, 5 September 2014 (UTC)
- The tool is down at the moment. I can give you some examples once the tool is running again. ~KvnG 14:14, 5 September 2014 (UTC)
The Signpost: 03 September 2014
- Arbitration report: Media viewer case is suspended
- Featured content: 1882 × 5 in gold, and thruppence more
- Traffic report: Holding Pattern
- WikiProject report: Gray's Anatomy (v. 2)
The Signpost: 10 September 2014
- Traffic report: Refuge in celebrity
- Featured content: The louse and the fish's tongue
- WikiProject report: Checking that everything's all right
The Signpost: 17 September 2014
- WikiProject report: A trip up north to Scotland
- News and notes: Wikipedia's traffic statistics are off by nearly one-third
- Traffic report: Tolstoy leads a varied pack
- Featured content: Which is not like the others?
The Signpost: 24 September 2014
- Featured content: Oil paintings galore
- Recent research: 99.25% of Wikipedia birthdates accurate; focused Wikipedians live longer; merging WordNet, Wikipedia and Wiktionary
- Traffic report: Wikipedia watches the referendum in Scotland
- WikiProject report: GAN reviewers take note: competition time
- Arbitration report: Banning Policy, Gender Gap, and Waldorf education
The Signpost: 01 October 2014
- From the editor: The Signpost needs your help
- Dispatches: Let's get serious about plagiarism
- WikiProject report: Animals, farms, forests, USDA? It must be WikiProject Agriculture
- Traffic report: Shanah Tovah
- Featured content: Brothers at War
Copyvios tool whitescreens
All today so far I get a white screen at http://tools.wmflabs.org/copyvios/ Fiddle Faddle 11:28, 8 October 2014 (UTC)
- Wikimedia Labs had an outage, so there was nothing I could do realistically. Should be fixed now, though. — Earwig talk 17:06, 8 October 2014 (UTC)
The Signpost: 08 October 2014
- In the media: Opposition research firm blocked; Australian bushfires
- Featured content: From a wordless novel to a coat of arms via New York City
- Traffic report: Panic and denial
- Technology report: HHVM is the greatest thing since sliced bread
Failure to find better copyvio
Hi. I use your copyvio detector a bunch – thanks for this great tool!
Occasionally, it seems to find a relatively low percentage-match page, when I feel there must be a better match out there. I ran one of them down today:
The violating text is at User:AlanM1/CVSample. If asked to search the tool finds a 76.5% match. However, a search for some of the text with Google yielded this much better match, which compared using the tool, yields a 99.8% match. Shouldn't the tool have found this page as well?
Thanks again. —[AlanM1(talk)]— 08:20, 13 October 2014 (UTC)
- @AlanM1: Hey, thanks for the report. I noticed two things: first, the tool wasn't always letting the user know when it was skipping possible matches (it finishes the check early when it encounters a source with ≥75% confidence, as you might have noticed). I just fixed that, so in the future you should see the "do a complete check" link whenever a search finishes early. As for the specific case you mentioned, I looked into it pretty carefully. The tool uses an algorithm to split the article text into searchable queries; for the given page, it creates the following ten:
Extended content
|
---|
|
- Many of these return http://license.cdesk.in/internal.aspx as a result in Google (example), but not in Yahoo (example). The fact is, Yahoo (which uses Bing as its backend now, which also lacks the page in its search results) is not as good of a search engine as Google; its index isn't as large and it doesn't seem to know about that URL. Since Yahoo is the one providing the WMF with access to its search engine and not Google, I don't think there's much I can do about this. I suspect other cases where a "better match is out there" are because of this reason. Sadly, we have to accept the tradeoff of a worse search engine in exchange for automation. — Earwig talk 06:33, 14 October 2014 (UTC)
- FYI, the phrase I searched was "Special Features for location based IT managers and all location IT managers", which results in just the one hit on Google and no hits on Bing or Yahoo.
I modified the page slightly to force a new search and got the new Do a complete check link. When I click on it, though, you report the following error after about 5 seconds:"An error occurred while using the search engine (Yahoo! BOSS Error: HTTP Error 500: Internal Server Error). Try reloading the page. If the error persists, repeat the check without using the search engine." This error was apparently temporary.
Out of curiosity, has Google been asked recently whether they can be used (and would that be a simple change)? On the face of it, this would seem to be an unlikely edge case, but then I've had the hunch before, usually from the writing quality being more professional/marketing-speak than even well-written Wikipedian :) —[AlanM1(talk)]— 02:25, 15 October 2014 (UTC) (edited) —[AlanM1(talk)]— 02:29, 15 October 2014 (UTC)
- (edit conflict)
I just tried it and it seems to work fine (Yahoo's 500 errors like that seem to be intermittent and not related to the actual query), but of course it's not returning the result we want.Supporting Google wouldn't be too hard if they allowed us to (I'd have to figure out their API); no one's spoken to them recently as far as I know, but I have little hope they'll change their mind based on previous attempts. — Earwig talk 02:34, 15 October 2014 (UTC)
- (edit conflict)
- FYI, the phrase I searched was "Special Features for location based IT managers and all location IT managers", which results in just the one hit on Google and no hits on Bing or Yahoo.
The Signpost: 15 October 2014
- Op-ed: Ships—sexist or sexy?
- Arbitration report: One case closed and two opened
- Featured content: Bells ring out at the Temple of the Dragon at Peace
- Technology report: Attempting to parse wikitext
- Traffic report: Now introducing ... mobile data
- WikiProject report: Signpost reaches the Midwest
Copyvios testing against itself and dupdet
I was going through my CSD log and found a couple pages I marked as CVs that hadn't been deleted, so I decided to check on them and see if they were still violations. I came across the Copyvios report for Draft:Brian Kennedy (Businessman) and noticed that the tool checks against itself and against a comparison created by dupdet. This probably shouldn't happen because at a quick glance without trying to read the looooong URL provided as a match it looks like a 99.8% match/violation. Thought I'd bring it to your attention and let you deal with it as you see fit. :) — {{U|Technical 13}} (e • t • c) 11:45, 17 October 2014 (UTC)
- Haha, holy crap, never thought I would see something like this! Seems Yahoo indexed that search result and it got confused. I'll add the
toolserver(I mean Tool Labs... whoops) to the URL exclusion list. — Earwig talk 15:32, 17 October 2014 (UTC)
The Signpost: 22 October 2014
- Featured content: Admiral on deck: a modern Ada Lovelace
- Traffic report: Death, War, Pestilence... Movies and TV
- WikiProject report: De-orphanning articles—a huge task but with a huge team of volunteers to help
Suggestion for improvement of the Copyvio Detector
I noticed that the Copyvio Detector search results include websites such as Wikia and Mashpedia. Mashpedia is a Wikipedia mirror, while much of Wikia's content is freely licensed (CC-by-SA). Would it be suitable to include (or exclude) these websites in the search results? Jarble (talk) 18:14, 27 October 2014 (UTC)
- Okay, I added them to User:EarwigBot/Copyvios/Exclusions for you. Thanks! — Earwig talk 19:08, 27 October 2014 (UTC)
The Signpost: 29 October 2014
- Featured content: Go West, young man
- In the media: Wikipedia a trusted source on Ebola; Wikipedia study labeled government waste; football biography goes viral
- Maps tagathon: Find 10,000 digitised maps this weekend
- Traffic report: Ebola, Ultron, and Creepy Articles
The Signpost: 05 November 2014
- In the media: Predicting the flu, MH17 conspiracy theories
- Traffic report: Sweet dreams on Halloween
Copyvio detection
Looking at your code [2] it seems like you are building Markov chains with very little data, and for the complete article, is that right? The problem as I see it is that describing the same entity will lead to the same language constructs, and using the whole article will imply using old text which may have propagated to external sites. That leads to a very high error rate, or low confidence. Do you have any estimates for the confidence intervals? As I see it a trustworthy copyvio detector can only use the most resent edits, it seems to be more in the range of minutes than hours, and definitely not days, and it must check if the same language construct is used by only a specific site. If the construct is used by several sites it is not usable as a hint for copyvio detection. That is the more confidence you get out of the Markov chain the less likely it is that something that looks like a copy violation is in fact just that. Or is it something I misunderstood in your code? Jeblad (talk) 04:17, 12 November 2014 (UTC)
- I'm not clear if your understanding of the code is correct, to be honest. Yes, Markov chains are formed for the article and each suspected source, and then they are compared. However, there is a difference between describing an entity similarly and outright copying text from elsewhere. In the latter instance, there will be a high frequency of duplicated phrases that you would not expect in the former. The only reason I can think of why two unrelated descriptions would appear similar is due to error when an entity might have a long name (or some particular short phrases, etc) that tends to be replicated in many places, and I admit the algorithm is not able to handle this well (there is room for improvement here), but I still think in most cases it is not going to affect confidence that much. Even if it does, one can recognize this when reviewing the comparison, and it should make one question why the article includes so many stock phrases/quotes in the first place (even if they are not technically plagiarized). Regarding recency: plenty of copyvios go undetected for a long period of time. While mirrors do need some time to catch up, merely being old does not make suspected sources mirrors. Regarding multiple sources: sometimes a single website will have multiple pages with the same copied content, or a (non-public domain) PDF will be widely disseminated and hosted from many sites. I don't think the mere existence of multiple webpages with the same content means that they are all mirrors. Instead, mirrors should be tracked (added to User:EarwigBot/Copyvios/Exclusions or Wikipedia:Mirrors and forks), which eliminates this concern if done correctly. — Earwig talk 08:33, 12 November 2014 (UTC)
The Signpost: 12 November 2014
- In the media: Amazon Echo; EU freedom of panorama; Bluebeard's Castle
- Traffic report: Holidays, anyone?
- Featured content: Wikipedia goes to church in Lithuania
- WikiProject report: Talking hospitals
Copyright checks when performing AfC reviews
Hello The Earwig. This message is part of a mass mailing to people who appear active in reviewing articles for creation submissions. First of all, thank you for taking part in this important work! I'm sorry this message is a form letter – it really was the only way I could think of to covey the issue economically. Of course, this also means that I have not looked to see whether the matter is applicable to you in particular.
The issue is in rather large numbers of copyright violations ("copyvios") making their way through AfC reviews without being detected (even when easy to check, and even when hallmarks of copyvios in the text that should have invited a check, were glaring). A second issue is the correct method of dealing with them when discovered.
If you don't do so already, I'd like to ask for your to help with this problem by taking on the practice of performing a copyvio check as the first step in any AfC review. The most basic method is to simply copy a unique but small portion of text from the draft body and run it through a search engine in quotation marks. Trying this from two different paragraphs is recommended. (If you have any question about whether the text was copied from the draft, rather than the other way around (a "backwards copyvio"), the Wayback Machine is very useful for sussing that out.)
If you do find a copyright violation, please do not decline the draft on that basis. Copyright violations need to be dealt with immediately as they may harm those whose content is being used and expose Wikipedia to potential legal liability. If the draft is substantially a copyvio, and there's no non-infringing version to revert to, please mark the page for speedy deletion right away using {{db-g12|url=URL of source}}. If there is an assertion of permission, please replace the draft article's content with {{subst:copyvio|url=URL of source}}.
Some of the more obvious indicia of a copyvio are use of the first person ("we/our/us..."), phrases like "this site", or apparent artifacts of content written for somewhere else ("top", "go to top", "next page", "click here", use of smartquotes, etc.); inappropriate tone of voice, such as an overly informal tone or a very slanted marketing voice with weasel words; including intellectual property symbols (™,®); and blocks of text being added all at once in a finished form with no misspellings or other errors.
I hope this message finds you well and thanks again you for your efforts in this area. Best regards--Fuhghettaboutit (talk) 02:20, 18 November 2014 (UTC).
Sent via--MediaWiki message delivery (talk) 02:20, 18 November 2014 (UTC)
Thursday December 4: NYC Wiki-Salon and Skill Share
Thursday December 4: NYC Wiki-Salon and Skill Share | |
---|---|
You are invited to join the the Wikimedia NYC community for our upcoming wiki-salon and knowledge-sharing workshop in Manhattan's Greenwich Village.
Afterwards at 8pm, we'll walk to a social wiki-dinner together at a neighborhood restaurant (to be decided). We hope to see you there!--Pharos (talk) 07:11, 27 November 2014 (UTC) |
(You can unsubscribe from future notifications for NYC-area events by removing your name from this list.)
The Signpost: 26 November 2014
- Featured content: Orbital Science: Now you're thinking with explosions
- WikiProject report: Back with the military historians
- Traffic report: Big in Japan
The Signpost: 03 December 2014
- In the media: Embroidery and cheese
- Featured content: ABCD: Any Body Can Dance!
- Traffic report: Turkey and a movie
- WikiProject report: Today on the island
Unintentional changes in your signature
Hello The Earwig, sorry for the unintentional changes at your signature with my latest edit here. Probably a wikEd feature/bug, or my own inability to use it properly :). GermanJoe (talk) 10:43, 12 December 2014 (UTC)
- No problem. My signature uses non-breaking spaces, but they're encoded directly as unicode characters, not HTML entities, so they look like normal spaces in the edit window. I guess wikiEd automatically replaces them? Seems like an odd feature, worth looking into. — Earwig talk 15:49, 12 December 2014 (UTC)
- Reading briefly (very briefly) through the wikEd talkpage, it seems like the editor analyses the text and re-codes it according to its own standards (and re-saves it with this standard). Usually that's OK, but rare problems occur with special characters - the talkpage contains 1-2 minor complaints about it. Just wanted to drop you a note, in case you wonder about it. GermanJoe (talk) 16:58, 12 December 2014 (UTC)
The Signpost: 10 December 2014
- Op-ed: It's GLAM up North!
- Traffic report: Dead Black Men and Science Fiction
- Featured content: Honour him, love and obey? Good idea with military leaders.
Mirrors for Copyvio Detector
Hi, using your tool with itwiki I found 3 mirrors in results:
- fatti-italiani.it
- wikideep.it
- wikipedia.sapere.virgilio.it
Could you please blacklist them? Thanks! --AlessioMela (talk) 13:50, 13 December 2014 (UTC)
- Done. — Earwig talk 20:30, 13 December 2014 (UTC)
The Signpost: 17 December 2014
- Arbitration report: Arbitration Committee election results
- Featured content: Tripping hither, tripping thither, Nobody knows why or whither; We must dance and we must sing, Round about our fairy ring!
- Traffic report: A December Lull
The Signpost: 24 December 2014
- From the editor: Looking for new editors-in-chief
- In the media: Wales on GamerGate
- Featured content: Still quoting Iolanthe, apparently.
- WikiProject report: Microsoft does The Signpost
- Traffic report: North Korea is not pleased
Happy Holidays!
Merry Christmas and a Prosperous 2015!!! | |
Hello The Earwig, may you be surrounded by peace, success and happiness on this seasonal occasion. Spread the WikiLove by wishing another user a Merry Christmas and a Happy New Year, whether it be someone you have had disagreements with in the past, a good friend, or just some random person. Sending you a heartfelt and warm greetings for Christmas and New Year 2015. Spread the love by adding {{subst:Seasonal Greetings}} to other user talk pages. |
Sent by MediaWiki message delivery (talk) on behalf of {{U|Technical 13}} to all registered users whom have commented on his talk page. To prevent receiving future messages, please follow the opt-out instructions on User:Technical 13/Holiday list
Disambiguation link notification for December 29
Hi. Thank you for your recent edits. Wikipedia appreciates your help. We noticed though that when you edited Pals For Life, you added a link pointing to the disambiguation page English. Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ • Join us at the DPL WikiProject.
It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 09:41, 29 December 2014 (UTC)
The Signpost: 31 December 2014
- News and notes: The next big step for Wikidata—forming a hub for researchers
- In the media: Study tour controversy; class tackles the gender gap
- Traffic report: Surfin' the Yuletide
- Featured content: A bit fruity
The Signpost: 07 January 2015
- In the media: ISIL propaganda video; AirAsia complaints
- Featured content: Kock up
- Traffic report: Auld Lang Syne
The Signpost: 14 January 2015
- WikiProject report: Articles for creation: the inside story
- News and notes: Erasmus Prize recognizes the global Wikipedia community
- Featured content: Citations are needed
- Traffic report: Wikipédia sommes Charlie
Copyvios tool down?
@The Earwig: When I try to use your copyvios tool, I just get a page that says: "No webservice. The URI you have requested, /copyvios/, is not currently serviced." --Ahecht (TALK
PAGE) 18:26, 21 January 2015 (UTC)
- Fixed now, thanks. — Earwig talk 22:24, 21 January 2015 (UTC)
The Signpost: 21 January 2015
- From the editor: Introducing your new editors-in-chief
- Anniversary: A decade of the Signpost
- News and notes: Annual report released; Wikimania; steward elections
- In the media: Johann Hari; bandishes and delicate flowers
- Featured content: Yachts, marmots, boat races, and a rocket engineer who attempted to birth a goddess
- Arbitration report: As one door closes, a (Gamer)Gate opens
tool down
The Copyvio tool appears to be down again. By the way, your tool is awesome. Not only has it been useful at AfC, it's really helped speed up CCI cleanup. Chris Troutman (talk) 23:29, 25 January 2015 (UTC)
- Fixed, sigh. I should look into this more carefully. Either way, thanks. — Earwig talk 23:33, 25 January 2015 (UTC)
- Thanks! Chris Troutman (talk) 23:47, 25 January 2015 (UTC)
Saturday February 7 in NYC: Black Life Matters Editathon
Saturday February 7 in NYC: Black Life Matters Editathon | |
---|---|
You are invited to join us at New York Public Library's Schomburg Center for Research in Black Culture for our upcoming editathon, a part of the Black WikiHistory Month campaign (which also includes events in Brooklyn and Westchester!).
The Wikipedia training and editathon will take place in the Aaron Douglas Reading Room of the Jean Blackwell Hutson Research and Reference Division, with a reception following in the Langston Hughes lobby on the first floor of the building at 5:00pm. We hope to see you there!--Pharos (talk) 06:03, 27 January 2015 (UTC) |
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)
copyvios tool down
I think the copyvios tool broke. :( Deunanknute (talk) 02:41, 28 January 2015 (UTC)
- Labs has been having some issues today. — Earwig talk 04:36, 28 January 2015 (UTC)
The Signpost: 28 January 2015
- From the editor: An editorial board that includes you
- In the media: A murderous week for Wikipedia
- Traffic report: A sea of faces
The Signpost: 04 February 2015
- Op-ed: Is Wikipedia for sale?
- In the media: Gamergate and Muhammad controversies continue
- Traffic report: The American Heartland
- Featured content: It's raining men!
- Arbitration report: Slamming shut the GamerGate
- WikiProject report: Dicing with death – on Wikipedia?
- Technology report: Security issue fixed; VisualEditor changes
- Gallery: Langston Hughes
you're great
great person | |
you're very good. Jakobas (talk) 22:12, 11 February 2015 (UTC) |
A barnstar for you!
The No Spam Barnstar | |
good job! Jakobas (talk) 22:19, 11 February 2015 (UTC) |
The Signpost: 11 February 2015
- From the editors: We want to know what you think!
- In the media: Is Wikipedia eating itself?
- Featured content: A grizzly bear, Operation Mascot, Freedom Planet & Liberty Island, cosmic dust clouds, a cricket five-wicket list, more fine art, & a terrible, terrible opera...
- Traffic report: Bowled over
- WikiProject report: Brand new WikiProjects profiled
- Gallery: Feel the love
The Signpost: 18 February 2015
- In the media: Students' use and perception of Wikipedia
- Special report: Revision scoring as a service
- Gallery: Darwin Day
- Traffic report: February is for lovers
- Featured content: A load of bull-sized breakfast behind the restaurant, Koi feeding, a moray eel, Spaghetti Nebula and other fishy, fishy fish
- Arbitration report: We've built the nuclear reactor; now what colour should we paint the bikeshed?
The Signpost: 25 February 2015
- News and notes: Questions raised over WMF partnership with research firm
- In the media: WikiGnomes and Bigfoot
- Gallery: Far from home
- Traffic report: Fifty Shades of... self-denial?
- Recent research: Gender bias, SOPA blackout, and a student assignment that backfired
- WikiProject report: Be prepared... Scouts in the spotlight
The Signpost: 04 March 2015
- From the editor: A sign of the times: the Signpost revamps its internal structure to make contributing easier
- Traffic report: Attack of the movies
- Arbitration report: Bradspeaks—impact, regrets, and advice; current cases hinge on sex, religion, and ... infoboxes
- Interview: Meet a paid editor
- Featured content: Ploughing fields and trading horses with Rosa Bonheur
- Technology report: Bugs, Repairs, and Internal Operational News
Sunday March 22: Wikipedia Day NYC Celebration and Mini-Conference
Sunday March 22: Wikipedia Day NYC 2015 | |
---|---|
You are invited to join us at Barnard College for Wikipedia Day NYC 2015, a Wikipedia celebration and mini-conference for the project's 14th birthday. In addition to the party, the event will be a participatory unconference, with plenary panels, lightning talks, and of course open space sessions. We also hope for the participation of our friends from the Free Culture movement and from educational and cultural institutions interested in developing free knowledge projects.
We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Pharos (talk) 21:59, 9 March 2015 (UTC) |
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)
The Signpost: 11 March 2015
- Special report: An advance look at the WMF's fundraising survey
- In the media: Gamergate; a Wiki hoax; Kanye West
- Traffic report: Wikipedia: handing knowledge to the world, one prank at a time
- Featured content: Here they come, the couple plighted –
- Op-ed: Why the Core Contest matters
Copy vio detector not working
Hi Ben. The copy vio detector tool is not working today. All articles are showing 0.0 per cent overlap, even ones that I know for certain have material copied from elsewhere online. If you could have a look and see what's up, I would appreciate it. Thanks, -- Diannaa (talk) 21:31, 12 March 2015 (UTC)
- Hi, thanks for the report. I restarted the server and it should be working now. The underlying issue seems to be out-of-memory errors; the same thing happened yesterday and a server restart then fixed it too. I'm not sure why we're running out of memory, though. If it happens again, I'll look more carefully. — Earwig talk 22:10, 12 March 2015 (UTC)
- Thanks so much. I will get at my CCI tasks after supper! -- Diannaa (talk) 22:26, 12 March 2015 (UTC)
The Signpost: 18 March 2015
- From the editor: A salute to Pine
- Featured content: A woman who loved kings
- Traffic report: It's not cricket
.
The Signpost – Volume 11, Issue 12 – 25 March 2015
- News and notes: Wikimedia Foundation adopts open-access research policy
- Featured content: A carnival of animals, a river of dung, a wasteland of uncles, and some people with attitude
- Special report: Wikimedia Commons Picture of the Year 2014
- Traffic report: Oddly familiar
- Recent research: Most important people; respiratory reliability; academic attitudes
The Signpost, 1 April 2015
- In the media: Wiki-PR duo bulldoze a piñata store; Wifione arbitration case; French parliamentary plagiarism
- Featured content: Stop Press. Marie Celeste Mystery Solved. Crew Found Hiding In Wardrobe.
- Traffic report: All over the place
- Special report: Pictures of the Year 2015
The Signpost: 01 April 2015
- In the media: Wiki-PR duo bulldoze a piñata store; Wifione arbitration case; French parliamentary plagiarism
- Featured content: Stop Press. Marie Celeste Mystery Solved. Crew Found Hiding In Wardrobe.
- Traffic report: All over the place
- Special report: Pictures of the Year 2015
Glitch in copy vio detector
Hi there, me again. The copyvio detector is working great, except when comparing with a link from the Wayback Machine. These are timing out, every time. I am using the Duplication Detector for these instances, but would prefer to use your superior tool. If you have time, could you please investigate? Thanks, -- Diannaa (talk) 22:34, 7 April 2015 (UTC)
- @Diannaa: Hmm, can you give an example? This one works fine. — Earwig talk 17:11, 8 April 2015 (UTC)
- The ones that were not working were for example here, here, here (searching for matches on Francis Escudero), were not working yesterday, but they are all working fine today. So, a false alarm I guess. -- Diannaa (talk) 18:35, 8 April 2015 (UTC)
- Hmm, alright. Sometimes Labs has intermittent issues talking to certain servers and there's nothing we can do about that. It seems to go away with time. — Earwig talk 23:56, 8 April 2015 (UTC)
- The ones that were not working were for example here, here, here (searching for matches on Francis Escudero), were not working yesterday, but they are all working fine today. So, a false alarm I guess. -- Diannaa (talk) 18:35, 8 April 2015 (UTC)
The Signpost: 08 April 2015
- Traffic report: Resurrection week
- Featured content: Partisan arrangements, dodgy dollars, a mysterious union of strings, and a hole that became a monument
- WikiProject report: WikiProject Christianity
- Arbitration report: New Functionary appointments
- Technology report: Bugs, Repairs, and Internal Operational News
Earwig's Copyvio Detector bug
The detector appears to fail on articles with ampersands in the name, for example Shanmugha Arts, Science, Technology & Research Academy. Almost certainly a URL encoding issue. Stuartyeates (talk) 22:37, 12 April 2015 (UTC)
- @Stuartyeates: Thanks for the report, but the detector has no problem with those pages: if you enter the title directly, it works fine. That looks to be an issue with how {{copypaste}} is encoding titles. I'm not entirely clear on how these templates are structured (looks like {{copypaste}} invokes {{CVD}} which invokes {{copyvios}}, possibly with some double-encoding going on?), so I'll @Technical 13: to take a look. — Earwig talk 23:48, 12 April 2015 (UTC)
April 29: WikiWednesday Salon and Skill-Share NYC
Wednesday April 29, 7pm: WikiWednesday Salon and Skill-Share NYC | |
---|---|
You are invited to join the Wikimedia NYC community for our inaugural evening "WikiWednesday" salon and knowledge-sharing workshop by 14th Street / Union Square in Manhattan. We also hope for the participation of our friends from the Free Culture movement and from educational and cultural institutions interested in developing free knowledge projects. We will also follow up on plans for recent and upcoming editathons, and other outreach activities. After the main meeting, pizza and refreshments and video games in the gallery!
Featuring a keynote talk this month on Lady Librarians & Feminist Epistemologies! We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Pharos (talk) 18:29, 14 April 2015 (UTC) |
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)
The Signpost: 15 April 2015
- Traffic report: Furious domination
The Signpost: 22 April 2015
- In the media: UK political editing; hoaxes; net neutrality
- Featured content: Vanguard on guard
- Traffic report: A harvest of couch potatoes
- Gallery: The bitter end
The Signpost: 29 April 2015
- Featured content: Another day, another dollar
- Traffic report: Bruce, Nessie, and genocide
- Recent research: Military history, cricket, and Australia targeted in Wikipedia articles' popularity vs. quality; how copyright damages economy
- Technology report: VisualEditor and MediaWiki updates
The Signpost: 06 May 2015
- News and notes: "Inspire" grant-making campaign concludes, grantees announced
- Featured content: The amorous android and the horsebreeder; WikiCup round two concludes
- Special report: FDC candidates respond to key issues
- Traffic report: The grim ship reality
Wednesday June 10, 7pm: WikiWednesday Salon / Wikimedia NYC Annual Meeting | |
---|---|
You are invited to join the Wikimedia NYC community for our next evening "WikiWednesday" salon and knowledge-sharing workshop by 14th Street / Union Square in Manhattan. This month will also feature on our agenda: recent and upcoming editathons, the organization's Annual Meeting, and Chapter board elections. We also hope for the participation of our friends from the Free Culture movement and from educational and cultural institutions interested in developing free knowledge projects. We will also follow up on plans for recent and upcoming editathons, and other outreach activities. After the main meeting, pizza and refreshments and video games in the gallery!
Featuring a keynote talk this month to be determined! We especially encourage folks to add your 5-minute lightning talks to our roster, and otherwise join in the "open space" experience! Newcomers are very welcome! Bring your friends and colleagues! --Pharos (talk) 17:23, 12 May 2015 (UTC) |
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)