MediaWiki talk:Titleblacklist/Archive 1
This is an archive of past discussions about MediaWiki:Titleblacklist. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | → | Archive 5 |
Initial comments
- Also, you shouldn't add "^" at the beginning of the entry and "$" at the end. They will be added automatically — VasilievVV (talk) 10:11, 1 January 2008 (UTC)
- There is a global title blacklist available at meta:Title blacklist awaiting activation. ~Kylu (u|t) 06:06, 5 March 2008 (UTC)
- The global title blacklist has been enabled. --MZMcBride (talk) 03:08, 10 April 2008 (UTC)
HAGGER
Why have you blocked all page creations containing HAGGER? עוד מישהו Od Mishehu 09:20, 1 January 2008 (UTC)
- It's a common vandal meme -- if you're familiar with "Willy on Wheels"-style pagemove vandalism, the HAGGER?!?!?!??!?! vandal(s) often take a similar approach. As I understand, the title blacklist was partially created specifically to try and deal with these sorts of things. – Luna Santin (talk) 12:01, 1 January 2008 (UTC)
- Isn't this going to cause a problem if people try and create articles about notable people with the last name "Hagger"? Such as Nicholas Hagger, David Osborne Hagger, Lloyd Hagger, and Kim Hagger? All of which turned up after a quick search. Even if it is a troll-meme, all the trolls have to do is change one letter and they can get around it. Where as someone who has a valid reason for creating the hypothetical article ?????? Hagger can't reasonably be expected to change the subjects name just to get around the blacklist. Or is the blacklist case-sensitive?--69.118.143.107 (talk) 14:16, 1 January 2008 (UTC)
- The section above contains the instruction about making an entry case-sensitive. Some admin should add a
<casesensitive>
after.*HAGGER.*
. — Kalan ? 17:19, 1 January 2008 (UTC)- Done EVula // talk // ☯ // 19:33, 1 January 2008 (UTC)
- Awhoops. Thanks for pointing that out! – Luna Santin (talk) 00:00, 2 January 2008 (UTC)
- Hi, I tried that code on testwiki, and the current addition
.*HAGGER.* <casesensitive>
did not prevent from creating that page (without "casesensitive" it worked), best regards, --birdy (:> )=| 00:08, 2 January 2008 (UTC) P.S. ah, I think You forgot the | ->Hm, no I am wrong. But it works casesensitve without that <casesensitive> but with the code right now it does not work at all, best regards, --birdy (:> )=| 00:19, 2 January 2008 (UTC).*HAGGER.*|<casesensitive>
regards, --birdy (:> )=| 00:10, 2 January 2008 (UTC)
- Hi, I tried that code on testwiki, and the current addition
- Awhoops. Thanks for pointing that out! – Luna Santin (talk) 00:00, 2 January 2008 (UTC)
- Done EVula // talk // ☯ // 19:33, 1 January 2008 (UTC)
- The section above contains the instruction about making an entry case-sensitive. Some admin should add a
- Isn't this going to cause a problem if people try and create articles about notable people with the last name "Hagger"? Such as Nicholas Hagger, David Osborne Hagger, Lloyd Hagger, and Kim Hagger? All of which turned up after a quick search. Even if it is a troll-meme, all the trolls have to do is change one letter and they can get around it. Where as someone who has a valid reason for creating the hypothetical article ?????? Hagger can't reasonably be expected to change the subjects name just to get around the blacklist. Or is the blacklist case-sensitive?--69.118.143.107 (talk) 14:16, 1 January 2008 (UTC)
My Suggestion for dealing with Hagger
Obviously Hagger is having too much fun with this. That person keeps posting a URL that is meant to confuse people(given they put YouTube in the URL) *and* take them to his/her website. It looks like "nimp.org" is registered with GoDaddy. If enough people from Wikipedia complain to GoDaddy about Hagger's actions perhaps we can get GoDaddy to revoke the domain registration based on their own Terms of Service seen here.
(Quote) "5. NO UNLAWFUL CONDUCT OR IMPROPER USE.
As a condition of Your use of Go Daddy ’s Software and Services, You agree not to use them for any purpose that is unlawful or prohibited by these terms and conditions, and You agree to comply with any applicable local, state, federal and international laws, government rules or requirements. You agree You will not be entitled to a refund of any fees paid to Go Daddy if, for any reason, Go Daddy takes corrective action with respect to Your improper or illegal use of its Services.
Go Daddy reserves the right at all times to disclose any information as Go Daddy deems necessary to satisfy any applicable law, regulation, legal process or governmental request, or to edit, refuse to post or to remove any information or materials, in whole or in part, in Go Daddy's sole discretion.
If You have purchased Services, Go Daddy has no obligation to monitor Your use of the Services. Go Daddy reserves the right to review Your use of the Services and to cancel the Services in its sole discretion. Go Daddy reserves the right to terminate Your access to the Services at any time, without notice, for any reason whatsoever.
Go Daddy reserves the right to terminate Services if Your usage of the Services results in, or is the subject of, legal action or threatened legal action, against Go Daddy or any of its affiliates or partners, without consideration for whether such legal action or threatened legal action is eventually determined to be with or without merit. Go Daddy may review every account for excessive space and bandwidth utilization and to terminate or apply additional fees to those accounts that exceed allowed levels.
Except as set forth below, Go Daddy may also cancel Your use of the Services, after thirty (30) days, if You are using the Services, as determined by Go Daddy in its sole discretion, in association with spam or morally objectionable activities. Morally objectionable activities will include, but not be limited to: activities designed to defame, embarrass, harm, abuse, threaten, slander or harass third parties; activities prohibited by the laws of the United States and/or foreign territories in which You conduct business; activities designed to encourage unlawful behavior by others, such as hate crimes, terrorism and child pornography; activities that are tortuous, vulgar, obscene, invasive of the privacy of a third party, racially, ethnically, or otherwise objectionable; activities designed to impersonate the identity of a third party; illegal access to other computers or networks (i.e., hacking); distribution of Internet viruses or similar destructive activities; and activities designed to harm or use unethically minors in any way. Notwithstanding anything to the contrary herein, in the event Go Daddy cancels Your Services during the first thirty (30) days after You purchase the Services, You will receive a refund of any fees paid to Go Daddy in connection with the Services being canceled. In the event Go Daddy deletes Your Services because they are being used in association with spam or morally objectionable activities, no refund will be issued. You agree You will not be entitled to a refund of any fees paid to Go Daddy if, for any reason, Go Daddy takes corrective action with respect to Your improper or illegal use of its Services. " (/END QUOTE)
and here.
(Quote) " GoDaddy.com, Inc. does not tolerate the transmission of spam. We monitor all traffic to and from our Web servers for indications of spamming and maintain a spam abuse compliant center to register allegations of spam abuse. Customers suspected to be using Go Daddy products and services for the purpose of sending spam are fully investigated. Once Go Daddy determines there is a problem with spam, Go Daddy will take the appropriate action to resolve the situation. Our spam abuse compliant center can be reached by email at abuse@godaddy.com.
How We Define Spam We define spam as the sending of Unsolicited Commercial Email (UCE), Unsolicited Bulk Email (UBE) or Unsolicited Facsimiles (Fax), which is email or facsimile sent to recipients as an advertisement or otherwise, without first obtaining prior confirmed consent to receive these communications from the sender. This **can include, but is not limited to**, the following:
1. Email Messages 2. Newsgroup postings 3. Windows system messages 4. Pop-up messages (aka "adware" or "spyware" messages) 5. Instant messages (using AOL, MSN, Yahoo or other instant messenger programs) 6. Online chat room advertisements 7. Guestbook or Website Forum postings 8. Facsimile Solicitations
" (/END QUOTE)
And they have a SPAM reporting tool seen here:
Feel free to jump in.
CaribDigita (talk) 15:45, 3 May 2008 (UTC)
All-uppercase entry
I think that we should only allow autoconfirmed users to create pages in which all the letters are uppercase. There are probably few cases where such pages are needed, except for abbreviations which already have articles. עוד מישהו Od Mishehu 16:38, 1 January 2008 (UTC)
- I disagree. There's still plenty of abbreviations that don't have pages. Also, most of such page creations will be honest errors of well-meaning people. Such attempts should be welcomed and then corrected (or the other way around), not stopped in their tracks. Finally, autoconfirmed blocking of page creation is still impossible, if I understand correctly. - Andre Engels (talk) 19:54, 1 January 2008 (UTC)
- I agree with Andre, I think this would net us far too many false positives. Phrases are useful to block, but an entire style? Eh... EVula // talk // ☯ // 20:28, 1 January 2008 (UTC)
- Worth discussing, but as said, probably a bit of a heavy tool for a smallish problem. On the upside, ALL-CAPS draw the rapid attention of newpage patrollers. ;) – Luna Santin (talk) 23:59, 1 January 2008 (UTC)
- Blocking anonymous and new users from creating articles about abbreviations is worth discussing, as such users have WP:AFC. But blocking people from editing, say, KPMG or TIAA-CREF or AARP or MS-DOS or NEC? I don't think semi-protecting every page about an organization or product identified by a string of uppercase Latin letters. --Damian Yerrick (talk | stalk) 00:37, 12 February 2008 (UTC)
Edit protected request
{{editprotected}} Request that "/w/" be banned to prevent the notorious /w/wiki.php? and /w/index.php? SPAMmers. 68.39.174.238 (talk) 01:27, 2 January 2008 (UTC)
- The absence is intentional. east.718 at 02:28, January 2, 2008
- Why, so you can get SPAMmed? 68.39.174.238 (talk) 09:46, 3 January 2008 (UTC)
- There are legitimate reasons for allowing bots to create predictable vandalism. If a global title blacklist is implemented, which seems pretty likely, it will most certainly contain the /w/ and /index.php regexes. Cheers. --MZMcBride (talk) 23:13, 3 January 2008 (UTC)
- Why, so you can get SPAMmed? 68.39.174.238 (talk) 09:46, 3 January 2008 (UTC)
- Isn't this a "global title blacklist"? 68.39.174.238 (talk) 07:59, 4 January 2008 (UTC)
- No, this blacklist is only for en.wikipedia. Identifying predictable spammers while instantly blocking them and cleaning up their spam on enwiki helps the stewards on the small wiki monitoring team shut them down on more vulnerable wikis before they do serious damage. east.718 at 08:07, January 4, 2008
- Isn't this a "global title blacklist"? 68.39.174.238 (talk) 07:59, 4 January 2008 (UTC)
- By the way, global blacklist is already implemented. You should just put bugzilla request for creating such on Wikimedia — VasilievVV (talk) 14:49, 4 January 2008 (UTC)
Interesting, but this is prone to host-phishing-style attacks
For instance, someone could replace the 'A' in 'HAGGER' with a Cyrillic 'A' or a Greek 'A' to go around the blacklist. And L337-speak would be another alternative (H4663R, anyone?). The blacklist might grow unwieldy if ever some vandal is dedicated enough. (But I guess, this can be used in conjunction with the user banning features, so maybe this is not too much of a problem.) --seav (talk) 02:49, 2 January 2008 (UTC)
- The beauty of regex means that we can solve this with one giant expression, rather than a huge blacklist. east.718 at 03:03, January 2, 2008
I just made MediaWiki talk:Titleblacklist/log.
I figure we should probably have something to keep track of what is added when, etc, just like the local spam link list has, but better. :P I figure that having a sortable table would be much cooler than a boring flat file log, so I did it. :P Cheers. --slakr\ talk / 06:33, 2 January 2008 (UTC)
BRIAN PEPPERS DAY???
this should be added to the list, as it is always a major disruption on Febuary 21. Blueanode (talk) 20:09, 8 January 2008 (UTC)
- Declined. I can't seem to find any extensive history of titles like it being deleted or salted. The title blacklist is primarily for titles for which normal deletion/salting is insufficient to resolve the problem. But, lemme know if I missed something. --slakr\ talk / 21:24, 23 January 2008 (UTC)
- I found this lot recently. Is that related? • Anakin (talk) 13:59, 28 February 2008 (UTC)
Removed addition
I've reverted the addition of the anti-phone-number regex. Wikipedia's already got several articles on phone numbers (555-1212 and 867-5309 come to mind). --Carnildo (talk) 10:42, 28 January 2008 (UTC)
Namespace?
Is the title blacklist restricted to the mainspace? If not, what's preventing us from replacing the fairly crude protected-image system that stops users uploading images with titles such as Image:Picture.jpg?? Using this list would be a much more elegant solution. Happy‑melon 20:09, 26 February 2008 (UTC)
- Sure. It works for all namespaces — 213.181.10.210 (talk) 06:58, 15 March 2008 (UTC)
The "Jews did WTC" entry
It seems that this entry only prevents "JEWS DID WTC" (with quotation marks) from being created, whereas any variation without the quotation marks do not return the title blacklist message. Should the quotation marks be removed? TML (talk) 11:48, 4 March 2008 (UTC)
Hagger
People are using H.A.G.G.E.R now. Probably move onto H..A..G..G..E..R.. or H-A-G-G-E-R. Should we add .*H\W*A\W*G\W*G\W*E\W*R.*
now? Sceptre (talk) 01:56, 24 March 2008 (UTC)
- Depends. How many false positives will it generate? --Carnildo (talk) 05:58, 24 March 2008 (UTC)
- \W means non-"word" characters, so unless you consider H!@#A%^$#G!)(G./?E!)R)!)!!!!! to be a false positive, it shouldn't cause too many. --Random832 (contribs) 16:48, 9 April 2008 (UTC)
- I've changed it to
.*H\W*(A|Α)\W*G\W*G\W*E\W*R.*
after a page had been moved to HΑGGER???????????????????. עוד מישהו Od Mishehu 07:13, 15 April 2008 (UTC)- Could someone please modify the HAGGER-regexp so that the Н sign, which has recently been used, is not allowed at the beginning ? I'm not sure if the correct regexp is
.*(H|Н)\W*(A|Α)\W*G\W*G\W*E\W*R.*
. Thanks --Oxymoron83 09:45, 16 April 2008 (UTC)- Done John Vandenberg (chat) 12:03, 16 April 2008 (UTC)
- I've also pre-emptively added other cyrillic and greek lookalikes --Random832 (contribs) 14:27, 18 April 2008 (UTC)
- Could someone please modify the HAGGER-regexp so that the Н sign, which has recently been used, is not allowed at the beginning ? I'm not sure if the correct regexp is
- I've changed it to
This vandal is also being disruptive at my sites Palaeos and EvoWiki can someone please use checkuser on the hagger usernames on wikipedia and tell me what their ip addresses are so i can block those ip addresses at Palaeos (and get someone else to block these usernames on EvoWiki since I don’t have administrator privileges there.--Fang 23 (talk) 14:25, 26 April 2008 (UTC)
- Declined - that would violate the privacy policy. Stifle (talk) 11:31, 15 May 2008 (UTC)
He's now starting to use the character Ң. Perhaps it's time to add that to the list. 128.2.152.135 (talk) 07:53, 4 May 2008 (UTC)
.*[!?]{3,}.*
I've modified the ! and ? regex to apply to all users except sysops. I can't think of any legitimate use of more than three question marks or exclamation points, and we've had some page move vandalism lately that has abused this form of punctuation. --MZMcBride (talk) 00:19, 2 April 2008 (UTC)
- There's an indie band by that name. Might come into trouble if they release a new album that needs to be disambiguated. Sceptre (talk) 02:54, 2 April 2008 (UTC)
- Agreed, it ought to be {4,} which would make us safe until !!! releases an album called ????...—/Mendaliv/2¢/Δ's/ 13:24, 17 September 2008 (UTC)
Two rules removed
.*[\p{P}\p{Mc}\p{Z}]{4,}.*
is causing problems at Portal:Indianapolis/On this day.../April 9 and other similar pages. .{135,} <autoconfirmed>
blocks the creation of pages with long page titles. Both of these rules are evidently causing more harm than good, so I've removed them. --- RockMFR 19:12, 9 April 2008 (UTC)
- Well, the second rule only applied to non-autoconfirmed users. Have there been any actual complaints about the regex or is this purely theoretical at the moment? --MZMcBride (talk) 20:02, 9 April 2008 (UTC)
- By the way, I wouldn't totally remove
.*[\p{P}\p{Mc}\p{Z}]{4,}.*
because it catches excessive punctuation that's common in that herbie or whatever dude's vandal creates/moves. I simply didn't account for ".../". Instead, consider simply upping the minimum count to perhaps 6 or 7. --slakr\ talk / 03:33, 17 April 2008 (UTC)- I should also stick a link to unicode character properties so that the
\p{properties}
stuff makes sense. E.g.,\p{P}
matches all punctuation characters— no matter how inverted or funky they are. There's a cool table on that page with the full list. --slakr\ talk / 03:43, 17 April 2008 (UTC)
- I should also stick a link to unicode character properties so that the
- By the way, I wouldn't totally remove
Removed redundant entries
There were several HAGGER entries that were redundant with (i.e. matched strict subsets of) the most recently added one. Let's try to keep the list clean. Also, the first one (with nothing in between the letters) is case-sensitive (to prevent blocking articles that have "Hagger" as a genuine surname, I expect), but these are caught by one of the other entries. Any opinions on what to do about this? --Random832 (contribs) 14:45, 18 April 2008 (UTC)
Removed entry
I've removed .*[\p{Mc}]{4,}.* from the list.
- According to my regex reference (Programming Perl), that's four consecutive combining marks, not four consecutive space-like characters, which would be perfectly valid (if rare) in a title.
- How common a problem are such titles, anyways?
--Carnildo (talk) 05:42, 29 April 2008 (UTC)
- Would two in a row ever be necessary? I think GRAWP, et al. have been abusing "special" spaces. --MZMcBride (talk) 06:11, 29 April 2008 (UTC)
- If something's actually showing up often enough to be a problem, then adding it to the blacklist is fine. --Carnildo (talk) 06:21, 29 April 2008 (UTC)
- Oops, that should have been \p{Z}, but technically multiple space combining characters in a row is equally problematic (especially for that user). --slakr\ talk / 06:22, 29 April 2008 (UTC)
- You mean special spaces like " ," " ," " ," " ," " ," " ," " ," " ," " ," " ," "," " "? Surely these can be added.—Ryūlóng (竜龙) 06:25, 29 April 2008 (UTC)
- If something's actually showing up often enough to be a problem, then adding it to the blacklist is fine. --Carnildo (talk) 06:21, 29 April 2008 (UTC)
- Interestingly, MediaWiki doesn't, from what I can tell, support multiple regular spaces consecutively (i.e., _____ snaps back to _). Special spaces can be consecutive, though. --MZMcBride (talk) 06:25, 29 April 2008 (UTC)
- Ryu: that's what the \p{Z} should nab. Though, if mw doesn't support multiple regular spaces, then it might be an idea to reduce the "4" down to "3" or "2", unless someone can think of a typical instance where it's needed. *shrug* --slakr\ talk / 06:30, 29 April 2008 (UTC)
- Those are unicode spaces. One is the unicode full width ideographic space. These should be added, but I don't know how to do it myself.—Ryūlóng (竜龙) 09:25, 29 April 2008 (UTC)
- Ryu: that's what the \p{Z} should nab. Though, if mw doesn't support multiple regular spaces, then it might be an idea to reduce the "4" down to "3" or "2", unless someone can think of a typical instance where it's needed. *shrug* --slakr\ talk / 06:30, 29 April 2008 (UTC)
Why is throw under the bus blocked?
Come on, it's the hottest cliché out there right now! Why is it blocked?--The lorax (talk) 06:49, 1 May 2008 (UTC)
- A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 07:27, 1 May 2008 (UTC)
Two odd inclusions
Any reason for including John Cabell Breckenridge, which should surely redirect to John C. Breckenridge, or Talk:Johann Jakob Breitinger (the talk page for Johann Jakob Breitinger)? Frickeg (talk) 07:02, 1 May 2008 (UTC)
- Those titles were being blocked by
.*[\x{2100}-\x{214F}].*
. That regex has been removed for the time being. --MZMcBride (talk) 07:26, 1 May 2008 (UTC)- Okay, anyone have any idea what's wrong with it? It works in Perl. —Ilmari Karonen (talk) 07:29, 1 May 2008 (UTC)
- Aha, found it. Apparently PHP's Unicode implementation considers the character "\x{212A}" (Kelvin sign, K) equivalent to the letter "K", and applies said equivalence rule even inside regular expressions, such that the letter "K" matches the Unicode character range "[\x{2100}-\x{214F}]". Same apparently goes for "\x{212B}" (Angstrom sign, Å) vs. "Å" as well as "\x{2126}" (Ohm sign, Ω) vs. "Ω".
- Fortunately, none of those unit sign characters are usable in titles anyway: they get automatically normalized to their simpler equivalents. (Using them in wikilinks gives odd results, with apparent redlinks leading to existing pages: Ω, K, Å.) Anyway, removing those and some other non-letterlike symbols from the range, I get the following fixed regexp, which I've tested on a local MediaWiki installation:
- .*[ℂ℃℄ℇ℈℉ℊℋℌℍℎℏℐℑℒℓℕ№℗℘ℙℚℛℜℝ℞℟℣ℤℨ℩ℬℭ℮ℯℰℱℲℳℴℹ℺⅁⅂⅃⅄ⅅⅆⅇⅈⅉⅎ].* <casesensitive>
- That should take care of all the latin letter lookalikes in that range; of course, I'm sure there are more in other parts of the Unicode repertoire. And now, having spilled the beans, I'd better go add that regexp to the blacklist before our friend, who I'm sure is reading this talkpage, saves a copy of that list to cut and paste from. —Ilmari Karonen (talk) 01:21, 3 May 2008 (UTC)
- Nice work. : - ) --MZMcBride (talk) 01:32, 3 May 2008 (UTC)
What's Going On?
All of a sudden, I can't leave warnings on anon talk pages. Its coming up "Unauthorized title". See [1] for example. AnmaFinotera (talk) 02:57, 3 May 2008 (UTC)
- A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 03:06, 3 May 2008 (UTC)
- Ah...thanks, all good again :) AnmaFinotera (talk) 03:08, 3 May 2008 (UTC)
Edward Henry Lewinski Corwin
Please check Edward H. L. Corwin which was rejected as Edward Henry Lewinski Corwin.-- Matthead Discuß 02:59, 3 May 2008 (UTC)
- A regular space was accidentally included in the blacklist. That title should work again. --MZMcBride (talk) 03:06, 3 May 2008 (UTC)
- I wish people would stop trying to get cute with blocking unusual characters -- they keep blocking huge numbers of valid article titles by accident. --Carnildo (talk) 03:20, 3 May 2008 (UTC)
- I wish Grawp and Grawp socks would stop moving tens of pages to obscure Unicode titles. The blacklist is updated when there are attacks. The most recent one was by User talk:I think 2 + 2 = 22.. When Random updated the blacklist, Firefox converted a non-breaking space into a regular space -- not really anyone's fault. The issue has been fixed going forward. --MZMcBride (talk) 03:24, 3 May 2008 (UTC)
- I wish people would stop trying to get cute with blocking unusual characters -- they keep blocking huge numbers of valid article titles by accident. --Carnildo (talk) 03:20, 3 May 2008 (UTC)
Recent error
The brief inclusion of a regular space character in the title blacklist was due to an error in Firefox's implementation of forms, and has been fixed by replacing the non-breaking space in one of the regexes with the code "\x{00A0}". The problem should not recur. I apologize for the inconvenience. --Random832 (contribs) 03:21, 3 May 2008 (UTC)
More Hagger substitutes
{{editprotected}}
Please modify the hagger regexp to blacklist the title ¿¿¿H Å G G Ệ Ŕ!. Thanks. MER-C 10:05, 10 May 2008 (UTC)
- Done. It seems there are three blacklist lines for this title, where one would probably suffice. -- zzuuzz (talk) 10:36, 10 May 2008 (UTC)
Another one: ¿Я Ǝ อ อ Ά H. MER-C 11:46, 12 May 2008 (UTC)
- I think this diff did that. Woody (talk) 11:50, 12 May 2008 (UTC)
And another... HΑGGĘRʔ (ʔs repeated). Kesac (talk) 02:18, 14 May 2008 (UTC)
Reorganized
I've just made some major changes to the list:
- Reorganized the list into sections based on what each regexp is checking for, hopefully making it easier to maintain.
- Added inline comments to most of the entries, except for the most self-explanatory ones.
- Rewrote the "HAGGER" regexps based on some grepping of Unicode character tables. The new regexps should match everything the old ones did, and plenty more variants besides, but keep an eye open for additions just in case.
- Added some new regexps to catch any mixed-script (i.e. Latin/Greek, Latin/Cyrillic, etc.) titles containing letters from the "HAGGER" regexps. This should help reduce the number of possible titles each new lookalike character makes available to the vandal.
I'm sure there are further improvements that could be made; for example, several characters in the HAGGER regexps may be obscure enough that they'd be worth blacklisting individually. On the converse side, I've expanded the whitelist to include all single-character titles; most of them could be potentially valid redirects to articles about the character or symbol in question. I've also compiled a list, User:Ilmari Karonen/Funnycode, of existing article titles (as of April 28) containing characters that the current blacklist disallows entirely; some of the characters in the list might be worth allowing after all (the "Other punctuation" class matches a whole lot of them), while others are just obvious errors and may require cleanup (though most are just leftover redirects). —Ilmari Karonen (talk) 19:27, 14 May 2008 (UTC)
I excluded the miscellaneous symbols and dingbats ranges, as well as the "№" and "™" signs, from the "other punctuation" regexp; the list of matching existing titles is a lot shorter now. A big fraction of the remaining ones contain non-breaking spaces; it might be worth adding a custom error message for those, since the default one we have isn't very informative. —Ilmari Karonen (talk) 20:10, 14 May 2008 (UTC)
...as I've just done: see MediaWiki:Titleblacklist-custom-nbsp. —Ilmari Karonen (talk) 20:33, 14 May 2008 (UTC)
Last addition
The last addition (this one) seems to be preventing User:Петър Петров from creating user subpages. See this ANI thread. Perhaps it was too wide and needs to be reverted. Stifle (talk) 11:24, 15 May 2008 (UTC)
- Yeah, sorry, that regexp was just broken — I basically just forgot to consider non-article titles. I've prefixed the mixed-script regexps with
(?!(User|Wikipedia|Image|MediaWiki|Template|Help|Category|Portal)( talk)?:|Talk:)
, which should prevent them from matching outside mainspace. —Ilmari Karonen (talk) 14:02, 15 May 2008 (UTC)
Soft hyphen
Could someone replace the soft hyphen with the appropriate \x{} code? Since it's a non-printing character, it could easily get lost when someone edits the page. --Carnildo (talk) 01:42, 17 May 2008 (UTC)
Existing matching titles
I ran a script that compared the blacklist against the latest database dump (from April 25), and uploaded the results at User:Ilmari Karonen/Badtitles. The list contains a lot of broken titles as well as plenty of vandal userpages, but there are also genuine false positives that may we worth looking into. For example, there seem to be quite the few titles that are listed because they contain three consecutive exclamation points; it might be worth loosening up that particular rule a little. Another similar case are titles like Talk:(-)-borneol dehydrogenase, where the colon in "Talk:" is enough to push the number of consecutive punctuation characters to five and thus hit the blacklist. —Ilmari Karonen (talk) 23:06, 17 May 2008 (UTC)
- First, let me say you've been doing great work lately. Truly. : - )
As for the blacklist issues, I would be more inclined to write a few custom error messages for the special cases (!!!, etc.) than allow a lot of repeated punctuation. It seems more sensible to block moving on this specific (small) subset of articles, which in reality probably won't be moved much, than allow other page move vandalism that has tens of exclamation points or tens of question marks. --MZMcBride (talk) 23:23, 17 May 2008 (UTC)
- True, but we could still make a specific exception for exactly three exclamation points in a row. Something like
.*([?‽¿][!?‽¿][!?‽¿]|![?‽¿][!?‽¿]|!![?‽¿]|!!!!).*
would do it, though it does look rather messy. The problem is that the blacklist doesn't just affect moves; it also, for example, prevents the creation of talk pages for the matched articles (assuming the regexp isn't explicitly made namespace-specific), as well as the archival of any existing talk pages. —Ilmari Karonen (talk) 00:39, 18 May 2008 (UTC)- Perhaps whitelist just "!!!"? --MZMcBride (talk) 05:01, 19 May 2008 (UTC)
- The whitelist, unfortunately, is useless for this: if you were to add, say,
.*!!!.*
to the whitelist, that would allow every title containing "!!!", even ones that would be blacklisted for any other reason. Basically, it's only good for regexps narrow enough that one can be sure that any title matching any of them is valid. To exclude certain titles from matching only one blacklist entry, the entry itself needs to be altered. (It does occur to me, though, that it should be possible to simplify the version I suggested above using negative lookbehind:.*[!?‽¿]{3}(?<!!!!).*
, with a separate regexp to match "!!!!".) —Ilmari Karonen (talk) 06:09, 19 May 2008 (UTC)
- The whitelist, unfortunately, is useless for this: if you were to add, say,
- Perhaps whitelist just "!!!"? --MZMcBride (talk) 05:01, 19 May 2008 (UTC)
- True, but we could still make a specific exception for exactly three exclamation points in a row. Something like
Latest Grawp stuff
Apparently Wikipedia:Ή.A.G.G.E.R.?.. wasn't protected. Can we fix this? NawlinWiki (talk) 01:08, 19 May 2008 (UTC)
- And more of same: Wikipedia:ḤAGGER??. Looks like moves to the Wikipedia namespace aren't blacklisted. Let's fix this quick. NawlinWiki (talk) 01:11, 19 May 2008 (UTC)
- Done. Let me know if there's any more. —Ilmari Karonen (talk) 03:59, 19 May 2008 (UTC)
- Thanks! NawlinWiki (talk) 11:14, 19 May 2008 (UTC)
- And the latest: see here. He's now using IBHHFS ("HAGGER" + 1 letter for each), and there are some interesting symbol combinations that I would have thought would be blocked. NawlinWiki (talk) 11:37, 19 May 2008 (UTC)
(unindent) Hm, I think it would be best if we banned any character from being repeated more than 5 times and if we added a custom error message for this particular case. Also, it seems that some type of upside-down question mark was able to be used. That should be added to the question mark line. --MZMcBride (talk) 17:01, 19 May 2008 (UTC)
- Added the ¿ to the question mark line. I'll let Ilmari take care of the rest -- I'm afraid I'd break something if I tried. NawlinWiki (talk) 18:20, 19 May 2008 (UTC)
- I've added some regxeps to catch these. In particular, it turned out that PCRE's definition of "punctuation" (
[\p{P}]
) was rather narrow; I replaced it with "not a letter, 0-9 or space" ([^\p{L}\d ]
), which should match for example "^" too. I also implemented MZMcBride's suggestion of blacklisting any character repeated five or more times (except "0", too many numbers with lots of zeros). Oh, and I added "IBHHFS" as well as "IFSNZ", though I'm sure these will not slow him much. (I think it's pretty safe to add simple rules like.*IBHHFS.*
even if you're not familiar with regexps; it's only when you involve trickier regexp syntax or odd Unicode characters that things can break.) —Ilmari Karonen (talk) 22:08, 19 May 2008 (UTC)
- Added ∑ to the e's section of the Haggers based on vandalism from last night. Please let me know if I didn't do this right. NawlinWiki (talk) 14:14, 28 May 2008 (UTC)
- Looks fine to me. —Ilmari Karonen (talk) 15:02, 28 May 2008 (UTC)
- Oops, looks like your edit lost the closing bracket ("]") from the character class. Didn't spot that the first time. Never mind, I've put it back. —Ilmari Karonen (talk) 01:06, 29 May 2008 (UTC)
- Looks fine to me. —Ilmari Karonen (talk) 15:02, 28 May 2008 (UTC)
- Added ∑ to the e's section of the Haggers based on vandalism from last night. Please let me know if I didn't do this right. NawlinWiki (talk) 14:14, 28 May 2008 (UTC)
- Thanks. Latest from last night is moves like this -- any ideas? NawlinWiki (talk) 11:56, 29 May 2008 (UTC)
- I made some changes to the regexp that should catch those. —Ilmari Karonen (talk) 15:33, 29 May 2008 (UTC)
- Report from last night -- now using repeating letters such as HAGGGER, HAAGGER, etc., and one HWAGGER. Your next challenge... Thanks for all you do, NawlinWiki (talk) 11:40, 30 May 2008 (UTC)
- How did this get through the blacklisting? NawlinWiki (talk) 03:16, 31 May 2008 (UTC)
- He's using Greek/Cyrillic characters to get around the
.*Grawp.*
blacklist entry. Instead of expanding it to include lookalikes, as has been done for the HAGGER entries, I decided to add a blanket prohibition against pagemoves to mixed-script titles. This will probably cause some false positives, but hopefully not too many; page moves aren't all that common anyway. On the other hand, if it works it ought to make it much harder to get around other blacklist rules. Anyway, if it causes too much trouble for legitimate editors, please revert it. —Ilmari Karonen (talk) 03:40, 31 May 2008 (UTC)
- He's using Greek/Cyrillic characters to get around the
- I don't think this will be a problem -- thanks! NawlinWiki (talk) 03:44, 31 May 2008 (UTC)
Incorrect blacklisting
Which blacklist rule was preventing Ιερουσαλήμ (Greek: Jerusalem) from being created? --Carnildo (talk) 23:00, 19 May 2008 (UTC)
- This one:
.*[ΉḤĤĦɧ⒣Ⓗⓗ].*
. The first character is an uppercase eta with tonos, and since it's case-insensitive, it matches "ή". That regexp is a bit problematic in other ways too: the "Ḥ", for example, matches quite a few Arabic names. It might help if it was made case-sensitive, that would reduce the false positives a bit. Of course, that would make it somewhat less effective, too. —Ilmari Karonen (talk) 01:24, 20 May 2008 (UTC)
And another incorrect blacklisting: User_talk:Nooooob. I assume it's .*([^0])\1{4}.*
that's the problem; would it make sense to have that rule an article-only rule? --Carnildo (talk) 23:31, 28 May 2008 (UTC)
- Maybe. Or simply exclude the User and User_talk namespaces. Vandalism in Wikipedia:, while not as "bad," is still annoying. --MZMcBride (talk) 23:58, 28 May 2008 (UTC)
- I agree, these should only be restricted from the article space, or at least permitted in User and User talk (for user pages/subpages and talk pages/archives), and Wikipedia and Wikipedia talk (for SSPs, RFAs, RFCUs and MFDs). --Snigbrook (talk) 23:59, 28 May 2008 (UTC)
- See also, the "moveonly" section below: repeated letters could be allowed in User and User talk, and allowed with the exception of page move targets in Wikipedia and Wikipedia talk, if there is a problem with vandalism. --Snigbrook (talk) 00:04, 29 May 2008 (UTC)
- Added <moveonly> to the
.*([^0])\1{4}.*
entry. Might be worth considering it for some other entries in that section too. —Ilmari Karonen (talk) 15:31, 29 May 2008 (UTC)
- Added <moveonly> to the
<moveonly>
I recently committed rev:35163, which adds support for a <moveonly> option that makes the blacklist entry apply only to page move targets. That means we can now more easily add regexps to catch pagemove vandalism without having to worry about also hitting things like legitimate redirects that would never be targets for a move (such as Ιερουσαλήμ above). I've added the option to the "HAGGER" regexps, but it occurs to me that some further simplification of the regexps might be possible. —Ilmari Karonen (talk) 08:28, 23 May 2008 (UTC)
The "log"
The log isn't being kept up to date, and it duplicates the function of the page history. I'd like to propose discontinuing use of the log page. --Random832 (contribs) 16:37, 23 May 2008 (UTC)
.*skater.* <moveonly>
Is there really enough of a problem with pagemoves to "skater" that we need to make it harder to disambiguate articles for sports figures? --Carnildo (talk) 23:22, 7 June 2008 (UTC)
- We had someone moving hundreds of user talk pages to titles including "skater girl". I'll change it to that rather than just "skater". NawlinWiki (talk) 03:07, 9 June 2008 (UTC)
"On wheels"
"On wheels" is contained in the title of several articles, and the entry here has been changed as it was previously in capitals but is now in lowercase. It would cause problems for deletion nominations (and possible featured article nominations), and in at least one case, prevent the article from being discussed as the talk page cannot be created. It has been used in page move vandalism but has not, as far as I know, been used often in the titles of vandalism articles. Could the entry be changed to have a <moveonly> next to it? --Snigbrook (talk) 15:26, 11 June 2008 (UTC)
- Done. —Ilmari Karonen (talk) 16:46, 11 June 2008 (UTC)
Disable userpage moves
I propose disabling userpage moves, allowing non-admin users to move user subpages only. The code is currently live on testwiki, see testwiki:MediaWiki:Titleblacklist and the message testwiki:MediaWiki:Titleblacklist-custom-userpagemove. Technically, user_talk moves could be disabled as well, but unfortunately some users archive their discussions this way. —AlexSm 18:07, 11 June 2008 (UTC)
- Would it possible to allow non-admin users to move their own user pages but not the pages of other users? --Snigbrook (talk) 20:17, 11 June 2008 (UTC)
- I filed a bug about this type of thing a while ago, bugzilla:13883. While it's trivial to block non-sysops from moving user pages and user talk pages, the issue is that the user themselves wouldn't be able to move their own user page, for better or for worse. I've seen quite a few users moving their userpage to the main namespace, however, so perhaps blocking the root user page from being moved would be beneficial. And, we can always include a custom error message that links to WP:RM or some such. --MZMcBride (talk) 23:10, 11 June 2008 (UTC)
- Correction: looks like Titleblacklist only checks move target. This means that vandals would still be able to move user pages into other namespaces. The suggested regex is only useful to stop newbies from self-renaming (and the custom error message should link to WP:Changing username). I have no idea if this happens often or not. One drawback: this regex would also prevent non-admins from reverting vandal userpage moves. —AlexSm 16:27, 12 June 2008 (UTC)
- I filed a bug about this type of thing a while ago, bugzilla:13883. While it's trivial to block non-sysops from moving user pages and user talk pages, the issue is that the user themselves wouldn't be able to move their own user page, for better or for worse. I've seen quite a few users moving their userpage to the main namespace, however, so perhaps blocking the root user page from being moved would be beneficial. And, we can always include a custom error message that links to WP:RM or some such. --MZMcBride (talk) 23:10, 11 June 2008 (UTC)
I can't see a lot of reason to move pages in wikipedia or template space. Some talk pages are move-archived, but can we have all pages outside namespace 0 and maybe 1 automatically move-restricted to admins? Gimmetrow 08:39, 13 June 2008 (UTC)
- That would mean a regular editor couldn't so much as rename an essay without filing a move request? I've moved pages in Wikipedia space and around in someone else's user space, and there's lots of legitimate reasons to do so. --JayHenry (talk) 04:25, 30 June 2008 (UTC)
- There may be some legit reasons, but there aren't a lot of them. This problem has been going for months and titleblacklist apparently does little. How about restricting page moves to rollbackers, at least. Gimmetrow 00:23, 22 July 2008 (UTC)
- I don't think this would add much of a work requirement to admins or rollbackers. Stifle (talk) 13:16, 22 September 2008 (UTC)
- There may be some legit reasons, but there aren't a lot of them. This problem has been going for months and titleblacklist apparently does little. How about restricting page moves to rollbackers, at least. Gimmetrow 00:23, 22 July 2008 (UTC)
New Grawp stuff
Hes using Η.Α.Ϝ.ϵ.Ρ? , Η.Ε.Ρ.Μ.ϓ?? , Η--Α--Γ--Γ--Ε--Ρ?? as well as very similar titles now. Can someone please blacklist stuff like this? --Boss Big (talk) 11:23, 14 June 2008 (UTC)
- We might as well just blacklist everything, then. People maintaining this list should probably read up on why enumerating badness is bad security. --Carnildo (talk) 21:47, 14 June 2008 (UTC)
- Do we really want to blacklist every title Grawp is using in his page move vandalism? He can always use some title that's not blacklisted yet, after all. Which we will in turn blacklist, which makes Grawp use some title that's not blacklisted yet, which we will in turn blacklist, which makes Grawp use some title that's not blacklisted yet, which.. you get the idea. We end up with a vandal who continues to do what he's doing, and a huge blacklist that makes it harder for legitimate editors to do their work. I'm not sure what else we could do, but adding more and more stuff to the blacklist doesn't sound like the best idea to me. --Conti|✉ 22:02, 14 June 2008 (UTC)
- Radical normalization would help significantly. --MZMcBride (talk) 22:04, 14 June 2008 (UTC)
- Obviously, unless we are to moveprotect all articles, a determined vandal can always move pages to *some* title that's not on the blacklist. But serial vandals want to be recognized. That's why these pagemoves are variations on a theme. If they were pagemoves to random titles, no one would know who the perpetrator was, and he wouldn't get his ego stroked. The above entries ("Η.Ε.Ρ.Μ.ϓ" et al) show that this vandal's moves are already starting to get fairly far removed from the original "Hagger" (kind of like those spam V!A&ra emails that no longer look anything like real words).
- As for "enumerating badness", the point of the cited excerpt was to allow only the 30 "good" programs on your computer rather than trying to block 75,000 "bad" viruses. But here, we have 2.4 million "good" titles. Not sure how you would switch things here to "enumerating goodness" -- unless, again, you mean protecting all pages against moves. That's a thought, but I doubt it will ever get consensus. NawlinWiki (talk) 22:13, 14 June 2008 (UTC)
- Some vandals want to be recognized, yes, and I think we provide that recognition by creating countless entries in this blacklist just for them. I don't want to argue for the removal of all Grawp entries, but we shouldn't add every possible vandal target, either, especially not if we might encounter false positives. --Conti|✉ 22:33, 14 June 2008 (UTC)
- Radical normalization would help significantly. --MZMcBride (talk) 22:04, 14 June 2008 (UTC)
- Those "V!A&ra" emails may look nothing like real words anymore, but they're still being sent out.
- As for enumerating badness, since pagemove vandals are infinitely creative, it's much easier to have humans spot bad pagemoves and block the offending user than it is to try to prevent the pagemoves in the first place by banning all possible bad page names. --Carnildo (talk) 22:41, 14 June 2008 (UTC)
Bürgerliches Brauhaus Budweis
I'd like to have Bürgerliches Brauhaus Budweis redirect to Budweiser Bier Bürgerbräu. -- Matthead Discuß 17:02, 12 July 2008 (UTC)
- Done. Cheers. --MZMcBride (talk) 20:30, 12 July 2008 (UTC)
- Strangely, as far as I can tell that title doesn't seem to match the blacklist. I'm not sure why you weren't able to create it. —Ilmari Karonen (talk) 17:27, 13 July 2008 (UTC)
- Before I created the redirect, I confirmed that the title was blocked. Currently, we have the parameter that shows which blacklist entry was hit removed (to avoid easier workarounds by vandals). Perhaps we should put that parameter in a display:none or something. It would make debugging simpler. --MZMcBride (talk) 18:22, 13 July 2008 (UTC)
- Oddly, the article Budweiser Bier Bürgerbräu still shows "Bürgerliches Brauhaus Budweis" as red. What is the difference to Bürgerliches Brauhaus Budweis, maybe differently coded ü-Umlaut or spaces? I recall a similar case recently, see MediaWiki_talk:Titleblacklist#Edward_Henry_Lewinski_Corwin -- Matthead Discuß 23:13, 13 July 2008 (UTC)
- No difference, also coded as "B.C3.BCrgerliches_Brauhaus_Budweis".
- Your redlink has the Unicode character "U+0094" (a non-printing control character) at the end, swhich is exactly the sort of thing the blacklist is supposed to prevent. --Carnildo (talk) 00:08, 14 July 2008 (UTC)
- The Cancel character? Using copy&paste to avoid misspelling is no good idea anymore, it seems. -- Matthead Discuß 00:20, 14 July 2008 (UTC)
- Ah. We should probably add some custom error messages for those rules. There's already one for non-breaking spaces (titleblacklist-custom-nbsp), all it needs is some tweaking to cover other invisible, non-printable or otherwise inappropriate characters. —Ilmari Karonen (talk) 12:49, 14 July 2008 (UTC)
Removed entry (2)
I've removed the following from the blacklist:
.*p\??.* <moveonly>
Unless I'm seriously mistaken, this blocks any move to a title containing the letter "p" optionally followed by a question mark. --Carnildo (talk) 10:57, 5 August 2008 (UTC)
- I was trying to block moves to pages with a double question mark. I thought you had to use the p because ? was a control character. NawlinWiki (talk) 19:14, 5 August 2008 (UTC)
- Of current page titles, only about 15–20 have "??" in their name. --MZMcBride (talk) 19:18, 5 August 2008 (UTC)
- The regex in question parses as follows:
- . -- Anything
- * -- zero or more of it
- p -- followed by the letter "p"
- \? -- followed by a question mark: the backslash cancels its special meaning in the regex
- ? -- zero or one times: since this question mark isn't escaped, it retains its special meaning
- . -- followed by anything
- * -- zero or more of it.
- --Carnildo (talk) 10:13, 6 August 2008 (UTC)
- The regex in question parses as follows:
Usernames
So now that the Usernameblacklist got merged into this one, I've got a question: Are usernames blocked only when the "<newaccountonly>" parameter is added, or do all entries on this list (including the "<moveonly>" ones) act like a blacklist for usernames, too? If it's the latter, I really hope there's a <!newaccountsonly>, too. :) --Conti|✉ 11:51, 10 August 2008 (UTC)
- "moveonly" ones don't blacklist usernames, or it wouldn't have allowed this username: User3aaaaa (talk) 13:02, 10 August 2008 (UTC)
- note: "User3aaaaa" is an account I created to answer the question; an attempt to move my sandbox to "User:Snigbrook/Sandbox 3aaaaa" was not allowed because of the repeated letters. --Snigbrook (talk) 13:09, 10 August 2008 (UTC)
- I tested around a bit, too, and it seems that every entry that contains <moveonly> is excluded from blacklisting usernames. Seems reasonable to me. --Conti|✉ 13:23, 10 August 2008 (UTC)
Talk:Shekhawati
Hi, I was trying to archive Talk: Shekhawati, but it was blocked by the following entry: .*HA.* <moveonly>
. Is there a fix? (I also wanted to rename the existing archive to Talk: Shekhawati/Archive 1 – currently it is called Archive01. Joshua Issac (talk) 14:17, 10 August 2008 (UTC)
- I've made that entry case sensitive, so you should be able to move the page now. --Conti|✉ 14:23, 10 August 2008 (UTC)
Thanks. Joshua Issac (talk) 14:24, 10 August 2008 (UTC)
- The page history indicates that some of these were / are supposed to be temporary. Perhaps it's time for them to be removed? --MZMcBride (talk) 23:10, 10 August 2008 (UTC)
- Sounds like a splendid idea to me. Especially now that most of the entries also blacklist usernames. The error messages need to be updated, too. It would also be helpful if some people who know more about regular expressions than me would add comments to some of the more cryptic regexes, describing what the heck they are doing. What does "*[\p{Z}]{2}.*" do, for example? Or ".*[^\p{L}\d ]{5}.*"? --Conti|✉ 23:17, 10 August 2008 (UTC)
- "*[\p{Z}]{2}.*" matches two adjacent "Separator" characters, which mostly means funny spaces. Most of those are already completely disallowed by the ".*[\x{00A0}\x{2002}-\x{200B}\x{3000}].*" regexp above, and the rest might be worth adding to it. ".*[^\p{L}\d ]{5}.*" matches five consecutive characters that are not letters (in any script), numbers or (normal) spaces. —Ilmari Karonen (talk) 10:01, 12 August 2008 (UTC)
- Sounds like a splendid idea to me. Especially now that most of the entries also blacklist usernames. The error messages need to be updated, too. It would also be helpful if some people who know more about regular expressions than me would add comments to some of the more cryptic regexes, describing what the heck they are doing. What does "*[\p{Z}]{2}.*" do, for example? Or ".*[^\p{L}\d ]{5}.*"? --Conti|✉ 23:17, 10 August 2008 (UTC)
- The page history indicates that some of these were / are supposed to be temporary. Perhaps it's time for them to be removed? --MZMcBride (talk) 23:10, 10 August 2008 (UTC)
Adding ␦
Why isn't ␦ in the list? It's a very odd symbol. --frogger3140 (talk) 19:01, 15 August 2008 (UTC)
- Might as well blacklist the whole "Control Pictures" block (U+2400 to U+2426). None of the characters in it seem particularly reasonable for use in titles, except for whitelisted single-character ones like . The only existing longer title containing any of them seems to be Apple II␝, and that's really an abuse of the character anyway. —Ilmari Karonen (talk) 20:14, 15 August 2008 (UTC)
- Added. My search did turn up another existing match, ␍␊, but I doubt any more are going to appear any time soon. —Ilmari Karonen (talk) 20:20, 15 August 2008 (UTC)
add ⃝
I think we should add ⃝ to the blacklist. It's very weird. --frogger3140 (talk) 22:34, 16 August 2008 (UTC)
Valid syntax?
.*RMY.* <casesensitive> <moveonly>
Is that valid syntax? I thought it was supposed to be <casesensitive|moveonly>, but I'm honestly not sure. --MZMcBride (talk) 04:59, 17 August 2008 (UTC)
- It isn't. Or, rather, it is, but the "<casesensitive>" gets parsed as part of the regexp. —Ilmari Karonen (talk) 23:30, 18 August 2008 (UTC)
add VÎÅG®Å
add VÎÅG®Å cuz it's bizzare --frogger3140 (talk) 23:17, 18 August 2008 (UTC)
User page move blocking
It seems the regex that was added will stop pages from being moved into the User: namespace, but not from being moved out of the user namespace, which is the exact opposite of the intended effect, I think. :-) To keep the regex active leaves a vulnerability where vandals could move user pages into other namespaces and then only sysops would be able to revert and clean up the mess. --MZMcBride (talk) 07:12, 25 August 2008 (UTC)
- The intended effect was to stop people from "renaming" themselves by moving their userpage. --Carnildo (talk) 09:13, 25 August 2008 (UTC)
- In such case, a sysop is needed anyway to delete the redirect that remains after the revert. I think it is not an issue in English Wikipedia, where vandal renames are rolled back quickly. Also, it prevents them from performing renames such as User:XXX → User:XXX_on_wheels. So, I think that this expression is rather useful that harmful. — Kalan ? 13:30, 25 August 2008 (UTC)
- True, though it also stops bots from being able to revert the page moves.... --MZMcBride (talk) 15:36, 25 August 2008 (UTC)
- Although I'm one of the authors of this idea (I added this on testwiki), I have to say I do not support this. The first reason is already mentioned above. The second reason is that in theory a user should be able to do this: 1) request userpage deletion, 2) move "user:Example/prepared_new_userpage" into "user:Example". The problem with users self-renames should be solved by developers in a proper way. —AlexSm 16:23, 25 August 2008 (UTC)
- True, though it also stops bots from being able to revert the page moves.... --MZMcBride (talk) 15:36, 25 August 2008 (UTC)
Grawp Usernames
For several months now, someone (ostensibly Grawp) has been making usernames in the form "<username>'s anus is stretched by Grawp's massive cock" (sometimes using mixed script characters). Can we put in something to stop such accounts from being created? TML (talk) 02:08, 26 August 2008 (UTC)
- Not really. There are too many ways to alter it to get around anything in the blacklist. --Carnildo (talk) 09:46, 26 August 2008 (UTC)
R and P lookalikes
.*\b[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ].*\?.* <moveonly>
.*\b[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ]..*\?.* <moveonly>
Latter is redundant. 217.132.98.67 (talk) 09:37, 26 August 2008 (UTC)
unblacklisted hagger title
Why isn't ƦǮƓƓẶĦ? blacklisted? --frogger3140 (talk) 20:53, 30 August 2008 (UTC)
False positives
I've removed the following two entries from the blacklist: they're generating false positives. See Wikipedia:Village pump (technical)#Move problem.
.*[GĜĢĞĠԍƓǤǦǴḠԌეอÇ&ΓϜ][\w\d\s][GĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ][\w\d\s][EƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ].* <moveonly> .*[GĜĢĞĠԍƓǤǦǴḠԌეอÇ&ΓϜ][\w\d\s][\w\d\s][GĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ][\w\d\s][\w\d\s][EƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ].* <moveonly>
--Carnildo (talk) 03:37, 3 September 2008 (UTC)
Discovered new hagger thing
Please protect: HAGGϡR? It is a grawp title. --frogger3140 (talk) 01:00, 5 September 2008 (UTC)
Overeager usage of this list
Can we stop adding every title any Grawp vandal ever used to this list, pretty please? It's not going to stop a creative person from finding new ways of writing "hagger", and, actually, we are pretty much encouraging the vandals by making a cool game out of this. On the other hand, we are making the lifes of the average Wikipedian harder by allowing them less and less when it comes to moving and creating articles. I'm fine with blacklisting obscure combinations of symbols and all that, since no one ever needs to use those, but when it comes to blocking things that are used in the everyday life of a Wikipedian (like this), we are going too far in our attempts to fight vandalism, IMHO. --Conti|✉ 13:03, 5 September 2008 (UTC)
- ClueBot seems to be doing a perfectly good job of keeping the pagemove vandalism under control. In the past 24 hours, I count one vandalistic move that the blacklist theoretically could have stopped that the bot failed to revert. --Carnildo (talk) 19:49, 5 September 2008 (UTC)
Cleaning up the blacklist
I've removed the following entries from the blacklist because they are either likely to cause false positives or because they were reactions to specific incidents:
.*Template.*arab.*world.*unity .*Seth.*Patinkin.* .*Jan.*Szatkowski.* .*(Bill|William).*Beggs.* .*massive.* <moveonly> .*di[ċćĉčċ].* <moveonly> .*d[íìîïĩǐīĭıįί]c.* <moveonly> .*giant d.* <moveonly> .*ck make.* <moveonly> .*n[^a-hk-z]gger.* <moveonly> .*skater girl.* <moveonly> .*\bAvri[lI].* <moveonly> .*p\? [ԍGGĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ].* <moveonly> .*p [ԍGGĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ].* <moveonly> .*Haggis.* <moveonly> .*[HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ][ǼAÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆΑἈἉἊἋἌἍἎἏᾺᾸᾹA].* <casesensitive | moveonly> .*[HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ][\w\d\s][ǼAÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆΑἈἉἊἋἌἍἎἏᾺᾸᾹA].* <casesensitive | moveonly> .*[HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ][\w\d\s][\w\d\s][ǼAÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆΑἈἉἊἋἌἍἎἏᾺᾸᾹA].* <casesensitive | moveonly> .*[ԍGĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ][ԍGĜĢĞĠƓǤǦǴḠԌეอÇ&ΓϜ].* <casesensitive | moveonly> .*[EƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ][RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®].* <casesensitive | moveonly> .*[EƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ][\d\w\s][[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ][\d\w\s]]MƜḾṀṂМӍΜ₥М][\d\w\s].* <casesensitive | moveonly> .*\b[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ]*\?.* <moveonly> .*\b[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ].*\?.* <moveonly> .*\b[RŔŖṜŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷ]..*\?.* <moveonly> .*Realist2.* <moveonly> .*[ΉḤĤĦ₳Ⱨ¥ϓ].* <moveonly>
--Carnildo (talk) 20:08, 5 September 2008 (UTC)
- With all due respect, #2, #3, and #4 relate to User:Daniel/SP, which is very many incidents over very many months relating to OTRS tickets and private emails from the subjects. Could I please ask that you consider reversing your removal of those three? Regards, Daniel (talk) 13:41, 9 September 2008 (UTC)
- I have restored 2-4 under a "BLP TARGETS" heading. I am worried that many of the others that were removed were actually necessary as well. Carnildo, which ones do you think are going to give false positives? John Vandenberg (chat) 13:49, 9 September 2008 (UTC)
- Many of them have strong potential for false positives.
- 5, 8, 9, 10, 12, and 15 have obvious false-positive matches
- 6 and 7 may have case-folding issues. I know MediaWiki does funny things with "İ", and I expect it extends to other accent marks.
- 13 matches any title containing "p? g", while 14 matches any title containing a word ending in "p" where the next word begins with "g". Both may also have other false positives depending on how "gamma" and that funny "F" case-fold.
- 16, 17, and 18 match any title containing an "H" and an "A" separated by zero, one, or two spaces.
- 19, 20, and 21 match letter pairs. I'm not comfortable with any regex that uses so little context when deciding what to match, particularly since the pairs "GG" and "ER" are quite common in English.
- 22, 23, and 24 match any title containing a word beginning with the letters "r" or "p", and also contining a question mark.
- 26 may have case-folding issues that cause it to block any move to a title containing the letter "h". I suspect it's what's causing the problem with Sugar Hill Historic District (Detroit, Michigan) mentioned at Wikipedia:Village pump (technical)/Archive 46#Move problem.
- The others were removed because they seemed to be outdated -- if there's still a problem, feel free to put them back. --Carnildo (talk) 18:32, 9 September 2008 (UTC)
- Many of them have strong potential for false positives.
- I have restored 2-4 under a "BLP TARGETS" heading. I am worried that many of the others that were removed were actually necessary as well. Carnildo, which ones do you think are going to give false positives? John Vandenberg (chat) 13:49, 9 September 2008 (UTC)
Tilde
Do we really need the tilde in our blacklist? It is by no means an "obscure ascii character", and apparently causes false positives (unsurprisingly, I'd say). If the tilde is used as a character lookalike, what character is it supposedly representing? I'm sure a more specific blacklist entry could be created that wouldn't outright forbid the usage of the tilde in any case. It's a pity we don't have a log that shows how effective our blacklist really is. --Conti|✉ 13:48, 17 September 2008 (UTC)
- We do have such a log: Special:Log/move. Based on that, I'd say the blacklist is completely ineffective at preventing non-casual pagemove vandalism. --Carnildo (talk) 18:54, 17 September 2008 (UTC)
- But we don't have a log that shows article creation/moving attempts that were stopped because of the blacklist. That would show how many false positives there really are. Or, alternatively, it would show how many vandals this list has actually stopped. Either way, I think it would be pretty darn useful. --Conti|✉ 19:02, 17 September 2008 (UTC)
- My impression is that serious (rather than casual) pagemove vandals try titles until they get one that isn't covered by the blacklist, then do as many moves to that pattern as possible. As for the false-positive rate, based on the number of complaints, I'd estimate that the blacklist is generating two or three false positives a day. --Carnildo (talk) 20:26, 17 September 2008 (UTC)
Special-use IP ranges
Shouldn't we add to the list of blocked User talk namespace pages, the IPs listed in RFC 3330? They are plainly nonsense if created. This, that and the other [talk] 07:16, 6 October 2008 (UTC)
Mixed-script titles
# POTENTIALLY CONFUSING MIXED-SCRIPT TITLES # Cyrillic/Greek + Latin intentionally skipped due to false positives (?!(User|Wikipedia|Image)( talk)?:|Talk:)[\P{Latin}A-Z]*[^\P{Latin}A-Z].*\p{Cyrillic}.* # Cyrillic + Non-ASCII Latin (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{Cyrillic}*\p{Cyrillic}.*[^\P{Latin}A-Z].* # Cyrillic + Non-ASCII Latin (?!(User|Wikipedia|Image)( talk)?:|Talk:)[\P{Latin}A-Z]*[^\P{Latin}A-Z].*\p{Greek}.* # Greek + Non-ASCII Latin (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{Greek}*\p{Greek}.*[^\P{Latin}A-Z].* # Greek + Non-ASCII Latin (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{Cyrillic}*\p{Cyrillic}.*\p{Greek}.* # Cyrillic + Greek (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{Greek}*\p{Greek}.*\p{Cyrillic}.* # Cyrillic + Greek (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Armenian}.*[^\p{Armenian}\P{L}].* # Armenian + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Armenian}\P{L}].*\p{Armenian}.* # Armenian + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Bengali}.*[^\p{Bengali}\P{L}].* # Bengali + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Bengali}\P{L}].*\p{Bengali}.* # Bengali + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Cherokee}.*[^\p{Cherokee}\P{L}].* # Cherokee + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Cherokee}\P{L}].*\p{Cherokee}.* # Cherokee + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Ethiopic}.*[^\p{Ethiopic}\P{L}].* # Ethiopic + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Ethiopic}\P{L}].*\p{Ethiopic}.* # Ethiopic + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Georgian}.*[^\p{Georgian}\P{L}].* # Georgian + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Georgian}\P{L}].*\p{Georgian}.* # Georgian + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Gujarati}.*[^\p{Gujarati}\P{L}].* # Gujarati + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Gujarati}\P{L}].*\p{Gujarati}.* # Gujarati + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Gurmukhi}.*[^\p{Gurmukhi}\P{L}].* # Gurmukhi + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Gurmukhi}\P{L}].*\p{Gurmukhi}.* # Gurmukhi + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Kannada}.*[^\p{Kannada}\P{L}].* # Kannada + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Kannada}\P{L}].*\p{Kannada}.* # Kannada + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Khmer}.*[^\p{Khmer}\P{L}].* # Khmer + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Khmer}\P{L}].*\p{Khmer}.* # Khmer + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Lao}.*[^\p{Lao}\P{L}].* # Lao + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Lao}\P{L}].*\p{Lao}.* # Lao + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Malayalam}.*[^\p{Malayalam}\P{L}].* # Malayalam + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Malayalam}\P{L}].*\p{Malayalam}.* # Malayalam + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Myanmar}.*[^\p{Myanmar}\P{L}].* # Myanmar + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Myanmar}\P{L}].*\p{Myanmar}.* # Myanmar + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Oriya}.*[^\p{Oriya}\P{L}].* # Oriya + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Oriya}\P{L}].*\p{Oriya}.* # Oriya + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Runic}.*[^\p{Runic}\P{L}].* # Runic + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Runic}\P{L}].*\p{Runic}.* # Runic + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Sinhala}.*[^\p{Sinhala}\P{L}].* # Sinhala + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Sinhala}\P{L}].*\p{Sinhala}.* # Sinhala + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Syriac}.*[^\p{Syriac}\P{L}].* # Syriac + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Syriac}\P{L}].*\p{Syriac}.* # Syriac + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Tamil}.*[^\p{Tamil}\P{L}].* # Tamil + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Tamil}\P{L}].*\p{Tamil}.* # Tamil + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Telugu}.*[^\p{Telugu}\P{L}].* # Telugu + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Telugu}\P{L}].*\p{Telugu}.* # Telugu + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Thaana}.*[^\p{Thaana}\P{L}].* # Thaana + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Thaana}\P{L}].*\p{Thaana}.* # Thaana + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Thai}.*[^\p{Thai}\P{L}].* # Thai + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Thai}\P{L}].*\p{Thai}.* # Thai + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*\p{Tibetan}.*[^\p{Tibetan}\P{L}].* # Tibetan + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:)\P{L}*[^\p{Tibetan}\P{L}].*\p{Tibetan}.* # Tibetan + anything else (?!(User|Wikipedia|Image)( talk)?:|Talk:).*[\p{Buhid}\p{Deseret}\p{Gothic}\p{Hanunoo}\p{Mongolian}\p{Ogham}\p{Tagalog}\p{Tagbanwa}\p{Yi}].* # Unused obscure scripts
# DISALLOW PAGE MOVES TO MIXED-SCRIPT TITLES # Intentionally move-only due to false positives (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*\p{Latin}.*[^\p{Latin}\P{L}].* <moveonly> # Latin + non-Latin (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*[^\p{Latin}\P{L}].*\p{Latin}.* <moveonly> # Latin + non-Latin (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*\p{Greek}.*[^\p{Greek}\P{L}].* <moveonly> # Greek + non-Greek (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*[^\p{Greek}\P{L}].*\p{Greek}.* <moveonly> # Greek + non-Greek (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*\p{Cyrillic}.*[^\p{Cyrillic}\P{L}].* <moveonly> # Cyrillic + non-Cyrillic (?!(User|Wikipedia)( talk)?:|Talk:)\P{L}*[^\p{Cyrillic}\P{L}].*\p{Cyrillic}.* <moveonly> # Cyrillic + non-Cyrillic # Slightly different regexp for user/project/talk pages, to allow e.g. Latin subpages of Cyrillic usernames: ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Latin}[^\/]*[^\p{Latin}\P{L}].* <moveonly> # Latin + non-Latin ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*[^\p{Latin}\P{L}][^\/]*\p{Latin}.* <moveonly> # Latin + non-Latin ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Greek}[^\/]*[^\p{Greek}\P{L}].* <moveonly> # Greek + non-Greek ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*[^\p{Greek}\P{L}][^\/]*\p{Greek}.* <moveonly> # Greek + non-Greek ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Cyrillic}[^\/]*[^\p{Cyrillic}\P{L}].* <moveonly> # Cyrillic + non-Cyrillic ((User|Wikipedia)( talk)?:|Talk:)(.*\/)?\P{L}*[^\p{Cyrillic}\P{L}][^\/]*\p{Cyrillic}.* <moveonly> # Cyrillic + non-Cyrillic .*([^\P{Lu}\p{Latin}]\P{L}*){4}.* <casesensitive | moveonly> # Non-Latin all caps
I've removed these for the time being because they're not working as intended (specifically, the "Greek + non-ASCII Latin" and four of the move-only rexes match "Radοžda"). I'm not familiar with the funny modifiers used in these regexes, so somebody other than me needs to fix them. --Carnildo (talk) 05:57, 8 October 2008 (UTC)
- That's because the "ο" in your "Radοžda" is not a Latin "o" but a "Greek small letter omicron" (U+03BF). The regexps would appear to be working as intended. I've moved Vevčani-Radοžda dialect to the correct Vevčani-Radožda dialect and restored the regexps. —Ilmari Karonen (talk) 11:24, 8 October 2008 (UTC)
More false positives
I've finally gotten around to running a test of the blacklist against existing articles. I've removed two of the anti-HAGGER regexes because the sheer number of existing pages that they match mean they're blocking valid pagemoves.
(.*\W)?([HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ-−ŧſⱧԋ]|[Il1ÌÍÎÏĨļǏĪĬİḷŀΙЇɨ!łľıĮįḹtţťṭτтŧjĵјſ\]\[]\W[Il1ÌÍÎÏĨļǏĪĬİḷŀΙЇɨ!łľıĮįḹtţťṭτтŧjĵјſ\]\[])+(\W|\W.*\W)?(([ǼΑÆǢAÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆἈἉἊἋἌἍἎἏᾺᾸᾹ4@?ΑƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩIÍIl1ÌÎÏĨļǏĪĬİḷŀΙЇɨ!ł\ľıĮįḹtţťṭτтŧjĵјſO0ÓÒÔÖÕǑŌŎǪŐŒØƏΌΟΩОФфЮUÙÚÛÜŨŪŬŮŰŲǓǕǗǙǛΫΥΫΎYÝŸŶƳȲʸẎỲỴỶỸƱὙὛὝὟῪῨῩ]|\/\W?\\)(\W|\W.*\W)?)+([GĜĞĠĢԍƓǤǦǴḠԌეอÇ&ΓϜ96MҨ](\W|\W.*\W)?)+(([ǼAÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆΑἈἉἊἋἌἍἎἏᾺᾸᾹ4@ƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ?O0ÓÒÔÖÕǑŌŎǪŐŒØƏΌΟΩОФфЮUÙÚÛÜŨŪŬŮŰŲǓǕǗǙǛΫΥΫΎYÝŸŶƳȲʸẎỲỴỶỸƱὙὛὝὟῪῨῩΑΕϵ]|\/\W?\\)(\W|\W.*\W)?)*[RŔŖŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷґЃ]+(\W.*)? <moveonly> # HAGG[AE]R
This matches approximately 107,000 article titles, and is responsible for the puzzling block of a move to Sugar Hill Historic District (Detroit, Michigan).
(.*\W)?[RŔŖŘȐȒƦʳʴʵʶṘṚṜṞЯ®Ρ₧ÞþΡρРрƤṔṖǷґЃ]+(\W|\W.*\W)?(([ǼAΑÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆἈἉἊἋἌἍἎἏᾺᾸᾹ4@ƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩ?O0ÓÒÔÖÕǑŌŎǪŐŒØƏΌΟΩОФфЮUÙÚÛÜŨŪŬŮŰŲǓǕǗǙǛΫΥΫΎYÝŸŶƳȲʸẎỲỴỶỸƱὙὛὝὟῪῨῩΑΕϵ]|\/\W?\\)(\W|\W.*\W)?)*([GĜĞĠĢԍƓǤǦǴḠԌეอÇ&ΓϜ96MҨ](\W|\W.*\W)?)+(([AÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺАӐӒḀẠẢẤẦẨẪẬẮẰẲẴẶₐΆΑἈἉἊἋἌἍἎἏᾺᾸᾹ4@?ΑƸEÈÉÊËĒĔĖĘĚƎƐȄȆȨЭѤӬḔḖḘḚḜẸẺẼẾỀỂỄỆₑΈΕἘἙἚἛἜἝῈЀЄЕӖ3ΣƩIÍIl1ÌÎÏĨļǏĪĬİḷŀΙЇɨ!ł\ľıĮįḹtţťṭτтŧjĵјſO0ÓÒÔÖÕǑŌŎǪŐŒØƏΌΟΩОФфЮUÙÚÛÜŨŪŬŮŰŲǓǕǗǙǛΫΥΫΎYÝŸŶƳȲʸẎỲỴỶỸƱὙὛὝὟῪῨῩ]|\/\W?\\)(\W|\W.*\W)?)+([HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ-−ŧſⱧԋ]|[Il1ÌÍÎÏĨļǏĪĬİḷŀΙЇɨ!!łľıĮįḹtţťṭτтŧjĵјſ\]\[]\W[Il1ÌÍÎÏĨļǏĪĬİḷŀΙЇɨ!łľıĮįḹtţťṭτтŧjĵјſ\]\[])+(\W.*)? <moveonly> # R[AE]GGAH
This matches approximately 76,000 article titles.
None of the other anti-HAGGER regexes matches more than a hundred or so existing titles. Of the non-HAGGER anti-pagemove regexes, a few might be worth re-considering:
.*[!?‽¿]{2}(?<!!!!).*
Matches over 600 titles
.*\p{Lu}(\P{L}*\p{Lu}){9}.*
Matches over 10000 titles
--Carnildo (talk) 00:01, 9 October 2008 (UTC)
- Well, removing the main HAGGER regex led to about a zillion page moves tonight -- check the log. I've restored it for now -- surely there's a way of limiting the false positives without jettisoning it completely. NawlinWiki (talk) 04:04, 9 October 2008 (UTC)
- How is that any worse than this batch of vandalism from yesterday? If you can't stop the vandalism with this, don't inconvienience honest users. --Carnildo (talk) 04:32, 9 October 2008 (UTC)
- If you can check how many titles a regex will affect could you please check
.*(H|Н)\W*(A|Α)\W*G\W*G\W*E.*. <casesenstive|moveonly>
for me? --Chris 04:43, 9 October 2008 (UTC)- One title: LEIGH HAGGERWOOD. --Carnildo (talk) 05:16, 9 October 2008 (UTC)
- If you can check how many titles a regex will affect could you please check
- How is that any worse than this batch of vandalism from yesterday? If you can't stop the vandalism with this, don't inconvienience honest users. --Carnildo (talk) 04:32, 9 October 2008 (UTC)
The major problem with the first regexp you removed seems to be the [HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ-−ŧſⱧԋ]
part, which is matching characters it shouldn't (including "S"). Changing it to [HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗŧⱧԋ]
should fix most of the problems. I think the real fix, if we want to keep those regexps, would be to make them case-sensitive. That has its own difficulties, though. —Ilmari Karonen (talk) 05:59, 9 October 2008 (UTC)
- Your suggested modification reduces it to about 59,000 matches; additionally making it case-sensitive reduces the number of matches to 66. --Carnildo (talk) 06:26, 9 October 2008 (UTC)
Not sure who has Toolserver / dump / API access, etc. so a full list of pages in NS:0 on en.wiki as of 06:43, 9 October 2008 (UTC) is available at tools:~mzmcbride/ns_0.txt.gz (30 MB) if anyone is interested. Let me know if you'd like anything else. --MZMcBride (talk) 06:43, 9 October 2008 (UTC)
Removed regex
I've removed the regex
.*(\W{2,}.){4,}.* <moveonly|errmsg=titleblacklist-custom-pagemove> #Antigrawp, works by blocking titles with overused punctuation (eg H..A..G..G..E..R)
from the blacklist. It doesn't match any current page titles, but while running it against the title list, it was occasionally extremely slow. I suspect that a pagemove to a carefully-crafted title could leave the blacklist tester in a near-infinite loop. --Carnildo (talk) 10:56, 30 October 2008 (UTC)
- What titles was it slow on? --Chris 12:06, 30 October 2008 (UTC)
- I don't know, and I don't think I can find out. What I do know is that my regex tester will drop from testing ~250,000 titles per second of CPU time to testing under 10000 per second. --Carnildo (talk) 21:28, 30 October 2008 (UTC)
- I see two problems with this regexp. One is that it's using "\W", which in PCRE syntax (unlike modern versions of Perl!) matches all non-ASCII characters. It should probably use "\pP" or "[^\pL\pN]" instead. The other is that "." can match any character, including punctuation, so for titles with long strings of punctuation (or, due to the first bug, non-ASCII characters) the regexp engine will spend a lot of time trying all possible different ways of splitting the string into groups of two or more "\W" characters with one character in between. (Oh, and a third problem is the unnecessary comma in "{4,}" — it makes no difference to the result but probably slows the matching down even more.)
- I don't know, and I don't think I can find out. What I do know is that my regex tester will drop from testing ~250,000 titles per second of CPU time to testing under 10000 per second. --Carnildo (talk) 21:28, 30 October 2008 (UTC)
- A simple alternative without the performance problem would be ".*(\pP{2,}\PP){4}.*". The difference is that this will match "++X++X++X++X" but not "++++++++++++". If you want to retain the original behavior exactly, you'd need something like ".*(\pP{2}(\pP{0,2}\PP|\pP)){4}.*".
- In general, a good rule of regexp design (for backtracking matchers like PCRE) is to keep your patterns as "rigid" as possible: that is, there should not be many different ways in which the pattern can match, since, if the pattern looks like it might match, but doesn't, the regexp engine has to exhaust all the possibilities before it can give up and say there's no match. A classical example of what not to do is matching "(A+)+B" against "AAAAAAAAAAAAAAAAC", which (assuming the regexp engine isn't clever enough to recognize the trap) has to try 65536 possibilities before concluding that there is no solution. —Ilmari Karonen (talk) 23:05, 30 October 2008 (UTC)
.*(\pP{2}\PP){4}.*
seems to work, unless anyone has any objections I'm going to add it --Chris 10:12, 31 October 2008 (UTC)- hmm, are there any objections to changing it to
.*(\pP{1,}\PP){4,}.*
? --Chris 10:27, 7 November 2008 (UTC)- 2000+ false positives say it's a bad idea. --Carnildo (talk) 20:45, 7 November 2008 (UTC)
- hmm, are there any objections to changing it to
- In general, a good rule of regexp design (for backtracking matchers like PCRE) is to keep your patterns as "rigid" as possible: that is, there should not be many different ways in which the pattern can match, since, if the pattern looks like it might match, but doesn't, the regexp engine has to exhaust all the possibilities before it can give up and say there's no match. A classical example of what not to do is matching "(A+)+B" against "AAAAAAAAAAAAAAAAC", which (assuming the regexp engine isn't clever enough to recognize the trap) has to try 65536 possibilities before concluding that there is no solution. —Ilmari Karonen (talk) 23:05, 30 October 2008 (UTC)
I've removed two regexes from the list because of the number of false positives they're generating:
.*\b[HН]\W*(U\W*)?[AΑĄĂÃÀ]\W*([GĠ]\W*)+(W\W*)?[ÉÈËĔĚĖEĘUO0].* <moveonly> #HAGGER .*H\W*[ÉÈËEĘĚĔ]\W*R\W*M\W*[ÉÈËEĘĚĔ]+.* <moveonly> #HERMEE
The first matches over a thousand articles, including titles like The Hague. The second matches about 750 articles, such as Hermes. Please, when you're updating the blacklist, make sure your additions do what you think they're doing. --Carnildo (talk) 20:40, 7 November 2008 (UTC)
I've removed another regex from the list:
.*\b[\,\;\'\*][HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ-−ŧſⱧԋ].* <moveonly|errmsg=titleblacklist-custom-pagemove>
It matches about 84,000 existing titles. NawlinWiki, please don't update the blacklist any more -- the regex
.*\b[^\p{L}][HΉĤĦȞʰʱḢḤḦḨḪНҢӇӉΗἨἩἪἫἬἭἮἯῊЋΗ-−ŧſⱧԋ].* <moveonly|errmsg=titleblacklist-custom-pagemove>
matched about 1.1 million titles, essentially disabling pagemoves for half an hour. --Carnildo (talk) 21:14, 8 December 2008 (UTC)
Can I remove this?
.*Everett.*
Can this be removed? It's preventing people from creating articles. --Carnildo (talk) 12:03, 14 December 2008 (UTC)
- I agree with removal - no matter what the reason, it's overly broad. It blocks, for example, Seth Everett, Everett and Monte Cristo Railway, and Seattle-Everett Traction Company (obviously, since it blocks anything with "Everett"; just giving a few examples). --NE2 11:22, 16 December 2008 (UTC)
RegExp testing tool?
Can someone tell me what's the best program/tool to test regular expressions matching current Wikipedia page titles? Would be great if is open source and in PCRE syntax (used here, right?). I'm using regulator but it can't handle a file with 2.600.000 page titles. Can handle half of that, but is very slow. Mosca (talk) 17:16, 15 December 2008 (UTC)
- I use the perlscript at User:Carnildo/wiki-regex-tester.pl for testing blacklist changes. There are some differences between Perl regexes and PCRE regexes, but it's mostly not a problem. --Carnildo (talk) 22:33, 15 December 2008 (UTC)
Huddersfield Town F.C. season 1948–49
When trying to move Huddersfield Town F.C. season 1948-49 to the endash version of that title (Huddersfield Town F.C. season 1948–49), the MW software says that the target is protected from being created. I would imagine that this is due to one of the 'HAGGER' regexen, but my bot was able to move lots of similarly named pages with no problem. Could there be a bug with one of the blacklist entries perhaps? Richard0612 18:44, 19 January 2009 (UTC)
- I've removed the blacklist entries that appear to be causing the problem. --Carnildo (talk) 23:28, 19 January 2009 (UTC)
- Thanks, Carnildo. Richard0612 16:01, 20 January 2009 (UTC)
John Hagadorn
Hello, I was notified by a community of concerned students of the deletion of the articled first entitled "Haggy" later renamed "John Hagadorn." For what reason was the article deleted? Our community deemed it necessary to honour the above man with an article outlining his outstanding career of educating well over 10,000 students through his career. If possible, we would like to have our article brought back in honour of John Hagadorn. —Preceding unsigned comment added by 71.206.162.68 (talk) 00:02, 21 January 2009 (UTC)
- The article was deleted because there was nothing in the article to distinguish him from any other career high-school teacher. Wikipedia is neither a collection of indiscriminate information nor a memorial, so we don't host biographies of ordinary people. The relevant guideline would be Wikipedia:Notability (academics). --Carnildo (talk) 00:11, 21 January 2009 (UTC)
Perhaps allow us some time to gather together information to make it a true article? For one, over 50 years of teaching is quite an outstanding achievement. As he is retiring, he is writing a memoir for the only Holocaust survivor of a certain town, his home town, and the survivor's name is Alex Lebenstein. If this man's (Hagadorn) efforts cannot be known to the world, it would be a downright injustice to his past students and as well as Mr. John Hagadorn himself. On behalf of the community of students in favor of the creation of Mr. Hagadorn's page, I kindly ask that we may do so. Thanks once again.
HAGGER?
It is still possible to create HAƏƏER? and HAƓƓER?. Please blacklist these, thanks. -- IRP ☎ 20:20, 22 December 2008 (UTC)
Two more: HAGGE®?, HAGGEŔ? -- IRP ☎ 01:43, 23 December 2008 (UTC)
A long list - I'm surprised none of these are already blacklisted: *HAɢɢER? *HAǴǴER? *HAḠḠER? *HAGGɛR? *HAGGɜR? *HAGGЭR? *HAGGEṜ? *HAGGἐR? *HAGGἑR? *HAGGἒR? *HAGGἓR? *HAGGἔR? *HAGGἕR? *HAGGἘR? *HAGGἙR? *HAGGἚR? *HAGGἛR? *HAGGἜR? *HAGGἝR? ::::::::(List by IRP ☎ 02:15, 23 December 2008 (UTC))
Three points:
- Black listing doesn't do anything to stop grawp, it just means he's a bit more creative
- Most of the antigrawp stuff has <moveonly> on it which means you can create the title but not move to it, so some of that stuff it already blacklisted
- To correctly blacklist the move we need to see the move in context (e.g. was it a plain move to HAGGἝR? or was it ;;;HAGGἝR? or H AG.G ἝR??) if we can't see that, then the regex that we make is even more ineffective --Chris 02:25, 23 December 2008 (UTC)
- How about setting it to ignore all spaces and any characters coming before any variation of the H. How would that work out? -- IRP ☎ 02:29, 23 December 2008 (UTC)
- We tried that. All that we ended up with was a huge amount of false positives and grawp just started using "|-|" instead (see point 1) --Chris 02:33, 23 December 2008 (UTC)
- So that means you can blacklist "|-|" and "/-\". -- IRP ☎ 02:38, 23 December 2008 (UTC)
- And the cycle continues, grawp will move onto something even more obscure until we have disabled every single possible pagemove. --Chris 02:41, 23 December 2008 (UTC)
- How about this method? -- IRP ☎ 02:45, 23 December 2008 (UTC)
- And are you sure grawp used "|-|AGGER?"? That shows up as "Bad title", because it contains unsupported characters. -- IRP ☎ 02:55, 23 December 2008 (UTC)
- My bad, the title was L-la44ger (see [2]) --Chris 03:02, 23 December 2008 (UTC)
- See WP:Abuse Filter and this --Chris 02:54, 23 December 2008 (UTC)
- I think łł should be blacklisted because it's part of grawp. --macbookair3140 (talk) 02:23, 10 January 2009 (UTC)
- And the cycle continues, grawp will move onto something even more obscure until we have disabled every single possible pagemove. --Chris 02:41, 23 December 2008 (UTC)
- So that means you can blacklist "|-|" and "/-\". -- IRP ☎ 02:38, 23 December 2008 (UTC)
- We tried that. All that we ended up with was a huge amount of false positives and grawp just started using "|-|" instead (see point 1) --Chris 02:33, 23 December 2008 (UTC)
- How about setting it to ignore all spaces and any characters coming before any variation of the H. How would that work out? -- IRP ☎ 02:29, 23 December 2008 (UTC)
The last grawp blocking regex caused server issues. Don't do it! :) Prodego talk 02:24, 10 January 2009 (UTC)
- Trust me, blacklisting doesn't help, I've tried --Chris 02:25, 10 January 2009 (UTC)
- Just had a move-vandalism to HẦ6 6 ER? Hadrian89 (talk) 14:49, 7 February 2009 (UTC)
- Trust me, blacklisting doesn't help, I've tried --Chris 02:25, 10 January 2009 (UTC)
Blacklist and other sources for black list names
I was looking at the source for the BlackList extension.
- TBLSRC_MSG
- TBLSRC_LOCALPAGE
- TBLSRC_URL
- TBLSRC_FILE
From what I can see it looks like localpage and URL are variables. Does anybody have any plans to add user interface to the User area to allow a custom blacklist? Either it could be a standard subpage "/blacklist" and "/whitelist", similar to modifying the CSS. Or add an option to add a URL or list of URLs "/blacklist_urls" and "/whitelist_urls".
The reason that I am asking is that I am trying to solve the issue of self-filtering on wikipedia, especially commons. This is the code to accomplish this, and it is stable enough to be part of wikipedia. But now the issue is how does the end user add to this list? Having each user add their own might end up taking too much disk space, especially if a bunch of users are repeating the same names. Plus, by allowing the specification of external URL's it will allow third party companies to maintain lists.
I know that I am going to get flammed by critics citing censoring, but if it is done by an individual for their own use (or their family's use) it is not censorship -- it is filtering. Any if a person uses a third party list, any blacklist can be overwritten by a whitelist.
Thanks for any help. Zzmonty (talk) 19:15, 26 January 2009 (UTC)
- Are you creating someing similar to the Abuse Filter? --Chris 07:33, 27 January 2009 (UTC)
Trademark symbols
{{editprotected}} Since they are pretty much only used in spam page titles, and WP:MOS/TM specifies us not to use them, think we could add trademark symbols such as ™ to the blacklist so pages cannot be created with them in the title? ViperSnake151 01:08, 12 February 2009 (UTC)
- Do you have a complete list of such symbols? --Carnildo (talk) 02:25, 12 February 2009 (UTC)
- Only problem is if someone were to (incorrectly) link to that form, a newbie would click the red link and not be able to make the article. --NE2 05:24, 12 February 2009 (UTC)
- If we create a custom error message linking to the relevant pages it should be fine. --Chris 08:03, 12 February 2009 (UTC)
- Does anyone know how many articles currently use these two symbols? --Conti|✉ 13:23, 12 February 2009 (UTC)
Well, there's only two, the trademark symbol, and the registered trademark symbol. Maybe the error could be like this:
The page title you have tried to create has been protected from creation. Page titles containing trademark symbols are not accepted due to their use in spam pages, and the trademarks page of the Wikipedia Manual of Style guidelines also recommend that these symbols not be used "unless unavoidably necessary for context." If the redlink that led you here contained one of these symbols and you wish to create an encyclopedic article here, it is recommended that you go back to the other page, remove the symbol from the link, and then try again. |
So? ViperSnake151 12:55, 12 February 2009 (UTC)
- I'd go with a simpler wording:
The page title you have tried to create has been protected from creation. Wikipedia's Manual of Style guidelines discourage the use of trademark symbols unless unavoidably necessary for context. If you feel the symbol is essential to the title of the article, post a request on the Administrators' Noticeboard requesting that the page be created; otherwise, remove the symbol from the title. |
Not actionable. Please specify which exact changes are to be made to which pages before making an {{editprotected}} request. Sandstein 21:55, 14 February 2009 (UTC)
Account creation blocked due to blacklist entry
Over at the ACC tool, we have a request for the username Sillvanus
, but creation is being blocked due to the blacklist. I would imagine this is becuase the name contains the string 'anus
', but this is obviously a false positive. I've glanced through the blacklist entries, but none of the regex seem to match. Could someone experienced with the list have a look? Thanks, Richard0612 14:43, 16 February 2009 (UTC)
- Until about five hours ago, it was on the global blacklist. --Carnildo (talk) 02:45, 17 February 2009 (UTC)
Another false positive?
I believe that the blacklist is preventing me moving G.G. Allin back to GG Allin. The error message says
"GG Allin" cannot be moved to "G.G. Allin", because the title "G.G. Allin" is protected from being created. If you feel that this move is valid, please consider requesting the move first.
And yes, I am trying to move "G.G. Allin" to "GG Allin". I don't understand why it says that I am trying to move "GG Allin" to "G.G. Allin". Thanks, Mike R (talk) 15:57, 1 March 2009 (UTC)
- Offending blacklist entry removed. --Carnildo (talk) 22:46, 1 March 2009 (UTC)
- And error message fixed --Chris 02:29, 2 March 2009 (UTC)
- Actually, I think you un-fixed it. I should have mentioned that I had changed things. --Carnildo (talk) 03:17, 2 March 2009 (UTC)
- And error message fixed --Chris 02:29, 2 March 2009 (UTC)
Better error message
Would it be possible to add a link back to this page in the error message which shows up when someone tries to move to or create a title on the blacklist? The current message just says the title "is protected from being created", which is confusing for those not aware of this list (see here and here, for example). — jwillbur 21:00, 4 March 2009 (UTC)
- I've changed the wording of MediaWiki:Titleblacklist-custom-pagemove, MediaWiki:Titleblacklist-forbidden-move, and MediaWiki:Titleblacklist-forbidden-upload, and added a link to this talkpage. --Carnildo (talk) 22:04, 4 March 2009 (UTC)
- That is great, thank you. — jwillbur 00:07, 5 March 2009 (UTC)