User talk:DumZiBoT/reflinks.py

http://en.wikipedia.org/w/index.php?title=User:DumZiBoT/reflinks.py&curid=15117767&diff=202109411&oldid=195635129

I see a lot of problems with that change.

Why did you remove the support of named references ? With your current version, a named bare references will be transformed into an un-named ref.
Removing the "finally" part is wrong. Read this : Finally allows us to execute a code, whatever happens in the try block
The hack for de:Humane_Papillomviren is non-sensical. The whole point of the code is to detect from meta-tags a proper encoding, so that UnicodeDammit is able to decode properly the text ( u = UnicodeDammit(linkedpagetext, overrideEncodings = enc). I don't really see how converting the html source to some potentially wrong charset could help finding the good charset to properly decode the page...
re.sub(r"(\[\w+://[^][<>\"\s]*?)''", r"\1 ''", new_text) Seems very strange to me. Are you sure this is not re.sub(r"(\[\w+://[^\]\[<>\"\s]*?)''", r"\1 ''", new_text) ?

Cheers, NicDumZ ~ 19:40, 30 March 2008 (UTC)[reply]

I'll have to admit that this was rather rushed and probably not tested well enough. But its what I got after merge the toolserver changes with you last changes. And because needed to copy part of the code to AWB.

I wanted to change so .group() excludes <ref> from the match. But it doesn't work... I'm going to have to look at that code again.
As the documentation said for python >= 2.5, python 2.4.5 is running on the toolserver
A Unicode error is raised as regex is preformed. I've since revised the toolserver copy to preform the search using the string from unicode(linkedpagetext, 'ascii', errors='ignore') as we're just dealing with ascii-based regular HTML code.
Odd as it may seem, it works the same because [^] is an invalid. Tested using the regex engines in C# and python.

— Dispenser 05:14, 31 March 2008 (UTC)[reply]