User talk:H3llBot/Archive 1
This is an archive of past discussions about User:H3llBot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 |
Wayback archiving of links
Hi, your BOT does not check for Wayback archive if a link is already marked as a dead link, see history of British anti-invasion preparations of World War II. Regards --palmiped | Talk 15:47, 15 September 2010 (UTC)
- Hi! This is an interesting issue. The bot checks the links itself and only pages with a HTTP 404 response code are regarded as dead links. [1] is a redirect to a live page and so was ignored. I think it is reasonable to assume that the link is actually dead if it is marked so, but the bot is currently not approved for this functionality. It also seems like something that would need broader discussion. I will keep this in mind and bring up somewhere, like VP. Thanks! — HELLKNOWZ ▎TALK 23:10, 15 September 2010 (UTC)
Bot at the TB article
Hi there! Replied at Talk:Thoroughbred but wanted you to know that the link to the polo pony site that you discussed as being flagged by the bot as a 404 is live now (I clicked the link you posted on the talk page, to boot) and was also live yesterday when I took off the "dead link" button (I took a screen shot to prove it, if needed). Don't know what's up with the bot, but DEFINITELY the page has come up every time I've checked it. So just an FYI that there's something wonky, may be a browser issue, I don't know. But it does work for me! I promise! (Of course, I have a Mac and run Safari, so of course, I may therefore have some esoteric skill unavailable to the mortal bot! LOL) Montanabw(talk) 21:18, 17 September 2010 (UTC)
- I do realize that the page actually displays the contents in the browser. If you look at the headers of the website response (for example, "Header Spy" addon for Firefox) you will see that the site returns a 404 http code. There is no way for the bot to detect this is a mistake by the website. All I can do is add the site to the exception list and hope they fix it. — HELLKNOWZ ▎TALK 01:44, 18 September 2010 (UTC)
- Hmmm. Interesting. I never knew that. Interesting. So bad coding on the part of the web designer? Montanabw(talk) 02:35, 19 September 2010 (UTC)
- Yes, it is. — HELLKNOWZ ▎TALK 10:47, 19 September 2010 (UTC)
- Hmmm. Interesting. I never knew that. Interesting. So bad coding on the part of the web designer? Montanabw(talk) 02:35, 19 September 2010 (UTC)
Bot redoing vandalism
The bot made an edit 4 minutes after I undid somebody's edit and basically undid my undoing for Insomnia on 22 September. The bot really ought to check again to see whether new edits were made when between when it archives a link and when it finalizes the edits when there's so much lag... I've seen it several times in the past when vandalism slips by because a bot accidentally re-does vandalism and nobody realizes that the vandalism was no longer undone since people usually assume bot edits are safe. —CodeHydro 19:45, 22 September 2010 (UTC)
- Hi, I've looked into this and fixed it. I'm highly aware of bots undoing vandalism and have tested previously against edit conflicts. I retrieve the timestamp and pass it for write requests, which should avoid any edit conflicts. In this case, bot tried to read the page again due to the edit conflict, but it seems the timestamp got lost in the page object reinitialization. I'll recheck contributions to see it this has happened with any other articles recently. Thanks for the good catch! — HELLKNOWZ ▎TALK 20:01, 22 September 2010 (UTC)
- Wouldn't checking whether the oldid number is still on top be a more accurate way of doing it than by timestamp? After all, it is possible for edits within the same minute to share a timestamp, but no two edits can share an oldid. —CodeHydro 20:13, 22 September 2010 (UTC)
- The timestamp is the revision's timestamp when I read the page. It is passed back to the server as "basetimestamp" when I want to write the page. This is the way API does/recommends it. If something happens in the time between, the API will ignore my edit and return edit conflict error. This is where my bot tried to read the new version of the page, but I screwed up object data copying so it used the old page's content with a new timestamp. — HELLKNOWZ ▎TALK 20:38, 22 September 2010 (UTC)
- Wouldn't checking whether the oldid number is still on top be a more accurate way of doing it than by timestamp? After all, it is possible for edits within the same minute to share a timestamp, but no two edits can share an oldid. —CodeHydro 20:13, 22 September 2010 (UTC)
- Related Comment - Another instance of "people usually assume bot edits are safe" from 25 September 2010 that I just caught & corrected in List of basic geography topics: H3llBot was "Checking dead links; Added 1 archived Wayback link" two days after an IP vandal added garbage characters to the same link text. The archive link was properly added —garbled text intact— but, due to the bot edit being marked minor and the "trust factor" (or just plain oversight), no one reverted the previous edit's damage. — DennisDallas (talk) 21:03, 8 October 2010 (UTC)
- Hi, thanks for concern and nice catch! This edit was legitimate (not a bug) and without edit conflicts, the issue is that no one noticed the vandalism. Both recent changes and watchlist can hide bots/minor edits. Any editor not specifically checking revision history would have not reverted vandalism, in this case, user Edgar181 only reverting vandalism after the bot's edit. Bots edits are "trusted", but that does not mean any edits before should be assumed to be trusted. Personally, I always check revisions up to the last logged-in non-redlink editor. — HELLKNOWZ ▎TALK 00:45, 9 October 2010 (UTC)
accessdate
I'm glad this bot exists, but:
- No accessdate update. In RealPlayer, H3llBot added archiveurl= and archivedate= , but did not update accessdate= . Reflinks fills this in with an ISO date YYYY-MM-DD, as I did prior to Reflinks' development.
- Date format. I greatly prefer ISO dates for minutiae like archivedate= and accessdate=, other editors do not. It would be great if H3llBot could update these parameters with one of the following, listed highest priority first:
- respect a comment or template at the top of the page, such as <!-- botfilldateformat=ISO, MDY, DMY --> or the like. Perhaps there's a canonical list of date formats...
- the existing format used in the accessdate= parameter
- the existing format used in the date= parameter.
- default to your preferred date format
- Keep off this citation: There should be a way to "ward off" H3llBot, to prevent filling in a cite with a known unrecoverable deadlink with values which are already known to produce a page which fails verification? Example: the ellen.audible.com citation (#11) in RealPlayer. For your inspection, I've left it in place, but commented it. --Lexein (talk) 13:39, 23 September 2010 (UTC)
- Thank you for your comments!
|accessdate=
is the date the citation was accessed to support the accompanying material. This value should not be changed unless an editor has manually accessed the resource again and verified the validity or updated the material. Adding|archiveurl=
and|archivedate=
do not warrant a change in accessdate. - I am in progress of implementing a check for an overall page's date format that would be used for the bot filled date. I was unaware <!-- botfilldateformat=ISO, MDY, DMY --> or similar thing existed. I suppose it would make sense to use the current citation's date's or accessdate's format. I will start implementing this, thanks!
- Thank you for your comments!
- To ward of the bot, you could either place "{{Bots|deny=H3llBot}}" for all the page or tell me to place an url into bot's exclusion list. That site at the time or archival must have returned a HTTP 200 OK message, therefore it got archived. This is the web-master's mistake and there is little I can do except include it in the exception list or check the wayback page for known problems, such as "You reached this page because of a time out error, most likely because your session timed out or due to a system error." text on "ellen.audible.com" domain. But these are all manual solution. I'll think about this, though I see no easy fix. — HELLKNOWZ ▎TALK 18:21, 23 September 2010 (UTC)
- The bot now respects the date format of the
|accessdate=
field when filling in the|archivedate=
. — HELLKNOWZ ▎TALK 23:39, 23 September 2010 (UTC)
- The bot now respects the date format of the
- To ward of the bot, you could either place "{{Bots|deny=H3llBot}}" for all the page or tell me to place an url into bot's exclusion list. That site at the time or archival must have returned a HTTP 200 OK message, therefore it got archived. This is the web-master's mistake and there is little I can do except include it in the exception list or check the wayback page for known problems, such as "You reached this page because of a time out error, most likely because your session timed out or due to a system error." text on "ellen.audible.com" domain. But these are all manual solution. I'll think about this, though I see no easy fix. — HELLKNOWZ ▎TALK 18:21, 23 September 2010 (UTC)
- Excellent news. Thanks, and for the bots deny info. Here's the template I was talking about: Template:Use dmy dates --Lexein (talk) 14:36, 24 September 2010 (UTC)
- Oh cool, I'll implement that. — HELLKNOWZ ▎TALK 14:51, 24 September 2010 (UTC)
- I'm guessing the use xxx dates template should have highest priority. --Lexein (talk)
- Implemented checking date format templates. — HELLKNOWZ ▎TALK 16:41, 26 September 2010 (UTC)
False Positives
I think I should alert you to some false positives [2]. Reubot (talk) 03:14, 24 September 2010 (UTC)
- Thanks for notice. However, this is web-site's fault. The pages return "HTTP/1.1 404 Not Found" response header even though the content is displayed. I will add the pages (and probably the site's url prefix) to the exception list. I replaced the urls in the article with proper ones. — HELLKNOWZ ▎TALK 10:27, 24 September 2010 (UTC)
Bot chose archive version dated 2 years before accessdate of deadlink
in this diff. Accessdate is 17 May 2007, archivedate 3 April 2005, and the archived page doesn't contain the cited content. The BRFA seems to say it selects an archive page within 6 months of accessdate, so should it have done this? cheers, Struway2 (talk) 09:22, 24 September 2010 (UTC)
- Thanks for notice! I have been changing the ranges and looking at archives to see what margin I can allow. Generally Wayback did not archive because the page stayed the same, so many 2 year old archives were still valid. I have now implemented the bot to respect the 6 month range. Unfortunately, many recent event articles will still contain outdated archived versions in this range. I think I will also add <!-- bot retrieved archive --> so that editors know the archives are automatically found. — HELLKNOWZ ▎TALK 10:42, 24 September 2010 (UTC)
Thank you!
FYI, I don't have a gripe with the bot at all, although I see several minor ones above. I just wanted to let you know that I appreciate what it's doing and am glad that you've gotten it coded and working. On the whole, it's doing GREAT work. Cheers, Jclemens (talk) 20:08, 26 September 2010 (UTC)
- Thanks! — HELLKNOWZ ▎TALK 20:39, 26 September 2010 (UTC)
- I too appreciate it. I'm glad it's been running through the articles on my watch list and fixing things before I know they need to be fixed! Imzadi 1979 → 21:40, 26 September 2010 (UTC)
False Positive
Hi!
FYI, the bot returned the following URL as "dead" on the Pit bull article although the URL is alive:
The other two links flagged as "dead" were really dead, though.
Thanks! :)
Astro$01 (talk) 01:42, 27 September 2010 (UTC)
- The bot's internal browser receives 404 headers, to be exact: "<HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=ISO-8859-1"><TITLE>Not Found</TITLE></HEAD> <H1>Not Found</H1> The requested object does not exist on this server. The link you followed is either outdated, inaccurate, or the server has been instructed not to let you have it.". I'll try to find out why, probably because of some user-agent/referral settings. 11:53, 27 September 2010 (UTC)
- Follow-up. I fixed it by faking user-agent and referrer as base url. The web-site seems to disallow bots, but instead of HTTP 403 forbidden it returns 404. This is poor coding. Hopefully I will avoid further sites like this by faking the referrer. — HELLKNOWZ ▎TALK 12:04, 27 September 2010 (UTC)
Fishy WayBack link checker
Hello. On edits like this, the bot simply tags with {{dead link}}
even though there was a archive.org version. I see from other edits and from the FAQ that the bot can check the WayBack machine, but sometimes it times out (or something). Personally, it's rather annoying that it's tagging what could be hundreds of thousands of pages as dead links when the code itself could be improved to catch more WayBack archived copies. Maybe when it times out or doesn't get a response from archive.org, it could add that link to a list of links/pages to return to? I'm sure there's a better way for this process. Killiondude (talk) 07:09, 27 September 2010 (UTC)
- Hi, thanks for comment. That archive is more than 6 months apart from the access date. I used to archive any closest available version, but it was pointed out to me (3 threads up) that this was a concern and that I am to stick with the BRFA's specified 6 month range. I realize just how many dead links there are, but I would not want to go against BRFA specification again unless there is consensus.
- Regarding timeouts, I triple check Wayback with increasing delays; if that fails then one of their servers must be down and I cannot do much more at that time. {{Dead link}} contains
|bot=
field which should be an indication that the template may have been placed due to a bot being unable to retrieve archived copy at the time. In fact, a human editor would have encountered the same issue. In any case, I believe tagging an otherwise unnoticed link as dead is better than not tagging it at all in hopes an archive copy will become available. I think will actually implement a list of pages where one or more Wayback urls returned a response other than 404/200 and recheck that page. — HELLKNOWZ ▎TALK 11:39, 27 September 2010 (UTC)
Recalcitrant seed broken link
Thanks. I wouldn't have found the broken link without your bot patrol. Trilobitealive (talk) 02:27, 28 September 2010 (UTC)
"Dead" link on Malcolm Hardee page isn't dead
I've undone a "dead link" reference you put on http://en.wikipedia.org/wiki/Malcolm_Hardee because the link seems to work perfectly OK - and it's a BBC News page so I can't imagine there's any problem opening it in the US. Oddly, the previous day, a RjwilmsiBot had put a similar "dead link" thing on the same link but, again, I seemed to have no problem at all when clicking on the link. It's a mystery to me; probably my ignorance protects me! I'm a bit ignorant about how these postings are supposed to be signed... if wrong, my apologies... TALK 01:13, 01 October 2010 (UTC)
- Hi, thanks for notice! The link contained a slash "/" appended to it. The slash was stripped by markup parser before creating an html link for some reason. This made the resulting html link appear without the slash, and thus a working url. I removed the slash. (You should sign by typing 4 tildes "~~~~") — HELLKNOWZ ▎TALK 00:50, 1 October 2010 (UTC)
Just want to say...
...that you kick ass muchly. Thanks a bunch! Tim1357 talk 02:16, 2 October 2010 (UTC)
- Thanks! :) — HELLKNOWZ ▎TALK 11:15, 2 October 2010 (UTC)
- I second that. Rjwilmsi 11:28, 2 October 2010 (UTC)
What I've Noticed and What I'm Suggesting
I've noticed this bot tags dead links, but right beside it puts the archived page in which the link used was tagged dead. Instead of doing that, wouldn't it be better if the bot just fixed the reference by replacing the dead link with the archived version of the same thing? Mr. C.C.Hey yo!I didn't do it! 12:45, 3 October 2010 (UTC)
- Hello! I'm not sure what you mean. "tags dead links, but right beside it puts the archived page" — do you mean the bot placed {{Dead link}} next to a citation where it also filled
|archiveurl=
field? The bot does fix the references by adding the archived version, I am unsure what you are suggesting should be changed? — HELLKNOWZ ▎TALK 14:04, 3 October 2010 (UTC)- It leaves the tag of dead link still there while sometimes placing the an archived link beside it in the references. It will show (dead link with the archived link information beside it).— Preceding unsigned comment added by Fishhead2100 (talk • contribs) 06:16, 5 October 5 2010 (UTC)
- Can you give an example article edit diff where you saw this? — HELLKNOWZ ▎TALK 10:29, 5 October 2010 (UTC)
- Ah, I think I found the case. The bot would not remove {{Dead link}} incorrectly placed after the
</ref>
tag, when it would add either {{Dead link}} or {{Wayback}} inside the<ref>
. I'll scan the contributions and hopefully fix any occurrences. — HELLKNOWZ ▎TALK 11:11, 5 October 2010 (UTC)
- It leaves the tag of dead link still there while sometimes placing the an archived link beside it in the references. It will show (dead link with the archived link information beside it).— Preceding unsigned comment added by Fishhead2100 (talk • contribs) 06:16, 5 October 5 2010 (UTC)
The dead and the undead
Hello. This edit flagged a link that loads a live page for me in both Firefox and Safari. Thanks. Rivertorch (talk) 05:12, 10 October 2010 (UTC)
- Hi! Yes, it does now return as live; it must have been down temporarily. It's a pity I cannot see what the issue was; most likely the site returned 404 incorrectly instead of 503. — HELLKNOWZ ▎TALK 11:38, 10 October 2010 (UTC)
Scottish Gazetteer
By flagging up a broken link, you have identified the fact that the Scottish Gazetteer now has its own domain (www.scottish-places.info). There is redirection from the former addresses at http://www.geo.ed.ac.uk/scotgaz. A search for the old address finds 978 uses in this wiki. Would it be possible for a bot to make these changes? They are not currently dead links. Finavon (talk) 20:26, 10 October 2010 (UTC)
- I made a BRFA for this as an on request sub-task. I'll parse the urls once it gets approved. — HELLKNOWZ ▎TALK 10:54, 11 October 2010 (UTC)
- BRFA passed and the bot replaced the occurrences it could recognise/parse. Those remaining should be manually changed. — HELLKNOWZ ▎TALK 13:35, 16 November 2010 (UTC)
Speaking as the creator and prime writer of this article, your recent retrieval of an archived link was really important, as it was one of only a few on line resources on the subject, and it was a really good and comprehensive newspaper article. You tangibly benefited Wikipedia's readers. Thank you. 7&6=thirteen (talk) 14:23, 11 October 2010 (UTC)
- Thanks! — HELLKNOWZ ▎TALK 14:28, 11 October 2010 (UTC)
re changes to USCG historical pages
There is now a template {{cite uscghist}} to deal with the references to the state-by-state pages on lighthouses on the USCG site (e.g. the link tagged in this edit). Mangoe (talk) 18:53, 11 October 2010 (UTC)
- The template does not have
|url=
or possible|archiveurl=
parameters, so how do you want me to deal with it? I suppose I could just build the url myself and use {{Wayback}}. Or I could just edit the template to include the archival parameters, since it uses {{Cite web}} anyway. Is this web-site with a high potential number of deadlinks? — HELLKNOWZ ▎TALK 19:04, 11 October 2010 (UTC)- Back in June 2008 the root of the set of lighthouse historical articles was moved, so that the urls are now of the form http://www.uscg.mil/history/weblighthouses/LHXX.asp, where XX is the two letter state code. Previously they had the form http://www.uscg.mil/hq/g-cp/history/WEBLIGHTHOUSES/LHXX.html (with the XX again being the state code) so that it's a straightforward fix to replace it with {{cite uscghist|XX}}. There was an attempt to catch these but obviously we missed a bunch. Mangoe (talk) 02:13, 12 October 2010 (UTC)
- Ah, you mean they changed the URL, gottcha! Then this is almost the same as 2 threads above, so I will parse these once my BRFA passes. Should any other fields used be disregarded, is there a discussion about this I could read/refer to? — HELLKNOWZ ▎TALK 11:51, 12 October 2010 (UTC)
- What I've written here is pretty much all you need to know. If you can translate the old URL into the template, the two-letter state code is the only part that matters. Mangoe (talk) 17:08, 12 October 2010 (UTC)
- Having passed the BRFA, the bot replaced the old domain name urls with new one. I will perhaps fill a BRFA to actually replace urls with the above template, because currently the bot is not authorised to do so. — HELLKNOWZ ▎TALK 14:03, 16 November 2010 (UTC)
- What I've written here is pretty much all you need to know. If you can translate the old URL into the template, the two-letter state code is the only part that matters. Mangoe (talk) 17:08, 12 October 2010 (UTC)
- Ah, you mean they changed the URL, gottcha! Then this is almost the same as 2 threads above, so I will parse these once my BRFA passes. Should any other fields used be disregarded, is there a discussion about this I could read/refer to? — HELLKNOWZ ▎TALK 11:51, 12 October 2010 (UTC)
- Back in June 2008 the root of the set of lighthouse historical articles was moved, so that the urls are now of the form http://www.uscg.mil/history/weblighthouses/LHXX.asp, where XX is the two letter state code. Previously they had the form http://www.uscg.mil/hq/g-cp/history/WEBLIGHTHOUSES/LHXX.html (with the XX again being the state code) so that it's a straightforward fix to replace it with {{cite uscghist|XX}}. There was an attempt to catch these but obviously we missed a bunch. Mangoe (talk) 02:13, 12 October 2010 (UTC)
Nevada historic sites and links
These links have apparently moved, using Clark County as an example, from http://nevadaculture.org/docs/shpo/entries/clark.htm to http://nvshpo.org/index.php?option=com_content&task=view&id=86&Itemid=9. Anyway your bot can update these rather then flag them as dead links? Vegaswikian (talk) 19:00, 12 October 2010 (UTC)
- Hi! How would I know what the new URL is? The 404 site does not give any indication of where to go. Is there a consistent way or getting the new link? — HELLKNOWZ ▎TALK 19:05, 12 October 2010 (UTC)
- In my opinion, bots should have some kind of exception process to deal with know changes that have been reported. Better to have a bot update the link then to just flag it as a dead link. For the pages I watch, I know have to manually go in and edit the link. Would be so much better if the bot updated the link rather then just tagged it. For the pages I don't watch it may take a while for someone to make the changes. Yes, it is better to flag the links as dead. But the best solution is to replace it with a working link. Yes, the latter is harder, but that does not mean it should not be done. Vegaswikian (talk) 19:38, 12 October 2010 (UTC)
- I can update the links I am aware of that have moved. Otherwise, it is better to flag them as dead rather than ignore. The previous thread and 3 above pointed out sites that have moved, but retained the same structure, so it is easy to change the urls. The nevadaculture.org seems to have redesigned the whole web-site. I don't know how to correctly detect where the new links are, otherwise I would replace them. That's why I asked if you knew. — HELLKNOWZ ▎TALK 19:56, 12 October 2010 (UTC)
- Looks like the old url was changed and they went from a person friendly English naming to a computer friendly code. So Clark County was the page I listed above, Carson City is http://nvshpo.org/index.php?option=com_content&task=view&id=84&Itemid=9. The main NRHP page is now at http://nvshpo.org/index.php?option=com_content&view=article&id=83&Itemid=419 if you want to try and sort this out. I'm surprised that in this day and age, they would make a change like this without pointers or redirection. Thanks. Vegaswikian (talk) 20:23, 12 October 2010 (UTC)
- Well, there's a little less than 100 links for nevadaculture.org. Some redirect, some are dead. It is easier and faster to replace the dead ones by hand for now than for me to get bot approval, code, test, and run this one-time task. Looking at the links, I cannot see an easy way to replace the links without making a list manually. — HELLKNOWZ ▎TALK 20:39, 12 October 2010 (UTC)
- Looks like the old url was changed and they went from a person friendly English naming to a computer friendly code. So Clark County was the page I listed above, Carson City is http://nvshpo.org/index.php?option=com_content&task=view&id=84&Itemid=9. The main NRHP page is now at http://nvshpo.org/index.php?option=com_content&view=article&id=83&Itemid=419 if you want to try and sort this out. I'm surprised that in this day and age, they would make a change like this without pointers or redirection. Thanks. Vegaswikian (talk) 20:23, 12 October 2010 (UTC)
- I can update the links I am aware of that have moved. Otherwise, it is better to flag them as dead rather than ignore. The previous thread and 3 above pointed out sites that have moved, but retained the same structure, so it is easy to change the urls. The nevadaculture.org seems to have redesigned the whole web-site. I don't know how to correctly detect where the new links are, otherwise I would replace them. That's why I asked if you knew. — HELLKNOWZ ▎TALK 19:56, 12 October 2010 (UTC)
- In my opinion, bots should have some kind of exception process to deal with know changes that have been reported. Better to have a bot update the link then to just flag it as a dead link. For the pages I watch, I know have to manually go in and edit the link. Would be so much better if the bot updated the link rather then just tagged it. For the pages I don't watch it may take a while for someone to make the changes. Yes, it is better to flag the links as dead. But the best solution is to replace it with a working link. Yes, the latter is harder, but that does not mean it should not be done. Vegaswikian (talk) 19:38, 12 October 2010 (UTC)
Hey,
An edit listed a link as dead, but its not (see here). Just wanted to let you know. :) - Theornamentalist (talk) 11:53, 14 October 2010 (UTC)
- Hey! The link is indeed reported as live, not sure what went wrong there. May be the site just got back up from temporary down-time, where it incorrectly used 404. Anyway, I reverted the edit. — HELLKNOWZ ▎TALK 13:03, 14 October 2010 (UTC)
Too many dead links!
This bot is too good at its job, and is making my articles look bad :( PS it's actually really useful; great work. -M.Nelson (talk) 03:01, 16 October 2010 (UTC)
H311Bot is a champion, it's surprising just how many dead links s/he has found, even in links that have been aded within the last year. Three cheers --Hughesdarren (talk) 04:08, 16 October 2010 (UTC)
- The bot could be improved if there was a way for it to suggest replacements. I only followed up on one, but there were scores of available links. Vegaswikian (talk) 05:02, 16 October 2010 (UTC)
Thanks guys! I have a pending approval for a task to replace known domain name changes. That would hopefully solve a part of the problem. — HELLKNOWZ ▎TALK 12:14, 16 October 2010 (UTC)
Missed a couple of dead links
Any idea why the bot caught this dead link but missed these? -M.Nelson (talk) 22:17, 17 October 2010 (UTC)
- Probably a timeout, also some web-sites don't like several concurrent connections. Also, my router is often throwing a fit at the number of simultaneous requests I'm making. While I could implement some logic for not browsing the same domain more than once at a time, I cannot really do anything about my internet connection. The whole deal with timeouts, read errors, and other ungraceful TCP handling makes it all really messy and complicated. — HELLKNOWZ ▎TALK 22:50, 17 October 2010 (UTC)
Wrong tagging of NYTimes dead links
Just one example (I saw more, but can't recall where). Cheers. Good bot otherwise. Materialscientist (talk) 00:06, 18 October 2010 (UTC)
- Thanks. The site returns a "HTTP/1.0 404 Not Found" header even though it displays some other content. It does not load "Bork's Nomination Is Rejected, 58–42; Reagan 'Saddened'" as the citation claims. Here is an actual working link. — HELLKNOWZ ▎TALK 00:28, 18 October 2010 (UTC)
- Sorry, my head didn't wake up yet. Some sites put a dummy in case of 404 error. This was a 404 error and was tagged appropriately. The correct link is [3]. Apologies. Materialscientist (talk) 00:34, 18 October 2010 (UTC)
Kudos
I'll admit that in my head I've slagged off your bot when it misses a dead link on WayBack – but usually it's is just because it raising awareness of numereous links which keep going dead and that is frustrating. Your bot does a good and useful job and you probably get more complaints (and less praise) than you deserve. When this diff appeared on my Watchlist I was very pleased/impressed that you've coded the bot to revisit dead links marked due to connection time-outs on Wayback or whatever (either that or you're trawling through all the dead links marked by the bot). Either way this is a valuable process of the bot and I wanted to give you some deserved praise for it. Best, Rambo's Revenge (talk) 16:05, 18 October 2010 (UTC)
- Thanks! I will eventually go through all links again and hopefully fix the ones which have an archived copy in the accepted time range. — HELLKNOWZ ▎TALK 16:12, 18 October 2010 (UTC)
Adding access dates
- Moved from User talk:H3llBot/Tasks/5
If your going to go through and fix moved domains, why not update the access date? Your bot must be doing some form of checking to make sure the page is not a 404, so it has confirmed it has been accessed at a newer time & date then the original time stamp on the cite.--Amckern (talk) 10:15, 19 October 2010 (UTC)
- The bot only replaces the domain url with a new one. Afterwards, regular 404 checking is done, but only on citations and referenced external links that have recognised access date information. The links replaced do not necessarily point to the same information as when accessed first by whoever used them as references. So I cannot add current date as access date, Also, I've already enquired about filling in access dates at VP (VP 1, VP 2) and there was no consensus about doing so. — HELLKNOWZ ▎TALK 10:33, 19 October 2010 (UTC)
List of St. Francis College people
The bot for the second time has marked the same reference as a dead link in List of St. Francis College people, when it is not. Please correct. Thanks in advance. --El Mayimbe (talk) 22:20, 20 October 2010 (UTC)
- Removed tag and added to ignore list; I have no idea why the link checker sees it as dead, will check it. — HELLKNOWZ ▎TALK 22:46, 20 October 2010 (UTC)
- Thank you.--El Mayimbe (talk) 03:56, 21 October 2010 (UTC)
Confused
How many websites actually keep the same URL forever? Wouldn't every web reference in every article eventually be marked dead link? Just a matter of time. What is the purpose of this. Benjwong (talk) 19:30, 23 October 2010 (UTC)
- Webciting services, such as, Wayback or Webcitation keep an online copy of much of the material. If a website changes their domain name, it can be updated. If the website removes the content, the material it supported has to be sourced in some other way. In any case, the bot points out that action has to be taken. Many of the links tagged in featured articles have been promptly replaced or updated by the editors who have worked on the article. Even if only a small number of links is repaired after the tagging, it is better than ignoring this altogether. — HELLKNOWZ ▎TALK 01:04, 24 October 2010 (UTC)
<3
Your bot is awesome, H3ll! Zaixionito (talk) —Preceding undated comment added 01:30, 26 October 2010 (UTC).
- Thanks. — HELLKNOWZ ▎TALK 10:27, 26 October 2010 (UTC)
ONS website
Hi, I noticed the bot marking links to the ONS web site as dead. The site is currently down for maintenance and it would be good to have a list of links that you have marked as dead on that web site so that they can be checked out when the site is back on air. On checking the tag can be removed or the link changed if they restructure the site. Thanks. Keith D (talk) 13:05, 26 October 2010 (UTC)
- Hi! Added the domain to exception list for now. — HELLKNOWZ ▎TALK 13:54, 26 October 2010 (UTC)
- Thanks. Will have to see the extent of the work needed when it is back on air as it could affect thousands of articles if they restructure it. Keith D (talk) 17:37, 26 October 2010 (UTC)
- Just to let you know this is back on the air and they appear not to have restructured the areas we have used for references so may not be too much work. Keith D (talk) 21:48, 30 October 2010 (UTC)
- Thanks. Will have to see the extent of the work needed when it is back on air as it could affect thousands of articles if they restructure it. Keith D (talk) 17:37, 26 October 2010 (UTC)
If possible, please stop marking links to http://www.musikindustrie.de/gold_platin_datenbank_beta/ as dead links (e.g. this. The website changed, and I am in the process of replacing all uses with a template. The bot tags are only making my life more difficult. --Muhandes (talk) 00:10, 29 October 2010 (UTC)
- Hi! Added the whole domain to ignore list for now. When do you expect the links to be fixed? — HELLKNOWZ ▎TALK 07:34, 29 October 2010 (UTC)
- Thanks. From my 50ish edits it doesn't seem like it can be automated, as too many formats were used to cite it. That's more than 1000 pages to do semi-manually (the search only responds with 1000, could be more). Not by tomorrow :) I'll let you know when I'm done if that's helpful. --Muhandes (talk) 08:14, 29 October 2010 (UTC)
- I'm all done. Applied in more than 900 pages, and there should not be any direct uses of the database now. --Muhandes (talk) 19:01, 9 November 2010 (UTC)
- Great job! Will remove from ignore list then. — HELLKNOWZ ▎TALK 19:15, 9 November 2010 (UTC)
Not dead
Your bot marked this link as dead [4] at The Simpsons Movie. It's not dead. Gran2 18:24, 30 October 2010 (UTC)
- Hi! Well, this is one of the weirder ones. It loads up with "HTTP/1.1 404 Not Found" headers, but a second later sends new headers with "HTTP/1.1 200 OK"; I'm not even sure how that's possible. This is beyond stupid design. I'll add the link to the exception/ignore list. Thanks for pointing out. — HELLKNOWZ ▎TALK 18:50, 30 October 2010 (UTC)
See also, these edits: [5] and [6]. They work for me, but even after I reverted, the bot tagged them as dead links again. OSX (talk • contributions) 00:39, 31 October 2010 (UTC)
- Hi! The bot retags the links as dead if you remove the tags. Both links/pages report with "HTTP/1.1 404 Not Found" header, which is web-master's mistake. Here [7] [8] are working links with HTTP OK header; they should be replaced. — HELLKNOWZ ▎TALK 00:49, 31 October 2010 (UTC)
Greetings... in Walter Adams (economist) your bot marked [9] as dead, but it works consistently for me. Thanks, Kevin Forsyth (talk) 16:14, 1 November 2010 (UTC)
- Hello! There was a stray character in the
|url=
that mark-up rendered ignored, but the bot read. — HELLKNOWZ ▎TALK 17:30, 1 November 2010 (UTC)
Webcitation
Will the bot be checking for the existence of webcitation.org archives of deadlinks? Using the comb function, I've preemptively archived all the ref links at pages like San Francisco, in anticipation of links failing, but have not reformulated the refs. Note that not all webcite archived links are successful - these were 404, or use a robots.txt or noarchive etc. Here's an example of a "failed" archive which is still retained. It returns a 200 status, but contains a clear error message, unfortunately in a frame. Also note that all archive attempts, successful or not, are available at the url of choice, in a drop down menu, sorted by date. --Lexein (talk) 09:02, 2 November 2010 (UTC)
- It is planned, but I have not gotten around to it yet. I know there have been bots, such as User:WebCiteBOT, that have archived many pages at Webcitation, so there definitely are pages on Webcite that the bot cannot find on Wayback. Of course, I have not way of dealing with problematic web-sites with funky frames, wrong headers, robot exclusions, etc. — HELLKNOWZ ▎TALK 10:59, 2 November 2010 (UTC)
Royal Navy links
You've flagged dead links on a variety of pages for RN ships, but if you look at the front page of the RN website you'll see that the site is currently undergoing "essential maintenance". I assume, therefore, that the pages you've flagged as dead may well be likely to return. David Biddulph (talk) 07:10, 8 November 2010 (UTC)
- Hi! Added the site to exclusion list for now. They did not add any maintenance notices on the pages themselves so the bot didn't find a reason to notify me of possible maintenance.
- The one's in logs I have: added Wayback urls, wrong tag, fixed, added Wayback url, added Wayback url, added Wayback url, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed, wrong tag, fixed. I really need to automate this kind of reversions... — HELLKNOWZ ▎TALK 11:20, 8 November 2010 (UTC)
- Thanks for sorting that. I don't know for how long it's going to be out of action. David Biddulph (talk) 11:54, 8 November 2010 (UTC)
Hollywood Reporter site content migration in progress
H3llBot is marking www.hollywoodreporter.com links as dead. Indeed, many of that entertainment news site's articles are offline, but it's temporary. I just inquired about it and the webmaster said "We are currently migrating this content to a new platform and all of these urls will be redirected to the same content (in a new shiny home) very shortly." Can H3llBot skip those links for now? —mjb (talk) 09:20, 10 November 2010 (UTC)
- Hi! Will ignore the site for now. If the urls get changed to an easily replacable format, I can also update them afterwards. — HELLKNOWZ ▎TALK 11:28, 10 November 2010 (UTC)
archivedate missing
this is quite old, but I just noticed it. --Muhandes (talk) 08:10, 14 November 2010 (UTC)
- Hi! Thanks for notice. I have already fixed this, it was the way .NET treated regex captured groups when replacing. — HELLKNOWZ ▎TALK 13:03, 14 November 2010 (UTC)
- Perhaps you could run the bot over Category:Articles_with_broken_citations and fix these? At least half of the ones that I have sampled in this category have had this error. Thanks! 134.253.26.12 (talk) 22:07, 18 November 2010 (UTC)
Pining for the fjords?
This one, which I have undone. Site is up and returns Status: HTTP/1.1 200 OK
via web-sniffer.net, various UA strings. Sswonk (talk) 20:20, 14 November 2010 (UTC)
- Hey, thanks for notice! Yeah, it is 200 for me too, may have been something temporary. Will add to ignore list just in case. — HELLKNOWZ ▎TALK 23:33, 14 November 2010 (UTC)
Another not dead
hi, the bot labelled all these links from the same site http://hamptonroads.com [10] as dead but none of them were, kind regards Tom B (talk) 23:21, 15 November 2010 (UTC)
- Hi! The domain must have been down at the time of checkings. This does seem to be the only page the "http://hamptonroads.com*" was marked as dead. — HELLKNOWZ ▎TALK 12:44, 16 November 2010 (UTC)
False positives
For some reason it's tagged a lot of URLs from countryuniverse.net as dead when they aren't; see this edit for instance. Ten Pound Hammer, his otters and a clue-bat • (Otters want attention) 22:38, 16 November 2010 (UTC)
- Will investigate this, probably was prolonged temporary downtime. My logs are a bit messy from October. — HELLKNOWZ ▎TALK 23:37, 16 November 2010 (UTC)
Please be careful
Please be careful - On Cyclone Monica your bot moved the primary link from the archived version to archive fields and added an original link, which was deliberately not used since the archived version of the page does not equal the current revision of the page.Jason Rees (talk) 17:33, 26 November 2010 (UTC)
- That is the correct and intended behaviour. The
|url=
parameter indicates the source of the reference. Archived copy is not this source. Wayback is an online library, it can only store copies, and therefore is not used as the source.|url=
should indicate the original version, while|archiveurl=
are for archived urls. Even if the page is still online and the content is different, it was the original source url, and that is why the|archivedate=
is used to inform when this source was accessed. The primary clickable link is the archived url, so this shouldn't present any problems for the user. In fact, this further informs the user that the version is a snapshot of past material. — HELLKNOWZ ▎TALK 01:07, 27 November 2010 (UTC)
Bot edit flag not set
I notice the 'BOT:' prefix in edit summaries, but the edits are not marked with the b flag, which may interfere with those who've set their watchlists to hide bot edits. Just a suggestion. – Regregex (talk) 11:31, 1 December 2010 (UTC)
- Are you sure? recent contributions show all H3llBot's edits as bot flagged. — HELLKNOWZ ▎TALK 12:36, 1 December 2010 (UTC)
- Hm. With no b's in BBC Micro's history or H3llbot's contributions it was a reasonable conclusion. Then WP is dumb to print those two lists differently. My apologies. – Regregex (talk) 22:11, 1 December 2010 (UTC)
- No worries, page history and Contribs never did show bot flags. Take care! — HELLKNOWZ ▎TALK 22:15, 1 December 2010 (UTC)
- Hm. With no b's in BBC Micro's history or H3llbot's contributions it was a reasonable conclusion. Then WP is dumb to print those two lists differently. My apologies. – Regregex (talk) 22:11, 1 December 2010 (UTC)
Bot bug ?
It sometimes automatically turns ".jpg" to ".bmp" (without any reason)... Jon Ascton (talk) 15:11, 17 December 2010 (UTC)
- Hi! Could you link to a specific case, please? the bot hasn't run for a while, and there is no functionality in any of the code that could change extensions. The bot does not deal with images at all, unless they appear in citations. — HELLKNOWZ ▎TALK 17:31, 17 December 2010 (UTC)
False dead link
The bot placed a dead link notice on this website found on the List of Red Hot Chili Peppers band members. The link actually isn't dead, and I am a bit late, since this happened last month. I just thought I'd let you know. WereWolf (talk) 04:34, 22 December 2010 (UTC)
False dead link
This edit to Ernest_Leiser added a false dead link.
http://www.nytimes.com/2002/12/02/obituaries/02LEIS.html is alive and kicking.
Does the bot have a free account to login into the NY Times and other common sources? If not, it is going to report dead links that a registered reader would not encounter. --Javaweb (talk) 17:36, 26 December 2010 (UTC)Javaweb
- Thanks for report. I would have expected the website to not 404 if subscription is required but return a different suitable error code. I guess I will have to ignore the sites that require subscriptions after a certain amount of views and fail to provide a reasonable error code. — HELLKNOWZ ▎TALK 17:50, 26 December 2010 (UTC)
- Thanks for your bot. It can identify links needing repair. BTW, NY Times is going to a pay-wall model probably with users getting a few links for free, then refused. I'm telling you that b/c your bot may have trouble with this case. Is there a standard error code that should be returned? If you describe what you want returned in case of "good link but you are not registered to view it". I would be happy to ask the NYTimes if they could start returning it to you. — Preceding unsigned comment added by Javaweb (talk • contribs) 18:09, 26 December 2010 (UTC)
Looking at html error codes they are not going to be used with pages that are actually returned html text like "Please register",etc. Screen scrapping is a possibility, but a fragile, messy one. Javaweb (talk) 18:22, 26 December 2010 (UTC)Javaweb Javaweb (talk) 18:22, 26 December 2010 (UTC)Javaweb
- Closest matching HTML errors defined are:
401 Unauthorized Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided.[2] The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication. 402 Payment Required Reserved for future use.[2] The original intention was that this code might be used as part of some form of digital cash or micropayment scheme, but that has not happened, and this code is not usually used. As an example of its use, however, Apple's MobileMe service generates a 402 error ("httpStatusCode:402" in the Mac OS X Console log) if the MobileMe account is delinquent. I'm sure the website does not return this. Javaweb (talk) 18:34, 26 December 2010 (UTC)Javaweb
- The bot would only flag the link as dead if it got a 404. So, yes, the site did return 404 due to maintenance, subscription, some error, or something else. There is no real way I can check now. I'm aware of different other codes that should be returned, but more sites ignore these than use these, nothing I can do. I've considered screen-scraping but the code needs a rewrite and the bot isn't running at the moment anyway. — HELLKNOWZ ▎TALK 18:38, 26 December 2010 (UTC)
Another false positive
Here. Ten Pound Hammer, his otters and a clue-bat • (Otters want attention) 22:17, 6 January 2011 (UTC)
- Thanks. — HELLKNOWZ ▎TALK 23:38, 6 January 2011 (UTC)
Romanian Top 100
I have stepped into tag <no code>user:H3llBot</no code> added on some articles reffering to Romanian Top 100 to klassify as <no code>dead link<no code>. Please consider the artikle on Romanian Top 100 has been created recently and fails to pass the time preservation of any data conditions constantly. More, the external webcite continuoucly removes oblsolete data and wikipedia.org will send viewers to invalid links. Please tag artikles refferencing to RUMwww.rt100.ro accordingly and thyz message too. Paul188.25.52.21 (talk) 16:25, 23 February 2011 (UTC)
- The bot's archiving task has not been active for some time now, so it could nor have edited a recent article. Webcite does not remove old data, neither has the bot been using Webcite (only Wayback Machine). If you need a certain domain (www.rt100.ro) tagged for dead links, then you should ask this at WP:BOTREQ. — HELLKNOWZ ▎TALK 16:58, 23 February 2011 (UTC)
Serial archives
At this edit the bot marked as dead the url for a news item in The Age. That lead to the deletion of the entire citation. I've just restored it and linked to the publisher's archived version, but that raises the point, that our articles about serials have an infobox that can identify where their archives are. It might be helpful when marking the deadlink to check for and suggest the url found there so that editors are less likely to just delete the citation.LeadSongDog come howl! 18:01, 12 March 2011 (UTC)
- Thanks for report! The editors should not be deleting dead links just because they are dead. WP:DEADREF and WP:LINKROT are rather clear on that. That said, the bot hasn't run for a while now. — HELLKNOWZ ▎TALK 18:29, 12 March 2011 (UTC)
I do have articles for my games
- Moved non-bot related discussion to User talk:H3llkn0wz#I do have articles for my games. 06:26, 15 April 2011 (UTC)
AAlertBot
The AAlertBot has recently created lots of articles in the (article) namespace instead of in Wikipedia: . Please confirm that this bug is now fixed. — RHaworth (talk · contribs) 20:23, 17 May 2011 (UTC)
- It is yes. I asked for the pages to be deleted shortly afterwards. It was a discrepancy between updated template syntax and bot code. Sorry for the mess. — HELLKNOWZ ▎TALK 21:16, 17 May 2011 (UTC)
"authorlink" parameter in United Kingdom
Hello, Your bot just put the authorlink parameter into 3 citations. One of them was correct, but in the other two cases the "authorlink" text was identical to the "author" text. There is no need for the "authorlink" parameter to be present at all in such cases: one simply puts the author name into double square brackets to make the Wikilink. The "authorlink" parameter is only meant for cases where the two are different, most often because the article is using the lastname firstname convention. -- Alarics (talk) 09:43, 12 June 2011 (UTC)
- I agree that this appears as if it changes nothing in visual output. But if you take a look at the source code of the page, for example the cite with "United Nations Economic and Social Council" as author used as "[[United Nations Economic and Social Council]]", you will see that the produced COinS metadata is in fact wrong, because it also includes the square braces (underlined):
- "<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.btitle=Ninth+UN+Conference+on+the+standardization+of+Geographical+Names&rft.atitle=&rft.aulast=%5B%5BUnited+Nations+Economic+and+Social+Council%5D%5D&rft.au=%5B%5BUnited+Nations+Economic+and+Social+Council%5D%5D&rft.date=August+2007&rft.pub=UN+Statistics+Division&rft_id=http%3A%2F%2Fwww.webcitation.org%2F5lhCIacyi&rfr_id=info:sid/en.wikipedia.org:United_Kingdom">"". — HELLKNOWZ ▎TALK 10:14, 12 June 2011 (UTC)
- So are you really saying that "authorlink" should be filled in in all cases in all articles if the author has a WP article? That is news to me. It seems an awful lot of work (and, cumulatively, server load, not to mention additional scope for editor confusion) just to meet a very obscure technical issue not seen by the ordinary user. Is COinS metadata really that important? Sounds to me like the geek tail is wagging the user dog. -- Alarics (talk) 10:43, 12 June 2011 (UTC)
- You're probably right anyway about editor confusion when the author name text and the link are the same. It's not a lot of work though, because a bot does it; and the server load is rather small because it is single page regeneration. I suppose, the braces could be stripped in Citation/core by string functions instead. I'll change the task to only touch
|author=
and|editor=
if the link is piped to a different target than the display portion. — HELLKNOWZ ▎TALK 11:11, 12 June 2011 (UTC)- Thanks -- Alarics (talk) 11:13, 12 June 2011 (UTC)
- You're probably right anyway about editor confusion when the author name text and the link are the same. It's not a lot of work though, because a bot does it; and the server load is rather small because it is single page regeneration. I suppose, the braces could be stripped in Citation/core by string functions instead. I'll change the task to only touch
bot error
[11] this edit was done incorrectly. Could you possibly be bold and set the parameters of the bot not to change that particular field when the original is cited in that way? --LauraHale (talk) 20:30, 14 June 2011 (UTC)
- Hi, sorry I don't quite understand what is wrong with the edit? — HELLKNOWZ ▎TALK 20:42, 14 June 2011 (UTC)
- I think the issue is you removed the author. However a press association is not an author, so your edit would appear to be correct. There is the agency field for cases like this. Vegaswikian (talk) 21:10, 14 June 2011 (UTC)
- As below, the bot removed piping, not the author. The actual visual display did not change. — HELLKNOWZ ▎TALK 09:43, 15 June 2011 (UTC)
- I think the issue is you removed the author. However a press association is not an author, so your edit would appear to be correct. There is the agency field for cases like this. Vegaswikian (talk) 21:10, 14 June 2011 (UTC)
- (edit conflict)The bot removed piping. It changed the field to make it a field that doesn't appear to accept piping. That was the error. Can you be bold and have the bot stop fixing the author field, when it has piping, to make it authorlink with the piping removed? It seems counter to WP:AGF that the contributor putting in the piping did so incorrectly. --LauraHale (talk) 21:22, 14 June 2011 (UTC)
- If NPA isn't the author, the edit is still incorrect because it doesn't change it to agency. Thus, the bot made an error on top of an error. It would also be really nice to have the bot leave a link to a page which explains why it isn't making the edits as in cases like this, it appears rather unnecessary. --LauraHale (talk) 21:22, 14 June 2011 (UTC)
- That was not really an error per se, rather the intended behaviour. The piped link is removed to allow metadata to be produced and the field to be parsed by automated processes and external bibliography tools. So it's not an error, merely an editor preference to not have or unawareness of possibility to have the link in
|authorlink=
.
- That was not really an error per se, rather the intended behaviour. The piped link is removed to allow metadata to be produced and the field to be parsed by automated processes and external bibliography tools. So it's not an error, merely an editor preference to not have or unawareness of possibility to have the link in
- Additionally, whatever error was already present is due to human error, not bot's. What AGF entails is that I assume fields are used correctly. Each and every bot's contribution actually links to the page explaining the task, in this case User:H3llBot/ALA.
- Anyhow, I will clarify the correct usage of
|authorlink=
and suspend the piping removal for now.— HELLKNOWZ ▎TALK 09:43, 15 June 2011 (UTC)
- Anyhow, I will clarify the correct usage of
GedawyBot
Hi
Hello there! I just wanted to say I appreciate the tasks your bot is doing. Keep it up! :) --Waldir talk 13:42, 24 July 2011 (UTC)
- Thanks!. — HELLKNOWZ ▎TALK 14:46, 17 August 2011 (UTC)
(citation needed) instead of the template
Your existing bot request could be expanded to include bare (citation needed)s. I have left a comment there. Mark Hurd (talk) 02:56, 18 February 2012 (UTC)
Dead link
The bot keeps tagging this page as a dead link, but I've been to it each time the bot tags it and it's still active. BIGNOLE (Contact me) 12:16, 2 April 2012 (UTC)
- Thanks for notice; you are probably referring to Aquaman (TV program) [12]. I already reverted the second bot's edit, removed the link from the bot's database and fixed the issue. The site was down for a short period while I was getting the bot back online and that messed up some of the re-retrieval times. It shouldn't occur again. — HELLKNOWZ ▎TALK 12:24, 2 April 2012 (UTC)
- Thanks. I appreciate the prompt response. BIGNOLE (Contact me) 16:24, 2 April 2012 (UTC)
Link not dead (joongangdaily)
[13] claims the second link is dead, works fine.[14]. Don't know how high this error rate is, but if its remotely significant it should go back to the drawing board.--Crossmr (talk) 13:44, 4 April 2012 (UTC)
- Hi, thanks for report. This is slightly more complicated than a simple detection error. (1) http://joongangdaily.joins.com/article/view.asp?aid=1913898 redirects to (2) http://koreajoongangdaily.joinsmsn.com/news/article/article.aspx?aid=1913898. Unfortunately (as per bad website design), (1) returns a 404. The problem is that (1) uses a non-standard redirect method with a unique Javascript piece of code that I couldn't really have anticipated. In fact, browsers without Javascript would fail on this. I will add http://joongangdaily.joins.com and http://joongangdaily.joinsmsn.com to a permanent ignore list. I will try and make some smart filter to look for signs of websites that try redirecting via Javascript. Sorry if this caused you much trouble, I repaired all the remaining false positives. This particular case is indeed unique to this website. Overall error rate should be pretty small though, only above thread and this one so far from 420k links, which makes it <0.005%. Regards. — HELLKNOWZ ▎TALK 17:51, 4 April 2012 (UTC)
For you and your owner
Clever bot award | |
Thanks for adding links to archive.org to dead links in articles that I have written. It is a very useful thing to do and tedious to do by hand. Here is something that you can share with your owner. SmartSE (talk) 19:24, 4 April 2012 (UTC) |
- Thanks! The bot really ought to cut on the booze though. — HELLKNOWZ ▎TALK 07:31, 5 April 2012 (UTC)
hanford.gov link not dead
Your bot tagged a link as dead with this edit. I have since fixed it and added an achive link. If the commented out URL is a problem please feel free to (re)move it. – Allen4names 17:03, 6 April 2012 (UTC)
- Hey, thanks for report. Yup, it was the commented url, silly me. I swear I had that fixed already, probably too much of the above beer. I fixed all the (4) cases. Regards. — HELLKNOWZ ▎TALK 17:47, 6 April 2012 (UTC)
Over-enthusiastic editing of title
Hi, the bot seems to be over-enthusiastic when it edits a title field including italics.
In this case[15] the last two words of the edited version of the field are in italics. The bot incorrectly deleted the two closing single-quote characters when it correctly deleted: (via Wayback). Peter Loader (talk) 19:45, 6 April 2012 (UTC)
- Thanks for report. A subtle bug in complex parsing. I made sure the formatting, such as italics be present on both sides of content before removing. Will keep an eye out for these, it's a bit tricky catching the many variations. Cheers. — HELLKNOWZ ▎TALK 19:53, 6 April 2012 (UTC)
On 27 Sep 2010, H3llBot marked 17 dead links in this article. I've just tried a few and discovered that they seem to be OK, albeit behind a paywall (but the opening paragraphs are available for free). Do these links to be rescanned by the bot, or does the bot have a problem with paywalled links? Colonies Chris (talk) 14:39, 21 June 2012 (UTC)
- Thanks for report. That version of code no longer runs, I had since rewritten it. I'm afraid, I also cannot tell what the issue was anymore. Paywalls generally return 403 Forbidden, so the bot wouldn't mark them as dead. The new code has rechecked those links (somewhat) recently and they correctly returned as "live", so it shouldn't be a reoccurring thing. — HELLKNOWZ ▎TALK 21:12, 21 June 2012 (UTC)
- If the bot has subsequently found the links OK, shouldn't it have removed the {{dead link}} templates? Colonies Chris (talk) 17:49, 23 June 2012 (UTC)
- That would cause far too many false positives in the long run. Sites often first go dead (i.e. 404), but then later some parent site or domain squatter snatches them and starts reporting them as "live". — HELLKNOWZ ▎TALK 18:46, 23 June 2012 (UTC)
- So for cases like this, the templates need to be removed by hand? Colonies Chris (talk) 08:45, 24 June 2012 (UTC)
- Afraid so. I've checked and removed tags from (I hope) all the links now. This was a case specific to that domain and I suspect they did something to the links. For example, [16] or [17] are still dead and were tagged around the same time. The new code with different re-checking is much less likely to tag such pages, but there isn't much I can really do if a domain "restores" links at a later date. — HELLKNOWZ ▎TALK 11:30, 24 June 2012 (UTC)
- OK, thanks for doing that. Colonies Chris (talk) 12:03, 24 June 2012 (UTC)
- Afraid so. I've checked and removed tags from (I hope) all the links now. This was a case specific to that domain and I suspect they did something to the links. For example, [16] or [17] are still dead and were tagged around the same time. The new code with different re-checking is much less likely to tag such pages, but there isn't much I can really do if a domain "restores" links at a later date. — HELLKNOWZ ▎TALK 11:30, 24 June 2012 (UTC)
- So for cases like this, the templates need to be removed by hand? Colonies Chris (talk) 08:45, 24 June 2012 (UTC)
- That would cause far too many false positives in the long run. Sites often first go dead (i.e. 404), but then later some parent site or domain squatter snatches them and starts reporting them as "live". — HELLKNOWZ ▎TALK 18:46, 23 June 2012 (UTC)
- If the bot has subsequently found the links OK, shouldn't it have removed the {{dead link}} templates? Colonies Chris (talk) 17:49, 23 June 2012 (UTC)
H3llBot in other project and manual
Hello, I beg your pardon. Can Your bot User:H3llBot to work in other Wikipedia, in particular, in the Russian Wikipedia, and if he can work in projects that do not belong to the Wikimedia Foundation? Are there to work instructions for installation and use? Thank you.--Ворота рая Импресариата (talk) 12:33, 26 January 2013 (UTC)
False positive for a URL requiring answering a survey or viewing an advert
I found a false positive while editing the Joseph F. Dunford, Jr. article. This involved a situation when the URL was not really dead, but required answering a survey or viewing a video before allowing the reader to view the external article. I am not sure if it is worth trying to code for this situation, but I thought I would let you know, nonetheless.
The particular link citation, & the note that I added to it, is:
- Johnson, Kimberly (24 February 2008). "3 tapped for stars". Marine Corps Times. Retrieved 18 October 2014. (Viewing article requires answering survey or viewing advertisement video)
Peaceray (talk) 02:47, 19 October 2014 (UTC)
New aritcle
How do I add my GA nominee Oei Hui-lan to the WikiProject: Women in Red? ClaraElisaOng (talk) 10:42, 29 September 2018 (UTC)
- If you mean Wikipedia:WikiProject Women in Red/Article alerts, then it should get added automatically tomorrow. — HELLKNOWZ ▎TALK 15:08, 29 September 2018 (UTC)
Bot not working for a few days now
Just so you know. Slimy asparagus (talk) 08:49, 2 August 2021 (UTC)
- @Slimy asparagus: Thanks for letting me know! — HELLKNOWZ ▎TALK 11:50, 2 August 2021 (UTC)