User talk:MMABot/v1.0 EditRun
This is an archive of past discussions about User:MMABot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Duplicated links
Can you have the bot refrain from removing duplicated links in the record table? Because the table is sortable, I'm afraid that readers would have to scour it just to find a link; it's especially problematic in long tables like Travis Fulton's. —LOL T/C 00:42, 14 July 2011 (UTC)
- I was trying to follow the MOS on repeat links. However, you make a good argument for them. I can easily comment out the method that removes the repeated location and fighter links. I'll make a note of this on the Bot approval page as well. Thanks for the suggestion. --TreyGeek (talk) 13:42, 14 July 2011 (UTC)
- Thanks for commenting it out. I didn't notice that REPEATLINK changed recently;[1] it seems that those who favoured the change value link removal more than usability and consistency, which is very poor judgement imo. —LOL T/C 16:51, 14 July 2011 (UTC)
- I think this is a case where the MOS is allowed to not be followed if there is a good reason (other than "I don't like it" as we see in the MMA project all too often). I can understand not repeating links on short and/or unsortable tables. I had never seen Travis Fulton's article (over 300 fights?! That man deserves an MMA Iron man award). It shows a good reason (among some of the other tables that are larger than one screen height) to repeat the links, particularly since the tables can be sorted. --TreyGeek (talk) 17:44, 14 July 2011 (UTC)
- Thanks for commenting it out. I didn't notice that REPEATLINK changed recently;[1] it seems that those who favoured the change value link removal more than usability and consistency, which is very poor judgement imo. —LOL T/C 16:51, 14 July 2011 (UTC)
Parsing error
There is a problem with the first three rows of the table here. —LOL T/C 16:54, 26 July 2011 (UTC)
Also, notice what happened to the event for the bout against Thacker here. —LOL T/C 17:19, 26 July 2011 (UTC)
- Yeah, it happens on some articles with more unusual formatting code. It's going to be near impossible to account for all possible variations. For instance, looking at your second example with the line involved Thacker the event was coded as {{sort|Ultimate Fighter 01|[[The Ultimate Fighter 1 Finale]]}}. It was the only event on the article with a sort template and very very few events across all fighter articles have it. I might be able to get MMABot to catch the situation that occurred with Pedro Rizzo's article. I'm finding for some reason people like adding the center formatting for event names.
- Basically, when I run MMABot for a set of article, I'm manually coming around behind it skimming through the article to look for instances where the article causes the bot problems. If it's been an hour or so and you don't see me fix it let me know as I might have missed it. --TreyGeek (talk) 17:31, 26 July 2011 (UTC)
- You don't have to account for all possible variations. It looks like you're just taking the event line, cutting it off at the first pipe and appending "]]"; instead, you should use something like
String.replaceAll("\\[\\[([^\\|]+)\\|\\1[^\\]]+\\]\\]", "[[$1]]")
, which is much better because there are far, far fewer false positives. —LOL T/C 18:03, 26 July 2011 (UTC)- I've never been great with regular expressions and I'm not quite able to parse that one in my head (or on paper). I do get the gist that it is going to replace instances of [[event|rename]] to [[event]]. But it won't cleanup unneeded {{sort}} or align=left code which is another thing I've been trying to do. I'll look at it more when I get home and see if there is a way to get it working better. Thanks for the suggestions. --TreyGeek (talk) 21:22, 26 July 2011 (UTC)
- The problem isn't the existence of nonstandard code, but the fact that the bot currently screws up tables that use
align
(which are quite common) and forces a human to manually perform damage control. I wouldn't remove{{sort}}
because we need them for events that start with "The", and speaking of sorting, we also should be using{{sortname}}
for each fighter's name. You may want to see User:LOL/mmarecordcolumns.js, as it offers one solution (though not a very elegant one) for the align/style attributes. —LOL T/C 22:16, 26 July 2011 (UTC)- The use of the sort template should probably be added to the MMA Wikiproject information on record tables since there is no mention of it there and isn't be used very much from my scans of articles so far. I'll work on the parsing issues. --TreyGeek (talk) 16:19, 27 July 2011 (UTC)
- The problem isn't the existence of nonstandard code, but the fact that the bot currently screws up tables that use
- I've never been great with regular expressions and I'm not quite able to parse that one in my head (or on paper). I do get the gist that it is going to replace instances of [[event|rename]] to [[event]]. But it won't cleanup unneeded {{sort}} or align=left code which is another thing I've been trying to do. I'll look at it more when I get home and see if there is a way to get it working better. Thanks for the suggestions. --TreyGeek (talk) 21:22, 26 July 2011 (UTC)
- You don't have to account for all possible variations. It looks like you're just taking the event line, cutting it off at the first pipe and appending "]]"; instead, you should use something like
Now your bot has some serious issues; I took a few random samples out of this list, and so many of them had errors like in Joe Doerksen's article that I decided to roll them all back. —LOL T/C 17:58, 28 July 2011 (UTC)
- It doesn't matter if you're revising the edits later because you're creating a time frame during which readers will visit the article and see a train wreck of the table. The number of articles that your bot screwed over is so large that I just can't be bothered to look over them all. As a bot operator it's your responsibility to test your new algorithms correctly before executing them on the mainspace. —LOL T/C 18:07, 28 July 2011 (UTC)
- Define large please. That last edit run was for 30 articles, only about half of which had a problem (and a couple of those were issues with a single line of the record table). When the bot was approved by the bot approvals group it was with the understanding that there will be errors. The approval was subject to my manually verifying the edits, fixing any problems in articles, and adjusting the bot to resolve any issues with it. I'm am doing that. If you think I should perform edit runs of fewer articles at a time, what number would you suggest? --TreyGeek (talk) 18:37, 28 July 2011 (UTC)
- I can help you with manually fixing those edits so that you can concentrate on improving the bot. To be honest, I don't think that it is such a big problem because most of these articles are not from popular fighters and they really are minor cosmetic problems. These kind of mistakes are expected in the first version of a bot and I believe that they are normal. And it is not an easy issue with so many editors using so many different and unusual formats for each record table. If it's okay with you, TreyGeek, simply put a list of the articles that are edited by the bot with the subject "MMABot edits of date of the report" either in a new section here or in my talk page and I'll correct all that I can manually and report whatever issue I find. May I suggest that you reduce the number of runs to 10? Just to easily distinguish in percentage how many mistakes the bot causes with each run and many are corrected with each improvement to the bot, when you reach and 80% of reliability you move it to 20 and so on. Jfgslo (talk) 01:56, 29 July 2011 (UTC)
- Define large please. That last edit run was for 30 articles, only about half of which had a problem (and a couple of those were issues with a single line of the record table). When the bot was approved by the bot approvals group it was with the understanding that there will be errors. The approval was subject to my manually verifying the edits, fixing any problems in articles, and adjusting the bot to resolve any issues with it. I'm am doing that. If you think I should perform edit runs of fewer articles at a time, what number would you suggest? --TreyGeek (talk) 18:37, 28 July 2011 (UTC)
Clarification and/or best idea on matching MMA fighter records in articles
I was contacted at my talk page about this issue but, what is the specific situation about this? Is this about matching the record in {{MMArecordbox}} with {{Infobox martial artist}}? Jfgslo (talk) 23:28, 27 July 2011 (UTC)
- I think that was an old talkback about this discussion that I've archived. Currently the bot examines the records in each box and if they don't match, it gives me a message to look into it. While the record box is often more accurate than the infobox, I've been finding a number of articles with both incorrect. --TreyGeek (talk) 00:54, 28 July 2011 (UTC)
- The bot could check the column Record from the MMA record table and check which of the two, infobox or MMArecordbox, has the one that matches Record and then the bot could modify the one that doesn't accordingly. There is also another alternative that could be fully automated and more accurate but I do not know how easily it would be to implement. LOL wrote another script that verifies that the method used in the MMA record table matches the one reported in a fighter's record at Sherdog. The bot could use a similar function and check which of them matches with the record reported in Sherdog and use that one as the correct one. This may be more complicated, though. A simpler solution would be that the bot could merely report records that do not match to a to do list at the front page of WP:MMA so that editors can manually correct them. While this would only change the place where the message is added, with the work of several editors it could be easily fixed. Jfgslo (talk) 01:07, 28 July 2011 (UTC)
- Neither the record box nor the infobox need the total win or loss information specified. The templates can automatically calculate the total wins and losses through the number of dec, ko, sub, and dq wins and losses. I've actually been removing the total win and loss fields from each box as I have been manually working on the articles. I've seen a lot of instances where someone will update just the total win/loss field and not the dec/sub/ko/dq fields. Web scraping the information from the Sherdog profile is a possiblity; that in itself is a possibly complicated process and requires being able to find the Sherdog profile (as not every fighter article has a link to it).
- As I said, right now, if the numbers in the record box don't match the infobox it provides me a message to look into it. I then manually update the numbers to match what is on Sherdog. A bit tedious at times, but it gets the job done. --TreyGeek (talk) 01:26, 28 July 2011 (UTC)
- What I meant is that, if the total number of fights, wins, losses, draws and no contest, in the Record matches the total of win/loss info from the infobox or MMArecorbox, then the bot can assume that that one is correct. But I guess that the problem you are referring to is that records may be correct regarding the total of wins/losses and total of fights but not with the submissions/KOs total. That is harder to tackle. It would not be possible for a script to count the number of rows in a MMA record that have a specific word (i.e., home many times "submission" appears in that table), would it?
- For the moment, would it be possible to implement messages from the bot in WT:MMA? I imagine that editors there could help with these records instead of leaving all the work to you, at least until a more suitable solution is found. Jfgslo (talk) 02:00, 28 July 2011 (UTC)
- The bot could check the column Record from the MMA record table and check which of the two, infobox or MMArecordbox, has the one that matches Record and then the bot could modify the one that doesn't accordingly. There is also another alternative that could be fully automated and more accurate but I do not know how easily it would be to implement. LOL wrote another script that verifies that the method used in the MMA record table matches the one reported in a fighter's record at Sherdog. The bot could use a similar function and check which of them matches with the record reported in Sherdog and use that one as the correct one. This may be more complicated, though. A simpler solution would be that the bot could merely report records that do not match to a to do list at the front page of WP:MMA so that editors can manually correct them. While this would only change the place where the message is added, with the work of several editors it could be easily fixed. Jfgslo (talk) 01:07, 28 July 2011 (UTC)
ToDo list for next version
I know I haven't finished one round of fighter articles and there are still little bugs in the current version. But I wanted to start jotting down a list of things that would be nice to implement in the future should I get that chance. Here goes:
- From the {{dts}} template remove the deprecated "link=off" parameter.
- If the {{WikiProject Mixed martial arts}} banner hasn't been added to talk pages do so. (That banner is used for a lot of reporting and notifications to the project.)
- Make sure the fighter's Sherdog "id number" is in both the "External links" section and in the infobox.
- Remove total win/loss fields from infobox (for MMA stats) and MMArecordbox. These values are calculated automatically by the template.
- If the article lacks citations (no <ref> tags) add a {{BLP unsourced}} template to the start of the article.
- In the Infobox remove any information for the style parameter. Exceptions are for fighters who are also kickboxers (in which the style should likely be "Kickboxing").
- Unlink red-linked fighters and events in record table.
- Convert variations of "US" to "United States" in location column of record table.
- Delink country names in location table as per WP:OVERLINK.
- Remove future fights from fight records as per recent consensus at WP:MMA.
--TreyGeek (talk) 23:17, 3 August 2011 (UTC)
- I question the unlinking bit. Obviously red-links are unattractive but if there's a possibility that the particular fighter might some day be notable or already is notable and just doesn't have an article for him then leaving the links in place will be useful. I'm sure some/many/most? of these fighters are/were amateurs never to be heard from ever again but that seems like the sort of thing a person should judge, not a bot.
- Along those same lines, I see that the bot is not removing flag icons from beside the fighters' names in the records tables. I completely understand why you chose to do it that way (by far the most contentious aspect of the entire "Record Tables War"). That said, like someone pointed out, for fighters who don't have articles how can we know what flag is supposed to be used? For red-linked fighters there should be no flag icon unless a reliable source is supplied indicating their nationalities (or whatever). I would think removing flags from red-linked entries would survive any potential discussion (being a WP:BLP issue).
- In any case, good job on the bot! SQGibbon (talk) 02:54, 29 August 2011 (UTC)
- I'm still a ways from starting work on the next version of the bot, so these points and any other that come up are very much up for discussion.
- For red-linked fighters/events (I'm going to toss both together though you or others may want to separate them in any discussion on this topic) I figure they can be re-linked easily. If I were creating a new article, I would link all fighter's names (for example). Then I'd follow each link to its respective destination article, locate the text referring to the article I just created, and edit the destination article adding in the [[ ]]. But that's me (and I rarely create new articles). Again, I'm in no hurry on starting this new work and it may be something to stick up at the MMA Wikiproject to see what the consensus on unlinking red-linked fighters/events is.
- As for fighter flags, I'm not going to touch them, either with the bot or manually. It's just one of those things that over the years I've decided to just keep my hands off unless there is some miraculous, clear consensus formed. ;) --TreyGeek (talk) 05:09, 29 August 2011 (UTC)
- I've found a couple times where an opponent link goes to an article for a person who is not a fighter. If I stumbled upon this twice, I can only imagine there are hundreds (thousands?) more. Would it be possible to have the bot check that the link goes to an article that contains an MMA category? In some cases the fix will be changing Patrick Smith to Patrick Smith (fighter) (or with middle names or nicknames), and in some cases de-linking. I think this is another reason not to red-link all opponents--it's likely that when an article is created one day, it won't be the same person. --Juventas (talk) 08:16, 7 September 2011 (UTC)
- I'll think about this suggestion. The challenge is going to be knowing whether the linked article is the correct one or not. There are a couple ideas rattling through my head on how to figure that out. I think if MMABot thinks the linked article is wrong it'll require a manual examination to tell for sure. There is a high possibility of false positives plus the issue of knowing what to rename the link to. --TreyGeek (talk) 13:25, 7 September 2011 (UTC)
- I thought checking for an MMA category (any of them) would be an automatic way of knowing whether it is correct or not. The links that go to people without an MMA category would probably have to checked manually. I would be willing to do this, if that's possible somehow. --Juventas (talk) 02:22, 8 September 2011 (UTC)
- I'll think about this suggestion. The challenge is going to be knowing whether the linked article is the correct one or not. There are a couple ideas rattling through my head on how to figure that out. I think if MMABot thinks the linked article is wrong it'll require a manual examination to tell for sure. There is a high possibility of false positives plus the issue of knowing what to rename the link to. --TreyGeek (talk) 13:25, 7 September 2011 (UTC)
Bad edit?
This edit: [2] changed "NSAC" to "nsac", presumably in an attempt to reduce unnecessary capitalization in record tables. However, I think NSAC should probably always be rendered NSAC (or, technically, NAC, but that's another point.) I changed it back, but I don't know if the bot is going to fight me on this. gnfnrf (talk) 03:04, 16 August 2011 (UTC)
- Thanks for bringing it to my attention. You're right it was doing that for cap reasons. The bot won't get around to the article again in a while. It's done the same thing a couple other times but I saw it and fixed it. Missed it this time though. I'll see if I can put a catch into the code so that it won't uncapitalize NSAC. Thanks again. --TreyGeek (talk) 03:50, 16 August 2011 (UTC)
- Fixed: When the bot looks at the text in between the parens, if it is "NSAC" it skips it and doesn't convert it to lowercase letters. --TreyGeek (talk) 06:13, 16 August 2011 (UTC)
MMABot edit summaries
Could you please take a look at Wikipedia:Bot owners' noticeboard#General notice to bot owners about edit summaries and see if the suggestions might apply to your bot? Feel free to add your own suggestions and comments there too. Headbomb {talk / contribs / physics / books} 21:06, 21 August 2011 (UTC)