Wikipedia:Bots/Requests for approval/ArkyBot II
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Automatic or Manually Assisted: Automatic.
Programming Language(s): PHP
Function Summary: Update US city and town infoboxes per 2007 Census estimates.
Edit period(s) (e.g. Continuous, daily, one time run): Daily until complete
Already has a bot flag (Y/N): Y
Function Details: This is a request for new functionality for a now inactive bot. The new task I am proposing is a one-time run of all articles for US cities and towns to ensure the population figure shown in the infobox is updated to match the 2007 official Census estimates, as a number of articles are either outdated or rely on unofficial sources. Additionally, the bot will add the proper source citation for this information. The bot will be easily modifiable to repeat this task on an annual basis when new figures are released.
Note that the bot's existing (already approved) task was largely similar (it involved editing the whole infobox rather than just a few lines) and thus for the most part re-uses the same code. A few test-runs in user space are available by checking the bot's last few edits.
Discussion
[edit]This sounds like something that needs to be done, and if it already does something similar then I don't see any problems with it. It would be a good idea to test-run it on a database dump or something similar beforehand, and have all of the changes in one place for easy review. - The Prophet Wizard of the Crayon Cake 06:47, 15 July 2008 (UTC)[reply]
- I don't personally see a problem with this but a bot editing every U.S. city and town is going to need wide community support. This should be advertised widely. BJTalk 13:15, 15 July 2008 (UTC)[reply]
- I considered bringing it up on Wikipedia:WikiProject Cities, although what I am proposing is simply bringing the articles up to date per their own adopted guideline, so it seemed like it would likely be redundant. However, I have no issues with waiting for "wider" approval. Shereth 14:52, 15 July 2008 (UTC)[reply]
I brought this up for discussion as stated above - the discussion at the VP got a couple concerns over whether or not Census estimates qualify as "official" per the USCITY guidelines. Unfortunately, requests for comment on the matter from both Wikipedia:CITY and the talk page of the Wikipedia:USCITY page itself resulted in 0 responses, so while no one is saying they shouldn't be used, no one is saying that they should. Not sure how to proceed, although I will note that a large majority of US City articles use (relatively) updated Census estimates as official figures, anyway. Shereth 17:27, 22 July 2008 (UTC)[reply]
- I'd still like to see more discussion (AN perhaps?) In the meantime Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'm going to leave the discussion up here for comments. BJTalk 04:36, 23 July 2008 (UTC)[reply]
- Test run is complete. I'm trying to think of other locations to try and get some community input, I'm not sure that AN is really appropriate .. I also left a note on the talk page of the template that's being modified. Shereth 16:01, 23 July 2008 (UTC)[reply]
Seems good to me, but all those trial edits were for relatively unproblematic towns. What would it do to Louisville, Kentucky, which already has footnotes and separate figures from competing sources? – Quadell (talk) 23:32, 25 July 2008 (UTC)[reply]
- The bot is literal in its interpretation of US Census sources. The example you bring up is one where the Census' definition of the "city" does not agree 100% with what one might find in the article, hence the dual population figures. Since the bot takes what the Census gives it literally, the bot would be looking for an article titled Louisville/Jefferson County metro government (the literal name of the entry in Census data) and come up with an error. Errors are saved in a log for human review, and the entry is skipped. By virtue of the bot's literal interpretation of the Census figures, these "problematic" cities will be skipped. Shereth 23:49, 25 July 2008 (UTC)[reply]
- On second thought, I suppose this warrants a little more discussion on exactly how the bot operates, in terms of matching articles to Census information. The bot uses the Census' CSV files for the most current estimates, and gets the city name by removing the "city", "town" or whatever modifier exists on the line. It then looks for an article at Name, State. If that article does not exist, the entry is written into an error log and it is skipped. The bot then checks to see if the article is a redirect page, in which case it checks at the redirect. The bot then, finally, looks for the appropriate infobox in the article. If it does not find the infobox - for example, the article may not have one, or if the article is a disambiguation page - it writes the entry into the error log and moves on. Lastly, it checks the existing population figure, and if it matches the Census figure, the file is skipped, otherwise, it is updated. Shereth 00:02, 26 July 2008 (UTC)[reply]
- You might consider, as a safeguard, noting when something other than a single number was overwritten in the "population_total" field, or when there had been something in the "population_footnotes" field which was overwritten. If you make a list of these, interested people can look over these case to make sure no useful data was lost. I'm sure it's a tiny fraction. Anyway, I'm happy with this task, and I look forward to see it running in the wild. – Quadell (talk) 03:09, 26 July 2008 (UTC)[reply]
- Currently the bot does not overwrite the existing population_footnotes section, it merely appends the new reference to the end of it, in the case that a reference there is being used elesewhere in the article (there was a problem with the bot in the testing where it did, in fact, mangle an existing reference but that was a technical issue - now resolved - and not by design). I can certainly look in to a modification that will allow the bot to make a special notification, perhaps on the talk page of the article, when multiple/unusual entries are found in the population field. Other than that, the bot is pretty much just waiting for final approval. I'm also considering adding an additional task to actively patrol these articles to ensure the integrity of the data, but I will first wait for the initial run of updates to make sure the community has no issues with using the census estimates prior to tackling that. Shereth 16:06, 28 July 2008 (UTC)[reply]
Temporarily on hold - there is some discussion regarding a potential alternative (and more permanent) fix to the issue of population figures, so I'm temporarily suspending the development of this bot as is. It will still be needed, to perform a similar task - but it would update the cities' FIPS/GNIS codes rather than the population figure, with some modifications to the template to use this number to automatically display the population based upon a table stored independently of the article. If discussion on this concept goes nowhere, I will go ahead with the bot as is, but otherwise I'll have to modify it to change a different field. Shereth 15:24, 30 July 2008 (UTC)[reply]
- Any updates? – Quadell (talk) 13:51, 6 August 2008 (UTC)[reply]
- I haven't gotten a lot of response regarding making changes to the template to auto-display populations, so until/unless that gains some more traction I think I'd like to go ahead with the task as listed above. I'd also like to tack on the bit about hard-coding in the GNIS code (currently it exists as a "blank" entry in the template). This should be pretty straightforward and uncontroversial, and may in the future prove quite useful as GNIS codes are static, unique identifiers for geographical locations and make it easy to link a city to demographic/geographic data. Shereth 13:56, 6 August 2008 (UTC)[reply]
Approved. You're good to go. – Quadell (talk) 17:59, 8 August 2008 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.