User:WP 1.0 bot/Third generation
This page describes plans for the new version of the WP 1.0 bot that is under development. Its codename is "lucky". This will replace the current version.
Goals
[edit]Some of the goals of the rewrite are:
- Provide a new codebase that is in a modern language (Python) that will allow for new maintainers to easily understand and contribute to the code.
- Allow User:Audiodude to understand the functionality of the current bot by engaging in the rewrite project.
- Create a platform that can be used to add on additional features and functionality that the community desires that isn't possible with the current bot (which is in "emergency maintenance mode").
Code location
[edit]The new bot is being developed primarily by User:Audiodude and the code for it is in the same place as the current bot, Github. Work was originally done under a separate "lucky" branch, and then under the "lucky" directory, but has now been merged into master:
Re-write status
[edit]Status update: 2019-01-06
[edit]The current status of the bot is that the initial rewrite was largely successful, with the functionality of the current version of the bot being duplicated, and new features such as unit tests being added. To be clear, the initial rewrite only duplicated the current functionality of the bot directly, and did not provide any new user-facing features.
However, there was a problem with the rewrite in that it used SQLALchemy to abstract database access. While this was functionally correct, it resulted in about a 10x slowdown in processing the data the bot needed. The code that would be necessary to counteract such a slowdown resulted in an end product that was much more complicated to read and write.
Therefore as of the new year, a rewrite of the rewrite is under way that will use raw SQL access to the database and remove all references to SQLAlchemy. This has presented its own set of problems, however, because now the test infrastructure is much more complicated. In the initial rewrite, an in-memory sqlite database could be used in tests because SQLAlchemy abstracted away the differences between the databases. In the newest version, it will be necessary to somehow bring up a "test" database that can be populated and destroyed on each test run.
Status update: 2019-01-07
[edit]A workaround for the problem described in the previous status update has been found. Development is proceeding on local machines with a local MySQL database for the test cases. Travis CI, which is used on Github for continuous integration, will be configured to use a similar setup once the tests have been all fixed.
Status update: 2019-02-18
[edit]All unit tests for the rewrite branch have been fixed (...for a while actually). The newest version of the code uses raw SQL queries and pymysql to access the databases that it talks to. So far, this has proven to be far more performant with less code than the ORM/SQLAlchemy versions.
In current tests, updating Wikipedia:WikiProject_Catholicism with the old bot takes ~50 seconds, while doing so with the new bot takes ~70 seconds. This is a minor performance regression that we're willing to live with.
The 'lucky' branch, which was housing the rewrite, has been merged into the master branch of the codebase. This is somewhat uneventful, given that all of the rewrite code lives in its own directory.
More exciting is that, after completing a backup of the WP 1.0 production database, we are proceeding with a test to use the new Lucky bot to update the rating data for Wikipedia:WikiProject_Catholicism. That means that Lucky will update Catholicism, with the old bot updating all other WikiProjects. We hope that this will reveal any egregious errors or things that might have been overlooked. It is our first step in using the rewritten code.
Status update: 2019-03-10
[edit]The new bot code has been "running" over 22 test projects for the past week. I say running in quotes because the assessment tables for these projects were not being updated.
With the project tables of the projects I had selected for the 22 project beta continuing to not update, I decided to dig a little deeper. I found two bugs in the newly rewritten code:
- The database was not being committed after updating the project metadata table. This meant that the upload job (in the old bot) always considered its cached output to be valid, so there was never a change to the table.
- The command in the update-all.sh binary was incorrect, so the 22 projects weren't being updated by the new bot anyways.
I've fixed these issues and kicked off a manual update of the 22 projects. They should correctly have their tables updated at 4 PM PT today. audiodude (talk) 22:05, 10 March 2019 (UTC)