User:GreenC/testcases/bigtenorg
Appearance
< User:GreenC | testcases
Steps to process bigten.org This is a real request that was recently made. The steps below are exactly what I did to process. This is if everything goes smoothly and there are no problems that require changes to the core code of the bot - which can happen frequently. This request had no URL transformations (step 6) which would normally require some additional code. 1. Request was created by a user: https://en.wikipedia.org/wiki/Wikipedia:Link_rot/URL_change_requests#bigten.org 2. Create a list of articles containing the domain, at the same time coin a new project name ('bigtenorg') wikiget -a "insource:bigten insource:/bigten[.]org/" | shuf > bigtenorg.auth 3. Create a skeleton source file that contains domain-specific changes: cp urlchanger_SKELETON_HARD.nim urlchanger_bigtenorg.nim 4. Edit urlchanger_bigtenorg.nim and modify basic domain information: # --------- CONFIG START Runme.urlchangerSum = "[[WP:URLREQ#bigten.org]]" # Edit summary Runme.urlchangerDRe = "bigten[.]org" # Old name: hostname/domain/path - regex # Used to parse URLs from wikitext Runme.urlchangerDDRe = "bigten[.]org" # Same as ^ - hostname/domain only - no path Runme.urlchangerDPRe = "bigten.org" # Same as ^ - no regex and no path Runme.urlchangerNRe = "bigten[.]org" # New name - hostname/domain - regex # Used to identify when it's been switched to new URL # If DRe and NRe have the same values use the same entry for eachRunme.urlchangerNPRe = "bigten.org" # Same as ^ - no regex Runme.urlchangerNPPRe = "[[Big Ten Conference]]" # Wikitext to replace with when it finds NRe in metadata fields - [[]] OK Runme.urlchangerNRPRe = "Big Ten Conference" # Plain text string to replace named refs - [[]] NOT OK Runme.urlchangerTCRe = &"(?i){mypipe}[^$]*[^$]*" Runme.skipapicheckexception = "bigten[.]org" # --------- CONFIG END 5. Add code to do URL transformations. None required for this project. 6. Compile medic binary lx -n bigtenorg 7. Create project directories and files. Project name (-p) is the number of articles to process ie. run the bot on articles 1 to 1326 as listed in bigtenorg.auth created in step #2 wc bigtenorg.auth 1326 projectm -c -p bigtenorg.0001-1326 8. Run the bot on 1,326 articles: runbot -n bigtenorg.0001-1326 -v medic-bigtenorg -r 8 -f auth 9. As it is running, check the logs for known trouble areas, such as soft-404s, that the bot will discover as it is running. 10. Cancel the bot and add code to handle discovered soft-404s ie. edit urlchanger_bigtenorg.nim and add the following code: # Soft-404 traps here: if newloc ~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/?$") and newurl !~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/?$"): sendlog(Project.syslog, CL.name, url & " ---- " & newloc & " ---- Redirect to home found ---- urlchanger7.1.3") return "DEADLINK" if newloc ~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/mbb/?$") and newurl !~ ("^https?://" & GX.hostname & Runme.urlchangerDRe & "/mbb/?$"): sendlog(Project.syslog, CL.name, url & " ---- " & newloc & " ---- Redirect to mbb ---- urlchanger7.1.4") return "DEADLINK" The above code is saying if a redirected URL ends in "/mbb" this indicates a soft-404 and treat it as a dead link. 11. Kill the original project and recreate it and re-run the bot: projectm -x -p bigtenorg.0001-1326 projectm -c -p bigtenorg.0001-1326 runbot -n bigtenorg.0001-1326 -v medic-bigtenorg -r 8 -f auth 12. Repeat steps #8-10 until it is running clear, then run to completetion. 13. After completion, follow a lengthy manual process of checking for known problems that show up in the logs. Sample steps: (meta) if(-e logembway) cat logembway # Check these - something went wrong (meta) grep fixcommentarchive syslog # look at diffs for problems / see also the first "error" step why those didn't get fixed (meta) if(-e logradicalurl) cat logradicalurl | awk -F"----" '{print $3}' # check for legit archive URLs and add to logradicalurl() in medic.nim (meta) grep removearchive2 cbignore # check for embedded templates that should be added to encodeWiki() etc.. Modify the bot code as needed and rerun any articles as needed. To re-run a single article: bugm -n "Feudalism" -r 14. For new archive.today links, need to manually verify each one is working, per a process outlined in the docs. 15. Push 5 diffs up to Wikipedia push2wiki -s5 16. Manually verify the diffs on Wikipedia look good and there are no problems 17. Push the remaining diffs push2wiki -s0 18. Any articles with intervening edits by other users (edit conflicts), reprocess them and upload push 19. Generate statistics and copy-paste into the request from step #1. Add a {{done}} flag to the page. stats bigten.org