Wikipedia:Bots/Noticeboard/Attribution bot proposal
This is a proposal for an Attribution bot or other automatic or semi-automatic procedure to accompany the discussion at Wikipedia:Bots/Noticeboard#Copy attribution bot question or proposal. Its goal is to remedy missing copy or translation attribution in numerous articles, by adding the attribution to the edit summary after the fact, working off a list of article names and other metadaata.
Context and background
[edit]Editors are welcome to copy or translate material from other Wikipedias (or wikis with compatible licenses) as long as they comply with our licensing requirements, which specifies the wording to be added to the edit summary. (This is not optional, and is per our Terms of use.) When an editor is not aware of the requirement or forgets to do it, the required attribution can still be added after the fact, per WP:CWW#Repairing insufficient attribution.
Standard text for the edit summary when a user individually repairs missing attribution as defined at WP:RIA looks like this:
For copied material:
NOTE: The previous edit as of 22:31, October 14, 2015, copied content from the Wikipedia page at [[Exact name of page copied from]]; see its history for attribution.
For translated material:
NOTE: Content in the edit of 01:25, January 25, 2023 was translated from the existing French Wikipedia article at [[:fr:Exact name of French article]]; see its history for attribution.
The automated procedure uses a modified version of this text, adding the username of the user who made the edit, and an appended bot id, and substituting in data line and runtime parameters for parts of the attribution statement as needed. So, the second example might look like this:
NOTE: Content in the edit of 01:25, January 25, 2023 was translated by [[Special:Contributions/User1|user1]] from the existing French Wikipedia article at [[:fr:Exact name of French article]]; see its history for attribution. (by AttriBot)
Task
[edit]The bot's task would be to add the proper attribution wording to the article history. Input to the bot would be a page containing a list of articles, where each article would be accompanied by one required parameter identifying the source of the copy or translation. Additional optional parameters would be available for customizing the attribution statement. A few run-time parameters would be available to avoid the need for constant repetition in the input file. The output would be a dummy edit to each article, along with an edit summary using the wording given at WP:RIA, substituting in the correct wording per the parameters. An output log itemizing the activity would be optional.
Let's provisionally call it 'AttriBot', because it seems like a natural choice.
Input file
[edit]
Format
[edit]The input file consists of multiple lines, where each line contains data identifying one article at Wikipedia requiring retroactive attribution, along with needed parameters. Each data line is in SSV (semicolon-separated variable) format (see § Choice of delimiter), containing two to five fields:
* ArticleTitle; SourceTitle; Timestamp; Type; Comment
where:
*
– leading asterisk (or colon) in column one to stop wrapping when user views their file (optional blank(s) after it)ArticleTitle
– title of the page at en-wiki containing unattributed text copied or translated from a foreign Wikipedia or compatible wiki (required)SourceTitle
– title of the source page; may contain a prefix with optional leading colon, a WP code, and another colon; e.g.:de:Schutzstaffel
. (required)Timestamp
– a string representing a timestamp as shown in the revision history (optional; no default) e.g.,02:45, 8 February 2021
Type
– eithercopy
ortranslate
; if present and valid, overrides the value of runtime param 'type'. (optional; default=runtime param type; if both absent, thencopy
); an invalid value of 'type' is treated as a comment and echoed to the output.Comment
– a user-given string to be appended to the bot-generated summary for this line.
Each data lines must represent a singled edit that is missing a required attribution. All data lines must correspond to a single user, normally the user in whose userspace the input file is located (but see § Runtime params).
Comment lines
[edit]Within the input file, there is no formal definition of a "Comment line", as there is in some programming languages, with /* comment delimiters */ and the like. By appropriate use of inclusion control tags, the input file may contain lines that are effectively comments. Surround any material that is not part of an input file data line with paired noinclude tags.
An example:
<noinclude> My German translations:</noinclude> Article1; German Article1; timestamp1 Article2; German Article2; timestamp2 Article3; German Article3; timestamp3 <noinclude> My French translations:</noinclude> ArticleA; French articleA; timestampA ArticleB; French articleB; timestampB
Lines within paired noinclude tags are skipped by the bot, and therefore do not appear in debug or log output; they are strictly for the convenience of the creator of the input file.
Note: Blank lines are seen by the bot and output to the log.
Special considerations
[edit]- right-to-left scripts – pay attention when including a SourceTitle (param 2) that is from a language with a right-to-left script such as Hebrew or Arabic. These may require a trailing left-to-right mark character (Html entity
‎
) to prevent the next parameter in the input line from being garbled.
Location
[edit]Normally, the input file should be a subpage in your user space. See § User requests and administration for details.
Runtime params
[edit]user
– specifies the username for the edit summary, e.g. 'earlier edits by USER were copied/translated...'; if missing, taken from the username part of the input filename, if found in Userspace, otherwise error: missing user. A run cannot proceed without a single identified user.type
– one ofcopy
ortranslate
; overridden by paramtype
in the input linelog
– when =y
, copies lines from intput file to speciifed log, and appends the RIA edit summary line; optional; default=y
; set ton
to turn off logging.debug
- when =y
, just produces the log, but doesn't edit any files
Output format
[edit]Generates a RIA-style attribution summary of the form:
NOTE: Content in the edit of $TIMESTAMP was $TYPEd by [[Special:Contributions/USER|USER]] from the existing LANGUAGE Wikipedia article at [[$SOURCETITLE]]; see its history for attribution. $COMMENT (by AttriBot)
Caps in the model RIA edit summary above show substitutable items ('$
' indicates a field in the data line):
- TIMESTAMP is the value from the Input file data line parameter 3. If missing, in the edit of $TIMESTAMP changes to in previous edit(s).
- LANGUAGE is derived from the WP codeprefix code]] in the data line SourceTitle (param 2)
- TYPEd is either
copied
ortranslated
, and comes either from the runtime parametertype
, or the data line parameterType
(param 4). - USER is either from the {{ROOTPAGENAME}} of the input file, or from the runtime parameter
user
. - SOURCETITLE is from the Input file data line parameter 2, SourceTitle.
- COMMENT is an optional free-form user comment, from data line parameter
Comment
(param 5)
Note: LANGUAGE is derived from the prefix in the SourceTitle (if any), where prefix is a WP code as in the table at List of Wikipedias#Wikipedia editions
The generated edit summary is added to the article whose title is given in data line param 1, SourceTitle
.
Note: If this becomes an approved bot, the parenthetical id at the end of the generated edit summary should be altered to link to the bot page, e.g.:
(by [[WP:BotName/doc|BotName]])
.
Logging
[edit]Each data line is echoed to the log, followed by the edit summary line, indented and preceded by a increasing integer count value, starting at 1 for the first data line. Logging is enabled by default, but may be disabled via a runtime parameter.
The log file is written to subpage /log
of the input file given by the user. But at operator discretion, the log file may be created locally to the run location, with the /log page as a redirect to it, or otherwise.
Alternatively, subpage: /log/RUNTIMETAMP
, if desired.
Debugging
[edit]If debug mode is enabled via a runtime parameter, the log is generated, but no articles are modified.
Issues
[edit]Choice of delimiter
[edit]The input file should be an SSV file (semicolon-separated variable). CSV (comma-separated variable) format is standard, but comma is a very common title character (especially in place names), but semicolon is not, so it is a better field separator than comma; only 63 article titles have semicolons in the title (and most are redirects) so very unlikely to collide with article names needing attribution.
Examples
[edit]A run with runtime params |user=Mathglot
|type=translate
|log=1
and this sample input file:
* Liberation of France; fr:Élections constituantes françaises de 1945; 00:28, 4 February 2021 * Liberation of France; fr:Assemblée consultative provisoire; 02:45, 8 February 2021 * Liberation of France; Battle of Gabon; 02:50, 3 February 2021; copy; From rev 1003043161. /* pt */ * Caixa 2; pt:Caixa dois; 22:09, 26 November 2019 * Brazilian criminal justice; :pt:Prisão preventiva; 01:43, 22 April 2024; * Brazilian criminal justice; :pt:Direito penal * Brazilian criminal justice; :pt:Justiça Militar do Brasil; 4:54, 15 April 2024 * Brazilian criminal justice; :pt:Código de Processo Penal brasileiro; 08:55, 8 July 2023; /* de */ * Anti-gender movement; :de:Anti-Gender-Bewegung; 04:14, 29 August 2021; ; From rev. [[:de:Special:Permalink/214798358#Deutschland|214798358]] * Weimar Republic; :de:Weimarer_Republik#Frühe Krisenjahre (1919–1923); 01:40, 21 December 2020; * War guilt question; de:Kriegschuldfrage; 19:10, 25 February 2021; From rev. 207898075 * War guilt question; Color book; 21:13, 25 February 2021; copy;
would generate the following output log:
1. Liberation of France; fr:Élections constituantes françaises de 1945; 00:28, 4 February 2021 : Content in the previous edit of 00:28, 4 February 2021 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the French Wikipedia article [[fr:Élections constituantes françaises de 1945]]; see that article's history for attribution. 2. Liberation of France; fr:Assemblée consultative provisoire; 02:45, 8 February 2021 : Content in the previous edit of 02:45, 8 February 2021 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the French Wikipedia article [[fr:Assemblée consultative provisoire]]; see that article's history for attribution. 3. Liberation of France; Battle of Gabon; 02:50, 3 February 2021; copy; From rev 1003043161. : Content in the previous edit of 02:50, 3 February 2021 by [[Special:Contributions/Mathglot|Mathglot]] was copied from the Wikipedia article [[Battle of Gabon]]; see that article's history for attribution. From rev 1003043161. /* pt */ 4. Caixa 2; pt:Caixa dois; 22:09, 26 November 2019 : Content in the previous edit of 22:09, 26 November 2019 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the Portuguese Wikipedia article [[pt:Caixa dois]]; see that article's history for attribution. 5. Brazilian criminal justice; :pt:Prisão preventiva; 01:43, 22 April 2024; : Content in the previous edit of 01:43, 22 April 2024 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the Portuguese Wikipedia article [[:pt:Prisão preventiva]]; see that article's history for attribution. 6. Brazilian criminal justice; :pt:Direito penal : Content in previous edit(s) by [[Special:Contributions/Mathglot|Mathglot]] were translated from the Portuguese Wikipedia article [[:pt:Direito penal]]; see that article's history for attribution. 7. Brazilian criminal justice; :pt:Justiça Militar do Brasil; 4:54, 15 April 2024 : Content in the previous edit of 4:54, 15 April 2024 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the Portuguese Wikipedia article [[:pt:Justiça Militar do Brasil]]; see that article's history for attribution. 8. Brazilian criminal justice; :pt:Código de Processo Penal brasileiro; 08:55, 8 July 2023; : Content in the previous edit of 08:55, 8 July 2023 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the Portuguese Wikipedia article [[:pt:Código de Processo Penal brasileiro]]; see that article's history for attribution. /* de */ 9. Anti-gender movement; :de:Anti-Gender-Bewegung; 04:14, 29 August 2021; ; From rev. [[:de:Special:Permalink/214798358#Deutschland|214798358]] : Content in the previous edit of 04:14, 29 August 2021 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the German Wikipedia article [[:de:Anti-Gender-Bewegung]]; see that article's history for attribution. From rev. [[:de:Special:Permalink/214798358#Deutschland|214798358]] 10. Weimar Republic; :de:Weimarer_Republik#Frühe Krisenjahre (1919–1923); 01:40, 21 December 2020; : Content in the previous edit of 01:40, 21 December 2020 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the German Wikipedia article [[:de:Weimarer_Republik#Frühe Krisenjahre (1919–1923)]]; see that article's history for attribution. 11. War guilt question; de:Kriegschuldfrage; 19:10, 25 February 2021; From rev. 207898075 : Content in the previous edit of 19:10, 25 February 2021 by [[Special:Contributions/Mathglot|Mathglot]] was translated from the German Wikipedia article [[de:Kriegschuldfrage]]; see that article's history for attribution. From rev. 207898075 12. War guilt question; Color book; 21:13, 25 February 2021; copy : Content in the previous edit of 21:13, 25 February 2021 by [[Special:Contributions/Mathglot|Mathglot]] was copied from the Wikipedia article [[Color book]]; see that article's history for attribution.
Example notes:
- 1 Simplest case: local article; foreign article; timestamp. Also #2, 4, 5, 7 et al.
- 3a: type=copy, so, "was copied from.."
- 3b: no foreign prefix in source file, so: "...from the Wikipedia article" (not: "the English Wikipedia article...")
- 4: the comment line above it is just echoed to log.
- 6: No timestamp: "previous edit of hh:mm, dd Month yyyy by USER was..." ⟶ "previous edit(s) by USER were"
- 9: two semicolons shows an empty 'type' field in arg4, so output is still the default "translated from" language. The arg5 value is a trailing comment to be echoed to the log.
- 11; user forgot the doubled semicolon for the empty 'type' placeholder, but
From rev. 207898075
is not a valid type, so just echo it to the log as a comment, and use the default 'translated from'. - 12; same as #3b.
- All lines: the identifier
(added by AttriBot)
at the end of each line not shown in the example above.
Requests
[edit]Single user
[edit]Users wishing to have a list of their articles adjusted for missing copy or translate attribution, should make up a list of their articles in their own userspace. Suggestion: use a WP:User subpage, like Special:Mypage/Attribution set 1 (or ...2, etc.).
Reminder: all data lines in the file correspond to edits by a single user. The user is normally the user in whose userspace the input file is located, but this can be overridden by a runtime parameter. Do not issue request for edits by another user without their permission, or approval by an admin.
Admins and multiple users
[edit]If you are an admin or someone requesting a run for multiple users, note that they cannot be placed in a single input file, but must be one file per user. If you need to file for multiple users, please create one input files for each user. The files may be subpages of each user in question, or may be all in your user space; in the latter case, hopefully with some logical filenaming structure, such as:
- User:Admin/Attribution requests/Example_User_1
- User:Admin/Attribution requests/Example_User_2
- User:Admin/Attribution requests/Example_User_2/Set_2
and so on. If your input files are in your own user space but pertain to other users, in your run request, be sure to carefully specify which file corresponds to which user, as changes to edit history cannot be undone.
Where to file
[edit]Requests for attribution runs should be made at User talk:DreamRimmer, which should include the filename(s) of your input file(s). You can request a debug run, meaning you will get back a list of all of the edit summaries that would be applied, but no files will actually be changed. If your input file is in User:Myuser/Attrib set one then your log will be copied to (or redirected from) User:Myuser/Attrib set one/log
Operation
[edit]Due diligence
[edit]This procedure adds content to the edit summary that cannot be removed, because the edit history cannot be altered. Care must be taken that the correct summary is written, as errors may leave incorrect information in the history permanently. An incorrect entry would have to be followed up by another edit to leave a second edit summary in the history, negating or correcting the first one.
Semi-automated operation
[edit]For the time being, runs are semi-automated and run by an individual. The procedures and suggestions below are likely to change as the process matures.
Error handling
[edit]Invalid input line
- All lines are echoed to the log, unless logging is turned off. Lines in the input file must conform to the Input file format. Invalid input lines are echoed, but article processing is skipped, and an indented error message (e.g., Invalid; skipped, or similar is added to the log after the echoed input.
Considerations
[edit]Do a large number of articles all appear to be about the same topic, or is there some reason to believe that a run might hit many articles on individual watchlists? (How do other bots deal with question?)
Throttling
[edit]Note that exponential backoff is mentioned and linked from the § Best practices section of Help:Creating a bot.