Wikipedia:WikiProject Copyright Cleanup/2023 backlog drive

Instructions

For new users

Firstly, thank you for taking the time to help clear the backlog! Your efforts are appreciated. Copyright is complex and nuanced topic to understand so we recommend you start on the easier backlogs to clear, linked below. For CCI, these are mainly pages that involve copying from non-free websites, so no offline research is required. Category-wise, all the suspected violations should have a source URL.

An exhaustive list of instructions for handling text-based copyright violations is available at the top of the copyright problems page. A good guide on how to start editing at CCI is User:Moneytrees/CCI guide. A brief rundown of handling CCIs, but no substitute for reading the relevant pages, is below:

Check for dead links, if there are, use IABot to restore them
Run the page through Earwig's copyright detector to get a cursory score. Often mirrors copy from Wikipedia, so make sure to identify these and ignore them.
Check the article' sources and compare it to existing text. WP:REX may be helpful for hard to access sources.
If you have identified any possibly infringing content with a source
- Check the page's licence: is it compatible per WP:COMPLIC?
- If the content is not compatible, remove or rewrite it with a link to the source material in the edit summary
- Remove the diff from the CCI page and mark it with {{y}}. Mark the article talk with {{CCI}}
If you have identified any possibly infringing content without a source
- In case of content added by repeat copyright violators at CCI, the content may be presumptively removed
  - Please note this in your edit summary, linking to the CCI page if applicable
- Otherwise, if you still suspect the content of being plagiarised from a non-free source, removing it under other policies (e.g. if it's unreferenced) may be appropriate.

Please do not hesitate to ask any experienced editors for help

For returning users

Welcome back, and thanks for taking part. This drive is mainly focusing on CCI, and the rewards system is available below.

Rewards system

For articles at CCI...

Handling a diff <1k bytes - one point
Handling a diff >1k bytes - two points

For everything else...

Handling any article - two points
Reviewing all diffs of an article - four points

Awards

Image	Minimum	Template
	5 points	The Invisible Barnstar
	10 points	The Working Wikipedian's Barnstar
	25 points	The Tireless Contributor Barnstar
	50 points	The Cleanup Barnstar
	100 points	The Copyright Cleanup Barnstar
	200 points	The Great Copyright Drive Barnstar
	500 points	The Order of the Superior Scribe of Wikipedia
	Re-reviewing 25 articles	The Teamwork Barnstar
	In addition, the person who accumulates the most points during the backlog elimination drive, will receive the Copyright Review Medal of Merit

Beginner friendly CCIs

Category backlogs to clear

Construction

Currently, there are significant backlogs in the three principle queues of copyright cleanup: CCI, CP and CopyPatrol. Other parts of the projects have made significant progress with clearing their backlogs through gamifying reviews and providing rewards for a certain number of points. Whilst a backlog drive is appealing, a gamified approach may not be effective in respect to copyright.

The Backlog (August 2023)

Based on rough estimates and database counts, copyright backlogs on Wikipedia are:

CCI currently has over 100,000 remaining diffs to be reviewed
CopyPatrol currently has ~70 open reports at a time
CP is at a manageable level for now

Rough ideas

Backlog drive where we reward points for older CCIs
Focus on a large CCI that's easier for beginners to tackle (rtkat3, werldwayd, etc.)
Tackle low-risk stuff towards the end of CCIs
Clear out Category:Copied and pasted articles and sections with url provided, so it doesn't have to be listed at CP
- Not too big so we could evaluate each once like a CCI review
Bot to collate number of articles fixed
?

Development

Rewards system

Most backlog drives make use of a point/article system, and this would make sense here: barnstars, etc. could be given out for certain criteria in a similar manner to the GAN drive. Finding points can be done automatically relatively easily: the NPP drive made use of bots to collect data such as the backlog size and user points.

The main problem is quality. Unlike the above, it is much more difficult to review individual users, not only because of the sheer number of pages, but the fact that there are a much more finite number of editors with sufficient copyright experience as GAN/NPP experience in the above drives. However, we could still probably get a relatively high standard with a set sample, which will have to be decided. One per 25 pages may be a good starting point but if this is an issue we can amend as appropriate.