Jump to content

User:Crispy1989/Dataset

From Wikipedia, the free encyclopedia

This is the page for the new Cluebot's engine neural network training set. There are two subpages of this page that correspond to edits that are known to be vandalism and edits that are known not to be vandalism.

People can help this effort by adding edits to one of the two subpages. Guidelines on how edits should be classified are here:

  • An edit should be specified by a URL like this:
* [http://en.wikipedia.org/w/index.php?title=User:ClueBot/Sandbox&diff=200695142&oldid=195000302 User:Cluebot/Sandbox]

The text that comes after the actual link doesn't matter - only the link will be used.

  • An edit should only be classified as vandalism if it is unequivocally vandalism and the bot should classify it as such.
  • An edit should only be classified as not vandalism if it is unequivocally a constructive edit and the bot should classify it as such.
  • If an edit is ambiguous as to whether or not it is vandalism, it should not be classified in either category.
  • Examples of all types of vandalism (as the bot should classify it) should be added to the vandalism category.
  • Examples of all types of non-vandalism edits should be added to that category, including vandalism reverts. The bot will process all edits according to these classifications, so numerous examples of all types of edits need to be added.
  • At present, this should only encompass the English Wikipedia.
  • These lists themselves may need to be reviewed by others to make sure the information is correct.
  • The more examples there are, the better. Thousands of data pairs are needed. Please help.

The respective subpages are here: Vandalism and Constructive Edits