User talk:Emijrp/AVBOT
A thought
[edit]I don't know if this is the right place for my thought.
We can, to a large extent, predict whether a typically bad word should nearly never appear in an article based on the subject of the article. For example, the word "cunt" should nearly never appear in articles that cover physics subjects only, while we are less certain if the same word appeared in articles that cover entertainment subjects.
My idea is to compile 2 lists of article titles based on the portals and wikiprojects templates that appear on their talk pages.
First, we should sort portals and wikiprojects into 2 categories according to the likelihood of bad words legitimately appearing on them: 1) low 2) mid to high.
Then we should sort the article titles into 2 categories: 1) articles that are listed in the low categories only 2) articles that are listed in the mid to high categories or that are not listed on any category yet. The list should be regularly updated (maybe every month?).
Then we can compile 2 separate scoring systems and decide which one to match an edit with based on the article's title. Sole Soul (talk) 19:35, 24 September 2010 (UTC)