Wikipedia:Bots/Requests for approval/TronBot

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

Request Expired.

TronBot

Operator: OrenBochman (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 00:11, Tuesday May 8, 2012 (UTC)

Automatic, Supervised, or Manual:

Automatic;
Supervised during development

Programming language(s):Java,Python

Source code available:Standard pywikipedia

Function overview:

TronBot provides a content analytics dashboard with metrics on current level of policy compliance of the article content.
Its primary motivation is to assist new users to comply with Wikipedia stringent editorial guidlines.
Its secondary motivation is to encourage kindness to new comers during AfD by reducing communication/coordination costs.

Links to relevant discussions (where appropriate):

Edit period(s):Continuous

Estimated number of pages affected:

Pages Tagged for deletion under CSD or Afd about 200-800 pages a day
Additional pages requested via template

Exclusion compliant (Yes/No):Yes

Already has a bot flag (Yes/No):No

Function details:

This is an editorial intelligence and a user engagement tool.
Based on a template request the bot will visits the page and processes the article. It then provides a template based dashboard. The tabular dashboard will contain estimates of problems

later as data is collected it will provide an estimated likelihood for deletion due to the editorial problems investigated. Metrics will include:

normalised unique information estimate.
notability estimate - based on external links, optional notability disclosure template and a normative gold standard.
NPOV compliance estimate - based on content analytics of POV sources based on POV disclosure template.
style compliance estimate - based on typos and grammar and style guide issues found.
an estimate as to the leading cause of non conformance will be indicated as well.
additional normative metrics as they are developed.

once a page is edited the dashboard will be recalculated after a delay.

The initial focus on Afd is to improve coordination between (good will) editors afd process participants during Afd discussion.

Discussion

So in a nutshell, this bot is a form of AI and learns and improves itself at tagging articles for deletion and welcomes new users?—cyberpower ^Chat_Offline 01:02, 8 May 2012 (UTC)[reply]

I'm not sure where you got that impression, the RFA does not mention greeting users nor AI - perhaps you were reviewing WP:FDB?

TronBot will do bean counting and then run semantic filters to estimate distance from a gold standard. It will not greet new users - it will provide much-needed quality estimate for articles (by users old & new) . It will do this on demand and where the Afd process is centered about quality issues.BO; talk 06:43, 8 May 2012 (UTC)[reply]

"Manual:Automatic" - what? Bulwersator (talk) 07:32, 8 May 2012 (UTC)[reply]

I have corrected the RFA - During the development the Bot's output will be reviewed - Starting with around 5-25 edits a day. In the long run it will be increasingly automatic - it will provide a feedback mechanism in the dashboard.

"Source code available:Standard pywikipedia" - AFAIK this functionality is not provided by by standard pywikipedia Bulwersator (talk) 07:32, 8 May 2012 (UTC)[reply]

Source for all Wikipedia interaction will be in Standard pywikipedia. The Content analytics will be developed in Java based on open-source packages Solr, Uima & Mahut. Integration will be in solrpy BO; talk 08:11, 8 May 2012 (UTC)[reply]

Can you provide a sample of what this dashboard would look like? Where would it be placed? (On the talk page of the article, in the bot's user space?) — madman 14:07, 8 May 2012 (UTC)[reply]

I have a UI design for a SimoneBot which incorporates the TronBot U.I. and behaviour. But TronBot will provide a much smaller subset - would Simone's UI be ok? Tron's UI was not described here since it is subject to change based on A/B testing. SimoneBot may be proposed here once I have some gained some experience with Tron and at RfA. BO; talk 15:57, 8 May 2012 (UTC)[reply]

Regarding the location, the options I consider are:

at a /QA sub page to minimise impact in the talk page's history + tranclusion of the QA page at the top of the talk page under the Project Importance/Quality Template if one exists. The idea behind /QA is to introduce an additional tab to the page in the future which will feature a far more detailed UI. However in the current scop of TronBot /QA is a hidden supbpage.
If /QA is problematic Then TronBot will use an article talk page.
If talk will be problematic TronBot's user space may be used for storing all the DashBoards.

I'd be greatful for any recomendations on this.

BO; talk 15:59, 8 May 2012 (UTC)[reply]

Ditto madman re providing a sample of dashboard. What is the SimoneBot's UI you refer to? I see the account was created, but there is no reference to anything dashboard-like. — HELLKNOWZ ▎TALK 11:18, 9 May 2012 (UTC)[reply]

Note: I filled in the field above for the bot flag (no) for easy reference to see whether or not it does. Just want to pick you up on some tiny little things: you said RfA a little higher up - the term is BRfA (unless you want an adminbot :)); and also the name is TronBot - per WP:BOTACC the name must reflect either the operator's name or its function. Tron does neither of these, so would you mind either changing it or having it rejected on those grounds? I know that sounds harsh, but that happened to me once so be warned. Rcsprinter (deliver) 17:09, 8 May 2012 (UTC)[reply]

Thanks for the kind assistence - I am doing my best. Regarding the name - it is not arbitrary, and through allusion to it's namesake Tron thus describing the bot's overall behaviour and strategy. So I think the requirments of this policy are met at the normative level BO; talk 06:32, 9 May 2012 (UTC)[reply]

Agree with Rcsprinter. Reading the function details, I do not see how "tron" relates to any of those. — HELLKNOWZ ▎TALK 11:11, 9 May 2012 (UTC)[reply]

Regarding the task itself, what does "likelihood for deletion" have to do with any points except "2. notability estimate"? If your goal is to improve AfD discussion, then it is only notability -- i.e. mainly WP:GNG and sourcing. I don't mind the additional info, but I'm not sure how you relate that to the deletion process. But, again, it is very hard to judge without an example of a dashboard. — HELLKNOWZ ▎TALK 11:18, 9 May 2012 (UTC)[reply]

As indicated in the BRfA, the goal is to improve article quality and engage users by providing clearer quality goals. AfD is just the most significant point for failure for article quality and thus the logical place to get started.

I think you might get a broader picture at SimoneBot Proposal on meta - however SimoneBot is about improving editors and TronBot is more centered about improving articles.

Note the UI it will be based on the article status div in Simone's UI.

Quality Thresholds in the dashboard will be for relative to the next quality level's requirements.

Probabilities of CSD and AdD can be calculated from results published in academic papers on the subjects. BO; talk 12:24, 9 May 2012 (UTC)[reply]

Trial

Approved for trial (20 dashboards to userspace). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Okay, let's see what the output actually is, otherwise we could be here for a while guessing and reclarifying. So choose some ≈20 articles and post their dashboards in the bot userspace. — HELLKNOWZ ▎TALK 13:03, 9 May 2012 (UTC)[reply]

I'm still not sure about the account name. I'm all for the task if the trial goes well, but you can't ignore the bot policy. Rcsprinter (warn) 10:31, 12 May 2012 (UTC)[reply]

Of course; as far as I'm concerned the account name issue still stands. I'm just assuming the botop will create a new account for the trial. But even if not yet, since the trial is in userspace and very brief, it shouldn't cause any major problems. — HELLKNOWZ ▎TALK 10:37, 12 May 2012 (UTC)[reply]

Thanks for pressing the issue of the account name. I had completly forgoten about this issue and I would certainly not wish to set a precedent for wilie nilie naming of robots. Accordingly the issue will be resolved shortly (24 hours) on the bot user's account page. BO; talk 12:37, 17 May 2012 (UTC)[reply]

Comment This sounds icky on editorial grounds, as editorial decisions should be made by human judgment rather than "metrics" ("you get what you measure"). I don't think it will increase "kindness" since canned and automated messages of any sort (at least in my personal experience) generally come across as bureaucratic. If somebody wants those automated assessments and requests one for a specific article, then fine, but they shouldn't be done unrequested. I don't see why a bot like this should edit articles directly. It could be on toolserver instead, with a link added to the afd/csd template similar to the existing links for google searches, and someone wanting the assessment could click the link. So, I don't think a bot is justified. 66.127.55.46 (talk) 19:18, 17 May 2012 (UTC)[reply]

Thanks for your comment: I understand your skeptecism about the capability of NLP enabled content analytics. I prefer to avoid hype on this project however it is neither:

GIGO - Garbage in Garbage out.
GIGO 2.0 - Garbage in Gold out.
It is not making edits - but supporting content creation.

There are a large class of actions that can also be done by a machine in text processing. Rather than automaticaly commit such edits - we simply estimate what we could improve. Also a machine can would be more objective than a person.

Regarding kindness - I don't think that furnishing an new comer creating an article is the act of kindness he requires (I agree about not greeting people with a bot) The idea is based on a much deeper type of thinking. These dashboards are supposed to enable a greater level of altruistic behaviour in a wiki. It would allow experienced users in a new page patrol etc to see without diffs that the article is heading towards a Quality Goal on an objective quality scale. It also shows that the user who might not know the 500 WP:??? policy shortcuts is persuing the top 5 policies as best as he can.

All other things being equal such an editor/article should get more WP:GW than what is going on today. I am hoping for enougth good will to allow work to proceed without immediatly demolishing the house as it is built. Even if the article is never going to be notable - it would be better for the editor to figure it out by himslef using ... just a dashboard.

Regarding your recomendation on attching this to say CSD template - this is precisely the behaviour I am trying to avoid. I am thinking in the long run about a Article Quality tab next to edit with

reports indicating how sources relate to the article (semantic coverage)
analytics of source quality. (commercialism, time stability, wikipedia wide rank)
a break down of social cooperation
indication of the page popularity (incoming links, page views) BO; talk 17:09, 18 May 2012 (UTC)[reply]
indication of the quality of internal links - (looking at casino stubs - they all link to US, State and their web site.)

as well as other factors that can give a better indicator of notability/quality then Counting how many resutls we get for the title in Google. BO; talk 17:18, 18 May 2012 (UTC)[reply]

Simone and her UI are in meta at http://meta.wikimedia.org/wiki/Wikimedia_Fellowships/Project_Ideas/A_Digital_Wiki_Coach_To_Provide_Focused_Advice. — Preceding unsigned comment added by OrenBochman (talk • contribs)

Making any move on the trial yet? Rcsprinter (talk) 15:50, 29 May 2012 (UTC)[reply]

Hi - I have been busy writing a related paper for wikisym. But I plan to make a sprint on the protoype during the Hackathon in Berlin. — Preceding unsigned comment added by OrenBochman (talk • contribs)

That's OK then. When is that Hackathon? Please sign your posts. Rcsprinter (chat) 17:03, 30 May 2012 (UTC)[reply]

(brfa stalker) June 1-3. — madman 17:12, 30 May 2012 (UTC)[reply]

Its starting tommorow !! BO; talk 04:51, 31 May 2012 (UTC)[reply]

It doesn't appear to have edited during that time, or indeed at all. Are you going to go soon? The BRFA will be marked expired if nothing happens. Rcsprinter (articulate) 22:03, 9 June 2012 (UTC)[reply]

I'm finalizing my toolserver registration - also there are many other issues that came up in Berlin that I have to take care of. This bot is still top priority but it involves lots of theoretical work being documented on my pages at Meta. I'm sure that getting quality work is more impotant than getting it fast. OrenBochman (talk) 14:09, 11 June 2012 (UTC)[reply]

You mean you're not registered on Toolserver yet? I think that should've been taken care of before pushing this BRFA. If your bot isn't ready yet, why are you posting this BRFA?—cyberpower ^Chat_Online 19:28, 11 June 2012 (UTC)[reply]

Not really, Cyberpower. You can have the code all ready and just run the bot like that for a bit in cmd.exe if you want before the toolserver account is ready. Some bots don't even run on toolserver. Rcsprinter (deliver) 19:41, 11 June 2012 (UTC)[reply]

I know but if he planned to start the task from toolserver, I would've waited until I was granted access to it before I pushed for this BRFA. Being granted access can take months, especially with nightshade being down and all.—cyberpower ^Chat_Online 19:51, 11 June 2012 (UTC)[reply]

Sorry if I did things backwards, I am a bit paranoid and did not see any point in coding if the bot request was not going to be approved. Regarding toolserver You are wrong they are very fast - and also with wikimedia labs - it takes about 48 hours max. I'm just super busy with wikimania; wikisym; patches needing testing on bugzilla; and coordinating with new people in this project. I am scheduled to demo this and another bot at wikimanaia which is in less then a month so that is my timeline and it will probably include some new pywiki code too! Btw the code is now running off the command line - but using single runs on my own account still. OrenBochman (talk) 20:50, 11 June 2012 (UTC)[reply]

You are wrong. It can take longer than 48 hours. Mine took about 14 days to get approved and gain access.—cyberpower ^Chat_Online 22:01, 11 June 2012 (UTC)[reply]

So how's the Toolserver account application going? It's been a full month since you promised to start the trial, and nearly two months since it was approved for trial. Oh, and the account name will still need sorting out as well. Rcsprinter (rap) 17:03, 3 July 2012 (UTC)[reply]

Request Expired. – No results of a trial within almost two months, no response to questions for the operator within a week. This request is closed without prejudice as to submission of a new request when code is ready for testing. — madman 17:15, 15 July 2012 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.