Jump to content

User talk:Sean.hoyland/Archive 16

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 10Archive 14Archive 15Archive 16Archive 17

Ramat Eshkol is located in Northern Jerusalem, not East Jerusalem

I included the fact that it's an Israeli settlement, and changed the location of the neighborhood from East Jerusalem to Northern Jerusalem. I also included the population of the neighborhood. Wikieditor738 (talk) 12:30, 30 January 2024 (UTC)

To editor Wikieditor738: Did you read WP:ARBECR? If so, why are you editing the article rather than making an edit request on the talk page? Sean.hoyland (talk) 12:40, 30 January 2024 (UTC)
Where does it say that only extended-confirmed users can edit on the Ramot Eshkol article? Wikieditor738 (talk) 12:45, 30 January 2024 (UTC)
At the top of the talk page. But you should be aware that all articles "related to the Arab–Israeli conflict" broadly construed can only be edited be extended-confirmed users even when that ARBPIA template is absent and even when extended confirmed protection has not been implemented on the article. Israeli settlements are clearly within scope of the restrictions. You should undo your edits and make an edit request. Someone will handle your request. Sean.hoyland (talk) 12:53, 30 January 2024 (UTC)
To editor Wikieditor738:, this edit indicates that you know how to undo edits. You also now know that WP:ARBECR applies to the article and that you should undo your edits and make an edit request. Are you going to do that? Sean.hoyland (talk) 14:06, 30 January 2024 (UTC)
Ok Wikieditor738 (talk) 14:09, 30 January 2024 (UTC)
I undid the edits Wikieditor738 (talk) 14:11, 30 January 2024 (UTC)
Thank you. Sean.hoyland (talk) 14:15, 30 January 2024 (UTC)

Can I borrow/steal your visualiser tool?

You used it ‚on me‘ and I thought it was really interesting. Is it something someone who is a little bit of a Luddite can use, and if so, would it be possible to get it?

Thanks in advance! FortunateSons (talk) 02:40, 13 February 2024 (UTC)

Unfortunately it's still very much work in progress, along with a number of other things I'm trying to work on mainly to help myself understand the state of affairs in ARBPIA. I wasn't planning on releasing anything, it's more about looking at what might be possible and useful. Also, I'm not a big fan of the Wikimedia Cloud Services development environment so I'm moving things over to my own environment. But let me know if you would like to see that kind of information for any editors and I'll generate the plots. Sean.hoyland (talk) 09:59, 13 February 2024 (UTC)
I understand, thank you very much for the kind offer! FortunateSons (talk) 10:03, 13 February 2024 (UTC)
Hi. Neat graphs. Can you do me? Is it possible to do my first 500 and my last 500? I'm just curious what the graphs would look like and if there's a difference. Thanks! Levivich (talk) 22:07, 15 February 2024 (UTC)
Sure, will do. Bear with me as I'm in the middle of migrating/rewriting/repeatedly breaking the code. Sean.hoyland (talk) 01:40, 16 February 2024 (UTC)
Here's your first 600. I'm connecting to Wikimedia's replica databases rather than the API, so they may not have your very latest edits, but I'll have a look. Sean.hoyland (talk) 01:52, 16 February 2024 (UTC)
Cool, thanks! Levivich (talk) 02:45, 16 February 2024 (UTC)
I'm still a bit confused by how Wikipedia counts 'edits'. The account information web page says your enwiki edit count is over 36,000. But both the replica database (which is up to date right now) and the API (which is live) agree that your enwiki revision count is more like ~32740, including new pages and deleted revisions. And I just pulled all of your contributions out the API without filtering, so I'm puzzled by the mismatch... I must be missing something. Sean.hoyland (talk) 08:43, 16 February 2024 (UTC)
The privileges required to see those particular deleted edits is the missing something. Sean.hoyland (talk) 10:41, 16 February 2024 (UTC)
Thanks again for doing this--the visualization is interesting. Levivich (talk) 04:42, 20 February 2024 (UTC)
To editor Levivich: There's also this one for interest, an edit count heatmap - day of week vs time of day, for all of your edits...or at least the 32,777 edits the system hasn't hidden behind an event horizon. Sean.hoyland (talk) 08:41, 20 February 2024 (UTC)
I'll never tell what's in those 3,470 edits! muahahahaha
(Actually I think it's five years worth of deleted sandbox pages.)
Thanks once again! I think your data driven approach could help bring some objectivity to the whole "gaming" issue. Levivich (talk) 15:38, 20 February 2024 (UTC)
Hi sorry to bother you again, but do you know if it's possible to get cumulative lifetime "page byte size change", i.e. how much total text I've added/removed from mainspace? I'm curious what portion of the encyclopedia I've written/deleted, how many 0's after the decimal point. And then I'm curious who's written the most. Do you think that's possible to figure out? Levivich (talk) 01:50, 21 February 2024 (UTC)

<- You might be interested in Wikimedia's "xtools" if you haven't seen them before. There's a whole bunch of handy tools in there. They don't quite address your questions though.

  • "do you know if it's possible to get cumulative lifetime "page byte size change", i.e. how much total text I've added/removed from mainspace?" The total bytes figures are easy. How much text that actually represents is hard to say exactly because the number of bytes per character depends on the character, but assuming most of that text is made up of normal ASCII letters, numbers etc. it will by roughly one to one, one byte for one text character I think. See below for additions and deletions for all workspaces.

Bytes added.

actor_name page_namespace namespace size_change_bytes editcount
Levivich 0 (Main/Article) 2999563 3337
Levivich 1 Talk 7018157 5607
Levivich 2 User 78631 363
Levivich 3 User talk 3247621 4172
Levivich 4 Wikipedia 13719981 10633
Levivich 5 Wikipedia talk 1441343 2306
Levivich 6 File 8673 14
Levivich 7 File talk 94 1
Levivich 10 Template 100475 176
Levivich 11 Template talk 41316 61
Levivich 13 Help talk 15203 24
Levivich 14 Category 213 6
Levivich 15 Category talk 5002 15
Levivich 101 Portal talk 430 1
Levivich 118 Draft 3095 13
Levivich 119 Draft talk 2389 10
Levivich 828 Module 631 2
Levivich 829 Module talk 3933 3

Bytes deleted

actor_name page_namespace namespace size_change_bytes editcount
Levivich 0 (Main/Article) -892661 2703
Levivich 1 Talk -2139330 721
Levivich 2 User -33662 166
Levivich 3 User talk -100578 176
Levivich 4 Wikipedia -4717790 1098
Levivich 5 Wikipedia talk -158053 150
Levivich 6 File -900 5
Levivich 10 Template -5630 45
Levivich 11 Template talk -337 4
Levivich 14 Category -3632 25
Levivich 118 Draft -17 1
Levivich 119 Draft talk -109 1

The other 2 questions...

  • "what portion of the encyclopedia I've written/deleted"
  • "who's written the most"

...are more difficult, and I'm not sure how to do that. It's easy to see the size of mainspace, just the current enwiki article content, excluding images, talk pages etc., and it's small enough to fit on a phone, 60084661068 bytes, about 60Gb it seems. But that doesn't really help because the size of each article and what portion of a user's contributions are preserved over time are dynamic and not easy to track. With xtools you can see that kind of information for individual articles e.g. Nakba (although I assume that is looking at byte volumes per user rather than how much of each user's contributions have been preserved). Sean.hoyland (talk) 11:48, 21 February 2024 (UTC)

Oh, you did it! Thanks! Yes I know xtools, and I think it'd be very cool if someone figured "total authorship" rather than just by-article breakdowns. (Of course, my net +2 MB to mainspace, divided by 60GB current size of mainspace, is not my total authorship percentage, as that would be way too high, and the total amount of text added by everyone would be orders of magnitude higher than 60GB.)
Which API are you using? mw:WikiWho? I assume you can't use that API for all 6 million articles, but in theory, it should be possible to download a recent DB dump, and then use WikiWho's code on all 6 million articles (locally), and calculate a cumulative total... right? I've played around with pywiki and the APIs a little bit but have never attempted working with a DB dump. I bet you could test whether the pareto principle (80/20 rule) applies to Wikipedia. Hypothesis: 20% of editors are reaching 80% of readers. Levivich (talk) 19:05, 21 February 2024 (UTC)
There are so many nice tools available to look at Wikimedia data. I'm not using any of the APIs right now for the most part, I'm using the replica databases (although, like for the APIs, you still need to do it in a server friendly way, by pulling the query results out in bite-sized chunks, or else it will eventually complain). I have a Wikimedia developer account and Toolforge membership so I ssh to a Toolforge server from my own environment (in VSCode), and open an ssh tunnel to the MySQL server behind that machine hosting the replicas. But you don't need a developer account or Toolforge membership or a local DB dump to talk to the replica databases. You can use Quarry, or Apache's Superset, or the PAWS JupyterHub deployment. I started by using the PAWS environment but you don't have freedom of choice for libraries you can keep in your environment there. I'm trying to make things that can use both Pandas/NumPy and Polars/Apache Arrow to do things, but the administrator does not seem to like Polars being there and keeps removing it, hence the move from the bare-bones PAWS environment to my own environment.
It's a bit of a rabbit hole, once you start looking around at all of the data. I started looking because of the canvassing. I'm curious whether it's possible to assemble a set of (biometric-like) signals from user data so that I can ask a model "Who's this?" and it says "It's probably a sock of account A because of X, Y, Z". It's a deep rabbit hole. I should probably be doing something more interesting like maybe looking at the combinatorial space of 8 quarter notes, 8 eighth notes and 1 eighth rest constrained to an A major scale from which Bach selected the opening 16 notes of the 2nd movement of BWV 1015 and listen to all the combinations he didn't choose. We have such nice tools nowadays. It almost seems foolish to use them to look for sockpuppets etc.
I hadn't even noticed mw:WikiWho, so thanks for that tip. I suspect the pareto principle will apply, but maybe the 20% will all be bots. Sean.hoyland (talk) 05:19, 22 February 2024 (UTC)
Having looked at the WikiWho API, which is a whole new rabbit hole, I think it probably is possible to use it to rank authorship. It would boil down to token counting I guess. With their limit of "2000 requests/day for unregistered users, and also a 60 requests/minute limit for all users" it might even be possible to get a good first approximation from the replicas by targeting a sensible subset of enwiki. Their WhoColor API is another new rabbit hole - it would be interesting to color code page histories based on things like EC privileges, whether an account was blocked as a sock etc. Sean.hoyland (talk) 09:31, 22 February 2024 (UTC)
It's an interesting rabbit hole though isn't it? :-) I'm guessing that however done, if I wanted to pull authorship data for 6 million articles, I should be querying a local dump and not somebody else's server, because somebody else would get mad, even if I did it in batches? I know next to nothing about APIs or API etiquette. But anyway, a sensible subset is the way to go, and frankly for my part I wouldn't care about all 6 million articles, knowing that there is a "long tail" of stubs and articles with <10 page views a day, etc. I would start with the 1,000 articles with the most page views and work downwards from there. Or alternatively the 1,000 articles with the most edits. Or maybe just WP:VA3 and up. Any set of well-read, well-edited articles would do. And yeah we're going to find out it's mostly Cluebot writing the encyclopedia :-D Levivich (talk) 19:59, 22 February 2024 (UTC)
One of the use applications is to find "damage" caused by UPE, COI, or POV-pushing editors or sockfarms or meatfarms ("cabals"). Say, for example, Arbcom blocks a group of accounts for whatever reason (UPE, COI, off-wiki canvassing, etc.). We can find what pages those accounts have edited, either alone or together, but it would be nice to know which pages those blocked accounts have high authorship of, as those would be the first to check manually.
Going further, if had some data about sort of what "average" or "normal" authorship looks like across a large sample of articles, and we had some data about what authorship looks like for, say, UPE'd or COI'd articles, that might help identify "biometric"-like patterns that could then be used to find previously-unknown problems.
For example I'd bet that the "authorship profile" of a known UPE-farm would look different even than the authorship profile of five editors who are interested in comic books and edit comic book articles together. You'd expect comic book articles (or any niche area) to be mostly authored by the same few editors, but I bet bona-fide UPE-farms have even a "higher" level of authorship than groups of fans or similarly-interested editors working in niche areas. Know what I mean? Levivich (talk) 20:10, 22 February 2024 (UTC)
So much to think about here. I'm thinking along similar lines on finding ways to measure "damage". The fact that an undercover NGO Monitor employee has the highest edit count at the NGO Monitor article doesn't seem ideal. Sean.hoyland (talk) 11:25, 24 February 2024 (UTC)

BC

I knew that editor as Daveout, and tbh I had no idea at all. nableezy - 15:01, 26 February 2024 (UTC)

Their editing habits are a bit odd. Most people's editing statistics produce nice probability density functions that reflect the fact that we do sensible things like go to sleep, wake up, commute, have lunch etc. But theirs is just a wall of edits.
Sean.hoyland (talk) 15:26, 26 February 2024 (UTC)

Borrowing your software tool (again)

Some of the edits by User talk:Gimmethegepgun seem fishy to me, and at least imply a rather rapid change in editing style after EC. Would you be so kind as to check them for EC-Gaming? FortunateSons (talk) 12:43, 5 March 2024 (UTC)

Bytes added by namespace.

actor_name page_namespace namespace size_change_bytes editcount
Gimmethegepgun 0 (Main/Article) 6979 128
Gimmethegepgun 1 Talk 10598 30
Gimmethegepgun 3 User talk 8705 10
Gimmethegepgun 4 Wikipedia 5735 15
Gimmethegepgun 11 Template talk 503 1
Gimmethegepgun 14 Category 151 1
Gimmethegepgun 100 Portal 27170 278
Gimmethegepgun 101 Portal talk 2046 5

Bytes removed by namespace.

actor_name page_namespace namespace size_change_bytes editcount
Gimmethegepgun 0 (Main/Article) -3617 91
Gimmethegepgun 1 Talk -7 1
Gimmethegepgun 3 User talk -3926 1
Gimmethegepgun 4 Wikipedia -3 1
Gimmethegepgun 100 Portal -9300 165

Sean.hoyland (talk) 13:09, 5 March 2024 (UTC)

Then it’s harmless, thank you! :) FortunateSons (talk) 13:46, 5 March 2024 (UTC)

The flour massacure

Hello there mate, on what grounds did you take down my edit in the talk about the flour massacure? --Amir Segev Sarusi (talk) 13:32, 13 March 2024 (UTC)

It's because you don't have the extended confirmed privilege. It's a requirement for that page, and anything related to the Israel-Palestine conflict. Have a look at the WARNING: ACTIVE ARBITRATION REMEDIES section near the top of the talk page and WP:ARBECR. You can make edit requests. See WP:MAKINGEREQ. Sean.hoyland (talk) 17:27, 13 March 2024 (UTC)

Thanks :)

Hi Sean

Thanks very much for your help with the Artists4Ceasefire article, as I mentioned I saw some edits that also looked strange but again I do not know this topic area at all, if you are interest to look here are a few, I'm just flagging them because they are large removals of text without an edit summary in a similar topic area. Honestly I wouldn't know how to identify any conspiracy theories added in this area so the stuff deleted without explanation jumped out at me.

Thanks

John Cummings (talk) 09:55, 14 March 2024 (UTC)

SPIs

I am confused about SPI. Although the Wikipedia:Signs of sockpuppetry list is exhaustive, using them as arguments to request a SPI is not taken into consideration. It seems that in practice only one criteria allows an SPI: whether the user has restored edits by a banned SP. And then when that does happen and an SPI is opened, an IP check and also edit analyses are performed, which could warrant an affirmative or possible match that results in a ban. Is that correct? And what are my options to investigate whether two accounts that are likely to be SPs despite the lack of restoring edits by banned SPs criteria? Makeandtoss (talk) 12:11, 10 March 2024 (UTC)

@Nableezy: is probably better placed to advise, but I've filed a number of SPIs and I'm not sure I've ever included evidence of someone restoring edits by a banned user. It's certainly not a prerequisite. In general, I would say, it's useful to adopt the same approach as many prosecutors i.e. don't file a case unless and until you have a reasonably high, evidence-based, confidence of success. For reasons that have never really made sense to me, so-called fishing expeditions are not allowed. Suspicion, and even evidence of sockpuppetry, usually isn't enough for me. I have to have a reason to file a report. The user has to be doing something wrong, something harmful. Things that make the topic area worse like persistent POV pushing (civil or not), aggression and deception get my attention. The cases that seem to work best are the ones that keep it simple. Trying to limit the evidence so that it is just enough to justify the investigation seems to help. You can always add more later. Also, you can ask people to critique the report before you file it to check for weaknesses, ways to improve it. It's hard to say what kind of evidence works best given that it is always circumstantial evidence, always behavioral evidence, because that's all we have, especially when checkuser results are unavailable, ambiguous etc. The "Possible signs" list seems pretty complete. And I think it pays to be quite skeptical about that feeling that the evidence that you personally find compelling is, in fact, compelling for anyone else. Sean.hoyland (talk) 13:55, 10 March 2024 (UTC)
I had a pretty convincing SPI with all the signs and the evidence listed last time and it was not considered specifically because they had not reverted banned SPs.
As for how to build evidence, what are the options? Would asking to use one of the tools you’re using be considered bad faith if it turned out to be false, especially since the usernames in question would have to be publicized?
Plus I am not sure what tools are you using. Are they your own, or WP’s? Can I use them discreetly? Makeandtoss (talk) 14:06, 10 March 2024 (UTC)
What was that SPI that was not considered? nableezy - 15:01, 10 March 2024 (UTC)
Yes, I'm curious which case that was too. I think if someone suspects a user is a sockpuppet of a banned user there's no harm in doing some investigation. You don't need to publicize anything unless you file an SPI. My Wikipedia email is always enabled if you want a second opinion. As for tools, none of them are great. I'm trying to build my own, but identifying sockpuppets is quite a challenge, especially as some of them have become quite good at evasion and it's difficult to separate signal from noise, or even know where to look for the signals...there's so much data. But the tools that I find useful are...
  • The 'Strike out usernames that have been blocked' gadget in preferences. That helps to quickly see accounts that have already been blocked at pages that the suspected sock likes to frequent.
  • The classic Editor Interaction Analyser. Although article overlaps are not necessarily significant, they can find intersections that have a low probability of happening by chance. That kind of tool seems most useful (and effective at SPI) when you have a large set of data for someone's sockpuppets. Many intersections with multiple blocked socks is something that seems to help.
  • I make this kind of web page to help me navigate a set of socks. Having links out to their contributions and seeing which accounts left the largest footprints can help with investigation.
  • The XTools edit counter can help sometimes to get a general idea of the user. The idea of a timecard is useful for comparing editors, but I don't like the XTools implementation. Circle size just doesn't work for me so I make this kind of display, which I find easier to understand.
  • Sometimes I look at edit summary patterns, tone, things like that, but I rarely include them in SPI reports. Sean.hoyland (talk) 16:24, 10 March 2024 (UTC)
Also, for interest, I think this ANI discussion and the associated SPI, are quite good examples of what not to do. Sean.hoyland (talk) 00:51, 11 March 2024 (UTC)
@Nableezy: this SPI Wikipedia:Sockpuppet investigations/Marokwitz/Archive. I think some of them were later indefinitely topic banned from IP articles for canvassing if I remember correctly. Makeandtoss (talk) 12:47, 11 March 2024 (UTC)
Okay thanks for sharing the links for the tools, I just used the tools and seems like a false alarm. Could be canvassing rather than sock puppetry. Would it be controversial if I share the edit link that raised suspicion here? Makeandtoss (talk) 13:29, 11 March 2024 (UTC)
Well I dont think that SPI actually has evidence of socking, just evidence of people from the same area having similar views. I think looking at those editors would show that they all edit in very different ways, have different tones, different levels of English competency. You need to be able to show there is suspicion that they are the same person, not just that they have similar views or even that they are vote-stacking. nableezy - 14:18, 11 March 2024 (UTC)
I don't think it would be controversial to share a link. Also, I agree with Nableezy on that SPI report. I find actively looking for differences, not just similarities, is useful. Sean.hoyland (talk) 15:08, 11 March 2024 (UTC)
@Nableezy: and Sean.hoyland: it was this comment on the talk page [1], which was swiftly followed by a supportive response; by someone who seems to have read the comment, read the linked article, quoted a passage in it, and wrote a response-all of that in four minutes. Both accounts are editing Jewish/Australia-related articles, which raised suspicions even more. The second account barely edits to WP articles, instead spending their time on talk pages and arbitration. But now having used your tools, seems an unlikely case of SP. Could be canvassing. Makeandtoss (talk) 13:26, 12 March 2024 (UTC)
Sean.holyland can you please remind me which was the tool you were using to check activity post 500 edits? Also this seems to be a special case where nonsensical edits were made on a sandbox, and suddenly jumped to ARBPIA articles after 7 October 2023 even before reaching 500 edits. [2]. Makeandtoss (talk) 13:44, 12 March 2024 (UTC)
Maybe one of these plots. Did you see this? The way they work on enwiki, in subpages/sandboxes, matches the way they work on ptwiki. Sean.hoyland (talk) 16:20, 12 March 2024 (UTC)
You seem to have a specialization in data analysis; personally, it takes me time to understand these plots, if I do at all. What about them? As for the EC warning, I just saw it, and noticed the match. It's becoming quite irritating how clusters of users who seem to have the exact same opinion are appearing at the same time on certain discussions.[3] Makeandtoss (talk) 10:38, 13 March 2024 (UTC)
Actually, I have no experience analyzing this kind of people-language-centric data. So, the "it takes me time to understand these plots, if I do at all" applies to me too. It's very different from the data I'm used to, which is essentially all about rocks. It seems rocks and people are quite different, so for the wiki data I'm very much in the data exploration phase. Change in activity after 500 edits is often clear on these kind of plots, but the committed rule breakers know not to change their behavior or focus too much on the topic area. It's probably always been the case that clusters of users with roughly the same opinion appear at the same time on certain discussions in ARBPIA. It seems to be part of the dynamics of the system here. Sean.hoyland (talk) 05:03, 14 March 2024 (UTC)
Yeah maybe I should rely less on my hunch and more on the tools linked above. Thanks for the help. Makeandtoss (talk) 17:05, 14 March 2024 (UTC)

You've got mail

Hello, Sean.hoyland. Please check your email; you've got mail!
It may take a few minutes from the time the email is sent for it to show up in your inbox. You can remove this notice at any time by removing the {{You've got mail}} or {{ygm}} template. Doug Weller talk 11:51, 24 March 2024 (UTC)

Question

About the ARBCOM tag [4], are the 500 edits inclusive of talk page edits? And are editors who did not reach 500 edits allowed to edit or vote on the talk pages of ARBPIA articles? Makeandtoss (talk) 13:16, 24 March 2024 (UTC)

Yes, I think includes edits anywhere on English Wikipedia. It's subject to EC gaming review of course and I've seen cases where editors have had their edit count effectively reduced because they made lots of edits in ARBPIA before they were extendedconfirmed. They certainly can't vote or comment at RfCs, AfDs, RSN etc. They can only submit edit requests that are supposed to follow the WP:EDITXY guide. In practice, if a non-EC editor makes a very straightforward reasonable request on a talk page to fix something like a typo, people mostly seem to give them a pass and handle the request. Sean.hoyland (talk) 13:53, 24 March 2024 (UTC)

Nomination of Where is Kate? for deletion

A discussion is taking place as to whether the article Where is Kate? is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.

The article will be discussed at Wikipedia:Articles for deletion/Where is Kate? (3rd nomination) until a consensus is reached, and anyone, including you, is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.

Users may edit the article during the discussion, including to improve the article to address concerns raised in the discussion. However, do not remove the article-for-deletion notice from the top of the article until the discussion has finished.

IgnatiusofLondon (he/him☎️) 11:51, 1 April 2024 (UTC)

Do you think Islamophobia is at least partially in the I/p area

Looking at it I think some parts are, but I could be wrong Thanks Doug Weller talk 19:37, 11 April 2024 (UTC)

I agree that some parts are in the topic area, or at least in the fuzzy lawless border regions. There are references to pro-Israel groups, pro-Palestinian groups, Zionism, a citation with the title "The Islamophobia Industry and the Demonization of Palestine: Implications for American Studies" etc. It seems close enough to fit into the "related content" set (Wikipedia:Arbitration/Index/Palestine-Israel_articles#General_sanctions_upon_related_content) and could probably do with a {{ArbCom Arab-Israeli enforcement|relatedcontent=yes}} template. Sean.hoyland (talk) 01:22, 12 April 2024 (UTC)
Thanks. That's my opinion exactly. Doug Weller talk 07:14, 12 April 2024 (UTC)

Possible sock?

I noticed you reported BasedGuy for being a sock and since then another account have appeared with a quite similar editing pattern. Just notifying you just in case. -UtoD 15:57, 15 April 2024 (UTC)

Yes, thanks. I noticed that editor at Battle of Hamad. Not sure what to do about it yet. I'm curious how they noticed the article within half an hour or so of its creation. Blocking these kinds of editors is evidently ineffective. Even extended confirmed protection won't keep them away from the ARBPIA topic area for long given their dedication. Sean.hoyland (talk) 16:22, 15 April 2024 (UTC)

Signature

Thank you for the feedback on the AE post about me. I think you forgot your signature though. JDiala (talk) 19:10, 3 June 2024 (UTC)

Oops. Thanks for letting me know. Sean.hoyland (talk) 04:19, 4 June 2024 (UTC)

Quick request regarding AN comment

Thank you for your input at AE.

I would like to fix the issue you addressed at AE here. For me, it links to the right place, but I guess it doesn’t for others? Would this link to the right place? If so, am I just allowed to fix it after others have responded? FortunateSons (talk) 16:56, 3 June 2024 (UTC)

That works for me now. Yes, I think you can change a link after responses when it's a change like this. More of a fix than a change. More precision. It's helpful. Sean.hoyland (talk) 17:07, 3 June 2024 (UTC)
I will fix it, thank you.
I know we don’t always see eye to eye on content, but you thinking of me as an asset to the topic area is meaningful to me! FortunateSons (talk) 17:28, 3 June 2024 (UTC)
You explain your reasoning and organize it. This is very helpful in the topic area, I think. Sean.hoyland (talk) 18:01, 3 June 2024 (UTC)
Thank you very much :) FortunateSons (talk) 18:03, 3 June 2024 (UTC)
Thats probably a kind of dumb question, but the contributions at AE feel like a joined a conversation late. Did I miss something? Was there a specific sock-related issue that I am unaware of? FortunateSons (talk) 07:50, 7 June 2024 (UTC)
Sort of related but more about the notion of casting aspersions...JDiala said "Two people on that thread arguing against me are proven or suspected sockpuppets (Galamore and ElLuzDelSur)." as part of their AE comment. There's no evidence of socking as far as I'm aware, hence The Kip's comment. My comment is more about structural issues/unintended consequences. Sean.hoyland (talk) 08:30, 7 June 2024 (UTC)
Ah, that does make more sense, thank you. Do you think socks in the I/P area are still a major issue? I feel like SPI is pretty good at catching them FortunateSons (talk) 08:34, 7 June 2024 (UTC)
Well, I'm not quite sure how to answer that question. Let me split it into 2 parts. I think there are multiple topic ban evading socks active in the topic area right now and I don't think the EC requirements have proven to be an effective barrier in that regard (despite being one of their objectives) for a variety of reasons, at least for the dedicated socks. But I'm not sure about the extent to which it can be described as "a major issue" because it depends on the behavior of the suspected sock, and that varies a lot. The question of when to file an SPI report, when it is "the right move", "for the best" etc., is not clear in my mind. Should it be in every case or only when there is problematic behavior? Blocked socks can simply come back, and they do, again and again, for years. It would better if there were a technical barrier, but that's difficult. One of the things that concerns me is that people tend to focus on easy problems they can solve rather than more difficult and perhaps more significant problems. Focusing on tone over honesty may be an example. If I think about a question like "is it a major issue", it depends on relative values, content creation vs honesty vs disruption vs bias etc. It's confusing, especially because socks come back. On the one hand I don't think the presence of well-behaved ban evading socks is intrinsically a major issue, on the other hand I think the presence of people willing to employ deception, regardless of what they do here, is a major issue, partly because it creates 2 classes of editors, the sanctionable and the unsanctionable. Sean.hoyland (talk) 09:28, 7 June 2024 (UTC)
That does make sense, particularly making the distinction between „well-behaved“ (thereby indirectly harmful) and directly disruptive socks, as well as different ways of measuring harm to the project. I don’t believe that there is a perfect answer to this sort of question, and would (as a person, not as an editor) probably also draw a distinction between those that were simply blocked or tbanned for technical violations and those that caused severe harm to individual members of the project or the project as a whole, for whom I think some sort of quiet return should be impossible (to any part of the project, not just en.wiki).
Some technical measure would be really nice, but I know a few people that do “disruptive activism” on social media sites, and even the very large platforms struggle with effectively suppressing this sort of stuff, so I’m not sure if it’s doable for us. However, if it could be done, it would be great to add some more sting to the sanctions.
I agree with deception being more important than tone, but (acknowledging my own lack of technical skills) I don’t think we can truly know with people who are decently competent, thereby creating an evolutionary pressure instead of an actual remedy. Personally, I know that I myself focus mostly on requesting actions to be taken about tone and “surface-level-conduct” issues, but do know that those dealing with sockmasters are probably doing a lot more per minute of time spend. FortunateSons (talk) 12:27, 7 June 2024 (UTC)
As for whether the community is good at catching them via SPI, one way to address that question is to look at the delay between registration and blocking as a sock, how long was the sock's account usable. Things don't look great from that perspective e.g. one sockmaster or another sockmaster Sean.hoyland (talk) 11:49, 7 June 2024 (UTC)
That’s quite interesting, thank you, I always really enjoy looking at this sort of data. Just for curiosity, are the ones that get away mostly sleeper agents that get ‘activated’, or are many of those active for 1000s or 10.000s of edits before getting caught? FortunateSons (talk) 12:30, 7 June 2024 (UTC)
1000s of edits before getting caught is not very unusual e.g. here, you can see some edit counts. Sean.hoyland (talk) 12:45, 7 June 2024 (UTC)
Hmmm, that is quite concerning. However (even with survivorship bias), it doesn’t seem like they are getting much better on duration, only on edit count, so that’s nice. Do you think that some technical measures could be promising? FortunateSons (talk) 12:51, 7 June 2024 (UTC)
I don't really have any idea what proportion of socks for any given sockmaster are detected, so that's a problem. Visibility into this issue isn't great. As for technical measures, the LITIS labs' SocksCatch could detect between 90% and 95% of sockpuppets depending on the approach used, and that was a long time ago now in machine-learning-time. However, Wikimedia doesn't appear to have followed up on it. Sean.hoyland (talk) 13:06, 7 June 2024 (UTC)
That makes sense.
It is unfortunate that they didn’t follow up on it, it does look promising. Do you know why? It’s not like there is a lack of money FortunateSons (talk) 13:11, 7 June 2024 (UTC)
Maybe because of the state of this workboard. Sean.hoyland (talk) 13:43, 7 June 2024 (UTC)
I know very little about tech, but I’m guessing there is not supposed to be this much stuff? FortunateSons (talk) 13:51, 7 June 2024 (UTC)
They look very busy, working on all sorts of ML related things. That's good I guess. Sean.hoyland (talk) 13:58, 7 June 2024 (UTC)
That makes sense, thanks FortunateSons (talk) 14:03, 7 June 2024 (UTC)
btw, I should clarify that I think concerns about tone in the topic area are important and there does need to be some kind of moderating force to keep things within limits or else it just gets unpleasant to do anything there. Sean.hoyland (talk) 12:07, 7 June 2024 (UTC)
Of course, I got that, don’t worry. I’m am mindful of not dragging someone somewhere for a single conduct violation (particularly if they have no prior warnings), but I find some types of “continuous low-lewel disruption” in combination with tone issues to be unpleasant enough for all involved that it’s just better for the project if that kind of editor edits something else for an indefinite (not not necessarily infinite) amount of time. FortunateSons (talk) 12:36, 7 June 2024 (UTC)

Fyi

@Sean.hoyland I tried to answer your question at WP:RSN itself since may be if any other notice board reader would be curious to understand what's being discussed. Since my own answer was detailed I did collapse template our discussion. I hope and request you do not mind me collapsing the discussion there.

Last not but least request to join in to develop the draft is very sincere and genuine one. Happy editing Bookku (talk) 05:13, 9 June 2024 (UTC)

Also, though draft article is largely of philosophical and theological and much less political side too it, pl. watch list the draft article and keep me informed, whenever possible, if any aspect of draft article comes under CTOP and would need correction at any time. Bookku (talk) 05:29, 9 June 2024 (UTC)
@Bookku:, thanks, no, of course I don't mind you collapsing the discussion. And thanks for your thoughtful response and invitation. Sean.hoyland (talk) 05:38, 9 June 2024 (UTC)

editor similarity

Hi, you mentioned some method for estimating the distance between editors in "metric space". I'd be curious what you mean by this and where I can find more information. Thank you! DMH223344 (talk) 15:29, 21 June 2024 (UTC)

@DMH223344: It's a bit of a long story. Years ago, I read about some work using machine learning to identify socks. It has been rattling around in my mind ever since. Unfortunately, I don't have access to a bunch of A100s and the enwiki database is too big, so machine learning is not an option. But the fact that the system could apparently discover features that enabled it to detect 90%+ socks suggests that it is possible to do some kind of proximity ranking of accounts, albeit in the very high dimensional space of a neural network. Since then, I've been wondering what those features were (the network is a black box so I have no idea) and whether looking at distances between editors in much lower dimensional spaces might still be able provide clues about sockpuppets. I've just started looking at this, and it's a bit of a rabbit hole, but it might have some potential. My comment was based on this test output (I've anonymized it here). I suggested a match based on the low Wasserstein distance between the editors in a particular space (I'll omit the details). I really have no idea whether it is a good match because my test set is small right now, I just happened to be looking at the test output at the time (although the Editor Interaction Analyser suggests it might be a decent match). It's possible to construct all sorts of spaces from editor data and I don't know which ones could be useful. Also, there are many things that I haven't gotten around to doing and aren't clear to me yet, like the relationship between proximity and dimensionality, how many samples per editor are needed (I have to pull them out of the database), can this approach 'predict' the outcomes of previous SPI reports etc. Sean.hoyland (talk) 17:19, 21 June 2024 (UTC)
interesting! one thing we should be very aware of with any method like this is the false positive rate. It's of course essential to have an idea of how well we've controlled this rate using the proposed method. There are of course many ways we can do that, depending on what data we have and how it was collected.
some brief comments without know much of the details of what you're describing: I would *not* expect a neural network based approach to work well for this task. More generally, I would not expect a supervised approach to work well. I also dont think gpus are required, although I'm guessing some approaches might not be open to us without access to many cpus and large working memory.
Feel free to share any other details you might have, even if you consider them overwhelming or disjointed thoughts. DMH223344 (talk) 17:50, 21 June 2024 (UTC)
In general, I would say that a method which tries to identify pairs of user accounts that are socks will not work unless we make some strong assumptions or come up with a clever way to identify candidate pairs. Otherwise our FPR will likely be huge. DMH223344 (talk) 17:53, 21 June 2024 (UTC)
I would not have expected a ML based approach to work well for this task either because I assumed the signal to noise ratio would be problematic. And yet it does work, quite well it seems. There are several papers. The fact that these systems are doing something (opaquely) that works is encouraging in the sense that it shows there are features there and not just in our imaginations when we see patterns connecting users. I'm trying a no assumptions, just do math approach. Sean.hoyland (talk) 18:17, 21 June 2024 (UTC)
I dont disagree that an ML based approach would work. It was the supervised framing that I was unsure of. but of course I could be wrong! DMH223344 (talk) 19:04, 21 June 2024 (UTC)

Technical question

Hey, how many edits per account are required so that your software can actually (relatively reliably) group them together? Can a “full picture” be created out of multiple low-edit accounts? FortunateSons (talk) 09:14, 24 June 2024 (UTC)

I really have no idea. At this stage I don't have any evidence that it could reliably group identical twins who shared a room, edited together, with a passionate interest in rare Mongolian hats. I'm in the stumbling around in the dark, bumping into things phase where I keep realizing 'oh, to do that, I need to be able to do this first, but how?' My normal approach to problem solving is to not think about it and do something else. Puzzlingly this works quite well for mysterious reasons, albeit slowly. As for the question 'Can a “full picture” be created out of multiple low-edit accounts', maybe, but I guess only if there is high confidence that combining them boosts a signal for a single source rather than introduces noise. Sean.hoyland (talk) 10:37, 24 June 2024 (UTC)
That's unfortunate, but if I find two editors with an obsessive interest in Gugu hats, I'll let you take a swing at it. :)
I can definitely relate to this style of problem solving, and wish you the best of luck with the work.
I'm asking because I seem to have made an acquaintance trying to make me join into their topic-specific discussions through improper means, and while I'm appropriately dealing with the messages themselves, I was curious whether the half dozen accounts with ~10 edits each are enough to create a profile that can be compared to others with an above-random chance of success? Some of them caught joint checkuser blocks and some haven't, so... FortunateSons (talk) 11:15, 24 June 2024 (UTC)
Well, it's always nice to make new friends I guess... That kind of behavior using disposable accounts to gain email access for canvassing is characteristic of Category:Wikipedia_sockpuppets_of_AndresHerutJaim, the guy that caused this ArbCom case. Sean.hoyland (talk) 12:36, 24 June 2024 (UTC)
Yeah, that’s the guy that some got CU’d for. I had only skimmed the case before, and didn’t connect the name and the person. FortunateSons (talk) 12:57, 24 June 2024 (UTC)
I would be interested to know which ones have checkuser blocks, to see how they were logged and categorized. Inconsistencies in logging and categorization can be a bit of a weak link, making potentially valuable information invisible. Sean.hoyland (talk) 12:50, 24 June 2024 (UTC)
I’m happy to provide the usernames of all who have send me messages through e-mail, assuming that’s permitted? FortunateSons (talk) 12:55, 24 June 2024 (UTC)
In the ArbCom case they said "Private evidence (including emails) can be sent to the Committee at arbcom-en@wikimedia.org". I think they still want people to let them know about inappropriate canvassing emails. I assume it's okay for you to list the usernames here. There's no loss of privacy. But what do I know? Very little apparently, so it might be worth verifying with someone who knows what they are doing first. Maybe ScottishFinnishRadish knows. Sean.hoyland (talk) 13:04, 24 June 2024 (UTC)
I’m going with an abundance of caution and will sent them through mail for now, if you find a categorisation error or two, I would have no issue with you doing with it what you wish (in line with policy).
If an admin later says that it was not appropriate, I would expect you to then delete the mail :) FortunateSons (talk) 13:23, 24 June 2024 (UTC)
No problem. Always happy to delete emails. Email received. Sean.hoyland (talk) 14:04, 24 June 2024 (UTC)
Perfect, if something productive comes of it, I would really appreciate a quick update either here or through mail. :) FortunateSons (talk) 14:05, 24 June 2024 (UTC)
I think I can probably see some of the blocked accounts...maybe... e.g. [5][6][7][8][9] Sean.hoyland (talk) 13:11, 24 June 2024 (UTC)
Hello, Sean.hoyland. Please check your email; you've got mail!
It may take a few minutes from the time the email is sent for it to show up in your inbox. You can remove this notice at any time by removing the {{You've got mail}} or {{ygm}} template.

FortunateSons (talk) 13:51, 24 June 2024 (UTC)

For the record

(I don't want to write too much at ARCA, so answering here.) About [this revert], I think it is invalid for three reasons. (1) Userspace is explicitly excluded from the ARBPIA topic area, including by ARBECR and its footnotes. (2) The rules about edit-requests refer to "Talk:" space and I don't think that "User talk:" is a subset of "Talk:". (3) It seems to not be an edit about the conflict but just about where some editor edits. Cheers. Zerotalk 05:27, 29 July 2024 (UTC)

Thanks for the feedback. Appreciated. Sean.hoyland (talk) 05:41, 29 July 2024 (UTC)

Gaming or patience?

For example, if a !vote is reverted, and then becomes the second edit after extended-confirmation is reached (in a fast but not unproductive manner). Would you be willing to provide me with one of your fancy graphs? FortunateSons (talk) 14:40, 30 July 2024 (UTC)

Sure, I'll have a look, assuming I haven't broken that code in the meantime. I have some even fancier/confusing plots now...making a bit of progress grouping those identical twins who like Mongolian hats. Sean.hoyland (talk) 15:35, 30 July 2024 (UTC)
Thank you. :)
The fancier stuff is blurry on my mobile device and looks like something I would find in a museum for modern art, which is probably a sign that I’m significantly under-qualified and shouldn’t even try to interpret it. When you say progress, what does it mean in layman’s terms?FortunateSons (talk) 15:57, 30 July 2024 (UTC)
I lose interest if software doesn't accidentally generate arty things. I think Google drives viewer increases the resolution/decompresses interactively in the viewer as you zoom in...sometimes. Can take a bit of time. Sometimes it seems to stay blurry in the viewer. So, the account was created 13 years ago, their first edit was about a black hat group, and they publicly declared a COI for a cybersecurity challenge article. Interesting. Here's the plot. It's not very informative. Again, might be blurry in the viewer, but I think you can download it or use a different viewer. There are only 456 revisions in the database. I guess the other 502-456 were deleted with the now red link European Cybersecurity Challenge article.
Progress in the sense that it can produce results consistent with the results of SPI investigations...sometimes. In that example, it finds a similarity between 3 accounts, 2 of which are confirmed socks of each other. But it doesn't detect the other confirmed sock for reasons that I don't understand. Baby steps. Sean.hoyland (talk) 16:48, 30 July 2024 (UTC)
It stayed blurry for me (I gave it a few minutes), but that could be anything from the poor internet on my train to individual incompetence.
However, the picture is rather beautiful either way. The SPI results sound rather promising, as long as you match or exceed the more sophisticated reporters we have here, I would say that it’s rather worth it. Is it possible that the lack of detection is related to an external factor? Different typing on a different device, a home computer that’s used after work and therefore later than the other devices, etc…
Thanks for the analysis, I would say that it just was someone who waited for the extended-confirmed label, not someone who gamed the rules, so it’s probably fine. I don’t seem to have much talent for sensing who is or isn’t gaming, but might get better with experience :) FortunateSons (talk) 17:05, 30 July 2024 (UTC)
Might be worth mentioning to ScottishFinnishRadish. !Voting just after becoming EC is not a good look.
Sometimes I probably don't have enough samples or maybe it's something more interesting. Sean.hoyland (talk) 17:34, 30 July 2024 (UTC)
Do you think so? I feel like he’s already rather busy with this area, shouldn’t I wait for more disruption?
Yeah, I guess it could be either. If you find out, do let me know, and best of luck with the project. FortunateSons (talk) 17:40, 30 July 2024 (UTC)
It's true, he is a bit busy, but a 13 year old account waking up and making a non-policy based !vote immediately after becoming EC might be something that should at least be on his radar. I guess waiting to see what happens next also works. Thanks. Sean.hoyland (talk) 18:25, 30 July 2024 (UTC)
I think I’m just going to directly link him this discussion, then he can do or not do as he wishes. Thank you very much for the help! :) FortunateSons (talk) 18:44, 30 July 2024 (UTC)
It's not gaming, and there are no policies requiring someone to ease into the topic area when extended-confirmed. ScottishFinnishRadish (talk) 20:39, 30 July 2024 (UTC)
Ok, thanks FortunateSons (talk) 21:03, 30 July 2024 (UTC)

Premature archiving

Can you please revert your archiving?

Clearly, a number of editors think this warrants discussion, and it’s inappropriate for an WP:INVOLVED editor to shut it down prematurely. BilledMammal (talk) 04:03, 4 August 2024 (UTC)

Reverting would reward a WP:TALKNO violation. The behavior is inappropriate. Reverting has, for me, less utility than dissociating the NPOV tag from the archived discussion. The OP, or anyone else, can create a talk page section and follow the Template:POV usage guidelines by "pointing to specific issues that are actionable within the content policies" and "identifying specific issues that are actionable within Wikipedia's content policies". I'm sure there are many given the nature of that article. Then everyone can move on. If you find my argument uncompelling, that's fine, I've been wrong thousands of times, but I encourage you to seek another opinion from someone whose judgement you trust. Sean.hoyland (talk) 05:16, 4 August 2024 (UTC)
The OP raised whether the content passes the "smell test". Other editors then identified that the sources don't appear to support the content.
Both of these are valuable contributions - and as an involved editor, I don't think it's appropriate for you to decide that the discussion needs to be shut down. BilledMammal (talk) 05:27, 4 August 2024 (UTC)
I have no interest in the content, and I'm about as involved as a bot. I understand your view. You understand my view. Clearly, we are using different value systems and priorities when it comes to talk page guideline compliance and measuring utility. That's okay. If you would like a different outcome you will need to talk to a different person. Sean.hoyland (talk) 06:41, 4 August 2024 (UTC)

I have no interest in the content, and I'm about as involved as a bot.

You've participated in previous discussions on the topic, such as this one. BilledMammal (talk) 06:54, 4 August 2024 (UTC)
I think that shows an interest in pedantry rather than the article content. Allow me to change the window of the statement then. I have no current interest in the content. This is true in general for PIA article content nowadays because I am focusing on other (mostly technical and enforcement related) things, unless something interesting catches my eye. To give a concrete example, it makes no difference to me whether the content of that article passes someone's "smell test", whether it offends anyone, whether it contains false information, whether it is a political weapon, whether it is complete garbage, or whether a number of editors think their personal opinions about the world have value here and are not subject to talk page guidelines. The state of articles is not my focus anymore. My interest is in whether enforcing rules at a small scale can make PIA better. It's not surprising that there will be disagreement about decisions, but your argument has not changed my view, probably because I don't care about problems with the content. Again, I encourage you to seek another opinion from someone whose judgement you trust. Sean.hoyland (talk) 08:26, 4 August 2024 (UTC)

Arbitration notice

You are involved in a recently filed request for clarification or amendment from the Arbitration Committee. Please review the request at Wikipedia:Arbitration/Requests/Clarification and Amendment#Amendment request: Referral from the Artibration Enforcement noticeboard regarding behavior in Palestine-Israel articles and, if you wish to do so, enter your statement and any other material you wish to submit to the Arbitration Committee. Additionally, the Wikipedia:Arbitration guide may be of use.

Thanks,

Red-tailed hawk (nest) 17:53, 17 August 2024 (UTC)

Question

Hi, Idk whether it is possible, is there any way to establish whether there are more editors participating in the topic area pre October 2023 compared to recent times? My own perception (top of my head, already determined as of dubious reliability), based on looking at an RSN discussion like ADL or AJ and discussions at articles like Gaza genocide is that there seem to be more but it would be nice to show that somehow. Selfstudier (talk) 10:03, 20 August 2024 (UTC)

Unfortunately, this raises the question "what precisely is the topic area?", a technical black hole that I have been orbiting for a while now. I thought maybe I can just traverse a graph, but it always seems to end with something like "How did I get here, an article about a biscuit?" etc. Another approach is to pretend that everything in the topic area has one of the talk page templates and look for those. It might be possible to come up with something tracking the number of unique editors for articles with the templates over time. Seems like the kind of thing where limiting the scope to a relatively small number of pages, at least to start with, might be a sensible approach. Do you think there's a list of articles that would be represent a decent sample of the topic area? Sean.hoyland (talk) 11:19, 20 August 2024 (UTC)
Maybe you could do something via the cats? Category:Arab–Israeli_conflict would be the top of a tree (a sort of graph, right?). To limit the number, drop down to Category:Israeli–Palestinian_conflict but that might still be a lot. Selfstudier (talk) 11:32, 20 August 2024 (UTC)
The graphs rapidly get big. This is with root category Arab–Israeli conflict and a 2 level descent through the hierarchy. Sean.hoyland (talk) 12:02, 20 August 2024 (UTC)
Very Mandelbrotty :) OK, let me think about a hotlist. Selfstudier (talk) 12:24, 20 August 2024 (UTC)
btw, the yellow things are things without ArbCom_Arab-Israeli_enforcement, Contentious_topics/Arab-Israeli_talk_notice templates or their equivalents. Sean.hoyland (talk) 12:43, 20 August 2024 (UTC)
I am currently using the intersection of Wikipedia:WikiProject Israel and Wikipedia:WikiProject Palestine. So far, it seems reasonably accurate. BilledMammal (talk) 12:28, 20 August 2024 (UTC)
Good idea, thanks. I'll have a look. Sean.hoyland (talk) 12:43, 20 August 2024 (UTC)
You could Take the articles that I have "bludgeoned" here as a hotlist. Selfstudier (talk) 13:08, 20 August 2024 (UTC)
Or Nishidani's new articles since October 2023 Selfstudier (talk) 14:57, 20 August 2024 (UTC)
Will have a think about it. Sean.hoyland (talk) 15:09, 20 August 2024 (UTC)
Seems to be possible. Here's an example plot for a single article (Zionism) going back a few years as a test (not handling data gaps in that plot). This is counting unique editors for the article and talk page combined. I'll do a multi-article version when I get a chance. Sean.hoyland (talk) 17:42, 20 August 2024 (UTC)
Yes, very good, I would be most interested to see if that pattern is repeated. Selfstudier (talk) 18:30, 20 August 2024 (UTC)
Seems hard to make a performant query that looks at thousands of pages, possibly due to a combination of SQL incompetence, being too lazy to look at the data model and indexing and my brain not working because it's a 'feels like' temperature of 42C today. Anyway, in the meantime, here's a plot for about 3800 articles (excluding talk pages) in the topic area. It would be nice to be able to normalize the data using stats for the number of unique active editors for all of Wikipedia over time or at least have some kind of trend comparison...but I have no idea where or how to get that data. Maybe I could randomly select a much bigger set of articles and compare it to that. Sean.hoyland (talk) 08:30, 21 August 2024 (UTC)
Don't overdo it, lol. Could you just do a few more? Like the Israel Hamas war, Gaza genocide, "hot" articles? While it's not conclusive without more data, I just want to get a sense if the editor count has gone up. Selfstudier (talk) 09:42, 21 August 2024 (UTC)
Righty dokey. Sean.hoyland (talk) 09:58, 21 August 2024 (UTC)
Can you see these? Sean.hoyland (talk) 10:36, 21 August 2024 (UTC)
I can, on pop up or larger view. Let me go through them. Selfstudier (talk) 10:42, 21 August 2024 (UTC)
Let me know if you have other articles in mind. Only takes a couple of seconds to make a plot. Sean.hoyland (talk) 10:51, 21 August 2024 (UTC)
They confirm the trend, the Israel Hamas war is less useful because it is a new article, altho it is possible to see the large number (1600!) at the beginning, but now dropped off (relatively speaking, still near 200). Selfstudier (talk) 10:52, 21 August 2024 (UTC)

Contentious topics

How did you check for whether an article was within ARBPIA based on the contentious topics template?

Do you do this by parsing the page text, or do you have a better way? BilledMammal (talk) 09:16, 25 August 2024 (UTC)

I wish I knew how to parse content. I usually use a common table expression with something like the following in it to get the titles (and/or namespace) or add a join to page to get the page_ids. There's a reason why I thought only those two template titles were enough, but because I'm not a software engineer, my documentation is not great. It was something to do with redirects...or the result of some testing...maybe.
select 
tl_from,
lt_title
from linktarget
join templatelinks on lt_id = tl_target_id
where lt_namespace = 10 -- Template
and lt_title in ('ArbCom_Arab-Israeli_enforcement', 'Contentious_topics/Arab-Israeli_talk_notice')

Sean.hoyland (talk) 10:05, 25 August 2024 (UTC)

Oh! I wasn't aware of those templates - I more commonly see {{Contentious_topics/talk_notice}} with ai as the first parameter.
I'll try your method out, should be an effective way to get at least some of the articles. BilledMammal (talk) 10:09, 25 August 2024 (UTC)
I'm not aware of most things to do with the data model and I know very little about MySQL because it's not used in my world. For example, the other day when I looked at the differences between your 'topic area' and my 'topic area' (which are substantial presumably because of spotty templating), I was confused that MySQL doesn't use MINUS or INTERSECT. Sean.hoyland (talk) 10:21, 25 August 2024 (UTC)