Wikipedia:Wikipedia Signpost/2011-08-15/Technology report
Forks, upload slowness and mobile redirection
Making Wikimedia more forkable
The question of how easy it is to "fork" Wikimedia wikis, or, indeed, to merely mirror their content on another site, was posed this week on the wikitech-l mailing list by Wikimedian David Gerard. The concept is also related to that of backups, since a Wikipedia fork could provide a useful restore point if Wikimedia server areas were affected by simultaneous technical failure, such as that caused by a potent hacking attempt.
During the discussion, Lead Software Architect Brion Vibber suggested that the Wikimedia software setup could be easily recreated, as could page content. Instead, he said, the major challenge would lie in "being able to move data around between different sites (merging changes, distributing new articles)", potentially allowing users of other sites to feedback improvements to articles whilst also receiving updates from Wikimedia users. So far, at least one site (http://wikipedia.wp.pl/) has been successful in maintaining a live copy of Wikimedia wikis, lagging behind the parent wiki it tries to mirror by only minutes. No site has yet implemented an automated procedure for pushing edits made by its users upstream to its parent wiki, however. Other contributors suggested that few external sites would have the facility to host their own copy of images, and keeping in line with Wikimedia's strict policy on attribution.
In unrelated news, there were also discussions about making pageview statistics more accessible to operators of tools and apps (also wikitech-l). In particular, the current reliance on the external site http://stats.grok.se to collate data was noted. As MZMcBride wrote, "currently, if you want data on, for example, every article on the English Wikipedia, you'd have to make 3.7 million individual HTTP requests to [the site]".
Uploading was slower than it used to be, but that's fixed, says bugmeister
Although hampered by a lack of data points, anecdotal evidence collected over the past fortnight pointed to a slowdown in the speed of uploading files to Wikimedia wikis. The problem therefore made mass API uploading very difficult, and, as a result, a bug was opened. "An upload that should take minutes is taking hours", wrote one commenter. Another pinpointed Wikimedia servers as the bottleneck: during a test, uploads to the Internet Archive had been over ten times quicker. As it became clear that the problem was affecting a large number of users and data collected seemed to show a dramatic decrease in upload speeds earlier this year, significant resources were devoted to the issue. WMF technicians Chad Horohoe, Roan Kattouw, Sam Reed, Rob Lanphier and Asher Feldman have all worked on the problem.
Once the upload chain was determined as "User → Europe caching server → US caching server → Application server (Apache) → Network File System → Wikimedia server MS7", members of the operations team worked to profile where the bottleneck was occurring. Unfortunately, an error introduced by the profiling meant that uploads were in fact blocked for several minutes. Then, on 12/13 August, the problem was pinpointed and fixed: a module for helping optimise network connections, Generic Receive Offload (GRO), had in fact been slowing them down. According to WMF bugmeister Mark Hershberger, smaller data packets were being collated into much larger ones. The new packets were then too large to be handled effectively by other parts of the network infrastructure. Although there are still some reports of slowness, test performance has increased by a factor of at least three. In the future, more data on upload speed is likely to be collected to provide a benchmark against which efficiency can be tested.
In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
This week, the Foundation's Rob Lanphier reiterated that the Foundation is having problems hiring a new Data Analysis engineer and a software developer. Know someone who might be interested? Link them to the details.
- There was a brief incident on Wednesday where users were being inappropriately identified as mobile users and redirected to the mobile version of Wikipedia following a software deployment (discussion). The deployment was aimed at improving levels of redirection ahead of the launch of an improved mobile browsing experience (set to be trialled later this month). Estimates for the amount of time the redirection was in place stand at around 6 minutes. In unrelated news, WMF Data Analyst Erik Zachte this week upgraded his figure for the percentage of Wikimedia page views originating on mobile devices to fifteen per cent.
- On the English Wikipedia this week, bots were approved for a number of tasks including mass TfD tagging and tagging valid files as being eligible for transfer to Wikimedia Commons. BRFAs that are still open cover a number of other tasks, including the import of expert comment from an external site.
- Mark Hershberger has suggested that efforts to get 1.18 released on time had significant "momentum" but needed to sustain that to achieve success. The bugmeister explained that while approximately 160 revisions had been reviewed in the last week, another 210 were still left to review (wikitech-l mailing list). The figures include certain core extensions, and are consequently higher than previously published figures which did not.
- A MediaWiki hackathon has been announced for 14–16 October. Held in the American city of New Orleans, it will include discussion of Wikimedia Labs (a project that will integrate and extend the functionality available to tool developers) and a bugsmash (wikitech-l mailing list).
- As is now becoming a regular event, developers reviewed the list of bugs currently marked as "blocking" the 1.18 release, or otherwise proving particularly problematic for users. Those attending noted their thoughts down in an Etherpad collaborative report.
- A question raised at Wikimania – why the Chinese Wikipedia was getting so much more traffic than it used to – turned out to have a technical answer. The robots.txt file for the Chinese Wikipedia was written in both traditional and simplified Chinese, causing problems for bots from search engines and the like, a Chinese Wikimedian explained.
Discuss this story
Not sure if you meant Network File System (protocol), as opposed to the generic term. The link Network File System is a disambiguation page. The protocol is more likely if the apache servers were running Linux for example. W Nowicki (talk) 19:27, 16 August 2011 (UTC)[reply]
Forks
I must note that I didn't intend it as an urgent call to action - rather, as something we need to keep in mind, and which will only benefit us. I am quite cognisant that the likely number of forks of English Wikipedia is zero ... but every one of the steps needed to make our projects forkable is actually (a) a good idea technically (b) important to preserving our work.
(I'd also like to make us forkable so that we can tell our more special critics "here, fork it, if you're right you'll do so much better than us." At the least, watching them come up with new excuses not to will be amusing.) - David Gerard (talk) 20:41, 16 August 2011 (UTC)[reply]
Upload speed
I upload lots of own-photograph images to Commons, and I've noticed the upload speed dropping — most of my uploading is done on a major university campus with huge bandwidth, but I've still noticed over the summer that the upload speed is markedly slower than it was over Christmas break. I'd vaguely wondered if snow on the ground made image sizes smaller and if blue sky made them larger, but I'd not really considered server problems. Nyttend (talk) 23:29, 19 August 2011 (UTC)[reply]
Chinese Wikipedia Numbers
With reference to the fact that the "page view" numbers on the chinese wikipedia have trebled over the last few months I didn't quite understand what has happened? Did the bug mean that the site was getting much fewer hits from search engines and bots (i.e the huge increase is not real people but the number of automated programs visiting the wiki) or did the bug mean that the site is now getting far more human viewers through search engines which were previously not displaying these pages as results in response to queries. If it is the latter as it seems (that whoever found this bug has essentially made the chinese wikipedia 3 times more popular!) then that seems like a huge deal, and whoever found that bug should probably get an award or something (or at least a big round of applause!) 86.66.128.117 (talk) 20:01, 20 August 2011 (UTC)[reply]