Wikipedia:Wikipedia Signpost/Single/2016-02-10
New internal documents raise questions about the origins of the Knowledge Engine
The 13-page grant agreement between the WMF and the Knight Foundation was released on the Wikimedia Foundation wiki on February 11, following the Signpost's inquiry to the Knight Foundation the day before.
The Knight Foundation grant has been a contentious topic in the Wikimedia community, and ousted WMF Board of Trustees member James Heilman has alleged that his initial opposition to the grant, which he ultimately voted in favor of, was a factor in his dismissal.
Numerous questions remain about the grant, which was intended to kickstart a project formerly called the Knowledge Engine – now referred to as Wikimedia Discovery. Chief among them is the question Andreas Kolbe asked last week in the Signpost, "So, what’s a knowledge engine anyway?".
Key players have repeatedly stated what the Knowledge Engine/Discovery is not – namely, a search engine intended to compete with Google. For example:
The Discovery FAQ on MediaWiki states that "We are not building Google. We are improving the existing CirrusSearch infrastructure with better relevance, multi-language, multi-projects search and incorporating new data sources for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites." In a November 4 email to all WMF staff, provided to the Signpost by several WMF staffers, executive director Lila Tretikov expressly stated that the Knowledge Engine "is NOT ... a search engine". Just hours before the release of the grant agreement, Jimmy Wales was even more blunt: "To make this very clear: no one in top positions has proposed or is proposing that WMF should get into the general "searching" or to try to "be google". It's an interesting hypothetical which has not been part of any serious strategy proposal, nor even discussed at the board level, nor proposed to the board by staff, nor a part of any grant, etc. It's a total lie."
However, these statements are flatly contradicted by the now-released grant agreement between the WMF and the Knight Foundation. Quotes such as the following make it abundantly clear that what is envisioned under the terms of the grant is indeed a search engine:
"Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet." (Page 1.) "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" (Page 2.) "Knowledge Engine by Wikipedia will democratize the discovery of media, news and information – it will make the Internet's most relevant information more accessible and openly curated, and it will create an open data engine that's completely free of commercial interests. Today, commercial search engines dominate search-engine use of the Internet, and they're employing proprietary technologies to consolidate channels of access to the Internet's knowledge and information. Their algorithms obscure the way the Internet's information is collected and displayed. ... Knowledge Engine by Wikipedia will be the Internet's first transparent search engine, and the first one originated by the Wikimedia Foundation." (Page 10.) "Proceed with the search engine project as deliberately as possible – which is what the Wikimedia Foundation is doing" (Page 13.)
Three internal WMF documents illustrating how WMF thinking about the project evolved have been leaked to the Signpost:
- An "April 2 – FINAL – Knight Search Presentation – 04.02.15"
- A "June 24 Attachment 1 of 2 – Knowledge Engine by Wikipedia"
- An "August 2015 – WMF Submission to Knight"
We describe the documents in detail in this week's "In Focus". The earliest document, dated April 2, 2015, is a 12-slide presentation marked "FINAL". While the phrase "Knowledge Engine" does not appear, it's clear that even at this early stage, the "Wikipedia Search" referred to here was a well-developed concept. The presentation contrasts the ideals and motivations of commercial search engines – they "highlight paid results, track users' internet habits, sell information to marketing firms" – with those of "Wikipedia Search", which will be private, transparent, and globally representative. It repeatedly stresses that "No other search engines carry these ideals".
Several well-designed examples of search results follow, including the one pictured above. They prominently brand Wikipedia and feature multimedia content and multiple Wikimedia projects such as Wiktionary and Wikivoyage. The results include non-wiki sources like Fox News and Open Maps.
The June 24 document is a draft proposal for the project, by then referred to as the Knowledge Engine, which promises to be "a new global project that will once again change the way people access knowledge on the Internet", fully leveraging Wikipedia's and the WMF's resources, values, and reputation. The Knowledge Engine is described as "a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web" that "will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests". Knowledge Engine "will be the Internet’s first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation."
The proposal divides the plan into four stages, each lasting 16–18 months. Interestingly, the first stage is called Discovery, which is the term the WMF currently uses to refer to the Knowledge Engine project. The proposal asks for US$6M from the Knight Foundation over three years. It pledges $2.4M of the WMF's own resources to the project for the current fiscal year, including eight presumably full-time engineers and two data analysts.
The final document, dated August 5, 2015, resembles the publicly released current grant agreement in many ways, including much of the same language. The grant amount has dropped to its current $250,000, but this amount is only for the first Discovery phase of the larger Knowledge Engine project. Both the amount and its designation for phase one appear in the current grant agreement.
These documents raise significant questions about how much the Knowledge Engine has actually evolved from April 2015 and what the technical and social implications of this project will be for Wikimedia.
These questions are at the heart of the current debate regarding transparency, accountability, the relationship between the WMF and the Wikimedia community, and the uncertain direction of that movement.
An in-depth look at the newly revealed documents
This week's "special report" discusses three internal documents from the Wikimedia Foundation that shed light on the history of the Knowledge Engine project. Here, we examine each one in depth.
"April 2 – FINAL – Knight Search Presentation – 04.02.15"
Wikipedia, on the other hand, is characterised as follows:
“ | Wikipedia's Roots
No other search engines carry these ideals Wikipedia Search Originates
No other search engines carry these ideals Wikipedia Search is ... Trusted. Private. Open. Wikipedia Search Globally democratizes knowledge. |
” |
The presentation concludes with screen mock-ups of what a Wikipedia search engine could look like, highlighting content from Wikivoyage, Openmaps, Fox News, Wikipedia and Wikidata.
"June 24 Attachment 1 of 2 – Knowledge Engine by Wikipedia"
Marked "CONFIDENTIAL – DRAFT", this 11-page document addressed to the Knight Foundation has the headline "Knowledge Engine by Wikipedia: A Proposal from the Wikimedia Foundation".
After briefly describing the history and achievements of the Wikipedia project, the document states:
“ | The Wikimedia Foundation is embarking on a new global project that will once again change the way people access knowledge on the Internet. Knowledge Engine By Wikipedia is a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web, applying fundamentals of transparent Wikibased systems to surfacing the most relevant and important information. Knowledge Engine By Wikipedia will democratize the discovery of media, news and information – it will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests. Our new site will be the Internet’s first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation.
The Problem The emergence of the Internet had promised massive democratization of content delivery. On the creation side, that promise has been largely fulfilled. Any person can easily add content to the enormous internet system. Simultaneously, as the availability of this information exploded, a few proprietary technologies began to consolidate channels of access to this data. This is accomplished through consolidation of access points into giant enterprises that today control user interfaces through device access, search, and media networks. The mechanisms by which the information on the internet is collected and displayed is largely obscured by proprietary algorithms. An exception to this pattern is Wikipedia. As a nonprofit, ad-free and collaboratively built site it has no incentives leveled upon the commercial systems. It is fully transparent in what information takes precedence, and how it is produced. It does not use personal data to market or sell to users or to optimize for ad revenue, and it prioritizes personal information security to avoid undue bias or censorship. In other words, it is aligned with user needs for transparency, clarity and trust. The Solution Knowledge Engine By Wikipedia will differ from commercial search engines in key areas:
Knowledge Engine By Wikipedia will surface important noncommercial results that are:
How Is It Different? The goal of today’s commercial engine is to give the user what they (or the interested party) think they want to know – the fact and data about a query: a medicine sold by a drug company, a movie ticket, or a most popular result. The knowledge engine of tomorrow will guide the user to discover what they need to know that is only available with a crowd-based knowledge engine: a new or alternative medicine producing better results at a lower price point, a book summary and source language and versions of the movies based on it, the most relevant result to the user’s area of exploration. Current engines rely on indexing and interlinking as the primary method for identifying and highlighting relevant results. In a world where data proliferation is rapid and unabiding, Wikipedia has a few advantages:
Our Knowledge Engine Will Be: Performance Based We are building a knowledge engine that has speed, open data, and relevance at its core. A new entry point to the sum of all knowledge, Knowledge Engine By Wikipedia has the responsiveness of commercial search engines and the ethos of Wikipedia and the Wikimedia Foundation. An Efficient Experience Quality is more important than quantity. The user doesn’t always need 10 or 20 or 200 results – they need the right set or even one result that provides a sufficient amount of knowledge with the contextual discovery to dig deeper. Still, in most searches, our knowledge engine will uncover a multitude of quality results, which should encourage a “down the rabbit-hole” discovery experience. The engine’s speed will bring consistency across the user interface, configuration options that adapt to users’ preferences, and an ease of experience that lets the user concentrate on the discovery task rather than the interface. Speed is crucial for global enablement but also for getting things done. Quickness and quality will be hallmarks of Knowledge Engine By Wikipedia. Openly Curated We are building a unique engine that sets us apart from commercial engines. Our knowledge engine leverages open data sources and champions an open understanding of where and how the results are calculated and curated. We have the unique opportunity to merge open knowledge graphs and data sources in a federated landscape. By combining human and machine curation, we are forming a holistic, usercentered model to drive our knowledge engine. A Multifaceted Tool Knowledge Engine By Wikipedia is much more than a search input – it’s like a collection of powerful apps and portals rolled into a singular interface and input. We’re creating a tool where questions like “show me the progress of an event” display contextual maps and timelines, and where a query reveals multiple types of media and data displayed with charts and visualizations – all in a way that illustrates quicker and more completely than text alone. With Knowledge Engine By Wikipedia, the user instantly gets the context of a query in a larger perspective. From an Open Community We’re focused on creating resources and tools for an open knowledge-engine community, and building on the input of an advisory team. We will strengthen the Application Programming Interface and the resources around the knowledge engine to enable us and others to build, contribute to, and extend the engine. “Openness” – through curation, sourcing, and community – means everyone can contribute to Knowledge Engine By Wikipedia, and everyone can use the results and software without restrictions. It's what the Internet was meant to be and it’s what Wikipedia is, and what our knowledge engine will be, too. |
” |
This is followed by a set of screen mock-ups labeled "Trending", "Multimedia Content", "Smarter Answers" and "Nearby" and an outline of the four stages of the plan:
“ | The Plan in Four Stages
We anticipate each stage will take 16–18 months to develop and transition into the overlapping stages. The Discovery stage has already begun, and each stage has the potential to overlap with other stages.
|
” |
There follows a timeline graphic and a more detailed description of these four stages, each comprising an introductory paragraph followed by an average of half a dozen bullet points. The document concludes with the table of costs reproduced on page 9 of the Knowledge Engine grant agreement, appended to which is the following:
“ | If we see significant progress on the project during the first six months of the fiscal year (July December 2015), we may petition the Wikimedia Foundation Board of Trustees for permission to seek and spend additional resources in support of the project.
Future Fiscal Years We anticipate future years’ budgets to increase by 20% per year as we accelerate the growth of the program. Projected future budgets FY 16–17: $2,900,000 FY 17–18: $3,500,000 Request of the Knight Foundation To support the project, we respectfully request $2 million per year for three fiscal years, which would make the Knight Foundation Knowledge Engine By Wikipedia's primary initial sponsor. The remaining initial support will come from the Wikimedia Foundation's general fund or from additional restricted grants. To identify other foundations that would support Knowledge Engine By Wikipedia, we welcome your suggestions and assistance. Thank you. |
” |
"August 2015 – WMF Submission to Knight"
The formal grant application, requesting a much reduced $250,000 from the Knight Foundation, summarizes the proposal as follows:
“ | Knowledge Engine By Wikipedia is a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web, applying fundamentals of transparent Wiki-based systems to surfacing the most relevant and important information.
The funds requested are in support of Stage One of this project. |
” |
The remainder of this document is largely reproduced on the latter pages of the grant agreement itself.
Another WMF departure
Siko Bouterse has announced that she will leave the Foundation on 25 February, the most recent in what has become a trend of departures from the organisation over the past few months. Bouterse joined the WMF nearly five years ago, first as head of community fellowships, then as head of grantmaking, with particular focus on the individual engagement grant scheme. She rose to be the director of community resources, responsible for some 13 staff. Bouterse is much respected among her staff and many members of the international community, and is regarded as a strong mentor. Among projects she has created or co-created are the IdeaLab on meta, the Inspire campaign to kickstart projects relevant to women, the WikiWomen's Collaborative, and the Teahouse, to help new editors on the English Wikipedia.
In her announcement to the WM mailing list, entitled Another goodbye, Bouterse wrote:
“ | Transparency, integrity, community and free knowledge remain deeply important to me, and I believe I will be better placed to represent those values in a volunteer capacity at this time. I am and will always remain a Wikimedian, so you'll still see me around the projects (User:Seeeko), hopefully with renewed energy and joy in volunteering. | ” |
Among others, former Board member Sam Klein wrote in the thread: "Hear, hear. You will always be tops in my book." SarahSV described Bouterse as "a strong supporter of women on Wikipedia and of improved community harmony." Sandra Rientjes, executive director of Wikimedia Netherlands, wrote: "It has been a real pleasure knowing you and working with you." Former English Wikipedia arbitrator Sydney Poore wrote: "Thank you for your enthusiasm and for being brave and bold in the way you support the community and staff. You will be missed."
The Signpost asked Bouterse by email to respond to the possible implication in her announcement that she no longer feels that the WMF’s direction or management are consistent with her personal or professional commitments to transparency, integrity, community, and free knowledge. We did not receive a reply by deadline.
Bouterse's portfolio continues to report to interim senior director of community engagement Maggie Dennis (User:Moonriddengirl), in whose abilities Bouterse wrote she has "full confidence". T
Brief notes
- Biennial Wikimania: Participants at a consultation on Meta supported experimenting with a new model for planning Wikimedia conferences where:
- Wikimania will be held every other year (with 2018 as the first non-Wikimania year), and
- In the years Wikimania is not held, regional and thematic conferences will receive increased support.
- Building connections between different conferences will become increasingly important in order to enhance the values that have been identified. AK
- Steward elections: Editors are alerted to the annual election for Wikimedia stewards, which started 8 February and will continue until 28 February, with 13 candidates standing. In parallel, editors are invited to participate in the annual confirmation of existing stewards. Stewards are users with complete access to the Wikimedia interface on all of the Foundation's sites; this includes the ability to change user rights and groups, and responsibility for technical implementation of community consensus and for dealing with emergencies such as cross-project vandalism. However, except in an emergency or when action across multiple projects is needed, stewards generally do not exercise their powers in a project that has local users with the required rights. T
Jeb Bush swings at Wikipedia and connects
On February 8 in Nashua, New Hampshire, US Presidential candidate Jeb Bush took a light-hearted swing at Wikipedia when he suspected the speaker introducing him at an event in New Hampshire of having read his Wikipedia biography:
“ | Did you go on Wikipedia to get that introduction? Yeah? So I've got to tell my first story.
It's the story of being introduced by someone, and they were trying to find things they had in common – the guy gets up and says, You know, I'm tired of regular kind of introductions, so I went on Wikipedia to find if I had anything in common with our guest speaker. And he proceeded then to go into this pretty lengthy introduction that I was an avid rock climber and that I had a secret desire to be a movie star. Neither of which is completely true. I'm from Florida – I mean, we don't have rocks, like you all here have in the Granite State – and I have no interest in being a movie star – you can tell that by my candidacy, can't you? So I get up and I have to tell the guy the truth, I am not any of this stuff, and it turns out there are people – they're typically, they're probably unemployed kids with student debt that are stuck in their parents' basement, with Cheeto stains on their T-shirts, that haven't been able to get their first job, and so, what they do, they play games to see how long they can edit Wikipedia pages in order to have games with their friends all around the world. So my advice to you is, if you do have a Wikipedia page, check it once in a while, 'cause you too may be an avid rock climber, or want to be a Hollywood movie star. (video: 5:14 onwards) |
” |
Investigation shows that for three and a half years Wikipedia did indeed claim that Bush was an avid rock climber. The unsourced claim was introduced by an IP on 10 December 2008, and deleted on 27 July 2012 by an IP address belonging to Bush's Foundation for Florida's Future.
Many other questionable claims have been introduced to Bush's Wikipedia biography over the years; directly adjacent to the rock-climbing claim, for example, the article once briefly asserted that Bush "also, with his limited spare time, raises Wolf cubs and releases them into the wild." That passage lasted less than three hours, when AlisonW deleted it as unsourced – while leaving the adjacent, equally unsourced claim in the article.
The incident was also mentioned in a Wall Street Journal piece.
Bush went on to come in fourth place in the New Hampshire Republican primary with 11 percent of the vote. In fairness to Jeb Bush, the likelihood that the insertion of the rock climbing claim was performed by a kid living in their parents' basement is > 0.
AK
In brief
- On the knowledge engine: The Register covers the recent fracas around the knowledge engine. (Feb. 12, Feb. 11) AK
- Putting Wales online: Wales Online writes about the work of Jason Evans (Jason.nlw), Wikipedian-in-Residence at The National Library of Wales. (Feb. 9) AK
- Longest-running hoax: The Business Insider and the i100 website, owned by The Independent, write about Wikipedia's longest-running hoax, Jack Robichaux, a fictitious 19th-century serial rapist in New Orleans. (Feb. 5, Feb. 4) AK
- Randomonium: In The Washington Post, Pulitzer Prize–winning journalist Gene Weingarten shares his enjoyment of Wikipedia: "I've been playing a new solitaire-like Internet game involving the Wikipedia "Random articles" option. With each click, the online encyclopedia randomly sends you to one of its millions of pages. I conclude that Wiki random is genuinely random, because it seems to make no effort to be interesting: Roughly 20 percent of the pages I was sent to were about species of moths. The object of the game is to keep doing this, rapidly, until you find a subject that you already know about, which is when the game ends; 23 tries is said to be average. I am currently at 40, and still going, but I’m not bummed out. It has taken me several hours because I'm savoring each site and diverting sideways to promising links." (Feb. 4) AK
- Scientific journal copies from Wikipedia: The Telegraph covers another example of an academic publication plagiarising Wikipedia and provides a round-up of earlier cases. (Feb. 2) AK
- On Wikipedia's US presidential campaign coverage: The New York Times looks at Wikipedia's coverage of the US presidential campaign. (Feb. 1) AK
- Black history: atlasobscura.com covers a black history edit-a-thon. (Jan. 30) AK
- Natasha Zouves: Peter Reynosa in The Huffington Post describes how he wrote the Wikipedia page for San Francisco anchor Natasha Zouves. (Jan. 27) AK
- Political photography: PetaPixel profiles the political photography of Gage Skidmore (Gage), whose high-quality Creative Commons images of politicians and other celebrities are ubiquitous on Wikipedia and Wikimedia Commons. Amazingly, PetaPixel omits all mention of Wikipedia or Wikimedia. (Jan. 26) G
This week's featured content
Text may be adapted from the respective articles and lists; see their page histories for attribution.
Featured articles
Three featured articles were promoted this week.
- Triturus (nominated by Tylototriton) is a genus comprising the crested and the marbled newts, which are found from Great Britain through most of continental Europe to westernmost Siberia, Anatolia, and the Caspian Sea region. They live and breed in vegetation-rich ponds or similar aquatic habitats for two to six months and usually spend the rest of the year in shady, protection-rich land habitats close to their breeding sites. Although not immediately threatened, the genus suffers from population declines.
- Jacob van Ruisdael (nominated by Edwininlondon) (circa 1629–1682) was a Dutch painter, draughtsman, and etcher, who is generally considered the pre-eminent landscape painter of the Dutch Golden Age. Ruisdael's work was in demand in the Dutch Republic during his lifetime, and today it's spread across private and institutional collections around the world (like the National Gallery in London, the Rijksmuseum in Amsterdam or the Hermitage Museum in St. Petersburg). He shaped landscape painting traditions worldwide, from the English Romantics to the Barbizon School in France, and the Hudson River School in the United States, and influenced generations of Dutch landscape artists.
- Liverpool F.C. is a Premier League football club based in Liverpool, England. Its history from 1959 to 1985 (nominated by NapHit) covers the period from the appointment of Bill Shankly as manager to the Heysel Stadium disaster and its aftermath. In 1962 the club regained it's place in the Football League First Division, and won the championship ten times during this time-frame. They also won the FA Cup (2x), the League Cup (4x), the European Cup (4x) and the UEFA Cup (2x).
Featured lists
One featured list was promoted this week.
- Charlize Theron (born 1975) is a South African and American actress, producer and fashion model. Her filmography (nominated by Cowlibob) includes forty-three films (including Hollywood Confidential) and eight television episodes. Four of her films are in the post-production stage, while she's currently filming The Coldest City. Theron is also an active in the field of producing with six films and a television pilot.
Featured topics
One featured topic was promoted this week.
- The 2015 Vuelta a España (nominated by Relentlessly) was a three-week Grand Tour cycling race. The race was the 70th edition of the Vuelta a España and took place principally in Spain, although two stages took place partly or wholly in Andorra. This topic contains one featured article, one featured list and two good articles.
Featured pictures
Nine featured pictures were promoted this week.
-
Promotional flyer for Lagu Kenangan
(created by the Persari Film Corporation; restored and nominated by Crisco 1492) -
Your Motherland Will Never Forget
(created by Joseph Simpson; restored and nominated by Adam Cuerden) -
Argentine escudo from 1828
(created by the United Provinces of the Rio de la Plata; nominated by Godot13)
A river of revilement
This week, some of the most hated men on Earth (or at least America, where most of our viewers live) line up to be collectively pelted with virtual rotten vegetables. In fairness, some on this list, such as its leader, Donald Trump and his victorious opponent in the Iowa caucus, Ted Cruz, have almost as many followers as detractors, but you'd be hard-pressed to find anyone who admires Bernie Madoff, the Ponzi schemer who bilked billions before being sent to a well-deserved prison cell, or Martin Shkreli, who jacked the price of a vital AIDS medicine 5000% before being arrested for securities fraud. And then there's O. J. Simpson, a man whom you, if you are over a certain age, will have a very definite personal opinion about whether he brutally murdered his wife.
For the full top-25 list, see WP:TOP25. See this section for an explanation of any exclusions. For a list of the most edited articles of the week, see here.
As prepared by Serendipodous, for the week of 31 January to 6 February 2016, the 10 most popular articles on Wikipedia, as determined from the report of the most viewed pages, were:
Rank Article Class Views Image Notes 1 Donald Trump 3,083,806 Donald Trump has so far sold his entire campaign on one word: win. He's the winner. Everyone else is a "loser". "We will have so much winning if I'm elected", he told a crowd in September, "that you may get bored with winning." Well, people certainly aren't bored yet because Trump failed to win his first test as a nominee, the Iowa caucus. And that lack of boredom may explain why he's number one on this list. 2 Iowa caucuses 1,781,879 Since 1972, the Iowa caucuses have marked the traditional beginning of the US Presidential primaries, in which the members of each of America's two main parties vote state by state to elect their nominee for President. This has struck a lot of people as mildly odd, since Iowa, with its 97% white, heavily Christian population, is not especially representative of the US as a whole, and since 2000 (a political generation ago) no Republican who has won there has gone on to win the party's nomination. 3 Zika virus 1,436,593 This unassuming flavivirus had, since its discovery in in Uganda in 1947, been seen as meek when stood among its more formidable cousins, such as Dengue, Yellow fever and West Nile. Whereas those could often prove fatal, Zika symptoms mostly compared to a nasty case of flu. However, its sudden pandemic spread throughout the Americas has triggered a panic in the US, particularly after a potentially related spate of microcephalic childbirths in Brazil. 4 Bernie Sanders 1,116,291 The self-described democratic socialist has seen his numbers double since last week, and nearly triple those of his rival and ostensible vanquisher in the Iowa caucus, Hillary Clinton, who isn't even on this list. Wikipedia viewers, much like America as a whole it seems, have favoured outsiders like Bernie in this contest, but his hair's-breadth 0.3% loss to Clinton (equivalent to just 750 votes) has shot him to prominence as never before. While even some in his own party view his plans as quixotic at best and confrontational at worst, his idealism has proven catnip to disenchanted young voters. 5 Ted Cruz 1,051,037 Since 2000, the Iowa caucus's Republican vote has been won by a Christian conservative, and it was also in 2000 that said Christian conservative (in the form of George W. Bush) last went on to win the party's nomination. If Ted Cruz bucks that trend, there will be a collective gasp from the Capitol, since before running for President, the Texas senator had a reputation as one of the most loathed men in Washington, at least among his Senate colleagues. John McCain called him a "whacko bird"; John Boehner called him a "jackass", and even fellow Texan and former boss George W. Bush admitted, "I just don't like the guy". He spearheaded a highly unpopular government shutdown in an ultimately failed attempt to stop Obamacare; he has openly embraced organizations that call for the execution of homosexuals and abortion doctors, and he actively disbelieves in the existence of man-made climate change. And yet it is this very antipathy he has generated that seems to have energised his popularity among America's most conservative voters, particularly those of a Christian fundamentalist bent, as voters across the political spectrum turn in rage against "the Establishment." Which goes some way towards explaining how he won this year's Republican Iowa Caucus. Well, that and some skulduggery involving his staffers deliberately releasing false reports of Ben Carson dropping out. 6 O. J. Simpson 1,019,172 As if we didn't have enough to fret about, the scandal once thought safely tucked away in the 90s is back with a vengeance, thanks to, as with most things in this decade, popular entertainment. The former football player, Leslie Nielsen costar and alleged murderer got a doubtless undesired surge in the popular consciousness on 2 February when American Crime Story, the true-crime spinoff of American Horror Story chose his trial as the focus for their first season, which means we will likely be seeing him on this list for months to come. Predictably this has also led to a surge in tabloid media coverage, which has upped the trial's currency by connecting it to the Kardashians. 7 Groundhog Day 957,194 This idiosyncratic American not-really-holiday (I once tried to explain it to a Chinese exchange student in college and failed) fell, as it always does, on 2 February. Thanks to the movie, most people in the world probably think it involves doing the same thing over and over again, but they're wrong; that's an average workday. For the still-perplexed, let me explain: every year, on the second day of February, Americans watch a groundhog, which is a large, potbellied marmot, emerge from its burrow. If it sees its shadow, it goes back in; if it doesn't, it comes out. Coming out heralds an early spring; staying in means six more weeks of winter. The custom is strongest in Pennsylvania, where it originated, and particularly Punxsutawney, home of the world's most famous groundhog, Punxsutawney Phil, who speaks his forecast in Groundhogese into the ear of the chairman of the Groundhog Club Inner Circle, who then translates for the audience. No I did not make that up. 8 Bernard Madoff 908,819 Television again raises a scandal from slumber this week, as the man who made off with tens of billions in a Ponzi scheme that cost the fortunes of, among many others, Kevin Bacon got a biopic miniseries from ABC. Like any good actor, Richard Dreyfuss, who portrayed him in the bio, tried to find some way to sympathise with the man and his actions, but ultimately, could not. “I started out thinking he was an inexcusable monster, [and] that’s the only conclusion,” he told Forbes. “I have no desire to find sympathy. His ability to inflict pain on others was unbelievable.” 9 Martin Shkreli 833,443 On 4 February, the former CEO of Turing Pharmaceuticals was called in to testify before the US Congress, after having been arrested by the FBI in December on charges of securities fraud. The 32-year-old Shkreli was already the prime target for current dissatisfaction with corporate greed after Turing obtained the manufacturing license for an antiparasitic drug and jacked up the price by over five thousand percent. His behaviour at the hearing, described by The New Yorker as "nothing but theater", only increased the vitriol being spewed at him from all sides. Commentators mocked his repeated use of the phrase, “On the advice of counsel, I invoke my Fifth Amendment privilege against self-incrimination and respectfully decline to answer your question,” even when asked to reaffirm the name of the Wu-Tang Clan. 10 Frederick Douglass 804,187 Thank you, Google Doodle, for capping this parade of unrepentant reprobates with someone genuinely admirable and heroic. The former slave whose oratory and literary skills lit a flame under the cause of abolition and also undermined the slavers' claim that the Negro could never attain the intellectual level necessary for free thought received deserved recognition on his 198th birthday. We don't know exactly when Douglass was born, but it was in February, and so Google granted him the first of the month.
Wikimedia Foundation removes The Diary of Anne Frank due to copyright law requirements
In an unfortunate example of the overreach of the United States’ current copyright law, the Wikimedia Foundation has removed the Dutch-language text of The Diary of a Young Girl—more commonly known in English as The Diary of Anne Frank—from Wikisource.[1]
We took this action to comply with the US Digital Millennium Copyright Act (DMCA), as we believe the diary is still under US copyright protection under the law as it is currently written. Nevertheless, our removal serves as an excellent example of why the law should be changed to prevent repeated extensions of copyright terms, an issue that has plagued our communities for years.
What prompted us to remove the diary?
The deletion was required because the Foundation is under the jurisdiction of US law and is therefore subject to the DMCA, specifically title 17, ch. 5, s. 512 of the United States Code. As we noted in 2013, “The location of the servers, incorporation, and headquarters are just three of many factors that establish US jurisdiction ... if infringing content is linked to or embedded in Wikimedia projects, then the Foundation may still be subject to liability for such use—either as a direct or contributory infringer.
Based on email discussions sent to the WMF at legal[at]wikimedia.org, we determined that the Foundation had either “actual knowledge” (in the statute quoted below) or what is commonly called “red flag knowledge” (in the statute quoted below) that the Anne Frank text was hosted on Wikisource and is under copyright. In the statute, a service provider is protected by the DMCA only when it:
(i) has no actual knowledge that the material or an activity using the material on the system or network is infringing; or
(ii) in the absence of such actual knowledge, is unaware of facts or circumstances from which infringing activity is apparent.
Further conditions can apply when a proper DMCA takedown notice is served.
Of particular concern, the US 9th Circuit Court of Appeals stated in UMG Recordings, Inc. v. Shelter Capital Partners LLC that in circumstances where a hosting provider (such as the WMF) is informed by a third party (such as an unrelated user) about infringing copyrighted content, that would likely constitute either actual or red-flag knowledge under the DMCA.
We believe, based on the detail and specificity contained in the emails we received, that we had actual knowledge sufficient for the DMCA to require us to perform a takedown even in the absence of a demand letter.
How is the diary still copyrighted?
You may wonder why or how the Anne Frank text is copyrighted at all, as Anne Frank died in February 1945. With 70 years having passed since her death, the text may have passed into public domain in the Netherlands on January 1, 2016, where it was first published, although there is still some dispute about this.
However, in the US, the Anne Frank original text will be under copyright until 2042. This is the result of several factors coming together, and the English Wikipedia has actually covered this issue with a multi-part test on its non-US copyrights content guideline.
In short, three major laws together make the diary still copyrighted:
- In general, the US copyright for works published before 1978 is 95 years from date of publication. This came about because copyrights in the US were originally for 28 years, with the ability to then extend that for a second 28 years (making a total of 56). Starting with the 1976 Copyright Act and extending to several more acts, the renewal became automatic and was extended. Today, the total term of works published before 1978 is 95 years from date of publication.
- Foreign works of countries that are treaty partners to the US are covered as if they were US works.
- Even if a country was not a treaty partner under copyright law at the time of a publication, the 1994 Uruguay Round Agreements Act (URAA) restored copyright to works that:
- had been published in a foreign country
- were still under copyright in that country in 1996
- and would have had US copyright but for the fact they were published abroad.
Court challenges to the URAA have all failed, with the most notable (Golan v. Holder) resulting in a Supreme Court ruling that upheld the URAA.
What that means for Anne Frank's diary is regrettably simple: no matter how it wound up in the US and regardless of what formal copyright notices they used, the US grants it copyright until the year 2042, or 95 years after its original publication in 1947.
Under current copyright law, this remains true regardless of its copyright status anywhere else in the world and regardless of whether it may have been in the public domain in the US.
- Jacob Rogers is a Legal Counsel at the Wikimedia Foundation; he thanks Anisha Mangalick, Legal Fellow, for her assistance in this matter.
- ^ The diary text was originally located on the Dutch-language Wikisource.