Jump to content

Wikipedia:Wikipedia Signpost/2016-02-10/Special report

From Wikipedia, the free encyclopedia
Special report

New internal documents raise questions about the origins of the Knowledge Engine


  • The Discovery FAQ on MediaWiki states that "We are not building Google. We are improving the existing CirrusSearch infrastructure with better relevance, multi-language, multi-projects search and incorporating new data sources for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites."
  • In a November 4 email to all WMF staff, provided to the Signpost by several WMF staffers, executive director Lila Tretikov expressly stated that the Knowledge Engine "is NOT ... a search engine".
  • Just hours before the release of the grant agreement, Jimmy Wales was even more blunt: "To make this very clear: no one in top positions has proposed or is proposing that WMF should get into the general "searching" or to try to "be google". It's an interesting hypothetical which has not been part of any serious strategy proposal, nor even discussed at the board level, nor proposed to the board by staff, nor a part of any grant, etc. It's a total lie."
  • However, these statements are flatly contradicted by the now-released grant agreement between the WMF and the Knight Foundation. Quotes such as the following make it abundantly clear that what is envisioned under the terms of the grant is indeed a search engine:

  • "Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet." (Page 1.)
  • "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" (Page 2.)
  • "Knowledge Engine by Wikipedia will democratize the discovery of media, news and information – it will make the Internet's most relevant information more accessible and openly curated, and it will create an open data engine that's completely free of commercial interests. Today, commercial search engines dominate search-engine use of the Internet, and they're employing proprietary technologies to consolidate channels of access to the Internet's knowledge and information. Their algorithms obscure the way the Internet's information is collected and displayed. ... Knowledge Engine by Wikipedia will be the Internet's first transparent search engine, and the first one originated by the Wikimedia Foundation." (Page 10.)
  • "Proceed with the search engine project as deliberately as possible – which is what the Wikimedia Foundation is doing" (Page 13.)
  • Three internal WMF documents illustrating how WMF thinking about the project evolved have been leaked to the Signpost:

    • An "April 2 – FINAL – Knight Search Presentation – 04.02.15"
    • A "June 24 Attachment 1 of 2 – Knowledge Engine by Wikipedia"
    • An "August 2015 – WMF Submission to Knight"
    While Heilman did not provide these documents to the Signpost, he confirmed their authenticity and stated that these were the same documents that were released to the entire Board following pressure from him and fellow Board member Dariusz Jemielniak, in the face of reluctance from other Board members and Tretikov. He told the Signpost that after "other board members told us we did not need to see" them "we pushed hard to have these documents released to the Board."

    We describe the documents in detail in this week's "In Focus". The earliest document, dated April 2, 2015, is a 12-slide presentation marked "FINAL". While the phrase "Knowledge Engine" does not appear, it's clear that even at this early stage, the "Wikipedia Search" referred to here was a well-developed concept. The presentation contrasts the ideals and motivations of commercial search engines – they "highlight paid results, track users' internet habits, sell information to marketing firms" – with those of "Wikipedia Search", which will be private, transparent, and globally representative. It repeatedly stresses that "No other search engines carry these ideals".

    Several well-designed examples of search results follow, including the one pictured above. They prominently brand Wikipedia and feature multimedia content and multiple Wikimedia projects such as Wiktionary and Wikivoyage. The results include non-wiki sources like Fox News and Open Maps.

    The June 24 document is a draft proposal for the project, by then referred to as the Knowledge Engine, which promises to be "a new global project that will once again change the way people access knowledge on the Internet", fully leveraging Wikipedia's and the WMF's resources, values, and reputation. The Knowledge Engine is described as "a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web" that "will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests". Knowledge Engine "will be the Internet’s first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation."

    The proposal divides the plan into four stages, each lasting 16–18 months. Interestingly, the first stage is called Discovery, which is the term the WMF currently uses to refer to the Knowledge Engine project. The proposal asks for US$6M from the Knight Foundation over three years. It pledges $2.4M of the WMF's own resources to the project for the current fiscal year, including eight presumably full-time engineers and two data analysts.

    The final document, dated August 5, 2015, resembles the publicly released current grant agreement in many ways, including much of the same language. The grant amount has dropped to its current $250,000, but this amount is only for the first Discovery phase of the larger Knowledge Engine project. Both the amount and its designation for phase one appear in the current grant agreement.

    These documents raise significant questions about how much the Knowledge Engine has actually evolved from April 2015 and what the technical and social implications of this project will be for Wikimedia.

    These questions are at the heart of the current debate regarding transparency, accountability, the relationship between the WMF and the Wikimedia community, and the uncertain direction of that movement.