Jump to content

Wikipedia:Verification proposal

From Wikipedia, the free encyclopedia

Proposal: Academic verified page subset – solutions to the problem of verified information via institutional sources.

In its short history, wikipedia expanded into on of the premier sources of knowledge on perhaps the widest amount of topics. It stands today as the largest encyclopedic source at any time in history. The power comes from the contributions of a world wide populace with open controls and a flexible, but maintainable, hierarchy. The one criticism of the project stems from the academic community who views the lack of peer review and verification of the information as a mark of the project’s lack of veracity and credibility.

However, the project garnered much acclaim and radically shifted the standard model of information gathering. While articles may be contributed by sources with non-accredited backgrounds, the information presented is often correct and often, if the author is diligent in their research, exemplary. In short, the model for wikipedia is a good one despite some weakness that empower those who wish to disrupt or interject biased, emotionally charged, or vandalized entries.

Goal

The goal of increasing wikipedia’s academic standing should be to enhance the project not restrict it. The current model works. Changing it would hinder the contributing community and undermine the project. For that reason any engineering effort should add features without limiting those in existence. However, it should also bridge the wikipedia community with the academic community to provide wikipedia with academic credibility without limiting the power of the established public. Another goal is to keep the project costs down to make the information available to all.

Proposal

There are many fallacies of logic often proposed when constructing a logical system. One is the False Dilemma, the concept that only a limited number of options (often two) are the only correct and available solutions. In this case, we could erroneously believe that academically verified information must stand apart from other sources. In other words a separate edition of wikipedia with academically verified information (one-best-way or one-smart-guy models) must be on a separate system from the publicly contributed system or a co-opted hybrid system must be created that combines the academically accepted one-best-way model with the communal contribution (wikipedia ideal model) with the one-best-way model superseding the wikipedia ideal. In other words, the academic in a co-opted model would be able to shut out the public from contributing to the information pool. This model leads to a stagnation of information.

Before we continue, let’s look at the relative strengths and weaknesses of the two accepted models:

Wikipedia Ideal (information contribution is open and available for comment):

Strength:

  • Timely information
  • “Living” information that grows and contracts as necessary
  • Limited controls with community accountability.
  • Numerous subjects not addressed by academic fields
  • Publicly affordable (free)
  • Overall quality acceptable if not exceptional
  • Excellent starting point for research

Weakness:

  • Information is debatable
  • Veracity of information can be questionable (but not always)
  • Depth of information sporadic
  • Presentation of information not uniform and sometimes incoherent

One-best-way (Modern Encyclopedic Method – Scholarly input with fact checking and peer review)

Strength:

  • Information undergoes rigourous standards (High Quality)
  • Academically accepted
  • Information verified

Weakness

  • Slow process
  • Limited subjects
  • “Embalmed” information (information becomes out of date with changes in the subject environment)
  • Costly (compensation normally needed for contributing sources)

There is another hybrid which obtains the above stated goals and preserves both models in full while addressing some of the weaknesses of the two model. This model is the co-existence model where both methodologies exist in the same system.

Co-existence Hybrid Encyclopedic Model

Step one: Preserve the current wikipedia methodology. In otherwords DO NOT CHANGE THE OPERATING MODEL OF THE CURRENT BASE OF INFORMATION OR FUTURE CONTRIBUTIONS BY THE PUBLIC. That is not to say don’t enhance the model as necessary. The current wikipedia ideal allows for necessary adaptation. However, don’t throw out what seems to be a working system.

Step two: Create a subset of the information that is academically accepted. In other words, within the methodology and software of wikipedia add the ability to add academic information that can only be contributed by academics. To do this, create a field in the wikipedia database that marks the page as an academic commentary. Make this information a separate page to exist as a subset of the publicly accessible page and place a button at the top of the public page marked “academic contribution.” On the “academic contribution” page, place a button to allow the reader to access the “public information” page. Design the page to appeal to academics. In other words, list the members credentials such as degree and verified area of expertise. This will provide a check when say someone with a degree in Engineering tries to make an edit on a political entry.

Step Two and a half (optional): Design an academic page template While this is an optional implementation, providing a page that meets academic standards and encourages academic contributions would help in obtaining acceptance. Academics pride themselves on the quality of their information and are required to publish in order to maintain their academic standing. Automatically listing the contributors body of works in a bibliography frame and allowing academics to upload papers related to their subjects would greatly encourage them to contribute to a free source. In other words, providing them a subtle means of advertising not only gives the author incentive to join the wikipedia community, but also strengthens the repository of knowledge. It is recommended that in doing so a disclaimer that all information must be contributed under Title 14 of the U.S. code to avoid copyright infringement. (please see technical section bellow for proposed methods of handling and funding the disk resources for file uploads for the academic portion of wikipedia) Also, with the addition of the bibliography section (to include ISBN numbers) link the ISBN numbers to bring the reader to Amazon.com or other bookselling site so that the reader can purchase the published work.

Step Three: Create an academic sub-community rights group. Create a rights group who have control over academic pages. Membership in this group would be limited to those individuals with academic credentials. To reduce on overhead, work with local colleges and universities to give institutions the right to add members. Let them supply membership verification and rescind rights. Make sure that top level wikipedia members have the ability to arbitrate membership and not give complete control to the academic community. This will shift responsibility and control to institutions while allowing wikipedia to be the final authority on what gets added. The model for this is the web of trust. Wikipedia supplies trust to institutions who in turn trust their members. The computer security community uses the concept often and it is a great model.

Step Four: Academic Sub-search With the fields in the wikipedia index established, a limiting search option only delivering the academic pages becomes possible. It also enables the wikipedia team to create sub-indexes for faster searches (see technical section bellow). Public contributors do not have to fear usurpation of their information since the public information button provides a link back to their contributions.

Technical

The author of this proposal is at a disadvantage. He does not know the topology of the implemented wikipedia system and does not have insight into the resources of the wikipedia organization. All following information is speculative. However based on publicly available information in wiki and his knowledge of systems and network design, he will speculate on possible solutions to implementing the proposed addition. In the realm of computer science, the concept of adding fields to the backend of wiki (hopefully mysql or some other database) is trivial. Creating the rights group may not be since wiki was designed to be an open trust based system. However, with a little work wiki could do what is described above. The real question comes with storing information and the wikipedia team will have to decide the best way to manage resources. One option is to dump all index entries with an academic tag to a separate database for searching and retrieval. This is trivial and provided faster search capabilities for academic sources. Another idea is to clone only academic entries onto the separate system. In reality it may be best to keep the public wikipedia back end separate from the academic backend. The two system approach allows for resources to be separate thus allowing two funding sources (the current wikipedia funding source and academic funding for the academic source). However search results may yield different results. For instance, a search on the public side will yield results matching only those sources that are contained in the public database while the academic will list only those in the academic database. Ideally, search results would give any page, public or academic, with a matching entry. Obviously a total search of wikipedia containing both sources would fork and happen on both systems. However, anyone with symmetric multiprocessing knowledge would know that implementation would be tricky as results will be reported at two separate time intervals. Moreover, the results would need to be concatenated. An option might be to have a master database index merging the indexes of both systems. Again, this solution’s implementation is not trivial. It is however possible as long as the namespace on both machines is maintained (this can be done during merging of the indexes by making sure that academic page names – assuming page name is the key index – contain an automatic additional character such as * or some character outside the alphanumeric display range). Separating out the indexes does decrease the search time significantly though for different types of information.

The proposed method of handling academic files such as papers is to create a disk storage array specifically to store files. It is suggested that wikipedia solicit funding for such and endeavor. Doing so will also give institutions a stake in the project.

Devin Cambridge is a systems engineer with over 10 years of experience. He is an expert in information security and high performance computing having worked for over 7 years for SGI/Cray research. He is also member of the International Informations Systems Security Certification Consortium(ISC)2 and CISSP certified.