Jump to content

User:Moudy83/conference papers

From Wikipedia, the free encyclopedia

Conference presentations and papers

[edit]
See also: Wikimania and WikiSym conference series
This table is sortable.
Authors Title Conference / published in Year Online Notes Abstract Keywords
Choi, Key-Sun IT Ontology and Semantic Technology International Conference on Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. 2007
IT (information technology) ontology is to be used for analyzing the information technology as well as for enhancing it. Semantic technology is compared with the syntactic one. Ontology plays a backbone for meaning-centered reconfiguration of syntactic structure, which is one aspect of semantic technology. The purpose of use of IT} ontology will be categorized into two things: to capture the right information and services for user requests, and on the other hand, to give insights for the future IT} with their possible paths by interlinking relations on component classes and instances. Consider question-answering based on ontology to improve the performance of QA.} Each question type (e.g., {5W1H) will seek its specific relation from the ontology that has already been acquired from the relevant information resources (e.g., Wikipedia or news articles). The question is whether such relations and related classes are so neutral independent of domain or they are affected by each specific-domain. The first step of ontology learning for question-answering application is to find such neutral relation discovery mechanism and to take care of the special distorted relation-instance mapping when populating on the domain resources. Then, we will consider the domain ontology acquisition by top-down manner from already made similar resources (e.g., domain-specific thesaurus) and also bottom-up manner from the relevant resources. But the already-made resources should be checked against the current available resources for their coverage. Problem is that thesaurus is comprised of classes, not the instances of terms that appear in corpora. They have little coverage over the resources, and even the mapping between classes and instances has not been established yet in this stage. Clustering technology could now filter out the irrelevant mappings. Features of clustering could be improved more accurate by using more semantic ones that have been accumulated during the steps. For example, discov- ery process based on patterns could be evolved by putting the discovered semantic features into the patterns. Keeping ontology use for question-answering in mind, it is asked for how much the acquired ontology can represent the resources used for acquisition processes. Derived questions are summarized into two about: (1) how such ideal complete ontology could be generated for each specification of use, and (2) how much ontology contributes to the intended problem-solving. The ideal case is to convert all of resources to their corresponding ontology. But if presupposing the gap between the meaning of resources and acquired ontology, a set of raw chunks in resources may be still effective to answer for given questions with some help from acquired ontology or even without resort to them. Definitions of classes and relations in ontology would be manifested through dual structure to supplement the complementary factors between the idealized complete noise-free ontology shape and incomplete error-prone knowledge. In the result, we now confront two problems: how to measure the ontology effectiveness for each situation, and how to compare with the use of ontology for each application and to transform into another shape of ontology depending on application, that could be helped by granularity control and even extended to reconfiguration of knowledge structure. In the result, the intended IT} ontology is modularized enough to be compromised later for each purpose of use, and in efficient and effective ways. Still we have to solve definition questions and their translation to ontology forms.
Paci, Giulio; Pedrazzi, Giorgio & Turra, Roberta Wikipedia based semantic metadata annotation of audio transcripts International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th 2010
A method to automatically annotate video items with semantic metadata is presented. The method has been developed in the context of the Papyrus project to annotate documentary- like broadcast videos with a set of relevant keywords using automatic speech recognition (ASR) transcripts as a primary complementary resource. The task is complicated by the high word error rate (WER) of the ASR} for this kind of videos. For this reason a novel relevance criterion based on domain information is proposed. Wikipedia is used both as a source of metadata and as a linguistic resource for disambiguating keywords and for eliminating the out of topic/out of domain keywords. Documents are annotated with relevant links to Wikipedia pages, concepts definitions, synonyms, translations and concepts categories.
Shachaf, P.; Hara, N.; Herring, S.; Callahan, E.; Solomon, P.; Stvilia, B. & Matei, S. Global perspective on Wikipedia research Proceedings of the American Society for Information Science and Technology 2008 [1]
This panel will provide a global perspective on Wikipedia research. The literature on Wikipedia is mostly anecdotal, and most of the research has focused attention primarily on the English Wikipedia examining the accuracy of entries compared to established online encyclopedias (Emigh} \& Herring, 2005; Giles, 2005; Rosenzweig, 2006) and analyzing the evolution of articles over time (Viégas, Wattenberg, \& Dave, 2004; Viégas, Wattenberg, Kriss, \& van Ham, 2007). Others have examined the quality of contribution (Stvilia} et al., 2005). However, only a few studies have conducted comparative analyses across languages or analyzed Wikipedia in languages other than English (e.g., Pfeil, Zaphiris, \& Ang, 2006). There is a need for international, cross-cultural understanding of Wikipedia. In an effort to address this gap, this panel will present a range of international and cross-cultural research of Wikipedia. The presenters will contribute different perspectives of Wikipedia as an international sociocultural institution and will describe similarities and differences across various national/language versions of Wikipedia. Shachaf and Hara will present variation of norms and behaviors on talk pages in various languages of Wikipedia. Herring and Callahan will share results from a cross-language comparison of biographical entries that exhibit variations in content of entries in the English and Polish versions of Wikipedia and will explain how they are influenced by the culture and history of the US} and Poland. Stvilia will discuss some of the commonalities and variability of quality models used by different Wikipedias, and the problems of cross-language quality measurement aggregation and reasoning. Matei will describe the social structuration and distribution of roles and efforts in wiki teaching environments. Solomon's comments, as a discussant, will focus on how these comparative insights provide evidence of the ways in which an evolving institution, such as Wikipedia, may be a force for supporting cultural identity (or not).
Schumann, E. T.; Brunner, L.; Schulz, K. U. & Ringlstetter, C. A semantic interface for post secondary education programs Proceedings of the American Society for Information Science and Technology 2008 [2]
We describe a prototype for a multilingual semantic interface to the academic programs of a university. Navigating within a graph model of the academic disciplines and fields, the users are led to course and program documents. For core academic concepts, informational support is provided by language specific links to Wikipedia. The web-based prototype is currently evaluated in a user study.
Ueda, H. & Murakami, H. Suggesting Japanese subject headings using web information resources Proceedings of the American Society for Information Science and Technology 2006 [3]
we propose a method that suggests BSH4} (Japan} Library Association, 1999) subject headings according to user queries when pattern matching algorithms fail to produce a hit. As user queries are diverse and unpredictable, we explore a method that makes a suggestion even when the query is a new word. We investigate the use of information obtained from Wikipedia (“Wikipedia,‿} n.d.), the Amazon Web Service (AWS), and Google. We implemented the method, and our system suggests ten BSH4} subject headings according to user queries.
Gazan, R.; Shachaf, P.; Barzilai-Nahon, K.; Shankar, K. & Bardzell, S. Social computing as co-created experience Proceedings of the American Society for Information Science and Technology 2007 [4]
One of the most interesting effects of social computing is that the line between users and designers has become increasingly uncertain. Examples abound-user-generated content, rating and recommendation systems, social networking sites, open source software and easy personalization and sharing have effectively allowed users to become design partners in the creation of online experience. This panel will discuss four examples of social computing in practice, including the exercise of virtual social capital by members of the Answerbag online question-answering community, the thriving yet understudied user interactions on Wikipedia talk pages, self-regulation mechanisms of gatekeeping in virtual communities, and collaborative design practices within Second Life, a Massively Multiplayer Online Game (MMOG) that is also an interactive design environment. The aim of this panel is to challenge traditional understanding of users' role in the creation and evolution of information systems, and work toward a more realistic conceptualization of Web 2.0 users as both a source of, and a solution to, the overabundance of information created via social computing.
Buzydlowski, J. W. Exploring co-citation chains Proceedings of the American Society for Information Science and Technology 2006 [5]
The game {“Six} Degrees of Kevin Bacon‿ is played by naming an actor and then, by thinking of other actors in movies such that a chain of connections can be made, linking the named actor with Kevin Bacon. The number of different movies that are used to link the actor to Bacon indicate the degree with which the two are linked. For example, using John Travolta as the named actor, he appeared in the movie Look Who' s Talking with Kirstie Alley, who was in She' s Having a Baby with Kevin Bacon. So, John Travolta has a Bacon number or degree of two, as connected via Kirstie Alley. (For} a more thorough discussion, see (http://en.wikipedia.org/wiki/Six\_Degrees\_of\_Kevin\_Bacon).} The example is taken from (http://www.geocities.com/theeac/bacon.html)). Based on the above, perhaps another title for this paper could be the {“Six} Degrees of Sir Francis Bacon,‿ as it indicates the framework for this paper by relating it to the above technique but placing it in an academic domain through the use of a scholarly bibliographic database. Additionally, the bibliometric technique of author co-citation analysis (ACA) will be used to help by automating the process of finding the connections.
Shachaf, P.; Hara, N.; Bonk, C.; Mackey, T. P.; Hemminger, B.; Stvilia, B. & Rosenbaum, H. Wiki a la carte: Understanding participation behaviors Proceedings of the American Society for Information Science and Technology 2007 [6]
This panel focuses on trends in research on Wikis. Wikis have become prevalent in our society and are used for multiple purposes, such as education, knowledge sharing, collaboration, and coordination. Similar to other popular social computing tools, they raise new research questions and have attracted the attention of researchers in information science. While some focus on the semantic web, the automatic processing of data accumulated by users, and tool improvements, others discuss social implications of Wikis. This panel presents five studies that address the social uses of Wikis that support information sharing. In their studies, the panelists use a variety of novel applications of research methods, such as action research, and online ethnography, site observation, survey, and interviews. The panelists will present their findings: Shachaf and Hara will discuss Wikipedians' norms and behaviors; Bonk will present collaborative writing on Wikibook; Mackey will discuss authorship and collaboration in PBwiki.com;} Hemminger will share results from the early use of wikis for conference communications; and Stvilia will outline the community mechanism of information quality assurance in Wikipedia.
Shachaf, P.; Hara; Eschenfelder, K.; Goodrum, A.; Scott, L. C.; Shankar, K.; Ozakca, M. & Robbin Anarchists, pirates, ideologists, and disasters: New digital trends and their impacts Proceedings of the American Society for Information Science and Technology 2006 [7]
This panel will address both online disasters created by anarchists and pirates and disaster relief efforts aided by information and communication technologies (ICTs).} An increasing number of people use (ICTs) to mobilize their resources and enhance their activities. This mobilization has unpredictable consequences for society: On one hand, use of ICT} has allowed for the mobilization of millions of people for disaster relief efforts and peace movements. On the other hand, it has also helped hackers, pirates to carryout destructive activities. In many cases it is hard to judge the moral consequences of the use of ICT} by marginalized groups. The panel will present five studies of which three will focus on online disobedience and two will focus on ICT} use for disaster. Together these presentations illustrate both positive and negative consequences of the new digital trends. Goodrum deliberates on an ethic of hacktivism in the context of online activism. Eschenfelder discusses user modification of or resistance to technological protection measures. Shachaf and Hara present a study of anarchists who attack information posted on Wikipedia and modify the content by deleting, renaming, reinterpreting, and recreating information according to their ideologies. Scott examines consumer media behaviors after hurricane Katrina and Rita disasters. Shankar and Ozakca discuss volunteer efforts in the aftermath of hurricane Katrina.
Ayers, P. Researching wikipedia - current approaches and new directions Proceedings of the American Society for Information Science and Technology 2006 [8]
Wikipedia (), an international, multi-lingual and collaboratively produced free online encyclopedia, has experienced massive growth since its inception in 2001. The site has become the world's single largest encyclopedia as well as one of the world's most diverse online communities. Because of these factors, the site provides a unique view into the processes of collaborative work and the factors that go into producing encyclopedic content. To date, there has been no unified review of the current research that is taking place on and about Wikipedia, and indeed there have been few formal studies of the site, despite its growing importance. This project is a review of social science and information science studies of the site, focusing on research methods and categorizing the areas of the site that have been studied so far. Studies of Wikipedia have focused primarily on the social dynamics of contributors (such as how disputes are resolved and why contributors participate), and the content of Wikipedia (such as whether it is an accurate source), but due to the unique collaborative processes on Wikipedia these two areas are deeply intertwined.
Sundin, O. & Haider, J. Debating information control in web 2.0: The case of Wikipedia vs. Citizendium Proceedings of the American Society for Information Science and Technology 2007 [9]
Wikipedia is continually being scrutinised for the quality of its content. The question addressed in this paper concerns which notions of information, of collaborative knowledge creation, of authority and of the role of the expert are drawn on when information control in WP} is discussed. This is done by focusing on the arguments made in the debates surrounding the launch of Citizendium, a proposed new collaborative online encyclopaedia. While Wikipedia claims not to attribute special status to any of its contributors, Citizendium intends to assign a decision-making role to subject experts. The empirical material for the present study consists of two online threads available from Slashdot. One, {“A} Look inside Citizendium‿, dates from September, the second one {“Co-Founder} Forks Wikipedia‿ from October 2006. The textual analysis of these documents was carried out through close interpretative reading. Five themes, related to different aspects of information control emerged: 1.information types, 2.information responsibility, 3. information perspectives, 4. information organisation, 5. information provenance \& creation. Each theme contains a number of different positions. It was found that these positions not necessarily correspond with the different sides of the argument. Instead, at times the fault lines run through the two camps.
Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike Individual Learning and Collaborative Knowledge Building with Shared Digital Artifacts. Proceedings of World Academy of Science: Engineering \& Technology 2008
The development of Internet technology in recent years has led to a more active role of users in creating Web content. This has significant effects both on individual learning and collaborative knowledge building. This paper will present an integrative framework model to describe and explain learning and knowledge building with shared digital artifacts on the basis of Luhmann's systems theory and Piaget's model of equilibration. In this model, knowledge progress is based on cognitive conflicts resulting from incongruities between an individual's prior knowledge and the information which is contained in a digital artifact. Empirical support for the model will be provided by 1) applying it descriptively to texts from Wikipedia, 2) examining knowledge-building processes using a social network analysis, and 3) presenting a survey of a series of experimental laboratory studies.
Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming EFS: Expert Finding System based on Wikipedia link pattern analysis IEEE International Conference on Systems, Man and Cybernetics, 2008. SMC 2008. 2008
Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or Web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia Web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like LdquoArtificial} Intelligencerdquo and LdquoImage} \& Pattern Recognitionrdquo etc.
Mullins, Matt & Fizzano, Perry Treelicious: A System for Semantically Navigating Tagged Web Pages IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2010
Collaborative tagging has emerged as a popular and effective method for organizing and describing pages on the Web. We present Treelicious, a system that allows hierarchical navigation of tagged web pages. Our system enriches the navigational capabilities of standard tagging systems, which typically exploit only popularity and co-occurrence data. We describe a prototype that leverages the Wikipedia category structure to allow a user to semantically navigate pages from the Delicious social bookmarking service. In our system a user can perform an ordinary keyword search and browse relevant pages but is also given the ability to broaden the search to more general topics and narrow it to more specific topics. We show that Treelicious indeed provides an intuitive framework that allows for improved and effective discovery of knowledge.
Achananuparp, Palakorn; Han, Hyoil; Nasraoui, Olfa & Johnson, Roberta Semantically enhanced user modeling Proceedings of the ACM} Symposium on Applied Computing 2007 [10]
Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient representation of the user model as it ignores the semantic relations between terms. In this paper, we present a novel method to enhance a traditional term-based user model with WordNet-based} semantic similarity techniques. To achieve this, we use word definitions and relationship hierarchies in WordNet} to perform word sense disambiguation and employ domain-specific concepts as category labels for the derived user models. We tested our method on Windows to the Universe, a public educational website covering subjects in the Earth and Space Sciences, and performed an evaluation of our semantically enhanced user models against human judgment. Our approach is distinguishable from existing work because we automatically narrow down the set of domain specific concepts from initial domain concepts obtained from Wikipedia and because we automatically create semantically enhanced user models. ""
Adafre, Sisay Fissaha; Jijkoun, Valentin & De, Rijke Fact discovery in Wikipedia IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Nov 2 - 5 2007 Silicon Valley, CA, United states 2007 [11]
We address the task of extracting focused salient information items, relevant and important for a given topic, from a large encyclopedic resource. Specifically, for a given topic (a Wikipedia article) we identify snippets from other articles in Wikipedia that contain important information for the topic of the original article, without duplicates. We compare several methods for addressing the task, and find that a mixture of content-based, link-based, and layout-based features outperforms other methods, especially in combination with the use of so-called reference corpora that capture the key properties of entities of a common type. ""
Adafre, Sisay Fissaha; Jijkoun, Valentin & Rijke, Maarten De Link-based vs. content-based retrieval for question answering using Wikipedia 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, September 20, 2006 - September 22, 2006 Alicante, Spain 2007
We describe our participation in the WiQA} 2006 pilot on question answering using Wikipedia, with a focus on comparing linkbased vs content-based retrieval. Our system currently works for Dutch and English. Springer-Verlag} Berlin Heidelberg 2007.
Adar, Eytan; Skinner, Michael & Weld, Daniel S. Information arbitrage across multi-lingual Wikipedia 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [12]
The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage-leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries. ""
Alencar, Rafael Odon De; Jr., Clodoveu Augusto Davis & Goncalves, Marcos Andre Geographical classification of documents using evidence from Wikipedia 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland 2010 [13]
Obtaining or approximating a geographic location for search results often motivates users to include place names and other geography-related terms in their queries. Previous work shows that queries that include geography-related terms correspond to a significant share of the users' demand. Therefore, it is important to recognize the association of documents to places in order to adequately respond to such queries. This paper describes strategies for text classification into geography-related categories, using evidence extracted from Wikipedia. We use terms that correspond to entry titles and the connections between entries in Wikipedia's graph to establish a semantic network from which classification features are generated. Results of experiments using a news data-set, classified over Brazilian states, show that such terms constitute valid evidence for the geographical classification of documents, and demonstrate the potential of this technique for text classification. ""
Amaral, Carlos; Cassan, Adan; Figueira, Helena; Martins, Andre; Mendes, Afonso; Mendes, Pedro; Pinto, Claudia & Vidal, Daniel Priberam's question answering system in QA@CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [14]
This paper accounts for Priberam's participation in the monolingual question answering (QA) track of CLEF} 2007. In previous participations, Priberam's QA} system obtained encouraging results both in monolingual and cross-language tasks. This year we endowed the system with syntactical processing, in order to capture the syntactic structure of the question. The main goal was to obtain a more tuned question categorisation and consequently a more precise answer extraction. Besides this, we provided our system with the ability to handle topic-related questions and to use encyclopaedic sources like Wikipedia. The paper provides a description of the improvements made in the system, followed by the discussion of the results obtained in Portuguese and Spanish monolingual runs. 2008 Springer-Verlag} Berlin Heidelberg.
Arribillaga, Esnaola Active knowledge generation by university students through cooperative learning 2008 ITI 6th International Conference on Information and Communications Technology, ICICT 2008, December 16, 2008 - December 18, 2008 Cairo, Egypt 2008 [15]
Social and cultural transformations caused by the globalisation have fostered changes in current universities, institutions which, doing an intensive and responsible use of technologies,have to create a continuous improvement-based pedagogical model consisting on Communities.To} this end, we propose here the adoption of the so-called hacker ethic, which highlights the importance of collaborative, passionate, creative as well as socially-valuable work. Applying this ethic to higher education, current universities may become Net-Academy-based} Universities.Therefore, these institutions require a new digital culture that allow the transmission of hacker ethic's values and, in turn, a Net-Academy-based} learning model that enable students transform into knowledge generators. In this way, wikitechnology-based systems may help universities to achieve the transformation they need. We present here an experiment to check whether these kind of resources transmit to the students the values of the hacker ethic allowing them to become active knowledge generators. This experiment revealed the problems of such technologies with the limits of the scope of the community created and the non-so-active knowledge-generator role of the students. Against these shortcomings, we address here a Wikipedia-based methodology and discuss the possibilities of this alternative to help current universities upgrade into Net-Academy-based} universities. ""
Ashoori, Elham & Lalmas, Mounia Using topic shifts in XML retrieval at INEX 2006 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007
This paper describes the retrieval approaches used by Queen Mary, University of London in the INEX} 2006 ad hoc track. In our participation, we mainly investigate element-specific smoothing method within the language modelling framework. We adjust the amount of smoothing required for each XML} element depending on its number of topic shifts to provide a focused access to XML} elements in the Wikipedia collection. We also investigate whether using non-uniform priors is beneficial for the ad hoc tasks. Springer-Verlag} Berlin Heidelberg 2007.
Auer, Soren; Bizer, Christian; Kobilarov, Georgi; Lehmann, Jens; Cyganiak, Richard & Ives, Zachary DBpedia: A nucleus for a Web of open data 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [16]
DBpedia} is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia} allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia} datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia} community and show how website authors can facilitate DBpedia} content within their sites. Finally, we present the current status of interlinking DBpedia} with other open datasets on the Web and outline how DBpedia} could serve as a nucleus for an emerging Web of open data. 2008 Springer-Verlag} Berlin Heidelberg.
Augello, Agnese; Vassallo, Giorgio; Gaglio, Salvatore & Pilato, Giovanni A semantic layer on semi-structured data sources for intuitive chatbots International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2009, March 16, 2009 - March 19, 2009 Fukuoka, Japan 2009 [17]
The main limits of chatbot technology are related to the building of their knowledge representation and to their rigid information retrieval and dialogue capabilities, usually based on simple pattern matching rules". The analysis of distributional properties of words in a texts corpus allows the creation of semantic spaces where represent and compare natural language elements. This space can be interpreted as a "conceptual" space where the axes represent the latent primitive concepts of the analyzed corpus. The presented work aims at exploiting the properties of a data-driven semantic/conceptual space built using semistructured data sources freely available on the web like Wikipedia. This coding is equivalent to adding into the Wikipedia graph a conceptual similarity relationship layer. The chatbot can exploit this layer in order to simulate an "intuitive" behavior attempting to retrieve semantic relations between Wikipedia resources also through associative sub-symbolic paths. """
Ayu, Media A.; Taylor, Ken & Mantoro, Teddy Active learning: Engaging students in the classroom using mobile phones 2009 IEEE Symposium on Industrial Electronics and Applications, ISIEA 2009, October 4, 2009 - October 6, 2009 Kuala Lumpur, Malaysia 2009 [18]
Audience Response Systems (ARS) are used to achieve active learning in lectures and large group environments by facilitating interaction between the presenter and the audience. However, their use is discouraged by the requirement for specialist infrastructure in the lecture theatre and management of the expensive clickers they use. We improve the ARS} by removing the need for specialist infrastructure, by using mobile phones instead of clickers, and by providing a web based interface in the familiar Wikipedia style. Responders usually vote by dialing and this has been configured to be cost free in most cases. The desirability of this approach is shown by the use the demonstration system has had with 21, 000 voters voting 92, 000 times in 14, 000 surveys to date. ""
Babu, T. Lenin; Ramaiah, M. Seetha; Prabhakar, T.V. & Rambabu, D. ArchVoc - Towards an ontology for software architecture ICSE 2007 Workshops:Second Workshop on SHAring and Reusing architectural Knowledge Architecture, Rationale, and Design Intent, SHARK-ADI'07, May 20, 2007 - May 26, 2007 Minneapolis, MN, United states 2007 [19]
Knowledge management of any domain requires controlled vocabularies, taxonomies, thesauri, ontologies, concept maps and other such artifacts. This paper describes an effort to identify the major concepts in software architecture that can go into such meta knowledge. The concept terms are identified through two different techniques (I) manually, through backof-the-book index of some of the major texts in Software Architecture (2) through a semi-automatic technique by parsing the Wikipedia pages. Only generic architecture knowledge is considered. Apart from identifying the important concepts of software architecture, we could also see gaps in the software architecture content in the Wikipedia. ""
Baeza-Yates, Ricardo Keynote talk: Mining the web 2.0 for improved image search 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009, December 2, 2009 - December 4, 2009 Graz, Austria 2009 [20]
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself. ""
Banerjee, Somnath Boosting inductive transfer for text classification using Wikipedia 6th International Conference on Machine Learning and Applications, ICMLA 2007, December 13, 2007 - December 15, 2007 Cincinnati, OH, United states 2007 [21]
Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1} corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting. ""
Baoyao, Zhou; Ping, Luo; Yuhong, Xiong & Wei, Liu Wikipedia-graph based key concept extraction towards news analysis 2009 IEEE Conference on Commerce and Enterprise Computing, CEC 2009, July 20, 2000 - July 23, 2009 Vienna, Austria 2009 [22]
The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the theme of the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation. ""
Bautin, Mikhail & Skiena, Steven Concordance-based entity-oriented search IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, November 2, 2007 - November 5, 2007 Silicon Valley, CA, United states 2007 [23]
We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL} dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39\% of all web search queries can be recognized as specifically searching for entities, while 73-87\% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people. ""
Beigbeder, Michel Focused retrieval with proximity scoring 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland 2010 [24]
We present in this paper a scoring method for information retrieval based on the proximity of the query terms in the documents. The idea of the method first is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. Some experiments on the Wikipedia collection used in the INEX} 2008 evaluation campaign are presented and discussed. ""
Beigbeder, Michel; Imafouo, Amelie & Mercier, Annabelle ENSM-SE at INEX 2009: Scoring with proximity and semantic tag information 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [25]
We present in this paper some experiments on the Wikipedia collection used in the INEX} 2009 evaluation campaign with an information retrieval method based on proximity. The idea of the method is to assign to each position in the document a fuzzy proximity value depending on its closeness to the surrounding keywords. These proximity values can then be summed on any range of text - including any passage or any element - and after normalization this sum is used as the relevance score for the extent. To take into account the semantic tags, we define a contextual operator which allow to consider at query time only the occurrences of terms that appear in a given semantic context. 2010 Springer-Verlag} Berlin Heidelberg.
Bekavac, Bozo & Tadic, Marko A generic method for multi word extraction from wikipedia ITI 2008 30th International Conference on Information Technology Interfaces, June 23, 2008 - June 26, 2008 Cavtat/Dubrovnik, Croatia 2008 [26]
This paper presents the generic method for multiword expression extraction from Wikipedia. The method is using the properties of this specific encyclopedic genre in its {HTML} format and it relies on the intention of the authors of articles to link to other articles. The relevant links were processed by applying local regular grammars within the NooJ} development environment. We tested the method on a Croatian version of Wikipedia and we present the results obtained.
Berkner, Kathrin WikiPrints - Rendering enterprise wiki content for printing Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV, January 19, 2010 - January 21, 2010 San Jose, CA, United states 2010 [27]
Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are often used as project management tools and contain several closely related pages authored by members of one project. In that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview of the linearized content is shown to the user and an interface for making manual formatting changes provided. 2010 Copyright SPIE} - The International Society for Optical Engineering.
Bhn, Christian & Nrvag, Kjetil Extracting named entities and synonyms from Wikipedia 24th IEEE International Conference on Advanced Information Networking and Applications, AINA2010, April 20, 2010 - April 23, 2010 Perth, WA, Australia 2010 [28]
In many search domains, both contents and searches are frequently tied to named entities such as a person, a company or similar. An example of such a domain is a news archive. One challenge from an information retrieval point of view is that a single entity can have more than one way of referring to it. In this paper we describe how to use Wikipedia contents to automatically generate a dictionary of named entities and synonyms that are all referring to the same entity. This dictionary can subsequently be used to improve search quality, for example using query expansion. Through an experimental evaluation we show that with our approach, we can find named entities and their synonyms with a high degree of accuracy. ""
Bischoff, Andreas The pediaphon - Speech interface to the free wikipedia encyclopedia for mobile phones, PDA's and MP3-players DEXA 2007 18th International Workshop on Database and Expert Systems Applications, September 3, 2007 - September 7, 2007 Regensburg, Germany 2007 [29]
This paper presents an approach to generate audio based learning material dynamically from Wikipedia articles for M-Learning} and ubiquitous access. It introduces the so called {'Pediaphon', an speech interface to the free Wikipedia online encyclopedia as an example application for 'microlearning'. The effective generation and the deployment of the audio data to the user via podcast or progressive download (pseudo streaming) are covered. A convenient cell phone interface to the Wikipedia content, which is usable with every mobile phone will be introduced. ""
Biuk-Aghai, Robert P. Visualizing co-authorship networks in online Wikipedia 2006 International Symposium on Communications and Information Technologies, ISCIT, October 18, 2006 - October 20, 2006 Bangkok, Thailand 2006 [30]
The Wikipedia online user-contributed encyclopedia has rapidly become a highly popular and widely used online reference source. However, perceiving the complex relationships in the network of articles and other entities in Wikipedia is far from easy. We introduce the notion of using co-authorship of articles to determine relationship between articles, and present the WikiVis} information visualization system which visualizes this and other types of relationships in the Wikipedia database in {3D} graph form. A {3D} star layout and a {3D} nested cone tree layout are presented for displaying relationships between entities and between categories, respectively. A novel {3D} pinboard layout is presented for displaying search results. ""
Biuk-Aghai, Robert P.; Tang, Libby Veng-Sam; Fong, Simon & Si, Yain-Whar Wikis as digital ecosystems: An analysis based on authorship 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey 2009 [31]
Wikis, best represented by the popular and highly successful Wikipedia system, have established themselves as important components of a collaboration infrastructure. We suggest that the complex network of user-contributors In volunteer-contributed wikis constitutes a digital ecosystem that bears all the characteristics typical of such systems. This paper presents an analysis supporting this notion based on significance of authorship within the wiki. Our findings confirm the hypothesis that large volunteer-contributed wikis are digital ecosystems, and thus that the findings from the digital ecosystems research stream are applicable to this type of system. ""
Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard Peer vote: A decentralized voting mechanism for P2P collaboration systems 3rd International Conference on Autonomous Infrastructure, Management and Security, AIMS 2009, June 30, 2009 - July 2, 2009 Enschede, Netherlands 2009 [32]
Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers. 2009 IFIP} International Federation for Information Processing.
Bohm, Christoph; Naumann, Felix; Abedjan, Ziawasch; Fenz, Dandy; Grutze, Toni; Hefenbrock, Daniel; Pohl, Matthias & Sonnabend, David Profiling linked open data with ProLOD 2010 IEEE 26th International Conference on Data Engineering Workshops, ICDEW 2010, March 1, 2010 - March 6, 2010 Long Beach, CA, United states 2010 [33]
Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD} sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset. ""
Boselli, Roberto; Cesarini, Mirko & Mezzanzanica, Mario Customer knowledge and service development, the Web 2.0 role in co-production Proceedings of World Academy of Science, Engineering and Technology 2009
The paper is concerned with relationships between SSME} and ICTs} and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value co-production. The core idea of the paper is the centrality of user participation and the collaborative technologies as enabling factors; Wikipedia is analyzed as an example. The result of such analysis is the identification and description of a pattern characterising specific services in which users collaborate by means of web tools with value co-producers during the service process. The pattern of collaborative co-production concerning several categories of services including knowledge based services is then discussed.
Bouma, Gosse; Kloosterman, Geert; Mur, Jori; Noord, Gertjan Van; Plas, Lonneke Van Der & Tiedemann, Jorg Question answering with joost at CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [34]
We describe our system for the monolingual Dutch and multilingual English to Dutch QA} tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5\% accuracy for the Dutch monolingual task, and 13.5\% accuracy for the multilingual task. 2008 Springer-Verlag} Berlin Heidelberg.
Brandes, Ulrik & Lerner, Jurgen Visual analysis of controversy in user-generated encyclopedias Houndmills, Basingstoke, Hants., RG21} {6XS, United Kingdom 2008 [35]
Wikipedia is a large and rapidly growing Web-based collaborative authoring environment, where anyone on the Internet can create, modify, and delete pages about encyclopedic topics. A remarkable property of some Wikipedia pages is that they are written by up to thousands of authors who may have contradicting opinions. In this paper, we show that a visual analysis of the who revises whom-network gives deep insight into controversies. We propose a set of analysis and visualization techniques that reveal the dominant authors of a page, the roles they play, and the alters they confront. Thereby we provide tools to understand how Wikipedia authors collaborate in the presence of controversy. 2008 PalgraveMacmillan} Ltd. All rights reserved.
Bryant, Susan L.; Forte, Andrea & Bruckman, Amy Becoming Wikipedian: Transformation of participation in a collaborative online encyclopedia 2005 International ACM SIGGROUP Conference on Supporting Group Work, GROUP'05, November 6, 2005 - November 9, 2005 Sanibel Island, FL, United states 2005 [36]
Traditional activities change in surprising ways when computer-mediated communication becomes a component of the activity system. In this descriptive study, we leverage two perspectives on social activity to understand the experiences of individuals who became active collaborators in Wikipedia, a prolific, cooperatively-authored online encyclopedia. Legitimate peripheral participation provides a lens for understanding participation in a community as an adaptable process that evolves over time. We use ideas from activity theory as a framework to describe our results. Finally, we describe how activity on the Wikipedia stands in striking contrast to traditional publishing and suggests a new paradigm for collaborative systems. ""
Butler, Brian; Joyce, Elisabeth & Pike, Jacqueline Don't look now, but we've created a bureaucracy: The nature and roles of policies and rules in Wikipedia 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy 2008 [37]
Wikis are sites that support the development of emergent, collective infrastructures that are highly flexible and open, suggesting that the systems that use them will be egalitarian, free, and unstructured. Yet it is apparent that the flexible infrastructure of wikis allows the development and deployment of a wide range of structures. However, we find that the policies in Wikipedia and the systems and mechanisms that operate around them are multi-faceted. In this descriptive study, we draw on prior work on rules and policies in organizations to propose and apply a conceptual framework for understanding the natures and roles of policies in wikis. We conclude that wikis are capable of supporting a broader range of structures and activities than other collaborative platforms. Wikis allow for and, in fact, facilitate the creation of policies that serve a wide variety of functions. ""
Buzzi, Marina & Leporini, Barbara Is Wikipedia usable for the blind? W4A'08: 2008 International Cross-Disciplinary Conference on Web Accessibility, W4A, Apr 21 - 22 2008 Beijing, China 2008 [38]
Today wikis are becoming increasingly widespread, and offer great benefits in a variety of collaborative environments. Therefore, to be universally valuable, wiki systems should be easy to use for anyone, regardless of ability. This paper describes obstacles that a blind user may encounter when interacting via screen reader with Wikipedia, and offers some suggestions for improving usability. ""
Buzzi, M.Claudia; Buzzi, Marina; Leporini, Barbara & Senette, Caterina Making wikipedia editing easier for the blind NordiCHI 2008: Building Bridges - 5th Nordic Conference on Human-Computer Interaction, October 20, 2008 - October 22, 2008 Lund, Sweden 2008 [39]
A key feature of Web 2.0 is the possibility of sharing, creating and editing on-line content. This approach is increasingly used in learning environments to favor interaction and cooperation among students. These functions should be accessible as well as easy to use for all participants. Unfortunately accessibility and usability issues still exist for Web 2.0-based applications. For instance, Wikipedia presents many difficulties for the blind. In this paper we discuss a possible solution for simplifying the Wikipedia editing page when interacting via screen reader. Building an editing interface that conforms to W3C} ARIA} (Accessible} Rich Internet Applications) recommendations would overcome ccessibility and usability problems that prevent blind users from actively contributing to Wikipedia. ""
Byna, Surendra; Meng, Jiayuan; Raghunathan, Anand; Chakradhar, Srimat & Cadambi, Srihari Best-effort semantic document search on GPUs 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, Held in cooperation with ACM ASPLOS XV, March 14, 2010 - March 14, 2010 Pittsburg, PA, United states 2010 [40]
Semantic indexing is a popular technique used to access and organize large amounts of unstructured text data. We describe an optimized implementation of semantic indexing and document search on manycore GPU} platforms. We observed that a parallel implementation of semantic indexing on a 128-core Tesla C870 GPU} is only {2.4X} faster than a sequential implementation on an Intel Xeon {2.4GHz} processor. We ascribe the less than spectacular speedup to a mismatch in the workload characteristics of semantic indexing and the unique architectural features of GPUs.} Compared to the regular numerical computations that have been ported to GPUs} with great success, our semantic indexing algorithm (the recently proposed Supervised Semantic Indexing algorithm called SSI) has interesting characteristics - the amount of parallelism in each training instance is data-dependent, and each iteration involves the product of a dense matrix with a sparse vector, resulting in random memory access patterns. As a result, we observed that the baseline GPU} implementation significantly under-utilizes the hardware resources (processing elements and memory bandwidth) of the GPU} platform. However, the SSI} algorithm also demonstrates unique characteristics, which we collectively refer to as the forgiving nature" of the algorithm. These unique characteristics allow for novel optimizations that do not strive to preserve numerical equivalence of each training iteration with the sequential implementation. In particular we consider best-effort computing techniques such as dependency relaxation and computation dropping to suitably alter the workload characteristics of SSI} to leverage the unique architectural features of the GPU.} We also show that the realization of dependency relaxation and computation dropping concepts on a GPU} is quite different from how one would implement these concepts on a multicore CPU} largely due to the distinct architectural features supported by a GPU.} Our new techniques dramatically enhance the amount of parallel workload leading to much higher performance on the GPU.} By optimizing data transfers between CPU} and GPU} and by reducing GPU} kernel invocation overheads we achieve further performance gains. We evaluated our new GPU-accelerated} implementation of semantic document search on a database of over 1.8 million documents from Wikipedia. By applying our novel performance-enhancing strategies our GPU} implementation on a 128-core Tesla C870 achieved a {5.5X} acceleration as compared to a baseline parallel implementation on the same GPU.} Compared to a baseline parallel TBB} implementation on a dual-socket quad-core Intel Xeon multicore CPU} (8-cores) the enhanced GPU} implementation is {11X} faster. Compared to a parallel implementation on the same multi-core CPU} that also uses data dependency relaxation and dropping computation techniques our enhanced GPU} implementation is {5X} faster. """
Cabral, Luis Miguel; Costa, Luis Fernando & Santos, Diana What Happened to Esfinge in 2007? 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [41]
Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA.} 2008 Springer-Verlag} Berlin Heidelberg.
Calefato, Caterina; Vernero, Fabiana & Montanari, Roberto Wikipedia as an example of positive technology: How to promote knowledge sharing and collaboration with a persuasive tutorial 2009 2nd Conference on Human System Interactions, HSI '09, May 21, 2009 - May 23, 2009 Catania, Italy 2009 [42]
This paper proposes an improved redesign of Wikipedia Tutorial following Fogg's persuasive concept. Wikipedia international project aims at being the biggest online and free encyclopedia. It can be considered a persuasive tool which tries to motivate people to collaborate for the development of a shared knowledge corpus, following a specific policy of behavior.
Chahine, C.Abi.; Chaignaud, N.; Kotowicz, J.P. & Pecuchet, J.P. Context and keyword extraction in plain text using a graph representation 4th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2008, November 30, 2008 - December 3, 2008 Bali, Indonesia 2008 [43]
Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources. ""
Chandramouli, K.; Kliegr, T.; Nemrava, J.; Svatek, V. & Izquierdo, E. Query refinement and user relevance feedback for contextualized image retrieval 5th International Conference on Visual Information Engineering, VIE 2008, July 29, 2008 - August 1, 2008 Xi'an, China 2008 [44]
The motivation of this paper is to enhance the user perceived precision of results of content based information retrieval (CBIR) systems with query refinement (QR), visual analysis (VA) and relevance feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space} CBIR} system. The QR} module discovers hypernyms for the given query from a free text corpus (such as Wikipedia) and uses these hypernyms as refinements for the original query. Extracting hypernyms from Wikipedia makes it possible to apply query refinement to more queries than in related approaches that use static predefined thesaurus such as Wordnet. The VA} Module uses the K-Means} algorithm for clustering the images based on low-level MPEG} - 7 Visual features. The RF} Module uses the preference information expressed by the user to build user profiles by applying SOM-} based supervised classification, which is further optimized by a hybrid Particle Swarm Optimization (PSO) algorithm. The experiments evaluating the performance of QR} and VA} modules show promising results. 2008 The Institution of Engineering and Technology.
Chandramouli, K.; Kliegr, T.; Svatek, V. & Izquierdo, E. Towards semantic tagging in collaborative environments DSP 2009:16th International Conference on Digital Signal Processing, July 5, 2009 - July 7, 2009 Santorini, Greece 2009 [45]
Tags pose an efficient and effective way of organization of resources, but they are not always available. A technique called SCM/THD} investigated in this paper extracts entities from free-text annotations, and using the Lin similarity measure over the WordNet} thesaurus classifies them into a controlled vocabulary of tags. Hypernyms extracted from Wikipedia are used to map uncommon entities to Wordnet synsets. In collaborative environments, users can assign multiple annotations to the same object hence increasing the amount of information available. Assuming that the semantics of the annotations overlap, this redundancy can be exploited to generate higher quality tags. A preliminary experiment presented in the paper evaluates the consistency and quality of tags generated from multiple annotations of the same image. The results obtained on an experimental dataset comprising of 62 annotations from four annotators show that the accuracy of a simple majority vote surpasses the average accuracy obtained through assessing the annotations individually by 18\%. A moderate-strength correlation has been found between the quality of generated tags and the consistency of annotations. ""
Chatterjee, Madhumita; Sivakumar, G. & Menezes, Bernard Dynamic policy based model for trust based access control in P2P applications 2009 IEEE International Conference on Communications, ICC 2009, June 14, 2009 - June 18, 2009 Dresden, Germany 2009 [46]
Dynamic self-organizing groups like wikipedia, and f/oss have special security requirements not addressed by typical access control mechanisms. An example is the ability to collaboratively modify access control policies based on the evolution of the group and trust and behavior levels. In this paper we propose a new framework for dynamic multi-level access control policies based on trust and reputation. The framework has interesting features wherein the group can switch between policies over time, influenced by the system's state or environment. Based on the behavior and trust level of peers in the group and the current group composition, it is possible for peers to collaboratively modify policies such as join, update and job allocation. We have modeled the framework using the declarative language Prolog. We also performed some simulations to illustrate the features of our framework. ""
Chen, Jian; Shtykh, Roman Y. & Jin, Qun A web recommender system based on dynamic sampling of user information access behaviors IEEE 9th International Conference on Computer and Information Technology, CIT 2009, October 11, 2009 - October 14, 2009 Xiamen, China 2009 [47]
In this study, we propose a Gradual Adaption Model for a Web recommender system. This model is used to track users' focus of interests and its transition by analyzing their information access behaviors, and recommend appropriate information. A set of concept classes are extracted from Wikipedia. The pages accessed by users are classified by the concept classes, and grouped into three terms of short, medium and long periods, and two categories of remarkable and exceptional for each concept class, which are used to describe users' focus of interests, and to establish reuse probability of each concept class in each term for each user by Full Bayesian Estimation as well. According to the reuse probability and period, the information that a user is likely to be interested in is recommended. In this paper, we propose a new approach by which short and medium periods are determined based on dynamic sampling of user information access behaviors. We further present experimental simulation results, and show the validity and effectiveness of the proposed system. ""
Chen, Qing; Shipper, Timothy & Khan, Latifur Tweets mining using Wikipedia and impurity cluster measurement 2010 IEEE International Conference on Intelligence and Security Informatics: Public Safety and Security, ISI 2010, May 23, 2010 - May 26, 2010 Vancouver, BC, Canada 2010 [48]
Twitter is one of the fastest growing online social networking services. Tweets can be categorized into trends, and are related with tags and follower/following social relationships. The categorization is neither accurate nor effective due to the short length of tweet messages and noisy data corpus. In this paper, we attempt to overcome these challenges with an extended feature vector along with a semi-supervised clustering technique. In order to achieve this goal, the training set is expanded with Wikipedia topic search result, and the feature set is extended. When building the clustering model and doing the classification, impurity measurement is introduced into our classifier platform. Our experiment results show that the proposed techniques outperform other classifiers with reasonable precision and recall. ""
Chen, Scott Deeann; Monga, Vishal & Moulin, Pierre Meta-classifiers for multimodal document classification 2009 IEEE International Workshop on Multimedia Signal Processing, MMSP '09, October 5, 2009 - October 7, 2009 Rio De Janeiro, Brazil 2009 [49]
This paper proposes learning algorithms for the problem of multimodal document classification. Specifically, we develop classifiers that automatically assign documents to categories by exploiting features from both text as well as image content. In particular, we use meta-classifiers that combine state-of-the-art text and image based classifiers into making joint decisions. The two meta classifiers we choose are based on support vector machines and Adaboost. Experiments on real-world databases from Wikipedia demonstrate the benefits of a joint exploitation of these modalities. ""
Chevalier, Fanny; Huot, Stephane & Fekete, Jean-Daniel WikipediaViz: Conveying article quality for casual wikipedia readers IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan 2010 [50]
As Wikipedia has become one of the most used knowledge bases worldwide, the problem of the trustworthiness of the information it disseminates becomes central. With WikipediaViz, we introduce five visual indicators integrated to the Wikipedia layout that can keep casual Wikipedia readers aware of important meta-information about the articles they read. The design of WikipediaViz} was inspired by two participatory design sessions with expert Wikipedia writers and sociologists who explained the clues they used to quickly assess the trustworthiness of articles. According to these results, we propose five metrics for Maturity and Quality assessment OfWikipedia} articles and their accompanying visualizations to provide the readers with important clues about the editing process at a glance. We also report and discuss about the results of the user studies we conducted. Two preliminary pilot studies show that all our subjects trust Wikipedia articles almost blindly. With the third study, we show that WikipediaViz} significantly reduces the time required to assess the quality of articles while maintaining a good accuracy.
Chidlovskii, Boris Multi-label wikipedia classification with textual and link features 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [51]
We address the problem of categorizing a large set of linked documents with important content and structure aspects, in particular, from the Wikipedia collection proposed at the INEX} 2009 XML} Mining challenge. We analyze the network of collection pages and turn it into valuable features for the classification. We combine the content-based and link-based features of pages to train an accurate categorizer for unlabelled pages. In the multi-label setting, we revise a number of existing techniques and test some which show a good scalability. We report evaluation results obtained with a variety of learning methods and techniques on the training set of the Wikipedia corpus. 2010 Springer-Verlag} Berlin Heidelberg.
Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini & Eichmann, David Detecting wikipedia vandalism with active learning and statistical language models 4th Workshop on Information Credibility on the Web, WICOW'10, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [52]
This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection. ""
Choubassi, Maha El; Nestares, Oscar; Wu, Yi; Kozintsev, Igor & Haussecker, Horst An augmented reality tourist guide on your mobile devices 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [53]
We present an augmented reality tourist guide on mobile devices. Many of latest mobile devices contain cameras, location, orientation and motion sensors. We demonstrate how these devices can be used to bring tourism information to users in a much more immersive manner than traditional text or maps. Our system uses a combination of camera, location and orientation sensors to augment live camera view on a device with the available information about the objects in the view. The augmenting information is obtained by matching a camera image to images in a database on a server that have geotags in the vicinity of the user location. We use a subset of geotagged English Wikipedia pages as the main source of images and augmenting text information. At the time of publication our database contained 50 K pages with more than 150 K images linked to them. A combination of motion estimation algorithms and orientation sensors is used to track objects of interest in the live camera view and place augmented information on top of them. 2010 Springer-Verlag} Berlin Heidelberg.
Ciglan, Marek; Rivierez, Etienne & Nrvag, Kjetil Learning to find interesting connections in Wikipedia 12th International Asia Pacific Web Conference, APWeb 2010, April 6, 2010 - April 8, 2010 Busan, Republic of Korea 2010 [54]
To help users answer the question, what is the relation between (real world) entities or concepts, we might need to go well beyond the borders of traditional information retrieval systems. In this paper, we explore the possibility of exploiting the Wikipedia link graph as a knowledge base for finding interesting connections between two or more given concepts, described by Wikipedia articles. We use a modified Spreading Activation algorithm to identify connections between input concepts. The main challenge in our approach lies in assessing the strength of a relation defined by a link between articles. We propose two approaches for link weighting and evaluate their results with a user evaluation. Our results show a strong correlation between used weighting methods and user preferences; results indicate that the Wikipedia link graph can be used as valuable semantic resource. ""
Conde, Tiago; Marcelino, Luis & Fonseca, Benjamim Implementing a system for collaborative search of local services 14th International Workshop of Groupware, CRIWG 2008, September 14, 2008 - September 18, 2008 Omaha, NE, United states 2008 [55]
The internet in the last few years has changed the way people interact with each other. In the past, users were just passive actors, consuming the information available on the web. Nowadays, their behavior is the opposite. With the so-called web 2.0, internet users became active agents and are now responsible for the creation of the content in web sites like MySpace, Wikipedia, YouTube, Yahoo! Answers and many more. Likewise, the way people buy a product or service has changed considerably. Thousands of online communities have been created on the internet, where users can share opinions and ideas about an electronic device, a medical service or a restaurant. An increasing number of consumers use this kind of online communities as information source before buying a product or service. This article describes a web system with the goal of creating an online community, where users could share their knowledge about local services, writing reviews and answering questions made by other members of the community regarding those services. The system will provide means for synchronous and asynchronous communication between users so that they can share their knowledge more easily. 2008 Springer Berlin Heidelberg.
Congle, Zhang & Dikan, Xing Knowledge-supervised learning by co-clustering based approach 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states 2008 [56]
Traditional text learning algorithms need labeled documents to supervise the learning process, but labeling documents of a specific class is often expensive and time consuming. We observe it is convenient to use some keywords(i.e. class-descriptions) to describe class sometimes. However, short class-description usually does not contain enough information to guide classification. Fortunately, large amount of public data is easily acquired, i.e. ODP, Wikipedia and so on, which contains enormous knowledge. In this paper, we address the text classification problem with such knowledge rather than any labeled documents and propose a co-clustering based knowledge-supervised learning algorithm (CoCKSL) in information theoretic framework, which effectively applies the knowledge to classification tasks. ""
Cotta, Carlos Keeping the ball rolling: Teaching strategies using Wikipedia: An argument in favor of its use in computer science courses 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010
Edition of Wikipedia articles has been recently proposed as a learning assignment. I argue it ideally suits Computer Science courses, due to the intrinsic mathematical nature of the concepts and structures considered in this field. It also provides benefits in terms of autonomous research and team-working, as well as a valuable legacy for future years' students. This view is supported by a two-year experience in sophomore programming subjects in the University of Malaga, Spain.
Craswell, Nick; Demartini, Gianluca; Gaugaz, Julien & Iofciu, Tereza L3S at INEX 2008: Retrieving entities using structured information 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [57]
Entity Ranking is a recently emerging search task in Information Retrieval. In Entity Ranking the goal is not finding documents matching the query words, but instead finding entities which match those requested in the query. In this paper we focus on the Wikipedia corpus, interpreting it as a set of entities and propose algorithms for finding entities based on their structured representation for three different search tasks: entity ranking, list completion, and entity relation search. The main contribution is a methodology for indexing entities using a structured representation. Our approach focuses on creating an index of facts about entities for the different search tasks. More, we use the category structure information for improving the effectiveness of the List Completion task. 2009 Springer Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Bapat, Salil; Mehta, Sarika & Paranjape, Darshan Finding good elements for focused retrieval 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [58]
This paper describes the integration of our methodology for the dynamic retrieval of XML} elements [2] with traditional article retrieval to facilitate the Focused and the Relevant-in-Context} Tasks of the INEX} 2008 Ad Hoc Track. The particular problems that arise for dynamic element retrieval in working with text containing both tagged and untagged elements have been solved [3]. The current challenge involves utilizing its ability to produce a rank-ordered list of elements in the context of focused retrieval. Our system is based on the Vector Space Model [8]; basic functions are performed using the Smart experimental retrieval system [7]. Experimental results are reported for the Focused, Relevant-in-Context, and Best-in-Context} Tasks of both the 2007 and 2008 INEX} Ad Hoc Tracks. These results indicate that the goal of our 2008 investigations-namely, finding good focused elements in the context of the Wikipedia collection-has been achieved. 2009 Springer Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Bhirud, Dinesh; Poluri, Pavan; Polumetla, Chaitanya & Sudhakar, Varun A methodology for producing improved focused elements 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [59]
This paper reports the results of our experiments to consistently produce highly ranked focused elements in response to the Focused Task of the INEX} Ad Hoc Track. The results of these experiments, performed using the 2008 INEX} collection, confirm that our current methodology (described herein) produces such elements for this collection. Our goal for 2009 is to apply this methodology to the new, extended 2009 INEX} collection to determine its viability in this environment. (These} experiments are currently underway.) Our system uses our method for dynamic element retrieval [4], working with the semi-structured text of Wikipedia [5], to produce a rank-ordered list of elements in the context of focused retrieval. It is based on the Vector Space Model [15]; basic functions are performed using the Smart experimental retrieval system [14]. Experimental results are reported for the Focused Task of both the 2008 and 2009 INEX} Ad Hoc Tracks. 2010 Springer-Verlag} Berlin Heidelberg.
Crouch, Carolyn J.; Crouch, Donald B.; Kamat, Nachiket; Malik, Vikram & Mone, Aditya Dynamic element retrieval in the wikipedia collection 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [60]
This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML} elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong Automatic acquisition of attributes for ontology construction 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong 2009 [61]
An ontology can be seen as an organized structure of concepts according to their relations. A concept is associated with a set of attributes that themselves are also concepts in the ontology. Consequently, ontology construction is the acquisition of concepts and their associated attributes through relations. Manual ontology construction is time-consuming and difficult to maintain. Corpus-based ontology construction methods must be able to distinguish concepts themselves from concept instances. In this paper, a novel and simple method is proposed for automatically identifying concept attributes through the use of Wikipedia as the corpus. The built-in Infobox in Wiki is used to acquire concept attributes and identify semantic types of the attributes. Two simple induction rules are applied to improve the performance. Experimental results show precisions of 92.5\% for attribute acquisition and 80\% for attribute type identification. This is a very promising result for automatic ontology construction. 2009 Springer Berlin Heidelberg.
Curino, Carlo A.; Moon, Hyun J.; Tanca, Letizia & Zaniolo, Carlo Schema evolution in wikipedia - Toward a web Information system benchmark ICEIS 2008 - 10th International Conference on Enterprise Information Systems, June 12, 2008 - June 16, 2008 Barcelona, Spain 2008
Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki.} Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.
Dalip, Daniel Hasan; Goncalves, Marcos Andre; Cristo, Marco & Calado, Pavel Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states 2009 [62]
The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction. ""
Darwish, Kareem CMIC@INEX 2008: Link-the-wiki track 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [63]
This paper describes the runs that I submitted to the INEX} 2008 Link-the-Wiki} track. I participated in the incoming File-to-File} and the outgoing Anchor-to-BEP} tasks. For the File-to-File} task I used a generic IR} engine and constructed queries based on the title, keywords, and keyphrases of the Wikipedia article. My runs performed well for this task achieving the highest precision for low recall levels. Further post-hoc experiments showed that constructing queries using titles only produced even better results than the official submissions. For the Anchor-to-BEP} task, I used a keyphrase extraction engine developed in-house and I filtered the keyphrases using existing Wikipedia titles. Unfortunately, my runs performed poorly compared to those of other groups. I suspect that this was the result of using many phrases that were not central to articles as anchors. 2009 Springer Berlin Heidelberg.
Das, Sanmay & Magdon-Ismail, Malik Collective wisdom: Information growth in wikis and blogs 11th ACM Conference on Electronic Commerce, EC'10, June 7, 2010 - June 11, 2010 Cambridge, MA, United states 2010 [64]
Wikis and blogs have become enormously successful media for collaborative information creation. Articles and posts accrue information through the asynchronous editing of users who arrive both seeking information and possibly able to contribute information. Most articles stabilize to high quality, trusted sources of information representing the collective wisdom of all the users who edited the article. We propose a model for information growth which relies on two main observations: (i) as an article's quality improves, it attracts visitors at a faster rate (a rich get richer phenomenon); and, simultaneously, (ii) the chances that a new visitor will improve the article drops (there is only so much that can be said about a particular topic). Our model is able to reproduce many features of the edit dynamics observed on Wikipedia and on blogs collected from LiveJournal;} in particular, it captures the observed rise in the edit rate, followed by 1/t decay. ""
Demartini, Gianluca; Firan, Claudiu S. & Iofciu, Tereza L3S at INEX 2007: Query expansion for entity ranking using a highly accurate ontology 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [65]
Entity ranking on Web scale datasets is still an open challenge. Several resources, as for example Wikipedia-based ontologies, can be used to improve the quality of the entity ranking produced by a system. In this paper we focus on the Wikipedia corpus and propose algorithms for finding entities based on query relaxation using category information. The main contribution is a methodology for expanding the user query by exploiting the semantic structure of the dataset. Our approach focuses on constructing queries using not only keywords from the topic, but also information about relevant categories. This is done leveraging on a highly accurate ontology which is matched to the character strings of the topic. The evaluation is performed using the INEX} 2007 Wikipedia collection and entity ranking topics. The results show that our approach performs effectively, especially for early precision metrics. 2008 Springer-Verlag} Berlin Heidelberg.
Demartini, Gianluca; Firan, Claudiu S.; Iofciu, Tereza; Krestel, Ralf & Nejdl, Wolfgang A model for Ranking entities and its application to Wikipedia Latin American Web Conference, LA-WEB 2008, October 28, 2008 - October 30, 2008 Vila Velha, Espirito Santo, Brazil 2008 [66]
Entity Ranking (ER) is a recently emerging search task in Information Retrieval, where the goal is not finding documents matching the query words, but instead finding entities which match types and attributes mentioned in the query. In this paper we propose a formal model to define entities as well as a complete ER} system, providing examples of its application to enterprise, Web, and Wikipedia scenarios. Since searching for entities on Web scale repositories is an open challenge as the effectiveness of ranking is usually not satisfactory, we present a set of algorithms based on our model and evaluate their retrieval effectiveness. The results show that combining simple Link Analysis, Natural Language Processing, and Named Entity Recognition methods improves retrieval performance of entity search by over 53\% for P@ 10 and 35\% for MAP.} ""
Demartini, Gianluca; Iofciu, Tereza & Vries, Arjen P. De Overview of the INEX 2009 entity ranking track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [67]
In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include Italian} Nobel prize winners" {"Formula} 1 drivers that won the Monaco Grand Prix" or {"German} spoken Swiss cantons". The XML} Entity Ranking (XER) track at INEX} creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER} tasks and the evaluation procedure used at the XER} track in 2009 where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants. 2010 Springer-Verlag} Berlin Heidelberg."
Demidova, Elena; Oelze, Irina & Fankhauser, Peter Do we mean the same? Disambiguation of extracted keyword queries for database search 1st International Workshop on Keyword Search on Structured Data, KEYS '09, June 28, 2009 - June 28, 2009 Providence, RI, United states 2009 [68]
Users often try to accumulate information on a topic of interest from multiple information sources. In this case a user's informational need might be expressed in terms of an available relevant document, e.g. a web-page or an e-mail attachment, rather than a query. Database search engines are mostly adapted to the queries manually created by the users. In case a user's informational need is expressed in terms of a document, we need algorithms that map keyword queries automatically extracted from this document to the database content. In this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries. Our evaluation is performed using a set of user queries from the AOL} query log and a set of queries automatically extracted from Wikipedia articles both executed against the Internet Movie Database (IMDB).} Our experimental results show that (1) knowledge of the document context is crucial in order to extract meaningful keyword queries; (2) statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests. ""
Denoyer, Ludovic & Gallinari, Patrick Overview of the INEX 2008 XML mining track categorization and clustering of XML documents in a graph of documents 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [69]
We describe here the XML} Mining Track at INEX} 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML} documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML} documents and also the link information between documents. 2009 Springer Berlin Heidelberg.
Denoyer, Ludovic & Gallinari, Patrick Machine learning for semi-structured multimedia documents: Application to pornographic filtering and thematic categorization Machine Learning Techniques for Multimedia - Case Studies on Organization and Retrieval Tiergartenstrasse 17, Heidelberg, D-69121, Germany 2008
We propose a generative statistical model for the classification of semi-structured multimedia documents. Its main originality is its ability to simultaneously take into account the structural and the content information present in a semi-structured document and also to cope with different types of content (text, image, etc.). We then present the results obtained on two sets of experiments: • One set concerns the filtering of pornographic Web pages • The second one concerns the thematic classification of Wikipedia documents. 2008 Springer-Verlag} Berlin Heidelberg.
Deshpande, Smita & Betke, Margrit RefLink: An interface that enables people with motion impairments to analyze web content and dynamically link to references 9th International Workshop on Pattern Recognition in Information Systems - PRIS 2009 In Conjunction with ICEIS 2009, May 6, 2009 - May 7, 2009 Milan, Italy 2009
In this paper, we present RefLink, an interface that allows users to analyze the content of web page by dynamically linking to an online encyclopedia such as Wikipedia. Upon opening a webpage, RefLink instantly provides a list of terms extracted from the webpage and annotates each term by the number of its occurrences in the page. RefLink} uses the text-to-speech interface to read out the list of terms. The user can select a term of interest and follow its link to the encyclopedia. RefLink} thus helps the users to perform an informed and efficient contextual analysis. Initial user testing suggests that RefLink} is a valuable web browsing tool, in particular for people with motion impairments, because it greatly simplifies the process of obtaining reference material and performing contextual analysis.
Dopichaj, Philipp; Skusa, Andre & He, Andreas Stealing anchors to link the wiki 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [70]
This paper describes the Link-the-Wiki} submission of Lycos Europe. We try to learn suitable anchor texts by looking at the anchor texts the Wikipedia authors used. Disambiguation is done by using textual similarity and also by checking whether a set of link targets makes sense" together. 2009 Springer Berlin Heidelberg."
Doyle, Richard & Devon, Richard Teaching process for technological literacy: The case of nanotechnology and global open source pedagogy 2010 ASEE Annual Conference and Exposition, June 20, 2010 - June 23, 2010 Louisville, KY, United states 2010
In this paper we propose approaching the concern addressed by the technology literacy movement by using process design rather than product design. Rather than requiring people to know an impossible amount about technology, we suggest that we can teach process for understanding and making decisions about any technology. This process can be applied to new problems and new contexts that emerge from the continuous innovation and transformation of technology markets. Such a process offers a strategy for planning for and abiding the uncertainty intrinsic to the development of modern science and technology. We teach students from diverse backgrounds in an NSF} funded course on the social, human, and ethical (SHE) impacts of nanotechnology. The process we will describe is global open source collective intelligence (GOSSIP).} This paper traces out some the principles of GOSSIP} through the example of a course taught to a mixture of engineers and students from the Arts and the Humanities. Open source is obviously a powerful method: witness the development of Linux, and GNU} before that, and the extraordinary success of Wikipedia. Democratic, and hence diverse, information flows have been suggested as vital to sustaining a healthy company. American Society for Engineering Education, 2010.
Dupen, Barry Using internet sources to solve materials homework assignments 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states 2008
Materials professors commonly ask homework questions derived from textbook readings, only to have students find the answers faster using internet resources such as Wikipedia or Google. While we hope students will actually read their textbooks, we can take advantage of student internet use to teach materials concepts. After graduation, these engineers will use the internet as a resource in their jobs, so it makes sense to use the internet in classroom exercises too. This paper discusses several materials homework assignments requiring internet research, and a few which require the textbook. Students learn that some answers are very difficult to find, and that accuracy is not guaranteed. Students also learn how materials data affect design, economics, and public policy. American Society for Engineering Education, 2008.
Edwards, Lilian Content filtering and the new censorship 4th International Conference on Digital Society, ICDS 2010, Includes CYBERLAWS 2010: 1st International Conference on Technical and Legal Aspects of the e-Society, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands 2010 [71]
Since the famous Time magazine cover of 1995, nation states have been struggling to control access to adult and illegal material on the Internet. In recent years, strategies for such control have shifted from the use of traditional policing-largely ineffective in a transnational medium - to the use of take down and especially filtering applied by ISPs} enrolled as privatized censors" by the state. The role of the IWF} in the UK} has become a pivotal case study of how state and private interests have interacted to produce effective but non transparent and non accountable censorship even in a Western democracy. The IWF's} role has recently been significantly questioned after a stand-off with Wikipedia in December 2008. This paper will set the IWF's} recent acts in the context of a massive increase in global filtering of Internet content and suggest the creation of a Speech Impact Assessment process which might inhibit the growth of unchecked censorship. """
Elmqvis, Niklas; Do, Thanh-Nghi; Goodell, Howard; Henry, Nathalie & Fekete, Jean-Daniel ZAME: Interactive large-scale graph visualization 2008 Pacific Visualization Symposium, PacificVis 2008, March 4, 2008 - March 7, 2008 Kyoto, Japan 2008 [72]
We present the Zoomable Adjacency Matrix Explorer (ZAME), a visualization tool for exploring graphs at a scale of millions of nodes and edges. ZAME} is based on an adjacency matrix graph representation aggregated at multiple scales. It allows analysts to explore a graph at many levels, zooming and panning with interactive performance from an overview to the most detailed views. Several components work together in the ZAME} tool to make this possible. Efficient matrix ordering algorithms group related elements. Individual data cases are aggregated into higher-order meta-representations. Aggregates are arranged into a pyramid hierarchy that allows for on-demand paging to GPU} shader programs to support smooth multiscale browsing. Using ZAME, we are able to explore the entire French. Wikipedia - over 500,000 articles and 6,000,000 links - with interactive performance on standard consumer-level computer hardware. ""
Fachry, Khairun Nisa; Kamps, Jaap; Koolen, Marijn & Zhang, Junte Using and detecting links in Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [73]
In this paper, we document our efforts at INEX} 2007 where we participated in the Ad Hoc Track, the Link the Wiki Track, and the Interactive Track that continued from INEX} 2006. Our main aims at INEX} 2007 were the following. For the Ad Hoc Track, we investigated the effectiveness of incorporating link evidence into the model, and of a CAS} filtering method exploiting the structural hints in the INEX} topics. For the Link the Wiki Track, we investigated the relative effectiveness of link detection based on retrieving similar documents with the Vector Space Model, and then filter with the names of Wikipedia articles to establish a link. For the Interactive Track, we took part in the interactive experiment comparing an element retrieval system with a passage retrieval system. The main results are the following. For the Ad Hoc Track, we see that link priors improve most of our runs for the Relevant in Context and Best in Context Tasks, and that CAS} pool filtering is effective for the Relevant in Context and Best in Context Tasks. For the Link the Wiki Track, the results show that detecting links with name matching works relatively well, though links were generally under-generated, which hurt the performance. For the Interactive Track, our test-persons showed a weak preference for the element retrieval system over the passage retrieval system. 2008 Springer-Verlag} Berlin Heidelberg.
Fadaei, Hakimeh & Shamsfard, Mehrnoush Extracting conceptual relations from Persian resources 7th International Conference on Information Technology - New Generations, ITNG 2010, April 12, 2010 - April 14, 2010 Las Vegas, NV, United states 2010 [74]
In this paper we present a relation extraction system which uses a combination of pattern based, structure based and statistical approaches. This system uses raw texts and Wikipedia articles to learn conceptual relations. Wikipedia structures are rich source of information in relation extraction and are well used in this system. A set of patterns are extracted for Persian language and are used to learn both taxonomic and non-taxonomic relations. This system is one of the few relation extraction systems designed for Persian language and is the first system among them which uses Wikipedia structures in the process of relation learning. ""
Fernandez-Garcia, Norberto; Blazquez-Del-Toro, Jose M.; Fisteus, Jesus Arias & Sanchez-Fernandez, Luis A semantic web portal for semantic annotation and search 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2006, October 9, 2006 - October 11, 2006 Bournemouth, United kingdom 2006
The semantic annotation of the contents of Web resources is a required step in order to allow the Semantic Web vision to become a reality. In this paper we describe an approach to manual semantic annotation which tries to integrate both the semantic annotation task and the information retrieval task. Our approach exploits the information provided by Wikipedia pages and takes the form of a semantic Web portal, which allows a community of users to easily define and share annotations on Web resources. Springer-Verlag} Berlin Heidelberg 2006.
Ferrandez, Sergio; Toral, Antonio; Ferrandez, Oscar; Ferrandez, Antonio & Munoz, Rafael Applying Wikipedia's multilingual knowledge to cross-lingual question answering 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007
The application of the multilingual knowledge encoded in Wikipedia to an open-domain Cross-Lingual} Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet} is proposed and evaluated. This strategy overcomes the problems due to ILI's} low coverage on proper nouns (Named} Entities). Moreover, as these are open class words (highly changing), using a community-based up-to-date resource avoids the tedious maintenance of hand-coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL-QA} and the advantages of relying on Wikipedia over ILI} for doing this. Tests on questions from the Cross-Language} Evaluation Forum (CLEF) justify our approach (20\% of these are correctly answered thanks to Wikipedia's Multilingual Knowledge). Springer-Verlag} Berlin Heidelberg 2007.
Fier, Darja & Sagot, Benoit Combining multiple resources to build reliable wordnets 11th International Conference on Text, Speech and Dialogue, TSD 2008, September 8, 2008 - September 12, 2008 Brno, Czech republic 2008 [75]
This paper compares automatically generated sets of synonyms in French and Slovene wordnets with respect to the resources used in the construction process. Polysemous words were disambiguated via a five-language word-alignment of the SEERA.NET} parallel corpus, a subcorpus of the JRC} Acquis. The extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from different resources, including Wikipedia, Wiktionary and EUROVOC} thesaurus. A representative sample of the generated synsets was evaluated against the goldstandards. 2008 Springer-Verlag} Berlin Heidelberg.
Figueroa, Alejandro Surface language models for discovering temporally anchored definitions on the web: Producing chronologies as answers to definition questions 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010
This work presents a data-driven definition question answering (QA) system that outputs a set of temporally anchored definitions as answers. This system builds surface language models on top of a corpus automatically acquired from Wikipedia abstracts, and ranks answer candidates in agreement with these models afterwards. Additionally, this study deals at greater length with the impact of several surface features in the ranking of temporally anchored answers.
Figueroa, Alejandro Are wikipedia resources useful for discovering answers to list questions within web snippets? 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2009 [76]
This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. 2009 Springer Berlin Heidelberg.
Figueroa, Alejandro Mining wikipedia for discovering multilingual definitions on the web 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China 2008 [77]
MI} - DfWebQA} is a multilingual definition question answering system (QAS) that extracts answers to definition queries from the short descriptions of web-sites returned by search engines, called web snippets. These answers are discriminated on the ground of lexico-syntactic regularities mined from multilingual resources supplied by Wikipedia. Results support that these regularities serve to significantly strengthen the answering process. In addition, Ml - DfWebQA} increases the robustness of multilingual definition QASs} by making use of aliases found in Wikipedia. ""
Figueroa, Alejandro Mining wikipedia resources for discovering answers to list questions in web snippets 4th International Conference on Semantics, Knowledge, and Grid, SKG 2008, December 3, 2008 - December 5, 2008 Beijing, China 2008 [78]
This paper presents LiSnQA, a list question answering system that extracts answers to list queries from the short descriptions of web-sites returned by search engines, called web snippets. LiSnQA} mines Wikipedia resources in order to obtain valuable information that assists in the extraction of these answers. The interesting facet of LiSnQA} is, that in contrast to current systems, it does not account for lists in Wikipedia, but for its redirections, categories, sandboxes, and first definition sentences. Results show that these resources strengthen the answering process. ""
Figueroa, Alejandro & Atkinson, John Using dependency paths for answering definition questions on the web 5th International Conference on Web Information Systems and Technologies, WEBIST 2009, March 23, 2009 - March 26, 2009 Lisbon, Portugal 2009
This work presents a new approach to automatically answer definition questions from the Web. This approach learns n-gram language models from lexicalised dependency paths taken from abstracts provided by Wikipedia and uses context information to identify candidate descriptive sentences containing target answers. Results using a prototype of the model showed the effectiveness of lexicalised dependency paths as salient indicators for the presence of definitions in natural language texts.
Finin, Tim & Syed, Zareen Creating and exploiting a Web of semantic data 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain 2010
Twenty years ago Tim Berners-Lee} proposed a distributed hypertext system based on standard Internet protocols. The Web that resulted fundamentally changed the ways we share information and services, both on the public Internet and within organizations. That original proposal contained the seeds of another effort that has not yet fully blossomed: a Semantic Web designed to enable computer programs to share and understand structured and semi-structured information easily. We will review the evolution of the idea and technologies to realize a Web of Data and describe how we are exploiting them to enhance information retrieval and information extraction. A key resource in our work is Wikitology, a hybrid knowledge base of structured and unstructured information extracted from Wikipedia.
Fogarolli, Angela Word sense disambiguation based on Wikipedia link structure ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [79]
In this paper an approach based on Wikipedia link structure for sense disambiguation is presented and evaluated. Wikipedia is used as a reference to obtain lexicographic relationships and in combination with statistical information extraction it is possible to deduce concepts related to the terms extracted from a corpus. In addition, since the corpus covers a representation of a part of the real world the corpus itself is used as training data for choosing the sense which best fit the corpus. ""
Fogarolli, Angela & Ronchetti, Marco Domain independent semantic representation of multimedia presentations International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain 2009 [80]
This paper describes a domain independent approach for semantically annotating and representing multimedia presentations. It uses a combination of techniques to automatically discover the content of the media and, though supervised or unsupervised methods, it can generate a RDF} description out of it. The domain independence is achieved using Wikipedia as a source of knowledge instead of domain Ontologies. The described approach can be relevant for understanding multimedia content which can be used in Information Retrieval, categorization and summarization. ""
Fogarolli, Angela & Ronchetti, Marco Discovering semantics in multimedia content using Wikipedia 11th International Conference on Business Information Systems, BIS 2008, May 5, 2008 - May 7, 2008 Innsbruck, Austria 2008 [81]
Semantic-based information retrieval is an area of ongoing work. In this paper we present a solution for giving semantic support to multimedia content information retrieval in an E-Learning environment where very often a large number of multimedia objects and information sources are used in combination. Semantic support is given through intelligent use of Wikipedia in combination with statistical Information Extraction techniques. 2008 Springer Berlin Heidelberg.
Fu, Linyun; Wang, Haofen; Zhu, Haiping; Zhang, Huajie; Wang, Yang & Yu, Yong Making more wikipedians: Facilitating semantics reuse for wikipedia authoring 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [82]
Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It can also serve as an ideal Semantic Web data source due to its abundance, influence, high quality and well-structuring. However, the heavy burden of up-building and maintaining such an enormous and ever-growing online encyclopedic knowledge base still rests on a very small group of people. Many casual users may still feel difficulties in writing high quality Wikipedia articles. In this paper, we use RDF} graphs to model the key elements in Wikipedia authoring, and propose an integrated solution to make Wikipedia authoring easier based on RDF} graph matching, expecting making more Wikipedians. Our solution facilitates semantics reuse and provides users with: 1) a link suggestion module that suggests and auto-completes internal links between Wikipedia articles for the user; 2) a category suggestion module that helps the user place her articles in correct categories. A prototype system is implemented and experimental results show significant improvements over existing solutions to link and category suggestion tasks. The proposed enhancements can be applied to attract more contributors and relieve the burden of professional editors, thus enhancing the current Wikipedia to make it an even better Semantic Web data source. 2008 Springer-Verlag} Berlin Heidelberg.
Fukuhara, Tomohiro; Arai, Yoshiaki; Masuda, Hidetaka; Kimura, Akifumi; Yoshinaka, Takayuki; Utsuro, Takehito & Nakagawa, Hiroshi KANSHIN: A cross-lingual concern analysis system using multilingual blog articles 2008 1st International Workshop on Information-Explosion and Next Generation Search, INGS 2008, April 26, 2008 - April 27, 2008 Shenyang, China 2008 [83]
An architecture of cross-lingual concern analysis (CLCA) using multilingual blog articles, and its prototype system are described. As various people who are living in various countries use the Web, cross-lingual information retrieval (CLIR) plays an important role in the next generation search. In this paper, we propose a CLCA} as one of CLIR} applications for facilitating users to find concerns of people across languages. We propose a layer architecture of CLCA, and its prototype system called KANSHIN.} The system collects Japanese, Chinese, Korean, and English blog articles, and analyzes concerns across languages. Users can find concerns from several viewpoints such as temporal, geographical, and a network of blog sites. The system also facilitates users to browse multilingual keywords using Wikipedia, and the system facilitates users to find spam blogs. An overview of the CLCA} architecture and the system are described. ""
Gang, Wang; Huajie, Zhang; Haofen, Wang & Yong, Yu Enhancing relation extraction by eliciting selectional constraint features from Wikipedia 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007
Selectional Constraints are usually checked for detecting semantic relations. Previous work usually defined the constraints manually based on handcrafted concept taxonomy, which is time-consuming and impractical for large scale relation extraction. Further, the determination of entity type (e.g. NER) based on the taxonomy cannot achieve sufficiently high accuracy. In this paper, we propose a novel approach to extracting relation instances using the features elicited from Wikipedia, a free online encyclopedia. The features are represented as selectional constraints and further employed to enhance the extraction of relations. We conduct case studies on the validation of the extracted instances for two common relations {hasArtist(album, artist) and {hasDirector(film, director). Substantially high extraction precision (around 0.95) and validation accuracy (near 0.90) are obtained. Springer-Verlag} Berlin Heidelberg 2007.
Garza, Sara E.; Brena, Ramon F. & Ramirez, Eduardo Topic calculation and clustering: an application to wikipedia 7th Mexican International Conference on Artificial Intelligence, MICAI 2008, October 27, 2008 - October 31, 2008 Atizapan de Zaragoza, Mexico 2008 [84]
Wikipedia is nowadays one of the most valuable information resources; nevertheless, its current structure, which has no formal organization, does not allow to always have a useful browsing among topics. Moreover, even though most Wikipedia pages include a See} Also " section for navigating through those articles' related Wikipedia pages the only references included here are those which authors are aware of leading to incompleteness and other irregularities. In this work a method for finding related Wikipedia articles is proposed; this method relies on a framework that clusters documents into semantically-calculated topics and selects the closest documents which could enrich the {"See} Also " section. """
Gaugaz, Julien; Zakrzewski, Jakub; Demartini, Gianluca & Nejdl, Wolfgang How to trace and revise identities 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece 2009 [85]
The Entity Name System (ENS) is a service aiming at providing globally unique URIs} for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS} for deciding on entity Identity-Do} two entity descriptions refer to the same real-world entity?-are changing over time, the system has to revise its past decisions: One entity has been given two different URIs} or two entities have been attributed the same URI.} The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs} provided by the ENS?} In this paper we propose a solution which relies on labelling the IDs} with additional history information. These labels allow clients to locally detect deprecated URIs} they are using and also merge IDs} referring to the same real-world entity without needing to consult the ENS.} Making update requests to the ENS} only for the IDs} detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID} history labelling, as well as how this impacts the uniqueness of the IDs} on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia. 2009 Springer Berlin Heidelberg.
Gehringer, Edward Assessing students' WIKI contributions 2008 ASEE Annual Conference and Exposition, June 22, 2008 - June 24, 2008 Pittsburg, PA, United states 2008
Perhaps inspired by the growing attention given to Wikipedia, instructors have increasingly been turning to wikis [1, 2] as an instructional collaborative space. A major advantage of a wiki is that any user can edit it at any time. In a class setting, students may be restricted in what pages they can edit, but usually each page can be edited by multiple students and/or each student can edit multiple pages. This makes assessment a challenge, since it is difficult to keep track of the contributions of each student. Several assessment strategies have been proposed. To our knowledge, this is the first attempt to compare them. We study the assessment strategies used in six North Carolina State University classes in Fall 2007, and offer ideas on how they can be improved. American Society for Engineering Education, 2008.
Geva, Shlomo GPX: Ad-Hoc queries and automated link discovery in the Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [86]
The INEX} 2007 evaluation was based on the Wikipedia collection. In this paper we describe some modifications to the GPX} search engine and the approach taken in the Ad-hoc and the Link-the-Wiki} tracks. In earlier version of GPX} scores were recursively propagated from text containing nodes, through ancestors, all the way to the document root of the XML} tree. In this paper we describe a simplification whereby the score of each node is computed directly, doing away with the score propagation mechanism. Results indicate slightly improved performance. The GPX} search engine was used in the Link-the-Wiki} track to identify prospective incoming links to new Wikipedia pages. We also describe a simple and efficient approach to the identification of prospective outgoing links in new Wikipedia pages. We present and discuss evaluation results. 2008 Springer-Verlag} Berlin Heidelberg.
Geva, Shlomo; Kamps, Jaap; Lethonen, Miro; Schenkel, Ralf; Thom, James A. & Trotman, Andrew Overview of the INEX 2009 Ad hoc track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [87]
This paper gives an overview of the INEX} 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX} 2002-2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints-now using both the Wikipedia's layout-based markup, as well as the enriched semantic markup-and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML} elements, ranges of XML} elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. The INEX} 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks. 2010 Springer-Verlag} Berlin Heidelberg.
Ghinea, Gheorghita; Bygstad, Bendik & Schmitz, Christoph Multi-dimensional moderation in online communities: Experiences with three Norwegian sites 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states 2009 [88]
Online-communities and user contribution of content have become widespread over the last years. This has triggered new and innovative web concepts, and perhaps also changed the power balance in the society. Many large corporations have embraced this way of creating content to their sites, which has raised concerns regarding abusive content. Previous research has identified two main different types of moderation; one where the users have most of the control as in Wikipedia, and the other where the owners control everything. The media industry, in particular, are reluctant to loose the control of their content by using the member-maintained approach even if it has proven to cost less and be more efficient. This research proposes to merge these two moderation types through a concept called multidimensional moderation. To test this concept, two prototype solutions have been implemented and tested in large-scale discussion groups. The results from this study show that a combination of owner and user moderation may enhance the moderation process. 2009 Springer Berlin Heidelberg.
Giampiccolo, Danilo; Forner, Pamela; Herrera, Jesus; Penas, Anselmo; Ayache, Christelle; Forascu, Corina; Jijkoun, Valentin; Osenova, Petya; Rocha, Paulo; Sacaleanu, Bogdan & Sutcliffe, Richard Overview of the CLEF 2007 multilingual question answering track 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [89]
The fifth QA} campaign at CLEF} [1], having its first edition in 2003, offered not only a main task but an Answer Validation Exercise (AVE) [2], which continued last year's pilot, and a new pilot: the Question Answering on Speech Transcripts (QAST) [3, 15]. The main task was characterized by the focus on cross-linguality, while covering as many European languages as possible. As novelty, some QA} pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster possibly contain co-references between one of them and the others. Finally, the need for searching answers in web formats was satisfied by introducing Wikipedia as document corpus. The results and the analyses reported by the participants suggest that the introduction of Wikipedia and the topic related questions led to a drop in systems' performance. 2008 Springer-Verlag} Berlin Heidelberg.
Giuliano, Claudio; Gliozzo, Alfio Massimiliano; Gangemi, Aldo & Tymoshenko, Kateryna Acquiring thesauri from wikis by exploiting domain models and lexical substitution 7th Extended Semantic Web Conference, ESWC 2010, May 30, 2010 - June 3, 2010 Heraklion, Crete, Greece 2010 [90]
Acquiring structured data from wikis is a problem of increasing interest in knowledge engineering and Semantic Web. In fact, collaboratively developed resources are growing in time, have high quality and are constantly updated. Among these problems, an area of interest is extracting thesauri from wikis. A thesaurus is a resource that lists words grouped together according to similarity of meaning, generally organized into sets of synonyms. Thesauri are useful for a large variety of applications, including information retrieval and knowledge engineering. Most information in wikis is expressed by means of natural language texts and internal links among Web pages, the so-called wikilinks. In this paper, an innovative method for inducing thesauri from Wikipedia is presented. It leverages on the Wikipedia structure to extract concepts and terms denoting them, obtaining a thesaurus that can be profitably used into applications. This method boosts sensibly precision and recall if applied to re-rank a state-of-the-art baseline approach. Finally, we discuss how to represent the extracted results in RDF/OWL, with respect to existing good practices.
Gonzalez-Cristobal, Jose-Carlos; Goni-Menoyo, Jose Miguel; Villena-Roman, Julio & Lana-Serrano, Sara MIRACLE progress in monolingual information retrieval at Ad-Hoc CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [91]
This paper presents the 2007 MIRACLE's} team approach to the AdHoc} Information Retrieval track. The main work carried out for this campaign has been around monolingual experiments, in the standard and in the robust tracks. The most important contributions have been the general introduction of automatic named-entities extraction and the use of wikipedia resources. For the 2007 campaign, runs were submitted for the following languages and tracks: a) Monolingual: Bulgarian, Hungarian, and Czech. b) Robust monolingual: French, English and Portuguese. 2008 Springer-Verlag} Berlin Heidelberg.
Grac, Marek Trdlo, an open source tool for building transducing dictionary 12th International Conference on Text, Speech and Dialogue, TSD 2009, September 13, 2009 - September 17, 2009 Pilsen, Czech republic 2009 [92]
This paper describes the development of an open-source tool named Trdlo. Trdlo was developed as part of our effort to build a machine translation system between very close languages. These languages usually do not have available pre-processed linguistic resources or dictionaries suitable for computer processing. Bilingual dictionaries have a big impact on quality of translation. Proposed methods described in this paper attempt to extend existing dictionaries with inferable translation pairs. Our approach requires only 'cheap' resources: a list of lemmata for each language and rules for inferring words from one language to another. It is also possible to use other resources like annotated corpora or Wikipedia. Results show that this approach greatly improves effectivity of building Czech-Slovak} dictionary. 2009 Springer Berlin Heidelberg.
Granitzer, Michael; Seifert, Christin & Zechner, Mario Context based wikipedia linking 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [93]
Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4\% less Mean Average Precision on the provided data set. 2009 Springer Berlin Heidelberg.
Guo, Hongzhi; Chen, Qingcai; Cui, Lei & Wang, Xiaolong An interactive semantic knowledge base unifying wikipedia and HowNet 7th International Conference on Information, Communications and Signal Processing, ICICS 2009, December 8, 2009 - December 10, 2009 Macau Fisherman's Wharf, China 2009 [94]
We present an interactive, exoteric semantic knowledge base, which integrates {HowNet} and the online encyclopedia Wikipedia. The semantic knowledge base mainly builds on items, categories, attributes and relation between. In the constructing process, a mapping relationship is established from {HowNet, Wikipedia to the new knowledge base. Different from other online encyclopedias or knowledge dictionaries, the categories in the semantic knowledge base are semantically tagged, and this can be well used in semantic analysis and semantic computing. Currently the knowledge base built in this paper contains more than 200,000 items and 1,000 categories, and these are still increasing every day. ""
Gupta, Anand; Goyal, Akhil; Bindal, Aman & Gupta, Ankuj Meliorated approach for extracting Bilingual terminology from wikipedia 11th International Conference on Computer and Information Technology, ICCIT 2008, December 25, 2008 - December 27, 2008 Khulna, Bangladesh 2008 [95]
With the demand of accurate and domain specific bilingual dictionaries, research in the field of automatic dictionary extraction has become popular. Due to lack of domain specific terminology in parallel corpora, extraction of bilingual terminology from Wikipedia (a corpus for knowledge extraction having a huge amount of articles, links within different languages, a dense link structure and a number of redirect pages) has taken up a new research in the field of bilingual dictionary creation. Our method not only analyzes interlanguage links along with redirect page titles and linktext titles but also filters out inaccurate translation candidates using pattern matching. Score of each translation candidate is calculated using page parameters and then setting an appropriate threshold as compared to previous approach, which was solely, based on backward links. In our experiment, we proved the advantages of our approach compared to the traditional approach. ""
Hartrumpf, Sven; Glockner, Ingo & Leveling, Johannes Coreference resolution for questions and answer merging by validation 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [96]
For its fourth participation at QA@CLEF, the German question answering (QA) system InSicht} was improved for CLEF} 2007 in the following main areas: questions containing pronominal or nominal anaphors are treated by a coreference resolver; the shallow QA} methods are improved; and a specialized module is added for answer merging. Results showed a performance drop compared to last year mainly due to problems in handling the newly added Wikipedia corpus. However, dialog treatment by coreference resolution delivered very accurate results so that follow-up questions can be handled similarly to isolated questions. 2008 Springer-Verlag} Berlin Heidelberg.
Haruechaiyasak, Choochart & Damrongrat, Chaianun Article recommendation based on a topic model for Wikipedia Selection for Schools 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia 2008 [97]
The 2007 Wikipedia Selection for Schools is a collection of 4,625 selected articles from Wikipedia as educational for children. Users can currently access articles within the collection via two different methods: (1) by browsing on either a subject index or a title index sorted alphabetically, and (2) by following hyperlinks embedded within article pages. These two retrieval methods are considered static and subjected to human editors. In this paper, we apply the Latent Dirichlet Allocation (LDA) algorithm to generate a topic model from articles in the collection. Each article can be expressed by a probability distribution on the topic model. We can recommend related articles by calculating the similarity measures among the articles' topic distribution profiles. Our initial experimental results showed that the proposed approach could generate many highly relevant articles, some of which are not covered by the hyperlinks in a given article. 2008 Springer Berlin Heidelberg.
Hatcher-Gallop, Rolanda; Fazal, Zohra & Oluseyi, Maya Quest for excellence in a wiki-based world 2009 IEEE International Professional Communication Conference, IPCC 2009, July 19, 2009 - July 22, 2009 Waikiki, {HI, United states 2009 [98]
In an increasingly technological world, the Internet is often the primary source of information. Traditional encyclopedias, once the cornerstone of any worthy reference collection, have been replaced by online encyclopedias, many of which utilize open source software (OSS) to create and update content. One of the most popular and successful encyclopedias of this nature is Wikipedia. In fact, Wikipedia is among the most popular Internet sites in the world. However, it is not without criticism. What are some features of Wikipedia? What are some of its strengths and weaknesses? And what have other wiki-based encyclopedias learned from Wikipedia that they have incorporated into their own websites in a quest for excellence? This paper answers these questions and uses Crawford's six information quality dimensions, 1) scope; 2) format; 3) uniqueness and authority; 4) accuracy; 5) currency; and 6) accessibility, to evaluate Wikipedia and three other online encyclopedias: Citizendium, Scholarpedia, and Medpedia. The latter three have managed to maintain the advantages of Wikipedia while minimizing its weaknesses. ""
He, Jiyin Link detection with wikipedia 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [99]
This paper describes our participation in the INEX} 2008 Link the Wiki track. We focused on the file-to-file task and submitted three runs, which were designed to compare the impact of different features on link generation. For outgoing links, we introduce the anchor likelihood ratio as an indicator for anchor detection, and explore two types of evidence for target identification, namely, the title field evidence and the topic article content evidence. We find that the anchor likelihood ratio is a useful indicator for anchor detection, and that in addition to the title field evidence, re-ranking with the topic article content evidence is effective for improving target identification. For incoming links, we use exact match and retrieval method with language modeling approach, and find that the exact match approach works best. On top of that, our experiment shows that the semantic relatedness between Wikipedia articles also has certain ability to indicate links. 2009 Springer Berlin Heidelberg.
He, Jiyin & Rijke, Maarten De An exploration of learning to link with wikipedia: Features, methods and training collection 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [100]
We describe our participation in the Link-the-Wiki} track at INEX} 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary. 2010 Springer-Verlag} Berlin Heidelberg.
He, Jiyin; Zhang, Xu; Weerkamp, Wouter & Larson, Martha Metadata and multilinguality in video classification 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [101]
The VideoCLEF} 2008 Vid2RSS} task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. The University of Amsterdam chose to focus on exploiting archival metadata and speech transcripts generated by both Dutch and English speech recognizers. A Support Vector Machine (SVM) classifier was trained on training data collected from Wikipedia. The results provide evidence that combining archival metadata with speech transcripts can improve classification performance, but that adding speech transcripts in an additional language does not yield performance gains. 2009 Springer Berlin Heidelberg.
He, Miao; Cutler, Michal & Wu, Kelvin Categorizing queries by topic directory 9th International Conference on Web-Age Information Management, WAIM 2008, July 20, 2008 - July 22, 2008 Zhangjiajie, China 2008 [102]
The categorization of a web user query by topic or category can be used to select useful web sources that contain the required information. In pursuit of this goal, we explore methods for mapping user queries to category hierarchies under which deep web resources are also assumed to be classified. Our sources for these category hierarchies, or directories, are Yahoo! Directory and Wikipedia. Forwarding an unrefined query (in our case a typical fact finding query sent to a question answering system) directly to these directory resources usually returns no directories or incorrect ones. Instead, we develop techniques to generate more specific directory finding queries from an unrefined query and use these to retrieve better directories. Despite these engineered queries, our two resources often return multiple directories that include many incorrect results, i.e., directories whose categories are not related to the query, and thus web resources for these categories are unlikely to contain the required information. We develop methods for selecting the most useful ones. We consider a directory to be useful if web sources for any of its narrow categories are likely to contain the searched for information. We evaluate our mapping system on a set of 250 TREC} questions and obtain precision and recall in the 0.8 to 1.0 range. ""
Hecht, Brent & Gergle, Darren The tower of Babel meets web 2.0: User-generated content and its applications in a multilingual context 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states 2010 [103]
This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create culturally- aware applications" and "hyperlingual applications". """
Hecht, Brent & Moxley, Emily Terabytes of tobler: Evaluating the first law in a massive, domain-neutral representation of world knowledge 9th International Conference on Spatial Information Theory, COSIT 2009, September 21, 2009 - September 25, 2009 Aber Wrac'h, France 2009 [104]
The First Law of Geography states, everything is related to everything else but near things are more related than distant things." Despite the fact that it is to a large degree what makes "spatial special the law has never been empirically evaluated on a large, domain-neutral representation of world knowledge. We address the gap in the literature about this critical idea by statistically examining the multitude of entities and relations between entities present across 22 different language editions of Wikipedia. We find that, at least according to the myriad authors of Wikipedia, the First Law is true to an overwhelming extent regardless of language-defined cultural domain. 2009 Springer Berlin Heidelberg.
Heiskanen, Tero; Kokkonen, Juhana; Hintikka, Kari A.; Kola, Petri; Hintsa, Timo & Nakki, Pirjo Tutkimusparvi the open research swarm in Finland 12th International MindTrek Conference: Entertainment and Media in the Ubiquitous Era, MindTrek'08, October 7, 2008 - October 9, 2008 Tampere, Finland 2008 [105]
in this paper, we introduce a new kind of scientific collaboration type (open research swarm) and describe a realization (Tutkimusparvi) of this new type of scientific social network. Swarming is an experiment in selforganizing and a novel way to collaborate in the field of academic research. Open research swarms utilize the possibilities of Internet, especially the social media tools that are now available because of the web 2.0 boom. The main goal is to collectively attain rapid solutions to given challenges and to develop a distributed intellectual milieu for researchers. Transparency of the research and creative collaboration are central ideas behind open research swarms. Like Wikipedia, open research swarm is open for everyone to participate. The questions and research topics can come from open research swarm participants, from a purposed principal or from general discussions on the mass media. ""
Hoffman, Joel Employee knowledge: Instantly searchable Digital Energy Conference and Exhibition 2009, April 7, 2009 - April 8, 2009 Houston, TX, United states 2009
The online encyclopedia, Wikipedia, has proven the value of the world community contributing to an instantly searchable world knowledge base. The same technology can be applied to the company community: each individual sharing strategic tips directly related to company interests that are then instantly searchable. Each employee can share, using Microsoft Sharepoint Wiki Pages, those unique hints, tips, tricks, and knowledge that they feel could be of the highest value to other employees: how-to's and shortcuts in company software packages, learnings from pilot projects (successful or not), links to fantastic resources, etc. This growing knowledge base then becomes an instantly searchable, global resource for the entire company. Occidental of Elk Hills, Inc. just recently, October 15, 2008, started a rollout of Wiki page use at its Elk Hills, CA, USA} properties. There are over 300 employees at Elk Hills and its Wiki Home Page received over 1500 hits in its first day, with multiple employees contributing multiple articles. Employees are already talking about time-savers they have learned and applied. A second presentation was demanded by those that missed the first. The rollout has generated a buzz of excitement and interest that we will be encouraging into the indefinite future. The significance of a corporate knowledge base can be major: high-tech professionals not spending hours figuring out how to do what someone else has already figured out and documented, support personnel not having to answer the same questions over and over again but having only to point those asking to steps already documented, employees learning time-saving tips that they may never have learned or thought of, professionals no longer wasting time searching for results of other trials or having to reinvent the wheel. Time is money. Knowledge is power. Applying Wiki technology to corporate knowledge returns time and knowledge to the workforce leading to bottom line benefits and powerful corporate growth. 2009, Society of Petroleum Engineers.
Hong, Richang; Tang, Jinhui; Zha, Zheng-Jun; Luo, Zhiping & Chua, Tat-Seng Mediapedia: Mining web knowledge to construct multimedia encyclopedia 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [106]
In recent years, we have witnessed the blooming of Web 2.0 content such as Wikipedia, Flickr and YouTube, etc. How might we benefit from such rich media resources available on the internet? This paper presents a novel concept called Mediapedia, a dynamic multimedia encyclopedia that takes advantage of, and in fact is built from the text and image resources on the Web. The Mediapedia distinguishes itself from the traditional encyclopedia in four main ways. (1) It tries to present users with multimedia contents (e.g., text, image, video) which we believed are more intuitive and informative to users. (2) It is fully automated because it downloads the media contents as well as the corresponding textual descriptions from the Web and assembles them for presentation. (3) It is dynamic as it will use the latest multimedia content to compose the answer. This is not true for the traditional encyclopedia. (4) The design of Mediapedia is flexible and extensible such that we can easily incorporate new kinds of mediums such as video and languages into the framework. The effectiveness of Mediapedia is demonstrated and two potential applications are described in this paper. 2010 Springer-Verlag} Berlin Heidelberg.
Hori, Kentaro; Oishi, Tetsuya; Mine, Tsunenori; Hasegawa, Ryuzo; Fujita, Hiroshi & Koshimura, Miyuki Related word extraction from wikipedia for web retrieval assistance 2nd International Conference on Agents and Artificial Intelligence, ICAART 2010, January 22, 2010 - January 24, 2010 Valencia, Spain 2010
This paper proposes a web retrieval system with extended queries generated from the contents of Wikipedia.By} using the extended queries, we aim to assist user in retrieving Web pages and acquiring knowledge. To extract extended query items, we make much of hyperlinks in Wikipedia in addition to the related word extraction algorithm. We evaluated the system through experimental use of it by several examinees and the questionnaires to them. Experimental results show that our system works well for user's retrieval and knowledge acquisition.
Huang, Darren Wei Che; Xu, Yue; Trotman, Andrew & Geva, Shlomo Overview of INEX 2007 link the Wiki track 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [107]
Wikipedia is becoming ever more popular. Linking between documents is typically provided in similar environments in order to achieve collaborative knowledge sharing. However, this functionality in Wikipedia is not integrated into the document creation process and the quality of automatically generated links has never been quantified. The Link the Wiki (LTW) track at INEX} in 2007 aimed at producing a standard procedure, metrics and a discussion forum for the evaluation of link discovery. The tasks offered by the LTW} track as well as its evaluation present considerable research challenges. This paper briefly described the LTW} task and the procedure of evaluation used at LTW} track in 2007. Automated link discovery methods used by participants are outlined. An overview of the evaluation results is concisely presented and further experiments are reported. 2008 Springer-Verlag} Berlin Heidelberg.
Huang, Jin-Xia; Ryu, Pum-Mo & Choi, Key-Sun An empirical research on extracting relations from Wikipedia text 9th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2008, November 2, 2008 - November 5, 2008 Daejeon, Korea, Republic of 2008 [108]
A feature based relation classification approach is presented, in which probabilistic and semantic relatedness features between patterns and relation types are employed with other linguistic information. The importance of each feature set is evaluated with Chi-square estimator, and the experiments show that, the relatedness features have big impact on the relation classification performance. A series experiments are also performed to evaluate the different machine learning approaches on relation classification, among which Bayesian outperformed other approaches including Support Vector Machine (SVM).} 2008 Springer Berlin Heidelberg.
Huynh, Dat T.; Cao, Tru H.; Pham, Phuong H.T. & Hoang, Toan N. Using hyperlink texts to improve quality of identifying document topics based on Wikipedia 1st International Conference on Knowledge and Systems Engineering, KSE 2009, October 13, 2009 - October 17, 2009 Hanoi, Viet nam 2009 [109]
This paper presents a method to identify the topics of documents based on Wikipedia category network. It is to improve the method previously proposed by Schonhofen by taking into account the weights of words in hyperlink texts in Wikipedia articles. The experiments on Computing and Team Sport domains have been carried out and showed that our proposed method outperforms the Schonhofen's one. ""
Iftene, Adrian; Pistol, Ionut & Trandabat, Diana Grammar-based automatic extraction of definitions 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2008, September 26, 2008 - September 29, 2008 Timisoara, Romania 2008 [110]
The paper describes the development and usage of a grammar developed to extract definitions from documents. One of the most important practical usages of the developed grammar is the automatic extraction of definitions from web documents. Three evaluation scenarios were run, the results of these experiments being the main focus of the paper. One scenario uses an e-learning context and previously annotated elearning documents; the second one involves a large collection of unannotated documents (from Wikipedia) and tries to find answers for definition type questions. The third scenario performs a similar question-answering task, but this time on the entire web using Google web search and the Google Translation Service. The results are convincing, further development as well as further integration of the definition extraction system in various related applications are already under way. ""
IV, Adam C. Powell & Morris, Arthur E. Wikipedia in materials education 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states 2007
Wikipedia has become a vast storehouse of human knowledge, and a first point of reference for millions of people from all walks of life, including many materials science and engineering (MSE) students. Its characteristics of open authorship and instant publication lead to both its main strength of broad, timely coverage and also its weakness of non-uniform quality. This talk will discuss the status and potential of this medium as a delivery mechanism for materials education content, some experiences with its use in the classroom, and its fit with other media from textbooks to digital libraries.
Jack, Hugh Using a wiki for professional communication and collaboration 2009 ASEE Annual Conference and Exposition, June 14, 2009 - June 17, 2009 Austin, TX, United states 2009
Since the inception of Wikipedia there has been a great interest in the open model of document development. However this model is not that different from what already exists in many professional groups. In a professional group every member is welcome to contribute, but one individual is tasked with the secretarial duties of collecting, collating and recording communications, or capturing discourse during face-to-face meetings. These are often captured as minutes, letters, reports, and recommendations. These activities can be supported in a more free-flowing manner on a Wiki where anybody is welcome to add/modify/delete content, changes can be tracked, and undone when necessary. This paper will describe the use of a Wiki to act as a central point for a professional group developing new curriculum standards. The topics will include a prototype structure for the site, governing principles, encouraging user involvement, and resolving differences of opinion. American Society for Engineering Education, 2009.
Jamsen, Janne; Nappila, Turkka & Arvola, Paavo Entity ranking based on category expansion 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [111]
This paper introduces category and link expansion strategies for the XML} Entity Ranking track at INEX} 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Janik, Maciej & Kochut, Krys J. Wikipedia in action: Ontological knowledge in text categorization 2nd Annual IEEE International Conference on Semantic Computing, ICSC 2008, August 4, 2008 - August 7, 2008 Santa Clara, CA, United states 2008 [112]
We present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that our categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Our method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, we used an RDF} ontology constructed from the full English version of Wikipedia. Our experiments, conducted on corpora of Reuters news articles, showed that our training-less categorization method achieved a very good overall accuracy. ""
Javanmardi, Sara; Ganjisaffar, Yasser; Lopes, Cristina & Baldi, Pierre User contribution and trust in Wikipedia 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states 2009 [113]
Wikipedia, one of the top ten most visited websites, is commonly viewed as the largest online reference for encyclopedic knowledge. Because of its open editing model -allowing anyone to enter and edit content- Wikipedia's overall quality has often been questioned as a source of reliable information. Lack of study of the open editing model of Wikipedia and its effectiveness has resulted in a new generation of wikis that restrict contributions to registered users only, using their real names. In this paper, we present an empirical study of user contributions to Wikipedia. We statistically analyze contributions by both anonymous and registered users. The results show that submissions of anonymous and registered users in Wikipedia suggest a power law behavior. About 80\% of the revisions are submitted by less than 7\% of the users, most of whom are registered users. To further refine the analyzes, we use the Wiki Trust Model (WTM), a user reputation model developed in our previous work to assign a reputation value to each user. As expected, the results show that registered users contribute higher quality content and therefore are assigned higher reputation values. However, a significant number of anonymous users also contribute high-quality content. We provide further evidence that regardless of a user s' attribution, registered or anonymous, high reputation users are the dominant contributors that actively edit Wikipedia articles in order to remove vandalism or poor quality content.
Jenkinson, Dylan & Trotman, Andrew Wikipedia ad hoc passage retrieval and Wikipedia document linking 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [114]
Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX} 2007. An analysis of the INEX} 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25} (trained on INEX} 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming. For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25} ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more. 2008 Springer-Verlag} Berlin Heidelberg.
Jiang, Jiepu; Lu, Wei; Rong, Xianqian & Gao, Yangyan Adapting language modeling methods for expert search to rank wikipedia entities 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [115]
In this paper, we propose two methods to adapt language modeling methods for expert search to the INEX} entity ranking task. In our experiments, we notice that language modeling methods for expert search, if directly applied to the INEX} entity ranking task, cannot effectively distinguish entity types. Thus, our proposed methods aim at resolving this problem. First, we propose a method to take into account the INEX} category query field. Second, we use an interpolation of two language models to rank entities, which can solely work on the text query. Our experiments indicate that both methods can effectively adapt language modeling methods for expert search to the INEX} entity ranking task. 2009 Springer Berlin Heidelberg.
Jijkoun, Valentin; Hofmann, Katja; Ahn, David; Khalid, Mahboob Alam; Rantwijk, Joris Van; Rijke, Maarten De & Sang, Erik Tjong Kim The university of amsterdam's question answering system at QA@CLEF 2007 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [116]
We describe a new version of our question answering system, which was applied to the questions of the 2007 CLEF} Question Answering Dutch monolingual task. This year, we made three major modifications to the system: (1) we added the contents of Wikipedia to the document collection and the answer tables; (2) we completely rewrote the module interface code in Java; and (3) we included a new table stream which returned answer candidates based on information which was learned from question-answer pairs. Unfortunately, the changes did not lead to improved performance. Unsolved technical problems at the time of the deadline have led to missing justifications for a large number of answers in our submission. Our single run obtained an accuracy of only 8\% with an additional 12\% of unsupported answers (compared to 21\% in the last year's task). 2008 Springer-Verlag} Berlin Heidelberg.
Jinpan, Liu; Liang, He; Xin, Lin; Mingmin, Xu & Wei, Lu A new method to compute the word relevance in news corpus 2nd International Workshop on Intelligent Systems and Applications, ISA2010, May 22, 2010 - May 23, 2010 Wuhan, China 2010 [117]
In this paper we propose a new method to compute the relevance of term in news corpus. According to the characteristics of news corpus , we first propose that the news corpus should be divided into different channels, second we make use of the feature of news document , we divide the co-occurrence of terms into two cases, on the one hand the co-occurrence in the title of the news, On the other hand the co-occurrence in the news text, we use different methods to compute the co-occurrence. In the end, we introduce the web corpus Wikipedia to overcome some shortcomings of the news corpus ""
Juffinger, Andreas; Kern, Roman & Granitzer, Michael Crosslanguage Retrieval Based on Wikipedia Statistics 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [118]
In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD} Task at CLEF} 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF} corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks. 2009 Springer Berlin Heidelberg.
Kaiser, Fabian; Schwarz, Holger & Jakob, Mihaly Using wikipedia-based conceptual contexts to calculate document similarity 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico 2009 [119]
Rating the similarity of two or more text documents is an essential task in information retrieval. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, we propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the Wikipedia hypertext corpus. ""
Kamps, Jaap; Geva, Shlomo; Trotman, Andrew; Woodley, Alan & Koolen, Marijn Overview of the INEX 2008 Ad hoc track 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [120]
This paper gives an overview of the INEX} 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML} mark-up) for retrieving relevant information. This is a continuation of INEX} 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX} 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log. 2009 Springer Berlin Heidelberg.
Kamps, Jaap & Koolen, Marijn The impact of document level ranking on focused retrieval 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [121]
Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML} element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the document retrieval ranking in two collections used in the INEX} 2008 Ad hoc and Book Tracks; the relatively short documents of the Wikipedia collection and the much longer books in the Book Track collection. We experiment with several methods of combining document and element retrieval approaches. Our findings are that 1) we can get the best of both worlds and improve upon both individual retrieval strategies by retaining the document ranking of the document retrieval approach and replacing the documents by the retrieved elements of the element retrieval approach, and 2) using document level ranking has a positive impact on focused retrieval in Wikipedia, but has more impact on the much longer books in the Book Track collection. 2009 Springer Berlin Heidelberg.
Kamps, Jaap & Koolen, Marijn Is Wikipedia link structure different? 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [122]
In this paper, we investigate the difference between Wikipe-dia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR} test-collections: the {.GOV} collection used at the TREC} Web tracks and the Wikipedia XML} Corpus used at INEX.} We first perform a comparative analysis of Wikipedia and {.GOV} link structure and then investigate the value of link evidence for improving search on Wikipedia and on the {.GOV} domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia's outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account. ""
Kanhabua, Nattiya & Nrvag, Kjetil Exploiting time-based synonyms in searching document archives 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [123]
Query expansion of named entities can be employed in order to increase the retrieval effectiveness. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms change with time. In this paper, we present an approach to extracting synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relationships change over time. Further, we describe how to make use of both types of synonyms to increase the retrieval effectiveness, i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with time-dependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC} collections, we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach. ""
Kaptein, Rianne & Kamps, Jaap Finding entities in wikipedia using links and categories 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [124]
In this paper we describe our participation in the INEX} Entity Ranking track. We explored the relations between Wikipedia pages, categories and links. Our approach is to exploit both category and link information. Category information is used by calculating distances between document categories and target categories. Link information is used for relevance propagation and in the form of a document link prior. Both sources of information have value, but using category information leads to the biggest improvements. 2009 Springer Berlin Heidelberg.
Kaptein, Rianne & Kamps, Jaap Using links to classify wikipedia pages 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [125]
This paper contains a description of experiments for the 2008 INEX} XML-mining} track. Our goal for the XML-mining} track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise. 2009 Springer Berlin Heidelberg.
Kawaba, Mariko; Nakasaki, Hiroyuki; Yokomoto, Daisuke; Utsuro, Takehito & Fukuhara, Tomohiro Linking Wikipedia entries to blog feeds by machine learning 3rd International Universal Communication Symposium, IUCS 2009, December 3, 2009 - December 4, 2009 Tokyo, Japan 2009 [126]
This paper studies the issue of conceptually indexing the blogosphere through the whole hierarchy of Wikipedia entries. This paper proposes how to link Wikipedia entries to blog feeds in the Japanese blogosphere by machine learning, where about 300,000 Wikipedia entries are used for representing a hierarchy of topics. In our experimental evaluation, we achieved over 80\% precision in the task. ""
Kc, Milly; Chau, Rowena; Hagenbuchner, Markus; Tsoi, Ah Chung & Lee, Vincent A machine learning approach to link prediction for interlinked documents 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [127]
This paper provides an explanation to how a recently developed machine learning approach, namely the Probability Measure Graph Self-Organizing} Map (PM-GraphSOM) can be used for the generation of links between referenced or otherwise interlinked documents. This new generation of SOM} models are capable of projecting generic graph structured data onto a fixed sized display space. Such a mechanism is normally used for dimension reduction, visualization, or clustering purposes. This paper shows that the PM-GraphSOM} training algorithm inadvertently" encodes relations that exist between the atomic elements in a graph. If the nodes in the graph represent documents and the links in the graph represent the reference (or hyperlink) structure of the documents then it is possible to obtain a set of links for a test document whose link structure is unknown. A significant finding of this paper is that the described approach is scalable in that links can be extracted in linear time. It will also be shown that the proposed approach is capable of predicting the pages which would be linked to a new document and is capable of predicting the links to other documents from a given test document. The approach is applied to web pages from Wikipedia a relatively large XML} text database consisting of many referenced documents. 2010 Springer-Verlag} Berlin Heidelberg."
Kimelfeid, Benny; Kovacs, Eitan; Sagiv, Yehoshua & Yahav, Dan Using language models and the HITS algorithm for XML retrieval 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007
Our submission to the INEX} 2006 Ad-hoc retrieval track is described. We study how to utilize the Wikipedia structure (XML} documents with hyperlinks) by combining XML} and Web retrieval. In particular, we experiment with different combinations of language models and the {HITS} algorithm. An important feature of our techniques is a filtering phase that identifies the relevant part of the corpus, prior to the processing of the actual XML} elements. We analyze the effect of the above techniques based on the results of our runs in INEX} 2006. Springer-Verlag} Berlin Heidelberg 2007.
Kiritani, Yusuke; Ma, Qiang & Yoshikawa, Masatoshi Classifying web pages by using knowledge bases for entity retrieval 20th International Conference on Database and Expert Systems Applications, DEXA 2009, August 31, 2009 - September 4, 2009 Linz, Austria 2009 [128]
In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC} graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet.} Finally, by analyzing the PEC} graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper. 2009 Springer Berlin Heidelberg.
Kirtsis, Nikos; Stamou, Sofia; Tzekou, Paraskevi & Zotos, Nikos Information uniqueness in Wikipedia articles 6th International Conference on Web Information Systems and Technologies, WEBIST 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010
Wikipedia is one of the most successful worldwide collaborative efforts to put together user generated content in a meaningfully organized and intuitive manner. Currently, Wikipedia hosts millions of articles on a variety of topics, supplied by thousands of contributors. A critical factor in Wikipedia's success is its open nature, which enables everyone edit, revise and /or question (via talk pages) the article contents. Considering the phenomenal growth of Wikipedia and the lack of a peer review process for its contents, it becomes evident that both editors and administrators have difficulty in validating its quality on a systematic and coordinated basis. This difficulty has motivated several research works on how to assess the quality of Wikipedia articles. In this paper, we propose the exploitation of a novel indicator for the Wikipedia articles' quality, namely information uniqueness. In this respect, we describe a method that captures the information duplication across the article contents in an attempt to infer the amount of distinct information every article communicates. Our approach relies on the intuition that an article offering unique information about its subject is of better quality compared to an article that discusses issues already addressed in several other Wikipedia articles.
Kisilevich, Slava; Mansmann, Florian; Bak, Peter; Keim, Daniel & Tchaikin, Alexander Where would you go on your next vacation? A framework for visual exploration of attractive places 2nd International Conference on Advanced Geographic Information Systems, Applications, and Services, GEOProcessing 2010, February 10, 2010 - February 16, 2010 St. Maarten, Netherlands 2010 [129]
Tourists face a great challenge when they gather information about places they want to visit. Geographically tagged information in the form of Wikipedia pages, local tourist information pages, dedicated web sites and the massive amount of information provided by Google Earth is publicly available and commonly used. But the processing of this information involves a time consuming activity. Our goal is to make search for attractive places simpler for the common user and provide researchers with methods for exploration and analysis of attractive areas. We assume that an attractive place is characterized by large amounts of photos taken by many people. This paper presents a framework in which we demonstrate a systematic approach for visualization and exploration of attractive places as a zoomable information layer. The presented technique utilizes density-based clustering of image coordinates and smart color scaling to produce an interactive visualizations using Google Earth Mashup1. We show that our approach can be used as a basis for detailed analysis of attractive areas. In order to demonstrate our method, we use real-world geo-tagged photo data obtained from Flickr2 and Panoramio3 to construct interactive visualizations of virtually every region of interest in the world. ""
Kittur, Aniket; Suh, Bongwon; Pendleton, Bryan A. & Chi, Ed H. He says, she says: Conflict and coordination in Wikipedia 25th SIGCHI Conference on Human Factors in Computing Systems 2007, CHI 2007, April 28, 2007 - May 3, 2007 San Jose, CA, United states 2007 [130]
Wikipedia, a wiki-based encyclopedia, has become one of the most successful experiments in collaborative knowledge building on the Internet. As Wikipedia continues to grow, the potential for conflict and the need for coordination increase as well. This article examines the growth of such non-direct work and describes the development of tools to characterize conflict and coordination costs in Wikipedia. The results may inform the design of new collaborative knowledge systems. ""
Kiyota, Yoji; Nakagawa, Hiroshi; Sakai, Satoshi; Mori, Tatsuya & Masuda, Hidetaka Exploitation of the Wikipedia category system for enhancing the value of LCSH 2009 ACM/IEEE Joint Conference on Digital Libraries, JCDL'09, June 15, 2009 - June 19, 2009 Austin, TX, United states 2009 [131]
This paper addresses an approach that integrates two different types of information resources: the Web and libraries. Our method begins from any keywords in Wikipedia, and induces related subject headings of LCSH} through the Wikipedia category system.
Koolen, Marijn & Kamps, Jaap What's in a link? from document importance to topical relevance 2nd International Conference on the Theory of Information Retrieval, ICTIR 2009, September 10, 2009 - September 12, 2009 Cambridge, United kingdom 2009 [132]
Web information retrieval is best known for its use of the Web's link structure as a source of evidence. Global link evidence is by nature query-independent, and is therefore no direct indicator of the topical relevance of a document for a given search request. As a result, link information is usually considered to be useful to identify the 'importance' of documents. Local link evidence, in contrast, is query-dependent and could in principle be related to the topical relevance. We analyse the link evidence in Wikipedia using a large set of ad hoc retrieval topics and relevance judgements to investigate the relation between link evidence and topical relevance. 2009 Springer Berlin Heidelberg.
Koolen, Marijn; Kaptein, Rianne & Kamps, Jaap Focused search in books and wikipedia: Categories, links and relevance feedback 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [133]
In this paper we describe our participation in INEX} 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet} categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet} categories, and Wikipedia categories in combination with relevance feedback lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet} categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking. 2010 Springer-Verlag} Berlin Heidelberg.
Kriplean, Travis; Beschastnikh, Ivan; McDonald, David W. & Golder, Scott A. Community, consensus, coercion, control: CS*W or how policy mediates mass participation 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states 2007 [134]
When large groups cooperate, issues of conflict and control surface because of differences in perspective. Managing such diverse views is a persistent problem in cooperative group work. The Wikipedian community has responded with an evolving body of policies that provide shared principles, processes, and strategies for collaboration. We employ a grounded approach to study a sample of active talk pages and examine how policies are employed as contributors work towards consensus. Although policies help build a stronger community, we find that ambiguities in policies give rise to power plays. This lens demonstrates that support for mass collaboration must take into account policy and power. ""
Kuribara, Shusuke; Abbas, Safia & Sawamura, Hajime Applying the logic of multiple-valued argumentation to social web: SNS and wikipedia 11th Pacific Rim International Conference on Multi-Agents, PRIMA 2008, December 15, 2008 - December 16, 2008 Hanoi, Viet nam 2008 [135]
The Logic of Multiple-Valued} Argumentation (LMA) is an argumentation framework that allows for argument-based reasoning about uncertain issues under uncertain knowledge. In this paper, we describe its applications to Social Web: SNS} and Wikipedia. They are said to be the most influential social Web applications to the present and future information society. For SNS, we present an agent that judges the registration approval for Mymixi in mixi in terms of LMA.} For Wikipedia, we focus on the deletion problem of Wikipedia and present agents that argue about the issue on whether contributed articles should be deleted or not, analyzing arguments proposed for deletion in terms of LMA.} These attempts reveal that LMA} can deal with not only potential applications but also practical ones such as extensive and contemporary applications. 2008 Springer Berlin Heidelberg.
Kusrsten, Jens; Richter, Daniel & Eibl, Maximilian VideoCLEF 2008: ASR classification with wikipedia categories 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [136]
This article describes our participation at the VideoCLEF} track. We designed and implemented a prototype for the classification of the Video ASR} data. Our approach was to regard the task as text classification problem. We used terms from Wikipedia categories as training data for our text classifiers. For the text classification the Naive-Bayes} and KNN} classifier from the WEKA} toolkit were used. We submitted experiments for classification task 1 and 2. For the translation of the feeds to English (translation task) Google's AJAX} language API} was used. Although our experiments achieved only low precision of 10 to 15 percent, we assume those results will be useful in a combined setting with the retrieval approach that was widely used. Interestingly, we could not improve the quality of the classification by using the provided metadata. 2009 Springer Berlin Heidelberg.
Kutty, Sangeetha; Tran, Tien; Nayak, Richi & Li, Yuefeng Clustering XML documents using closed frequent subtrees: A structural similarity approach 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [137]
This paper presents the experimental study conducted over the INEX} 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML} documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progressively cluster the XML} documents. In spite of the large number of documents in INEX} 2007 Wikipedia dataset, the proposed frequent subtree-based incremental clustering approach was successful in clustering the documents. 2008 Springer-Verlag} Berlin Heidelberg.
Lahti, Lauri Personalized learning paths based on wikipedia article statistics 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010
We propose a new semi-automated method for generating personalized learning paths from the Wikipedia online encyclopedia by following inter-article hyperlink chains based on various rankings that are retrieved from the statistics of the articles. Alternative perspectives for learning topics are achieved when the next hyperlink to access is selected based on hierarchy of hyperlinks, repetition of hyperlink terms, article size, viewing rate, editing rate, or user-defined weighted mixture of them all. We have implemented the method in a prototype enabling the learner to build independently concept maps following her needs and consideration. A list of related concepts is shown in a desired type of ranking to label new nodes (titles of target articles for current hyperlinks) accompanied with parsed explanation phrases from the sentences surrounding each hyperlink to label directed arcs connecting nodes. In experiments the alternative ranking schemes well supported various learning needs suggesting new pedagogical networking practices.
Lahti, Lauri & Tarhio, Jorma Semi-automated map generation for concept gaming Computer Graphics and Visualization 2008 and Gaming 2008: Design for Engaging Experience and Social Interaction 2008, MCCSIS'08 - IADIS Multi Conference on Computer Science and Information Systems, July 22, 2008 - July 27, 2008 Amsterdam, Netherlands 2008
Conventional learning games have often limited flexibility to address individual needs of a learner. The concept gaming approach provides a frame for handling conceptual structures that are defined by a concept map. A single concept map can be used to create many alternative games and these can be chosen so that personal learning goals can be taken well into account. However, the workload of creating new concept maps and sharing them effectively seems to easily hinder adoption of concept gaming. We now propose a new semi-automated map generation method for concept gaming. Due to fast increase in the open access knowledge available in the Web, the articles of the Wikipedia encyclopedia were chosen to serve as a source for concept map generation. Based on a given entry name the proposed method produces hierarchical concept maps that can be freely explored and modified. Variants of this approach could be successfully implemented in the wide range of educational tasks. In addition, ideas for further development of concept gaming are proposed.
Lam, Shyong K. & Riedl, John Is Wikipedia growing a longer tail? 2009 ACM SIGCHI International Conference on Supporting Group Work, GROUP'09, May 10, 2009 - May 13, 2009 Sanibel Island, FL, United states 2009 [138]
Wikipedia has millions of articles, many of which receive little attention. One group of Wikipedians believes these obscure entries should be removed because they are uninteresting and neglected; these are the deletionists. Other Wikipedians disagree, arguing that this long tail of articles is precisely Wikipedia's advantage over other encyclopedias; these are the inclusionists. This paper looks at two overarching questions on the debate between deletionists and inclusionists: (1) What are the implications to the long tail of the evolving standards for article birth and death? (2) How is viewership affected by the decreasing notability of articles in the long tail? The answers to five detailed research questions that are inspired by these overarching questions should help better frame this debate and provide insight into how Wikipedia is evolving. ""
Lanamaki, Arto & Paivarinta, Tero Metacommunication patterns in online communities 3rd International Conference on Online Communities and Social Computing, OCSC 2009. Held as Part of HCI International 2009, July 19, 2009 - July 24, 2009 San Diego, CA, United states 2009 [139]
This paper discusses about contemporary literature on computer-mediated metacommunication and observes the phenomenon in two online communities. The results contribute by identifying six general-level patterns of how metacommunication refers to primary communication in online communities. A task-oriented, user-administrated, community (Wikipedia} in Finnish) involved a remarkable number of specialized metacommunication genres. In a centrally moderated discussion-oriented community (Patientslikeme), metacommunication was intertwined more with primary ad hoc communication. We suggest that a focus on specialized metacommunication genres may appear useful in online communities. However, room for ad hoc (meta)communication is needed as well, as it provides a basis for user-initiated community development. 2009 Springer Berlin Heidelberg.
Laroslaw, Kuchta Passing from requirements specification to class model using application domain ontology 2010 2nd International Conference on Information Technology, ICIT 2010, June 28, 2010 - June 30, 2010 Gdansk, Poland 2010
The quality of a classic software engineering process depends on the completeness of project documents and on the inter-phase consistency. In this paper, a method for passing from the requirement specification to the class model is proposed. First, a developer browses the text of the requirements, extracts the word sequences, and places them as terms into the glossary. Next, the internal ontology logic for the glossary needs to be elaborated. External ontology sources, as Wikipedia or domain ontology services, may be used to support this stage. At the end, the newly built ontology is transformed to the class model. The whole process may be supported with semi-automated, interactive tools. The result should be the class model with better completeness and consistency than using traditional methods.
Larsen, Jakob Eg; Halling, Sren; Sigurosson, Magnus & Hansen, Lars Kai MuZeeker: Adapting a music search engine for mobile phones Mobile Multimedia Processing - Fundamentals, Methods, and Applications Tiergartenstrasse 17, Heidelberg, D-69121, Germany 2010 [140]
{{{2}}}
Larson, Martha; Newman, Eamonn & Jones, Gareth J. F. Overview of videoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [141]
The VideoCLEF} track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF} piloted the Vid2RSS} task, whose main subtask was the classification of dual language video (Dutch-language} television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and ten thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF} track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and K-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a Non-Dutch} speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF} will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks. 2009 Springer Berlin Heidelberg.
Le, Qize & Panchal, Jitesh H. Modeling the effect of product architecture on mass collaborative processes - An agent-based approach 2009 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC2009, August 30, 2009 - September 2, 2009 San Diego, CA, United states 2010
Traditional product development efforts are based on well-structured and hierarchical product development teams. The products are systematically decomposed into subsystems that are designed by dedicated teams with well-defined information flows. Recently, a new product development approach called Mass Collaborative Product Development (MCPD) has emerged. The fundamental difference between a traditional product development process and a MCPD} process is that the former is based on top-down decomposition while the latter is based on evolution and self-organization. The paradigm of MCPD} has resulted in highly successful products such as Wikipedia, Linux and Apache. Despite the success of various projects using MCPD, it is not well understood how the product architecture affects the evolution of products developed using such processes. To address this gap, an agent-based model to study MCPD} processes is presented in this paper. Through this model, the effect of product architectures on the product evolution is studied. The model is executed for different architectures ranging from slot architecture to bus architecture and the rates of product evolution are determined. The simulation-based approach allows us to study how the degree of modularity of products affects the evolution time of products and different modules in the MCPD} processes. The methodology is demonstrated using an illustrative example of mobile phones. This approach provides a simple and intuitive way to study the effects of product architecture on the MCPD} processes. It is helpful in determining the best strategies for product decomposition and identifying the product architectures that are suitable for the MCPD processes.
Le, Minh-Tam; Dang, Hoang-Vu; Lim, Ee-Peng & Datta, Anwitaman WikiNetViz: Visualizing friends and adversaries in implicit social networks IEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008, June 17, 2008 - June 20, 2008 Taipei, Taiwan 2008 [142]
When multiple users with diverse backgrounds and beliefs edit Wikipedia together, disputes often arise due to disagreements among the users. In this paper, we introduce a novel visualization tool known as WikiNetViz} to visualize and analyze disputes among users in a dispute-induced social network. WikiNetViz} is designed to quantify the degree of dispute between a pair of users using the article history. Each user (and article) is also assigned a controversy score by our proposed ControversyRank} model so as to measure the degree of controversy of a user (and an article) by the amount of disputes between the user (article) and other users in articles of varying degrees of controversy. On the constructed social network, WikiNetViz} can perform clustering so as to visualize the dynamics of disputes at the user group level. It also provides an article viewer for examining an article revision so as to determine the article content modified by different users. ""
Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo FolksoViz: A semantic relation-based folksonomy visualization using the Wikipedia corpus 10th ACIS Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2009, In conjunction with IWEA 2009 and WEACR 2009, May 27, 2009 - May 29, 2009 Daegu, Korea, Republic of 2009 [143]
Tagging is one of the most popular services in Web 2.0 and folksonomy is a representation of collaborative tagging. Tag cloud has been the one and only visualization of the folksonomy. The tag cloud, however, provides no information about the relations between tags. In this paper, targeting del.icio.us tag data, we propose a technique, FolksoViz, for automatically deriving semantic relations between tags and for visualizing the tags and their relations. In order to find the equivalence, subsumption, and similarity relations, we apply various rules and models based on the Wikipedia corpus. The derived relations are visualized effectively. The experiment shows that the FolksoViz} manages to find the correct semantic relations with high accuracy. ""
Lee, Kangpyo; Kim, Hyunwoo; Shin, Hyopil & Kim, Hyoung-Joo Tag sense disambiguation for clarifying the vocabulary of social tags 2009 IEEE International Conference on Social Computing, SocialCom 2009, August 29, 2009 - August 31, 2009 Vancouver, BC, Canada 2009 [144]
Tagging is one of the most popular services in Web 2.0. As a special form of tagging, social tagging is done collaboratively by many users, which forms a so-called folksonomy. As tagging has become widespread on the Web, the tag vocabulary is now very informal, uncontrolled, and personalized. For this reason, many tags are unfamiliar and ambiguous to users so that they fail to understand the meaning of each tag. In this paper, we propose a tag sense disambiguating method, called Tag Sense Disambiguation (TSD), which works in the social tagging environment. TSD} can be applied to the vocabulary of social tags, thereby enabling users to understand the meaning of each tag through Wikipedia. To find the correct mappings from del.icio.us tags to Wikipedia articles, we define the Local )eighbor tags, the Global )eighbor tags, and finally the )eighbor tags that would be the useful keywords for disambiguating the sense of each tag based on the tag co-occurrences. The automatically built mappings are reasonable in most cases. The experiment shows that TSD} can find the correct mappings with high accuracy. ""
Lees-Miller, John; Anderson, Fraser; Hoehn, Bret & Greiner, Russell Does Wikipedia information help Netflix predictions? 7th International Conference on Machine Learning and Applications, ICMLA 2008, December 11, 2008 - December 13, 2008 San Diego, CA, United states 2008 [145]
We explore several ways to estimate movie similarity from the free encyclopedia Wikipedia with the goal of improving our predictions for the Netflix Prize. Our system first uses the content and hyperlink structure of Wikipedia articles to identify similarities between movies. We then predict a user's unknown ratings by using these similarities in conjunction with the user's known ratings to initialize matrix factorization and K-Nearest} Neighbours algorithms. We blend these results with existing ratings-based predictors. Finally, we discuss our empirical results, which suggest that external Wikipedia data does not significantly improve the overall prediction accuracy. ""
Lehtonen, Miro & Doucet, Antoine Phrase detection in the Wikipedia 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [146]
The Wikipedia XML} collection turned out to be rich of marked-up phrases as we carried out our INEX} 2007 experiments. Assuming that a phrase occurs at the inline level of the markup, we were able to identify over 18 million phrase occurrences, most of which were either the anchor text of a hyperlink or a passage of text with added emphasis. As our IR} system - EXTIRP} - indexed the documents, the detected inline-level elements were duplicated in the markup with two direct consequences: 1) The frequency of the phrase terms increased, and 2) the word sequences changed. Because the markup was manipulated before computing word sequences for a phrase index, the actual multi-word phrases became easier to detect. The effect of duplicating the inline-level elements was tested by producing two run submissions in ways that were similar except for the duplication. According to the official INEX} 2007 metric, the positive effect of duplicated phrases was clear. 2008 Springer-Verlag} Berlin Heidelberg.
Lehtonen, Miro & Doucet, Antoine EXTIRP: Baseline retrieval from Wikipedia 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007
The Wikipedia XML} documents are considered an interesting challenge to any XML} retrieval system that is capable of indexing and retrieving XML} without prior knowledge of the structure. Although the structure of the Wikipedia XML} documents is highly irregular and thus unpredictable, EXTIRP} manages to handle all the well-formed XML} documents without problems. Whether the high flexibility of EXTIRP} also implies high performance concerning the quality of IR} has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML} documents that EXTIRP} is expected to index. The most interesting question stemming from our results is about the line between high-quality XML} markup which aids accurate IR} and noisy XML} spam" that misleads flexible XML} search engines. Springer-Verlag} Berlin Heidelberg 2007."
Leong, Peter; Siak, Chia Bin & Miao, Chunyan Cyber engineering co-intelligence digital ecosystem: The GOFASS methodology 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, DEST '09, June 1, 2009 - June 3, 2009 Istanbul, Turkey 2009 [147]
Co-intelligence, also known as collective or collaborative intelligence, is the harnessing of human knowledge and intelligence that allows groups of people to act together in ways that seem to be intelligent. Co-intelligence Internet applications such as Wikipedia are the first steps toward developing digital ecosystems that support collective intelligence. Peer-to-peer (P2P) systems are well fitted to Co-Intelligence} digital ecosystems because they allow each service client machine to act also as a service provider without any central hub in the network of cooperative relationships. However, dealing with server farms, clusters and meshes of wireless edge devices will be the norm in the next generation of computing; but most present P2P} system had been designed with a fixed, wired infrastructure in mind. This paper proposes a methodology for cyber engineering an intelligent agent mediated co-intelligence digital ecosystems. Our methodology caters for co-intelligence digital ecosystems with wireless edge devices working with service-oriented information servers. ""
Li, Bing; Chen, Qing-Cai; Yeung, Daniel S.; Ng, Wing W.Y. & Wang, Xiao-Long Exploring wikipedia and query log's ability for text feature representation 6th International Conference on Machine Learning and Cybernetics, ICMLC 2007, August 19, 2007 - August 22, 2007 Hong Kong, China 2007 [148]
The rapid increase of internet technology requires a better management of web page contents. Many text mining researches has been conducted, like text categorization, information retrieval, text clustering. When machine learning methods or statistical models are applied to such a large scale of data, the first step we have to solve is to represent a text document into the way that computers could handle. Traditionally, single words are always employed as features in Vector Space Model, which make up the feature space for all text documents. The single-word based representation is based on the word independence and doesn't consider their relations, which may cause information missing. This paper proposes Wiki-Query} segmented features to text classification, in hopes of better using the text information. The experiment results show that a much better F1 value has been achieved than that of classical single-word based text representation. This means that Wikipedia and query segmented feature could better represent a text document. ""
Li, Yun; Huang, Kaiyan; Ren, Fuji & Zhong, Yixin Searching and computing for vocabularies with semantic correlations from Chinese Wikipedia China-Ireland International Conference on Information and Communications Technologies, CIICT 2008, September 26, 2008 - September 28, 2008 Beijing, China 2008 [149]
This paper introduces experiment on searching for semantically correlated vocabularies in Chinese Wikipedia pages and computing semantic correlations. Based on the 54,745 structured documents generated from Wikipedia pages, we explore about 400,000 pairs of Wikipedia vocabularies considering of hyperlinks, overlapped text and document positions. Semantic relatedness is calculated based on the relatedness of Wikipedia documents. From comparing experiment we analyze the reliability of our measures and some other properties.
Lian, Li; Ma, Jun; Lei, JingSheng; Song, Ling & Liu, LeBo Automated construction Chinese domain ontology from Wikipedia 4th International Conference on Natural Computation, ICNC 2008, October 18, 2008 - October 20, 2008 Jinan, China 2008 [150]
Wikipedia (Wiki) is a collaborative on-line encyclopedia, where web users are able to share their knowledge about a certain topic. How to make use of the rich knowledge in the Wiki is a big challenge. In this paper we propose a method to construct domain ontology from the Chinese Wiki automatically. The main Idea in this paper is based on the entry segmenting and Feature Text (FT) extracting, where we segment the name of entries and establish the concept hierarchy firstly. Secondly, we extract the FTs} from the descriptions of entries to eliminate the redundant information. Finally we calculate the similarity between pairs of FTs} to revise the concept hierarchy and gain non-taxonomy relations between concepts. The primary experiment indicates that our method is useful for Chinese domain ontology construction. ""
Liang, Chia-Kai; Hsieh, Yu-Ting; Chuang, Tien-Jung; Wang, Yin; Weng, Ming-Fang & Chuang, Yung-Yu Learning landmarks by exploiting social media 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [151]
This paper introduces methods for automatic annotation of landmark photographs via learning textual tags and visual features of landmarks from landmark photographs that are appropriately location-tagged from social media. By analyzing spatial distributions of text tags from Flickr's geotagged photos, we identify thousands of tags that likely refer to landmarks. Further verification by utilizing Wikipedia articles filters out non-landmark tags. Association analysis is used to find the containment relationship between landmark tags and other geographic names, thus forming a geographic hierarchy. Photographs relevant to each landmark tag were retrieved from Flickr and distinctive visual features were extracted from them. The results form ontology for landmarks, including their names, equivalent names, geographic hierarchy, and visual features. We also propose an efficient indexing method for content-based landmark search. The resultant ontology could be used in tag suggestion and content-relevant re-ranking. 2010 Springer-Verlag} Berlin Heidelberg.
Lim, Ee-Peng; Kwee, Agus Trisnajaya; Ibrahim, Nelman Lubis; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Maureen Visualizing and exploring evolving information networks in Wikipedia 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [152]
Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community's knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia's information networks. SSNetViz+} supports time-based network browsing, content browsing and search. Using a terrorism information network as an example, we show that different timestamped versions of the network can be interactively explored. As information networks in Wikipedia are created and maintained by collaborative editing efforts, the edit activity data are also shown to help detecting interesting events that may have happened to the network. SSNetViz+} also supports temporal queries that allow other relevant nodes to be added so as to expand the network being analyzed. ""
Lim, Ee-Peng; Wang, Z.; Sadeli, D.; Li, Y.; Chang, Chew-Hung; Chatterjea, Kalyani; Goh, Dion Hoe-Lian; Theng, Yin-Leng; Zhang, Jun & Sun, Aixin Integration of Wikipedia and a geography digital library 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan 2006
In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal} and Wikipedia to meet the integration requirements. Springer-Verlag} Berlin Heidelberg 2006.
Linna, Li The design of semantic web services discovery model based on multi proxy 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, November 20, 2009 - November 22, 2009 Shanghai, China 2009 [153]
Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and proxys. In this paper we propose a model for semantic Web service discovery based on semantic Web services and FIPA} multi proxys. This paper provides a broker which provides semantic interoperability between semantic Web service provider and proxys by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} for FIPA} multi proxys. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""
Lintean, Mihai; Moldovan, Cristian; Rus, Vasile & McNamara, Danielle The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis 23rd International Florida Artificial Intelligence Research Society Conference, FLAIRS-23, May 19, 2010 - May 21, 2010 Daytona Beach, FL, United states 2010
In this paper, we investigate the impact of several local and global weighting schemes on Latent Semantic Analysis' (LSA) ability to capture semantic similarity between two texts. We worked with texts varying in size from sentences to paragraphs. We present a comparison of 3 local and 3 global weighting schemes across 3 different standardized data sets related to semantic similarity tasks. For local weighting, we used binary weighting, term-frequency, and log-type. For global weighting, we relied on binary, inverted document frequencies (IDF) collected from the English Wikipedia, and entropy, which is the standard weighting scheme used by most LSA-based} applications. We studied all possible combinations of these weighting schemes on the following three tasks and corresponding data sets: paraphrase identification at sentence level using the Microsoft Research Paraphrase Corpus, paraphrase identification at sentence level using data from the intelligent tutoring system ISTART, and mental model detection based on student-articulated paragraphs in MetaTutor, another intelligent tutoring system. Our experiments revealed that for sentence-level texts a combination of type frequency local weighting in combination with either IDF} or binary global weighting works best. For paragraph-level texts, a log-type local weighting in combination with binary global weighting works best. We also found that global weights have a greater impact for sententence-level similarity as the local weight is undermined by the small size of such texts. Copyright 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Liu, Changxin; Chen, Huijuan; Tan, Yunlan & Wu, Lanying The design of e-Learning system based on semantic wiki and multi-agent 2nd International Workshop on Education Technology and Computer Science, ETCS 2010, March 6, 2010 - March 7, 2010 Wuhan, Hubei, China 2010 [154]
User interactions and social networks are based on web2.0, the well-known application are blogs, Wikis, and image/video sharing sites. They have dramatically increased sharing and participation among web users. Knowledge was collected and information was shared using social software. Wikipedia is a successful example of web technology. It has helped knowledge-sharing between people. User can freely create and modify its content, but Wikipedia cannot understand its content. This problem has solved by semantic Wiki. The E-Learning system has been designed based on semantic Wiki and multi-agent. It can help us to implement a distributed learning resource discovery and individualized service. The prototype is of efficient navigation and search. ""
Liu, Qiaoling; Xu, Kaifeng; Zhang, Lei; Wang, Haofen; Yu, Yong & Pan, Yue Catriple: Extracting triples from wikipedia categories 3rd Asian Semantic Web Conference, ASWC 2008, December 8, 2008 - December 11, 2008 Bangkok, Thailand 2008 [155]
As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about {10M} triples with a 12-level confidence ranging from 47.0\% to 96.4\%, which cover 78.2\% of Wikipedia articles. Among them, {1.27M} triples have confidence of 96.4\%. Applications can on demand use the triples with suitable confidence. 2008 Springer Berlin Heidelberg.
Lu, Zhiqiang; Shao, Werimin & Yu, Zhenhua Measuring semantic similarity between words using wikipedia 2009 International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China 2009 [156]
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and information Retrieval (IR).} This paper presents a new Web-based method for measuring the semantic similarity between words. Different from other methods which are based on taxonomy or Search engine in Internet, our method uses snippets from Wikipedia1 to calculate the semantic similarity between words by using cosine similarity and TF-IDF.} Also, the stemmer algorithm and stop words are used in preprocessing the snippets from Wikipedia. We set different threshold to evaluate our results in order to decrease the interference from noise and redundancy. Our method was empirically evaluated using Rubenstein-Goodenough} benchmark dataset. It gives higher correlation value (with 0.615) than some existing methods. Evaluation results show that our method improves accuracy and more robust for measuring semantic similarity between words. ""
Lukosch, Stephan & Leisen, Andrea Comparing and merging versioned wiki pages 4th International Conference on Web Information Systems and Technologies, WEBIST 2008, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2009 [157]
Collaborative web-based applications support users when creating and sharing information. Wikis are prominent examples for that kind of applications. Wikis, like e.g. Wikipedia [1], attract loads of users that modify its content. Normally, wikis do not employ any mechanisms to avoid parallel modification of the same page. As result, conflicting changes can occur. Most wikis record all versions of a page to allow users to review recent changes. However, just recording all versions does not guarantee that conflicting modifications are reflected in the most recent version of a page. In this paper, we identify the requirements for efficiently dealing with conflicting modifications and present a web-based tool which allows to compare and merge different versions of a wiki page. 2009 Springer Berlin Heidelberg.
Lukosch, Stephan & Leisen, Andrea Dealing with conflicting modifications in a Wiki WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2008
Collaborative web-based applications support users when creating and sharing information. Wikis are prominent examples for that kind of applications. Wikis, like e.g. Wikipedia (Wikipedia, 2007), attract loads of users that modify its content. Normally, wikis do not employ any mechanisms to avoid parallel modification of the same page. As result, conflicting changes can occur. Most wikis record all versions of a page to allow users to review recent changes. However, just recording all versions does not guarantee that conflicting modifications are reflected in the most recent version of a page. In this paper, we identify the requirements for efficiently dealing with conflicting modifications and present a web-based tool which allows to compare and merge different versions of a wiki page.
Mansour, Osama Group Intelligence: A distributed cognition perspective International Conference on Intelligent Networking and Collaborative Systems, INCoS 2009, November 4, 2009 - November 6, 2009 Barcelona, Spain 2009 [158]
The question of whether intelligence can be attributed to groups or not has been raised in many scientific disciplines. In the field of computer-supported collaborative learning, this question has been examined to understand how computer-mediated environments can augment human cognition and learning on a group level. The era of social computing which represents the emergence of Web 2.0 collaborative technologies and social media has stimulated a wide discussion about collective intelligence and the global brain. This paper reviews the theory of distributed cognition in the light of these concepts in an attempt to analyze and understand the emergence process of intelligence that takes place in the context of computer-mediated collaborative and social media environments. It concludes by showing that the cognitive organization, which occurs within social interactions serves as a catalyst for intelligence to emerge on a group level. Also a process model has been developed to show the process of collaborative knowledge construction in Wikipedia that characterizes such cognitive organization. ""
Mataoui, M'hamed; Boughanem, Mohand & Mezghiche, Mohamed Experiments on PageRank algorithm in the XML information retrieval context 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [159]
In this paper we present two adaptations of the PageRank} algorithm to collections of XML} documents and the experimental results obtained for the Wikipedia collection used at INEX1} 2007. These adaptations to which we referred as DOCRANK} and TOPICAL-docrank"} allow the re-rank of the results returned by the base run execution to improve retrieval quality. Our experiments are performed on the results returned by the three best ranked systems in the {"Focused"} task of INEX} 2007. Evaluations have shown improvements in the quality of retrieval results (improvement of some topics is very significant eg: topic 491 topic 521 etc.). The best improvement achieved in the results returned by the DALIAN2} university system (global rate obtained for the 107 topics of INEX} 2007) was about 3.78\%. """
Maureen; Sun, Aixin; Lim, Ee-Peng; Datta, Anwitaman & Chang, Kuiyu On visualizing heterogeneous semantic networks from multiple data sources 11th International Conference on Asian Digital Libraries, ICADL 2008, December 2, 2008 - December 5, 2008 Bali, Indonesia 2008 [160]
In this paper, we focus on the visualization of heterogeneous semantic networks obtained from multiple data sources. A semantic network comprising a set of entities and relationships is often used for representing knowledge derived from textual data or database records. Although the semantic networks created for the same domain at different data sources may cover a similar set of entities, these networks could also be very different because of naming conventions, coverage, view points, and other reasons. Since digital libraries often contain data from multiple sources, we propose a visualization tool to integrate and analyze the differences among multiple social networks. Through a case study on two terrorism-related semantic networks derived from Wikipedia and Terrorism Knowledge Base (TKB) respectively, the effectiveness of our proposed visualization tool is demonstrated. 2008 Springer Berlin Heidelberg.
Minier, Zsolt; Bodo, Zalan & Csato, Lehel Wikipedia-based Kernels for text categorization 9th International Symposium on Symbolic and Numeric lgorithms for Scientific Computing, SYNASC 2007, September 26, 2007 - September 29, 2007 Timisoara, Romania 2007 [161]
In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch [6], we incorporate Wikipedia articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of features is needed. We use Latent Semantic Analysis (LSA), Kernel Principal Component Analysis (KPCA) and Kernel Canonical Correlation Analysis (KCCA) and present results for these experiments on the Reuters corpus. ""
Mishra, Surjeet & Ghosh, Hiranmay Effective visualization and navigation in a multimedia document collection using ontology 3rd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2009, December 16, 2009 - December 20, 2009 New Delhi, India 2009 [162]
We present a novel user interface for visualizing and navigating in a multimedia document collection. Domain ontology has been used to depict the background knowledge organization and map the multimedia information nodes on that knowledge map, thereby making the implicit knowledge organization in a collection explicit. The ontology is automatically created by analyzing the links in Wikipedia, and is delimited to tightly cover the information nodes in the collection. We present an abstraction of the knowledge map for creating a clear and concise view, which can be progressively 'zoomed in' or 'zoomed out' to navigate the knowledge space. We organize the graph based on mutual similarity scores between the nodes for aiding the cognitive process during navigation. 2009 Springer-Verlag} Berlin Heidelberg.
Missen, Malik Muhammad Saad; Boughanem, Mohand & Cabanac, Guillaume Using passage-based language model for opinion detection in blogs 25th Annual ACM Symposium on Applied Computing, SAC 2010, March 22, 2010 - March 26, 2010 Sierre, Switzerland 2010 [163]
In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4). ""
Mlgaard, Lasse L.; Larsen, Jan & Goutte, Cyril Temporal analysis of text data using latent variable models Machine Learning for Signal Processing XIX - 2009 IEEE Signal Processing Society Workshop, MLSP 2009, September 2, 2009 - September 4, 2009 Grenoble, France 2009 [164]
Detecting and tracking of temporal data is an important task in multiple applications. In this paper we study temporal text mining methods for Music Information Retrieval. We compare two ways of detecting the temporal latent semantics of a corpus extracted from Wikipedia, using a stepwise Probabilistic Latent Semantic Analysis (PLSA) approach and a global multiway PLSA} method. The analysis indicates that the global analysis method is able to identify relevant trends which are difficult to get using a step-by-step approach. Furthermore we show that inspection of PLSA} models with different number of factors may reveal the stability of temporal clusters making it possible to choose the relevant number of factors. ""
Mohammadi, Mehdi & GhasemAghaee, Nasser Building bilingual parallel corpora based on wikipedia 2nd International Conference on Computer Engineering and Applications, ICCEA 2010, March 19, 2010 - March 21, 2010 Indonesia 2010 [165]
Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian-English} sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs. ""
Morgan, Jonathan T.; Derthick, Katie; Ferro, Toni; Searle, Elly; Zachry, Mark & Kriplean, Travis Formalization and community investment in wikipedia's regulating texts: The role of essays 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [166]
This poster presents ongoing research on how discursive and editing behaviors are regulated on Wikipedia by means of documented rules and practices. Our analysis focuses on three types of collaboratively-created policy document (policies, guidelines and essays), that have been formalized to different degrees and represent different degrees of community investment. We employ a content analysis methodology to explore how these regulating texts differ according to a) the aspects of editor behavior, content standards and community principles that they address, and b) how they are used by Wikipedians engaged in 'talk' page discussions to inform, persuade and coordinate with one another. ""
Mozina, Martin; Giuliano, Claudio & Bratko, Ivan Argument based machine learning from examples and text 2009 1st Asian Conference on Intelligent Information and Database Systems, ACIIDS 2009, April 1, 2009 - April 3, 2009 Dong Hoi, Viet nam 2009 [167]
We introduce a novel approach to cross-media learning based on argument based machine learning (ABML).} ABML} is a recent method that combines argumentation and machine learning from examples, and its main idea is to use arguments for some of the learning examples. Arguments are usually provided by a domain expert. In this paper, we present an alternative approach, where arguments used in ABML} are automatically extracted from text with a technique for relation extraction. We demonstrate and evaluate the approach through a case study of learning to classify animals by using arguments automatically extracted from Wikipedia. ""
Mulhem, Philippe & Chevallet, Jean-Pierre Use of language model, phrases and wikipedia forward links for INEX 2009 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [168]
We present in this paper the work of the Information Retrieval Modeling Group (MRIM) of the Computer Science Laboratory of Grenoble (LIG) at the INEX} 2009 Ad Hoc Track. Our aim this year was to twofold: first study the impact of extracted noun phrases taken in addition to words as terms, and second using forward links present in Wikipedia to expand queries. For the retrieval, we use a language model with Dirichlet smoothing on documents and/or doxels, and using an Fetch and Browse approach we select rank the results. Our best runs according to doxel evaluation get the first rank on the Thorough task, and according to the document evaluation we get the first rank for the Focused, Relevance in Context and Best in Context tasks. 2010 Springer-Verlag} Berlin Heidelberg.
Muller, Christof & Gurevych, Iryna Using Wikipedia and Wiktionary in domain-specific information retrieval 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [169]
The main objective of our experiments in the domain-specific track at CLEF} 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text} and SR-Word, based on semantic relatedness by comparing their performance to a statistical model as implemented by Lucene. We refer to Wikipedia article titles and Wiktionary word entries as concepts and map query and document terms to concept vectors which are then used to compute the document relevance. In the bilingual task, we translate the English topics into the document language, i.e. German, by using machine translation. For SR-Text, we alternatively perform the translation process by using cross-language links in Wikipedia, whereby the terms are directly mapped to concept vectors in the target language. The evaluation shows that the latter approach especially improves the retrieval performance in cases where the machine translation system incorrectly translates query terms. 2009 Springer Berlin Heidelberg.
Muller, Claudia; Meuthrath, Benedikt & Jeschke, Sabina Defining a universal actor content-element model for exploring social and information networks considering the temporal dynamic 2009 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2009, July 20, 2009 - July 22, 2009 Athens, Greece 2009 [170]
The emergence of the Social Web offers new opportunities for scientists to explore open virtual communities. Various approaches have appeared in terms of statistical evaluation, descriptive studies and network analyses, which pursue an enhanced understanding of existing mechanisms developing from the interplay of technical and social infrastructures. Unfortunately, at the moment, all these approaches are separate and no integrated approach exists. This gap is filled by our proposal of a concept which is composed of a universal description model, temporal network definitions, and a measurement system. The approach addresses the necessary interpretation of Social Web communities as dynamic systems. In addition to the explicated models, a software tool is briefly introduced employing the specified models. Furthermore, a scenario is used where an extract from the Wikipedia database shows the practical application of the software. ""
Murugeshan, Meenakshi Sundaram; Lakshmi, K. & Mukherjee, Saswati Exploiting negative categories and wikipedia structures for document classification ARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing, October 27, 2009 - October 28, 2009 Kottayam, Kerala, India 2009 [171]
This paper explores the effect of profile based method for classification of Wikipedia XML} documents. Our approach builds two profiles, exploiting the whole content, Initial Descriptions and links in the Wikipedia documents. For building profiles we use the negative category information which has shown to perform well for classifying unstructured texts. The performance of Cosine and Fractional Similarity metrics is also compared. The use of two classifiers and their weighted average improves the classification performance. ""
Nadamoto, Akiyo; Aramaki, Eiji; Abekawa, Takeshi & Murakami, Yohei Content hole search in community-type content using Wikipedia 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [172]
SNSs} and blogs, both of which are maintained by a community of people, have become popular in Web 2.0. We call these content as Community-type} content." This community is associated with the content and those who use or contribute to community-type content are considered as members of the community. Occasionally the members of a community do not understand the theme of the content from multiple viewpoints hence the amount of information is often insufficient. It is convenient to present the user missed information. In this way when Web 2.0 became popular the content on the Internet and type of users are changed. We believe that there is a need for next-generation search engines in Web 2.0. We require a search engine that can search for information users are unaware of; we call such information as "content holes." In this paper we propose a method for searching content holes in community-type content. We attempt to extract and represent content holes from discussions on SNSs} and blogs. Conventional Web search technique is generally based on similarities. On the other hand our content-hole search is a different search. In this paper we classify and represent a number of images for different searching methods; we define content holes and as the first step toward realizing our aim we propose a content-hole search system using Wikipedia. """
Nakabayashi, Takeru; Yumoto, Takayuki; Nii, Manabu; Takahashi, Yutaka & Sumiya, Kazutoshi Measuring peculiarity of text using relation between words on the web 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [173]
We define the peculiarity of text as a metric of information credibility. Higher peculiarity means lower credibility. We extract the theme word and the characteristic words from text and check whether there is a subject-description relation between them. The peculiarity is defined using the ratio of the subject-description relation between a theme word and characteristic words. We evaluate the extent to which peculiarity can be used to judge by classifying text from Wikipedia and Uncyclopedia in terms of the peculiarity. ""
Nakasaki, Hiroyuki; Kawaba, Mariko; Utsuro, Takehito & Fukuhara, Tomohiro Mining cross-lingual/cross-cultural differences in concerns and opinions in blogs 22nd International Conference on Computer Processing of Oriental Languages, ICCPOL 2009, March 26, 2009 - March 27, 2009 Hong kong 2009 [174]
The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. Mulitlingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework. 2009 Springer Berlin Heidelberg.
Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro Wikipedia relatedness measurement methods and influential features 2009 International Conference on Advanced Information Networking and Applications Workshops, WAINA 2009, May 26, 2009 - May 29, 2009 Bradford, United kingdom 2009 [175]
As a corpus for knowledge extraction, Wikipedia has become one of the promising resources among researchers in various domains such as NLP, WWW, IR} and AI} since it has a great coverage of concepts for wide-range domain, remarkable accuracy and easy-handled structure for analysis. Relatedness measurement among concepts is one of the traditional research topics on Wikipedia analysis. The value of relatedness measurement research is widely recognized because of the wide range of applications such as query expansion in IR} and context recognition in WSD(Word} Sense Disambiguation). A number of approaches have been proposed and they proved that there are many features that can be used to measure relatedness among concepts in Wikipedia. In the past, previous researches, many features such as categories, co-occurrence of terms (links), inter-page links and Infoboxes are used to this aim. What seems lacking, however, is an integrated feature selection model for these dispersed features since it is still unclear that which feature is influential and how can we integrate them in order to achieve higher accuracy. This paper is a position paper that proposes a SVR} (Support} Vector Regression) based integrated feature selection model to investigate the influence of each feature and seek a combine model of features that achieves high accuracy and coverage. ""
Nakayama, Kotaro; Ito, Masahiro; Hara, Takahiro & Nishio, Shojiro Wikipedia mining for huge scale Japanese association thesaurus construction 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan 2008 [176]
Wikipedia, a huge scale Web-based dictionary, is an impressive corpus for knowledge extraction. We already proved that Wikipedia can be used for constructing an English association thesaurus and our link structure mining method is significantly effective for this aim. However, we want to find out how we can apply this method to other languages and what the requirements, differences and characteristics are. Nowadays, Wikipedia supports more than 250 languages such as English, German, French, Polish and Japanese. Among Asian languages, the Japanese Wikipedia is the largest corpus in Wikipedia. In this research, therefore, we analyzed all Japanese articles in Wikipedia and constructed a huge scale Japanese association thesaurus. After constructing the thesaurus, we realized that it shows several impressive characteristics depending on language and culture. ""
Nazir, Fawad & Takeda, Hideaki Extraction and analysis of tripartite relationships from Wikipedia 2008 IEEE International Symposium on Technology and Society: ISTAS '08 - Citizens, Groups, Communities and Information and Communication Technologies, June 26, 2008 - June 28, 2008 Fredericton, NB, Canada 2008 [177]
Social aspects are critical in the decision making process for social actors (human beings). Social aspects can be categorized into social interaction, social communities, social groups or any kind of behavior that emerges from interlinking, overlapping or similarities between interests of a society. These social aspects are dynamic and emergent. Therefore, interlinking them in a social structure, based on bipartite affiliation network, may result in isolated graphs. The major reason is that as these correspondences are dynamic and emergent, they should be coupled with more than a single affiliation in order to sustain the interconnections during interest evolutions. In this paper we propose to interlink actors using multiple tripartite graphs rather than a bipartite graph which was the focus of most of the previous social network building techniques. The utmost benefit of using tripartite graphs is that we can have multiple and hierarchical links between social actors. Therefore in this paper we discuss the extraction, plotting and analysis methods of tripartite relations between authors, articles and categories from Wikipedia. Furthermore, we also discuss the advantages of tripartite relationships over bipartite relationships. As a conclusion of this study we argue based on our results that to build useful, robust and dynamic social networks, actors should be interlinked in one or more tripartite networks. ""
Neiat, Azadeh Ghari; Mohsenzadeh, Mehran; Forsati, Rana & Rahmani, Amir Masoud An agent- based semantic web service discovery framework 2009 International Conference on Computer Modeling and Simulation, ICCMS 2009, February 20, 2009 - February 22, 2009 Macau, China 2009 [178]
Web services have changed the Web from a database of static documents to a service provider. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper we propose a framework for semantic Web service discovery based on semantic Web services and FIPA} multi agents. This paper provides a broker which provides semantic interoperability between semantic Web service provider and agents by translating WSDL} to DF} description for semantic Web services and DF} description to WSDL} ForFIPA} multi agents. We describe how the proposed architecture analyzes the request and match search query. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ). We also describe the recommendation component that recommends the WSDL} to Web service provider to increase their retrieval probability in the related queries. ""
Neiat, Azadeh Ghari; Shavalady, Sajjad Haj; Mohsenzadeh, Mehran & Rahmani, Amir Masoud A new approach for semantic web service discovery and propagation based on agents 5th International Conference on Networking and Services, ICNS 2009, April 20, 2009 - April 25, 2009 Valencia, Spain 2009 [179]
for Web based systems integration become a time challenge. To improve the automation of Web services interoperation, a lot of technologies are recommended, such as semantic Web services and agents. In this paper an approach for semantic Web service discovery and propagation based on semantic Web services and FIPA} multi agents is proposed. A broker allowing to expose semantic interoperability between semantic Web service provider and agent by translating WSDL} to DF} description for semantic Web services and vice versa is proposed . We describe how the proposed architecture analyzes the request and after being analyzed, matches or publishes the request. The ontology management in the broker creates the user ontology and merges it with general ontology (i.e. WordNet, Yago, Wikipedia ...). We also describe the recommender which analyzes the created WSDL} based on the functional and non-functional requirements and then recommends it to Web service provider to increase their retrieval probability in the related queries. ""
Netzer, Yael; Gabay, David; Adler, Meni; Goldberg, Yoav & Elhadad, Michael Ontology evaluation through text classification APWeb/WAIM 2009 International Workshops: WCMT 2009, RTBI 2009, DBIR-ENQOIR 2009, PAIS 2009, April 2, 2009 - April 4, 2009 Suzhou, China 2009 [180]
We present a new method to evaluate a search ontology, which relies on mapping ontology instances to textual documents. On the basis of this mapping, we evaluate the adequacy of ontology relations by measuring their classification potential over the textual documents. This data-driven method provides concrete feedback to ontology maintainers and a quantitative estimation of the functional adequacy of the ontology relations towards search experience improvement. We specifically evaluate whether an ontology relation can help a semantic search engine support exploratory search. We test this ontology evaluation method on an ontology in the Movies domain, that has been acquired semi-automatically from the integration of multiple semi-structured and textual data sources (e.g., IMDb} and Wikipedia). We automatically construct a domain corpus from a set of movie instances by crawling the Web for movie reviews (both professional and user reviews). The 1-1 relation between textual documents (reviews) and movie instances in the ontology enables us to translate ontology relations into text classes. We verify that the text classifiers induced by key ontology relations (genre, keywords, actors) achieve high performance and exploit the properties of the learned text classifiers to provide concrete feedback on the ontology. The proposed ontology evaluation method is general and relies on the possibility to automatically align textual documents to ontology instances. 2009 Springer Berlin Heidelberg.
Newman, David; Noh, Youn; Talley, Edmund; Karimi, Sarvnaz & Baldwin, Timothy Evaluating topic models for digital libraries 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [181]
Topic models could have a huge impact on improving the ways users find and discover content in digital libraries and search interfaces, through their ability to automatically learn and apply subject tags to each and every item in a collection, and their ability to dynamically create virtual collections on the fly. However, much remains to be done to tap this potential, and empirically evaluate the true value of a given topic model to humans. In this work, we sketch out some sub-tasks that we suggest pave the way towards this goal, and present methods for assessing the coherence and inter-pretability of topics learned by topic models. Our large-scale user study includes over 70 human subjects evaluating and scoring almost 500 topics learned from collections from a wide range of genres and domains. We show how a scoring model - based on pointwise mutual information of word-pairs using Wikipedia, Google and MEDLINE} as external data sources - performs well at predicting human scores. This automated scoring of topics is an important first step to integrating topic modeling into digital libraries. ""
Nguyen, Chau Q. & Phan, Tuoi T. Key phrase extraction: A hybrid assignment and extraction approach 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [182]
Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that combines assignment and extraction approaches. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Then we propose a method that exploits specific characteristics of the Vietnamese language and exploits the Vietnamese Wikipedia as an ontology for key phrase ambiguity resolution. Finally, we show the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting. ""
Nguyen, Dong; Overwijk, Arnold; Hauff, Claudia; Trieschnigg, Dolf R. B.; Hiemstra, Djoerd & Jong, Franciska De WikiTranslate: Query translation for cross-lingual information retrieval using only wikipedia 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [183]
This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate} is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67\% compared to the monolingual baseline. 2009 Springer Berlin Heidelberg.
Nguyen, Hien T. & Cao, Tru H. Exploring wikipedia and text features for named entity disambiguation 2010 Asian Conference on Intelligent Information and Database Systems, ACIIDS 2010, March 24, 2010 - March 26, 2010 Hue City, Viet nam 2010 [184]
Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambiguation task, based on a statistical ranking model of candidate entities. Through experiments, we show which combinations of features are the best choices for disambiguation. 2010 Springer-Verlag} Berlin Heidelberg.
Nguyen, Hien T. & Cao, Tru H. Named entity disambiguation on an ontology enriched by Wikipedia RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, July 13, 2008 - July 17, 2008 Ho Chi Minh City, Viet nam 2008 [185]
Currently, for named entity disambiguation, the shortage of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems. ""
Nguyen, Thanh C.; Le, Hai M. & Phan, Tuoi T. Building knowledge base for Vietnamese information retrieval 11th International Conference on Information Integration and Web-based Applications and Services, iiWAS2009, December 14, 2009 - December 16, 2009 Kuala Lumpur, Malaysia 2009 [186]
At present, Vietnamese knowledge base (vnKB) is one of the most important focuses of Vietnamese researchers because of its applications in wide areas such as Information Retrieval (IR), Machine Translation (MT) etc. There have been several separate projects developing VnKB} in various domains. The training in VnBK} is the most difficulty because of quantity and quality of training data, and lacking of available Vietnamese corpus with acceptable quality. This paper introduces an approach, which first extracts semantic information from Vietnamese Wikipedia (vnWK), then trains the proposed VnKB} by applying support vector machine (SVM) technique. The experimentation of the proposed approach shows that it is a potential solution because of its good results and proves that it can provide more valuable benefits when applying to our Vietnamese Semantic Information Retrieval system. ""
Ochoa, Xavier & Duval, Erik Measuring learning object reuse 3rd European Conference on Technology Enhanced Learning, EC-TEL 2008, September 16, 2008 - September 19, 2008 Maastricht, Netherlands 2008 [187]
This paper presents a quantitative analysis of the reuse of learning objects in real world settings. The data for this analysis was obtained from three sources: Connexions' modules, University courses and Presentation components. They represent the reuse of learning objects at different granularity levels. Data from other types of reusable components, such as software libraries, Wikipedia images and Web APIs, were used for comparison purposes. Finally, the paper discusses the implications of the findings in the field of Learning Object research. 2008 Springer-Verlag} Berlin Heidelberg.
Oh, Jong-Hoon; Kawahara, Daisuke; Uchimoto, Kiyotaka; Kazama, Jun'ichi & Torisawa, Kentaro Enriching multilingual language resources by discovering missing cross-language links in Wikipedia 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [188]
We present a novel method for discovering missing crosslanguage links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links - a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92\% precision with 78\% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately 105 missing crosslanguage links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia. ""
Ohmori, Kenji & Kunii, Tosiyasu L. The mathematical structure of cyberworlds 2007 International Conference on Cyberworlds, CW'07, October 24, 2007 - October 27, 2007 Hannover, Germany 2007 [189]
The mathematical structure of cyberworlds is clarified based on the duality of homology lifting property and homotopy extension property. The duality gives bottom-up and top-down methods to model, design and analyze the structure of cyberworlds. The set of homepages representing a cyberworld is transformed into a state finite machine. In development of the cyberworld, a sequence of finite state machines is obtained. This sequence has homotopic property. This property is clarified to map a finite state machine to a simplicial complex. Wikipedia, bottom-up network construction and top-down network analysis are described as examples. ""
Okoli, Chitu A brief review of studies of Wikipedia in peer-reviewed journals 3rd International Conference on Digital Society, ICDS 2009, February 1, 2009 - February 7, 2009 Cancun, Mexico 2009 [190]
Since its establishment in 2001, Wikipedia, the free encyclopedia that anyone can edit" has become a cultural icon of the unlimited possibilities of the World Wide Web. Thus it has become a serious subject of scholarly study to objectively and rigorously understand it as a phenomenon. This paper reviews studies of Wikipedia that have been published in peer-reviewed journals. Among the wealth of studies reviewed major sub-streams of research covered include: how and why Wikipedia works; assessments of the reliability of its content; using it as a data source for various studies; and applications of Wikipedia in different domains of endeavour. """
Okoli, Chitu Information product creation through open source encyclopedias ICC2009 - International Conference of Computing in Engineering, Science and Information, April 2, 2009 - April 4, 2009 Fullerton, CA, United states 2009 [191]
The same open source philosophy that has been traditionally applied to software development can be applied to the collaborative creation of non-software information products, such as encyclopedias, books, and dictionaries. Most notably, the eight-year-old Wikipedia is a comprehensive general encyclopedia, comprising over 12 million articles in over 200 languages. It becomes increasingly important to rigorously investigate the workings of the open source process to understand its benefits and motivations. This paper presents a research program funded by the Social Sciences and Humanities Research Council of Canada with the following objectives: (1) Survey open source encyclopedia participants to understand their motivations for participating and their demographic characteristics, and compare them with participants in traditional open source software projects; (2) investigate the process of open source encyclopedia development in a live community to understand how their motivations interact in the open source framework to create quality information products. ""
Okoli, Chitu & Schabram, Kira Protocol for a systematic literature review of research on the Wikipedia 1st ACM International Conference on Management of Emergent Digital EcoSystems, MEDES '09, October 27, 2009 - October 30, 2009 Lyon, France 2009 [192]
Context: Wikipedia has become one of the ten-most visited sites on the Web, and the world's leading source of Web reference information. Its rapid success has attracted over 1,000 scholarly studies that treat Wikipedia as a major topic or data source. Objectives: This article presents a protocol for conducting a systematic mapping (a broad-based literature review) of research on Wikipedia. It identifies what research has been conducted; what research questions have been asked, which have been answered; and what theories and methodologies have been employed to study Wikipedia. Methods: This protocol follows the rigorous methodology of evidence-based software engineering to conduct a systematic mapping study. Results and conclusions: This protocol reports a study in progress. ""
Okuoka, Tomoki; Takahashi, Tomokazu; Deguchi, Daisuke; Ide, Ichiro & Murase, Hiroshi Labeling news topic threads with wikipedia entries 11th IEEE International Symposium on Multimedia, ISM 2009, December 14, 2009 - December 16, 2009 San Diego, CA, United states 2009 [193]
Wikipedia is a famous online encyclopedia. How-ever most Wikipedia entries are mainly explained by text, so it will be very informative to enhance the contents with multimedia information such as videos. Thus we are working on a method to extend information of Wikipedia entries by means of broadcast videos which explain the entries. In this work, we focus especially on news videos and Wikipedia entries about news events. In order to extend information of Wikipedia entries, it is necessary to link news videos and Wikipedia entries. So the main issue will be on a method that labels news videos with Wikipedia entries automatically. In this way, explanations could be more detailed with news videos can be exhibited, and the context of the news events should become easier to understand. Through experiments, news videos were accurately labeled with Wikipedia entries with a precision of 86\% and a recall of 79\%. ""
Olleros, F. Xavier Learning to trust the crowd: Some lessons from Wikipedia 2008 International MCETECH Conference on e-Technologies, MCETECH 2008, January 23, 2008 - January 25, 2008 Montreal, QC, Canada 2008 [194]
Inspired by the open source software (OSS) movement, Wikipedia has gone further than any OSS} project in decentralizing its quality control task. This is seen by many as a fatal flaw. In this short paper, I will try to show that it is rather a shrewd and fertile design choice. First, I will describe the precise way in which Wikipedia is more decentralized than OSS} projects. Secondly, I will explain why Wikipedia's quality control can be and must be decentralized. Thirdly, I will show why it is wise for Wikipedia to welcome anonymous amateurs. Finally, I will argue that concerns about Wikipedia's quality and sustainable success have to be tempered by the fact that, as disruptive innovations tend to do, Wikipedia is in the process of redefining the pertinent dimensions of quality and value for general encyclopedias. ""
Ortega, Felipe; Gonzalez-Barahona, Jesus M. & Robles, Gregorio The top-ten wikipedias : A quantitative analysis using wikixray 2nd International Conference on Software and Data Technologies, ICSOFT 2007, July 22, 2007 - July 25, 2007 Barcelona, Spain 2007
In a few years, Wilcipedia has become one of the information systems with more public (both producers and consumers) of the Internet. Its system and information architecture is relatively simple, but has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. In this paper, we analyze in detail this community, and the contents it is producing. Using a quantitative methodology based on the analysis of the public Wikipedia databases, we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology (which is almost completely automated) is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends.
Otjacques, Benoit; Cornil, Mael & Feltz, Fernand Visualizing cooperative activities with ellimaps: The case of wikipedia 6th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2009, September 20, 2009 - September 23, 2009 Luxembourg, Luxembourg 2009 [195]
Cooperation has become a key word in the emerging Web 2.0 paradigm. The nature and motivations of the various behaviours related to this type of cooperative activities remain however incompletely understood. The information visualization tools can play a crucial role from this perspective to analyse the collected data. This paper presents a prototype allowing visualizing some data about the Wikipedia history with a technique called ellimaps. In this context the recent CGD} algorithm is used in order to increase the scalability of the ellimaps approach. 2009 Springer Berlin Heidelberg.
Overell, Simon; Sigurbjornsson, Borkur & Zwol, Roelof Van Classifying tags using open content resources 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [196]
Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet} categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet} baseline our method increases the coverage of the Flickr vocabulary by 115\%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geo-caching and wii. ""
Ozyurt, I. Burak A large margin approach to anaphora resolution for neuroscience knowledge discovery 22nd International Florida Artificial Intelligence Research Society Conference, FLAIRS-22, March 19, 2009 - March 21, 2009 Sanibel Island, FL, United states 2009
A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet} and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method. Copyright 2009, Assocation for the Advancement of ArtdicaI} Intelligence (www.aaai.org). All rights reserved.
Pablo-Sanchez, Cesar De; Martinez-Fernandez, Jose L.; Gonzalez-Ledesma, Ana; Samy, Doaa; Martinez, Paloma; Moreno-Sandoval, Antonio & Al-Jumaily, Harith Combining wikipedia and newswire texts for question answering in spanish 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [197]
This paper describes the adaptations of the MIRACLE} group QA} system in order to participate in the Spanish monolingual question answering task at QA@CLEF} 2007. A system, initially developed for the EFE} collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE} subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. 2008 Springer-Verlag} Berlin Heidelberg.
Panchal, Jitesh H. & Fathianathan, Mervyn Product realization in the age of mass collaboration 2008 ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC 2008, August 3, 2008 - August 6, 2008 New York City, NY, United states 2009
There has been a recent emergence of communities working together in large numbers to develop new products, services, and systems. Collaboration at such scales, referred to as mass collaboration, has resulted in various robust products including Linux and Wikipedia. Companies are also beginning to utilize the power of mass collaboration to foster innovation at various levels. Business models based on mass collaboration are also emerging. Such an environment of mass collaboration brings about significant opportunities and challenges for designing next generation products. The objectives in this paper are to discuss these recent developments in the context of engineering design and to identify new research challenges. The recent trends in mass collaboration are discussed and the impacts of these trends on product realization processes are presented. Traditional collaborative product realization is distinguished from mass collaborative product realization. Finally, the open research issues for successful implementation of mass collaborative product realization are discussed.
Panciera, Katherine; Priedhorsky, Reid; Erickson, Thomas & Terveen, Loren Lurking? Cyclopaths? A quantitative lifecycle analysis of user behavior in a geowiki 28th Annual CHI Conference on Human Factors in Computing Systems, CHI 2010, April 10, 2010 - April 15, 2010 Atlanta, GA, United states 2010 [198]
Online communities produce rich behavioral datasets, e.g., Usenet news conversations, Wikipedia edits, and Facebook friend networks. Analysis of such datasets yields important insights (like the long tail" of user participation) and suggests novel design interventions (like targeting users with personalized opportunities and work requests). However certain key user data typically are unavailable specifically viewing pre-registration and non-logged-in activity. The absence of data makes some questions hard to answer; ac- cess to it can strengthen extend or cast doubt on previous results. We report on analysis of user behavior in Cyclopath a geographic wiki and route-finder for bicyclists. With access to viewing and non-logged-in activity data we were able to: (a) replicate and extend prior work on user lifecycles in Wikipedia (b) bring to light some pre-registration activity thus testing for the presence of "educational lurking and (c) demonstrate the locality of geographic activity and how editing and viewing are geographically correlated.
Pang, Wenbo & Fan, Xiaozhong Inducing gazetteer for Chinese named entity recognition based on local high-frequent strings 2009 2nd International Conference on Future Information Technology and Management Engineering, FITME 2009, December 13, 2009 - December 14, 2009 Sanya, China 2009 [199]
Gazetteers, or entity dictionaries, are important for named entity recognition (NER).} Although the dictionaries extracted automatically by the previous methods from a corpus, web or Wikipedia are very huge, they also misses some entities, especially the domain-specific entities. We present a novel method of automatic entity dictionary induction, which is able to construct a dictionary more specific to the processing text at a much lower computational cost than the previous methods. It extracts the local high-frequent strings in a document as candidate entities, and filters the invalid candidates with the accessor variety (AV) as our entity criterion. The experiments show that the obtained dictionary can effectively improve the performance of a high-precision baseline of NER.} ""
Paolucci, Alessio Research summary: Intelligent Natural language processing techniques and tools 25th International Conference on Logic Programming, ICLP 2009, July 14, 2009 - July 17, 2009 Pasadena, CA, United states 2009 [200]
My research path started with my master thesis (supervisor Prof. Stefania Costantini) about a neurobiologically-inspired proposal in the field of natural language processing. In more detail, we proposed the Semantic} Enhanced DCGs"} (for short SE-DCGs) extension to the well-known DCG's} to allow for parallel syntactic and semantic analysis and generate semantically-based description of the sentence at hand. The analysis carried out through SE-DCG's} was called "syntactic-semantic fully informed analysis" and it was designed to be as close as possible (at least in principle) to the results in the context of neuroscience that I had revised and studied. As proof-of-concept I implemented the prototype of semantic search engine the Mnemosine system. Mnemosine is able to interact with a user in natural language and to provide contextual answer at different levels of detail. Mnemosine has been applied to a practical case-study i.e. to the WikiPedia} Web pages. A brief overview of this work was presented during CICL} 08 [1]. 2009 Springer Berlin Heidelberg."
Pedersen, Claus Vesterager Who are the oracles - Is Web 2.0 the fulfilment of our dreams?: Host lecture at the EUSIDIC Annual Conference 11-13 March 2007 at Roskilde University Information Services and Use 2007
Powerful web-services will enable integration with Amazon, Library Thing, Google etcetera and it will make it feasible to construct new applications in very few days rather than the usual months or even years. The fundamental objective for modern university libraries is to create interfaces with the global knowledge system, tailor-made to the individual profile and needs of each university, department, researcher, and student. University libraries must support and use collaborative working and learning spaces and must be able to filter information and make it context relevant and reliant. Wikipedia is a good example of collaborative work between non-professionals, non-specialists, nonscientific volunteers with a fine result. Filtering information and making it context relevant and reliant are of very high importance, not only to the students and their education processes but also in connection with science and the scientific processes at the university.
Pei, Minghua; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Constructing a global ontology by concept mapping using Wikipedia thesaurus 22nd International Conference on Advanced Information Networking and Applications Workshops/Symposia, AINA 2008, March 25, 2008 - March 28, 2008 Gino-wan, Okinawa, Japan 2008 [201]
Recently, the importance of semantics on the WWW} is widely recognized and a lot of semantic information (RDF, OWL} etc.) is being built/published on the WWW.} However, the lack of ontology mappings becomes a serious problem for the Semantic Web since it needs well defined relations to retrieve information correctly by inferring the meaning of information. One to one mapping is not an efficient method due to the nature of distributed environment. Therefore, it would be a considerable method to map the concepts by using a large-scale intermediate ontology. On the other hand, Wikipedia is a large-scale of concept network covering almost all concepts in the real world. In this paper, we propose an intermediate ontology construction method using Wikipedia Thesaurus, an association thesaurus extracted from Wikipedia. Since Wikipedia Thesaurus provides associated concepts without explicit relation type, we propose an approach of concept mapping using two sub methods; name mapping" and "logic-based mapping". """
Pereira, Francisco; Alves, Ana; Oliveirinha, Joo & Biderman, Assaf Perspectives on semantics of the place from online resources ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [202]
We present a methodology for extraction of semantic indexes related to a given geo-referenced place. These lists of words correspond to the concepts that should be semantically related to that place, according to a number of perspectives. Each perspective is provided by a different online resource, namely upcoming.org, Flickr, Wikipedia or open web search (using Yahoo! search engine). We describe the process by which those lists are obtained, present experimental results and discuss the strengths and weaknesses of the methodology and of each perspective. ""
Pilato, Giovanni; Augello, Agnese; Scriminaci, Mario; Vassallo, Giorgio & Gaglio, Salvatore Sub-symbolic mapping of cyc microtheories in data-driven conceptual" spaces" 11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007, September 12, 2007 - September 14, 2007 Vietri sul Mare, Italy 2007
The presented work aims to combine statistical and cognitive-oriented approaches with symbolic ones so that a conceptual similarity relationship layer can be added to a Cyc KB} microtheory. Given a specific microtheory, a LSA-inspired} conceptual space is inferred from a corpus of texts created using both ad hoc extracted pages from the Wikipedia repository and the built-in comments about the concepts of the specific Cyc microtheory. Each concept is projected in the conceptual space and the desired layer of subsymbolic relationships between concepts is created. This procedure can help a user in finding the concepts that are sub-symbolically conceptually related" to a new concept that he wants to insert in the microtheory. Experimental results involving two Cyc microtheories are also reported. Springer-Verlag} Berlin Heidelberg 2007."
Pinkwart, Niels Applying Web 2.0 design principles in the design of cooperative applications 5th International Conference on Cooperative Design, Visualization, and Engineering, CDVE 2008, September 22, 2008 - September 25, 2008 Calvia, Mallorca, Spain 2008 [203]
Web} 2.0" is a term frequently mentioned in media - apparently applications such as Wikipedia Social Network Services Online Shops with integrated recommender systems or Sharing Services like flickr all of which rely on user's activities contributions and interactions as a central factor are fascinating for the general public. This leads to a success of these systems that seemingly exceeds the impact of most "traditional" groupware applications that have emerged from CSCW} research. This paper discusses differences and similarities between novel Web 2.0 tools and more traditional CSCW} application in terms of technologies system design and success factors. Based on this analysis the design of the cooperative learning application LARGO} is presented to illustrate how Web 2.0 success factors can be considered for the design of cooperative environments. 2008 Springer-Verlag} Berlin Heidelberg."
Pirrone, Roberto; Pipitone, Arianna & Russo, Giuseppe Semantic sense extraction from Wikipedia pages 3rd International Conference on Human System Interaction, HSI'2010, May 13, 2010 - May 15, 2010 Rzeszow, Poland 2010 [204]
This paper discusses a modality to access and to organize unstructured contents related to a particular topic coming from the access to Wikipedia pages. The proposed approach is focused on the acquisition of new knowledge from Wikipedia pages and is based on the definition of useful patterns able to extract and identify novel concepts and relations to be added in the knowledge base. We proposes a method that uses information from the wiki page's structure. According to the different part of the page we define different strategies to obtain new concepts or relation between them. We analyze not only structure but text directly to obtain relations and concepts and to extract the type of relations to be incorporated in a domain ontology. The purpose is to use the obtained information in an intelligent tutoring system to improve his capabilities in dialogue management with users. ""
Popescu, Adrian; Borgne, Herve Le & Moellic, Pierre-Alain Conceptual image retrieval over a large scale database 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [205]
Image retrieval in large-scale databases is currently based on a textual chains matching procedure. However, this approach requires an accurate annotation of images, which is not the case on the Web. To tackle this issue, we propose a reformulation method that reduces the influence of noisy image annotations. We extract a ranked list of related concepts for terms in the query from WordNet} and Wikipedia, and use them to expand the initial query. Then some visual concepts are used to re-rank the results for queries containing, explicitly or implicitly, visual cues. First evaluations on a diversified corpus of 150000 images were convincing since the proposed system was ranked 4 th and 2 nd at the WikipediaMM} task of the ImageCLEF} 2008 campaign [1]. 2009 Springer Berlin Heidelberg.
Popescu, Adrian; Grefenstette, Gregory & Moellic, Pierre-Alain Gazetiki: Automatic creation of a geographical gazetteer 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08, June 16, 2008 - June 20, 2008 Pittsburgh, PA, United states 2008 [206]
Geolocalized databases are becoming necessary in a wide variety of application domains. Thus far, the creation of such databases has been a costly, manual process. This drawback has stimulated interest in automating their construction, for example, by mining geographical information from the Web. Here we present and evaluate a new automated technique for creating and enriching a geographical gazetteer, called Gazetiki. Our technique merges disparate information from Wikipedia, Panoramio, and web search, engines in order to identify geographical names, categorize these names, find their geographical coordinates and rank them. The information produced in Gazetiki enhances and complements the Geonames database, using a similar domain model. We show that our method provides a richer structure and an improved coverage compared to another known attempt at automatically building a geographic database and, where possible, we compare our Gazetiki to Geonames. ""
Prasarnphanich, Pattarawan & Wagner, Christian Creating critical mass in collaboration systems: Insights from wikipedia 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies, IEEE-DEST 2008, February 26, 2008 - February 29, 2008 Phitsanulok, Thailand 2008 [207]
Digital ecosystems that rely on peer production, where users are consumers as well as producers of information and knowledge, are becoming increasingly popular and viable. Supported by Web 2.0 technologies such as wikis, these systems have the potential to replace existing knowledge management systems which generally rely on a small group of experts. The fundamental question for all such systems is under which conditions, the collective acts of knowledge contribution are started and become self-sustaining? Our article addresses this question, using Wikipedia as an exemplary system. Through a collective action framework, we apply critical mass theory to explain emergence and sustainability of the peer production approach. ""
Prato, Andrea & Ronchetti, Marco Using Wikipedia as a reference for extracting semantic information from a text 3rd International Conference on Advances in Semantic Processing - SEMAPRO 2009, October 11, 2009 - October 16, 2009 Sliema, Malta 2009 [208]
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm. ""
Preminger, Michael; Nordlie, Ragnar & Pharo, Nils OUC's participation in the 2009 INEX book track 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [209]
In this article we describe the Oslo University College's participation in the INEX} 2009 Book track. This year's tasks have been featuring complex topics, containing aspects. These lend themselves to use in both the book retrieval and the focused retrieval tasks. The OUC} has submitted retrieval results for both tasks, focusing on using the Wikipedia texts for query expansion, as well as utilizing chapter division information in (a number of) the books. 2010 Springer-Verlag} Berlin Heidelberg.
Priedhorsky, Reid; Chen, Jilin; Lam, Shyong K.; Panciera, Katherine; Terveen, Loren & Riedl, John Creating, destroying, and restoring value in wikipedia 2007 International ACM Conference on Supporting Group Work, GROUP'07, November 4, 2007 - November 7, 2007 Sanibel Island, FL, United states 2007 [210]
Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing. Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage. Finally, we make policy recommendations for Wikipedia and other wikis in light of these findings. ""
Pu, Qiang; He, Daqing & Li, Qi Query expansion for effective geographic information retrieval 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [211]
We developed two methods for monolingual Geo-CLEF} 2008 task. The GCEC} method aims to test the effectiveness of our online geographic coordinates extraction and clustering algorithm, and the WIKIGEO} method wants to examine the usefulness of using the geographic coordinates information in Wikipedia for identifying geo-locations. We proposed a measure of topic distance to evaluate these two methods. The experiments results show that: 1) our online geographic coordinates extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC} is effective in improving geographic retrieval; 3) Wikipedia can help in finding the coordinates for many geo-locations, but its usage for query expansion still needs further study; 4) query expansion based on title only obtained better results than that on the title and narrative parts, even though the latter contains more related geographic information. Further study is needed for this part. 2009 Springer Berlin Heidelberg.
Puttaswamy, Krishna P.N.; Marshall, Catherine C.; Ramasubramanian, Venugopalan; Stuedi, Patrick; Terry, Douglas B. & Wobber, Ted Docx2Go: Collaborative editing of fidelity reduced documents on mobile devices 8th Annual International Conference on Mobile Systems, Applications and Services, MobiSys 2010, June 15, 2010 - June 18, 2010 San Francisco, CA, United states 2010 [212]
Docx2Go} is a new framework to support editing of shared documents on mobile devices. Three high-level requirements influenced its design - namely, the need to adapt content, especially textual content, on the fly according to the quality of the network connection and the form factor of each device; support for concurrent, uncoordinated editing on different devices, whose effects will later be merged on all devices in a convergent and consistent manner without sacrificing the semantics of the edits; and a flexible replication architecture that accommodates both device-to-device and cloudmediated synchronization. Docx2Go} supports on-the-go editing for XML} documents, such as documents in Microsoft Word and other commonly used formats. It combines the best practices from content adaptation systems, weakly consistent replication systems, and collaborative editing systems, while extending the state of the art in each of these fields. The implementation of Docx2Go} has been evaluated based on a workload drawn from Wikipedia. ""
Qiu, Qiang; Zhang, Yang; Zhu, Junping & Qu, Wei Building a text classifier by a keyword and Wikipedia knowledge 5th International Conference on Advanced Data Mining and Applications, ADMA 2009, August 17, 2009 - August 19, 2009 Beijing, China 2009 [213]
Traditional approach for building text classifiers usually require a lot of labeled documents, which are expensive to obtain. In this paper, we propose a new text classification approach based on a keyword and Wikipedia knowledge, so as to avoid labeling documents manually. Firstly, we retrieve a set of related documents about the keyword from Wikipedia. And then, with the help of related Wikipedia pages, more positive documents are extracted from the unlabeled documents. Finally, we train a text classifier with these positive documents and unlabeled documents. The experiment result on {20Newsgroup} dataset show that the proposed approach performs very competitively compared with NB-SVM, a PU} learner, and NB, a supervised learner. 2009 Springer.
Ramanathan, Madhu; Rajagopal, Srikant; Karthik, Venkatesh; Murugeshan, Meenakshi Sundaram & Mukherjee, Saswati A recursive approach to entity ranking and list completion using entity determining terms, qualifiers and prominent n-grams 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [214]
This paper presents our approach for INEX} 2009 Entity Ranking track which consists of two subtasks viz. Entity Ranking and List Completion. Retrieving the correct entities according to the user query is a three-step process viz. extracting the required information from the query and the provided categories, extracting the relevant documents which may be either prospective entities or intermediate pointers to prospective entities by making use of the structure available in the Wikipedia Corpus and finally ranking the resultant set of documents. We have extracted the Entity Determining Terms (EDTs), Qualifiers and prominent n-grams from the query, strategically exploited the relation between the extracted terms and the structure and connectedness of the corpus to retrieve links which are highly probable of being entities and then used a recursive mechanism for retrieving relevant documents through the Lucene Search. Our ranking mechanism combines various approaches that make use of category information, links, titles and WordNet} information, initial description and the text of the document. 2010 Springer-Verlag} Berlin Heidelberg.
Ramezani, Maryam & Witschel, Hans Friedrich An intelligent system for semi-automatic evolution of ontologies 2010 IEEE International Conference on Intelligent Systems, IS 2010, July 7, 2010 - July 9, 2010 London, United kingdom 2010 [215]
Ontologies are an important part of the Semantic Web as well as of many intelligent systems. However, the traditional expert-driven development of ontologies is time-consuming and often results in incomplete and inappropriate ontologies. In addition, since ontology evolution is not controlled by end users, it may take too long for a conceptual change in the domain to be reflected in the ontology. In this paper, we present a recommendation algorithm in a Web 2.0 platform that supports end users to collaboratively evolve ontologies by suggesting semantic relations between new and existing concepts. We use the Wikipedia category hierarchy to evaluate our algorithm and our experimental results show that the proposed algorithm produces high quality recommendations. ""
Ramirez, Alex; Ji, Shaobo; Riordan, Rob; Ulbrich, Frank & Hine, Michael J. Empowering business students: Using Web 2.0 tools in the classroom 2nd International Conference on Computer Supported Education, CSEDU 2010, April 7, 2010 - April 10, 2010 Valencia, Spain 2010
This paper discusses the design of a course to empower business students using Web 2.0 technologies. We explore the learning phenomenon as a way to bring forward a process of continuous improvement supported by social software. We develop a framework to assess the infrastructure against expectations of skill proficiency using Web 2.0 tools which must emerge as a result of registering in an introductory business information and communication technologies (ICT) course in a business school of a Canadian university. We use Friedman's (2007) thesis that the world is flat" to discuss issues of globalization and the role of ICT.} Students registered in the course are familiar with some of the tools we introduce and use in the course. The students are members of Facebook or MySpace} regularly check YouTube} and use Wikipedia in their studies. They use these tools to socialize. We broaden the students' horizons and explore the potential business benefits of such tools and empower the students to use Web 2.0 technologies within a business context."
Rao, Weixiong; Fu, Ada Wai-Chee; Chen, Lei & Chen, Hanhua Stairs: Towards efficient full-text filtering and dissemination in a DHT environment 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China 2009 [216]
Nowadays contents in Internet like weblogs, wikipedia and news sites become live". How to notify and provide users with the relevant contents becomes a challenge. Unlike conventional Web search technology or the RSS} feed this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT).} Users can subscribe to their interested contents by specifying some terms and threshold values for filtering. Then published contents will be disseminated to the associated Subscribers.We} propose a novel and simple framework of filter registration and content publication STAIRS.} By the new framework we propose three algorithms (default forwarding dynamic forwarding and adaptive forwarding) to reduce the forwarding cost and false dismissal rate; meanwhile the subscriber can receive the desired contents with no duplicates. In particular the adaptive forwarding utilizes the filter information to significantly reduce the forwarding cost. Experiments based on two real query logs and two real datasets show the effectiveness of our proposed framework. """
Ray, Santosh Kumar; Singh, Shailendra & Joshi, B.P. World wide web based question answering system - A relevance feedback framework for automatic answer validation 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [217]
An open domain question answering system is one of the emerging information retrieval systems available on the World Wide Web that is becoming popular day by day to get succinct and relevant answers in response of users' questions. The validation of the correctness of the answer is an important issue in the field of question answering. In this paper, we are proposing a World Wide Web based solution for answer validation where answers returned by open domain Question Answering Systems can be validated using online resources such as Wikipedia and Google. We have applied several heuristics for answer validation task and tested them against some popular World Wide Web based open domain Question Answering Systems over a collection of 500 questions collected from standard sources such as TREC, the Worldbook, and the Worldfactbook. We found that the proposed method is yielding promising results for automatic answer validation task. ""
Razmara, Majid & Kosseim, Leila A little known fact is... Answering other questions using interest-markers 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007, Febrary 18, 2007 - Febrary 24, 2007 Mexico City, Mexico 2007
In this paper, we present an approach to answering Other"} questions using the notion of interest marking terms. {"Other"} questions have been introduced in the TREC-QA} track to retrieve other interesting facts about a topic. To answer these types of questions our system extracts from Wikipedia articles a list of interest-marking terms related to the topic and uses them to extract and score sentences from the document collection where the answer should be found. Sentences are then re-ranked using universal interest-markers that are not specific to the topic. The top sentences are then returned as possible answers. When using the 2004 TREC} data for development and 2005 data for testing the approach achieved an F-score of 0.265 placing it among the top systems. Springer-Verlag} Berlin Heidelberg 2007."
Reinoso, Antonio J.; Gonzalez-Barahona, Jesus M.; Robles, Gregorio & Ortega, Felipe A quantitative approach to the use of the wikipedia IEEE Symposium on Computers and Communications 2009, ISCC 2009, July 5, 2009 - July 8, 2009 Sousse, Tunisia 2009 [218]
This paper presents a quantitative study of the use of the Wikipedia system by its users (both readers and editors),with special focus on the identification of time and kind of- use patterns, characterization of traffic and workload, and comparative analysis of different language editions. The basis of the study is the filtering and analysis of a large sample of the requests directed to the Wikimedia systems for six weeks, each in a month from November 2007 to April 2008. In particular, we have considered the twenty most frequently visited language editions of the Wikipedia, identifying for each access to any of them the corresponding namespace (sets of resources with uniform semantics), resource name (article names, for example) and action (editions, submissions, history reviews, save operations, etc.). The results found include the identification of weekly and daily patterns, and several correlations between several actions on the articles. In summary, the study shows an overall picture ofhow the most visited language editions of the Wikipedia are being accessed by their users. ""
Ren, Reede; Misra, Hemant & Jose, Joemon M. Semantic based adaptive movie summarisation 16th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 2010, October 6, 2010 - October 8, 2010 Chongqing, China 2009 [219]
This paper proposes a framework for automatic video summarization by exploiting internal and external textual descriptions. The web knowledge base Wikipedia is used as a middle media layer, which bridges the gap between general user descriptions and exact film subtitles. Latent Dirichlet Allocation (LDA) detects as well as matches the distribution of content topics in Wikipedia items and movie subtitles. A saliency based summarization system then selects perceptually attractive segments from each content topic for summary composition. The evaluation collection consists of six English movies and a high topic coverage is shown over official trails from the Internet Movie Database. 2010 Springer-Verlag} Berlin Heidelberg.
Riche, Nathalie Henry; Lee, Bongshin & Chevalier, Fanny IChase: Supporting exploration and awareness of editing activities on Wikipedia International Conference on Advanced Visual Interfaces, AVI '10, May 26, 2010 - May 28, 2010 Rome, Italy 2010 [220]
To increase its credibility and preserve the trust of its readers. Wikipedia needs to ensure a good quality of its articles. To that end, it is critical for Wikipedia administrators to be aware of contributors' editing activity to monitor vandalism, encourage reliable contributors to work on specific articles, or find mentors for new contributors. In this paper, we present IChase, a novel interactive visualization tool to provide administrators with better awareness of editing activities on Wikipedia. Unlike the currently used visualizations that provide only page-centric information, IChase} visualizes the trend of activities for two entity types, articles and contributors. IChase} is based on two heatmaps (one for each entity type) synchronized to one timeline. It allows users to interactively explore the history of changes by drilling down into specific articles and contributors, or time points to access the details of the changes. We also present a case study to illustrate how IChase} can be used to monitor editing activities of Wikipedia authors, as well as a usability study. We conclude by discussing the strengths and weaknesses of IChase.} ""
Riedl, John Altruism, selfishness, and destructiveness on the social web 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, AH 2008, July 29, 2008 - August 1, 2008 Hannover, Germany 2008 [221]
Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} What is the nature of people's participation in building these repositories? What are their motives? In what ways is their behavior destructive instead of constructive? Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose three related research questions: 1) How does intelligent task routing-matching people with work-affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? 3) How do recommender systems affect the evolution of a shared tagging vocabulary among the contributors? We will explore these questions in the context of existing CALVs, including Wikipedia, Facebook, and MovieLens.} 2008 Springer-Verlag} Berlin Heidelberg.
Roger, Sandra; Vila, Katia; Ferrandez, Antonio; Pardino, Maria; Gomez, Jose Manuel; Puchol-Blasco, Marcel & Peral, Jesus Using AliQAn in monolingual QA@CLEF 2008 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [222]
This paper describes the participation of the system AliQAn} in the CLEF} 2008 Spanish monolingual QA} task. This time, the main goals of the current version of AliQAn} were to deal with topic-related questions and to decrease the number of inexact answers. We have also explored the use of the Wikipedia corpora, which have posed some new challenges for the QA} task. 2009 Springer Berlin Heidelberg.
Roth, Benjamin & Klakow, Dietrich Combining wikipedia-based concept models for cross-language retrieval 1st Information Retrieval Facility Conference, IRFC 2010, May 31, 2010 - May 31, 2010 Vienna, Austria 2010 [223]
As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language brigding for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR} by simply normalizing the training data. Furthermore, we show that LDA} and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC} corpus und a CLEF} dataset. ""
Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Third International Atlantic Web Intelligence Conference on Advances in Web Intelligence, AWIC 2005, June 6, 2005 - June 9, 2005 Lodz, Poland 2005
We describe an approach taken for automatically associating entries from an on-line encyclopedia with concepts in an ontology or a lexical semantic network. It has been tested with the Simple English Wikipedia and WordNet, although it can be used with other resources. The accuracy in disambiguating the sense of the encyclopedia entries reaches 91.11\% (83.89\% for polysemous words). It will be applied to enriching ontologies with encyclopedic knowledge. Springer-Verlag} Berlin Heidelberg 2005.
Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo Automatic extraction of semantic relationships for wordNet by means of pattern learning from wikipedia 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005: Natural Language Processing and Information Systems, June 15, 2005 - June 17, 2005 Alicante, Spain 2005
This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet} 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet} originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation. Springer-Verlag} Berlin Heidelberg 2005.
Sabin, Mihaela & Leone, Jim IT education 2.0 10th ACM Special Interest Group for Information Technology Education, SIGITE 2009, October 22, 2009 - October 24, 2009 Fairfax, VA, United states 2009 [224]
Today's networked computing and communications technologies have changed how information, knowledge, and culture are produced and exchanged. People around the world join online communities that are set up voluntarily and use their members' collaborative participation to solve problems, share interests, raise awareness, or simply establish social connections. Two online community examples with significant economic and cultural impact are the open source software movement and Wikipedia. The technological infrastructure of these peer production models uses current Web 2.0 tools, such as wikis, blogs, social networking, semantic tagging, and RSS} feeds. With no control exercised by property-based markets or managerial hierarchies, commons-based peer production systems contribute to and serve the public domain and public good. The body of cultural, educational, and scientific work of many online communities is made available to the public for free and legal sharing, use, repurposing, and remixing. Higher education's receptiveness to these transformative trends deserves close examination. In the case of the Information Technology (IT) education community, in particular, we note that the curricular content, research questions, and professional skills the IT} discipline encompasses have direct linkages with the Web 2.0 phenomenon. For that reason, IT} academic programs should pioneer and lead efforts to cultivate peer production online communities. We state the case that free access and open engagement facilitated by technological infrastructures that support a peer production model benefit IT} education. We advocate that these technologies be employed to strengthen IT} educational programs, advance IT} research, and revitalize the IT} education community.
Sacarea, C.; Meza, R. & Cimpoi, M. Improving conceptual search results reorganization using term-concept mappings retrieved from wikipedia 52008 IEEE International Conference on Automation, Quality and Testing, Robotics, AQTR 2008 - THETA 16th Edition, May 22, 2008 - May 25, 2008 Cluj-Napoca, Romania 2008 [225]
This paper describes a way of improving search engine results conceptual reorganization that uses formal concept analysis. This is done by using redirections to solve conceptual redundancies and by adding preliminary disambiguation and expanding the concept lattice with extra navigation nodes based on Wikipedia's ontology and strong conceptual links.
Safarkhani, Banafsheh; Mohsenzadeh, Mehran & Rahmani, Amir Masoud Improving website user model automatically using a comprehensive lexical semantic resource 2009 International Conference on E-Business and Information System Security, EBISS 2009, May 23, 2009 - May 24, 2009 Wuhan, China 2009 [226]
A major component in any web personalization system is its user model. Recently a number of researches have been done to incorporate semantics of a web site in representation of its users. All of these efforts use either a specific manually constructed taxonomy or ontology or a general purpose one like WordNet} to map page views into semantic elements. However, building a hierarchy of concepts manually is time consuming and expensive. On the other hand, general purpose resources suffer from low coverage of domain specific terms. In this paper we intend to address both these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user interests. We evaluate the effectiveness of the resulting model using concepts extracted from this promising resource. ""
Safarkhani, Banafsheh; Talabeigi, Mojde; Mohsenzadeh, Mehran & Meybodi, Mohammad Reza Deriving semantic sessions from semantic clusters 2009 International Conference on Information Management and Engineering, ICIME 2009, April 3, 2009 - April 5, 2009 Kuala Lumpur, Malaysia 2009 [227]
A important phase in any web personalization system is transaction identification. Recently a number of researches have been done to incorporate semantics of a web site in representation of transactions. Building a hierarchy of concepts manually is time consuming and expensive. In this paper we intend to address these shortcomings. Our contribution is that we introduce a mechanism to automatically improve the representation of the user in the website using a comprehensive lexical semantic resource and semantic clusters. We utilize Wikipedia, the largest encyclopedia to date, as a rich lexical resource to enhance the automatic construction of vector model representation of user sessions. We cluster web pages based on their content with Hierarchical Unsupervised Fuzzy Clustering algorithms ,are effective methods, for exploring the structure of complex real data where grouping of overlapping and vague elements is necessary. Entries in web server logs are used to identify users and visit sessions, while web page or resources in the site are clustered based on their content and their semantic. Theses clusters of web documents are used to scrutinize the discovered web sessions in order to identify what we call sub-sessions. Each subsession have consistent goal. This process engendered to improving deriving semantic sessions from web site user page views. Our experiments show that proposed system significantly improves the quality of web personalization process. ""
Saito, Kazumi; Kimura, Masahiro & Motoda, Hiroshi Discovering influential nodes for SIS models in social networks 12th International Conference on Discovery Science, DS 2009, October 3, 2009 - October 5, 2009 Porto, Portugal 2009 [228]
We address the problem of efficiently discovering the influential nodes in a social network under the susceptible/infected/susceptible (SIS) model, a diffusion model where nodes are allowed to be activated multiple times. The computational complexity drastically increases because of this multiple activation property. We solve this problem by constructing a layered graph from the original social network with each layer added on top as the time proceeds, and applying the bond percolation with pruning and burnout strategies. We experimentally demonstrate that the proposed method gives much better solutions than the conventional methods that are solely based on the notion of centrality for social network analysis using two large-scale real-world networks (a blog network and a wikipedia network). We further show that the computational complexity of the proposed method is much smaller than the conventional naive probabilistic simulation method by a theoretical analysis and confirm this by experimentation. The properties of the influential nodes discovered are substantially different from those identified by the centrality-based heuristic methods. 2009 Springer Berlin Heidelberg.
Sallaberry, Arnaud; Zaidi, Faraz; Pich, Christian & Melancon, Guy Interactive visualization and navigation of web search results revealing community structures and bridges 36th Graphics Interface Conference, GI 2010, May 31, 2010 - June 2, 2010 Ottawa, ON, Canada 2010
With the information overload on the Internet, organization and visualization of web search results so as to facilitate faster access to information is a necessity. The classical methods present search results as an ordered list of web pages ranked in terms of relevance to the searched topic. Users thus have to scan text snippets or navigate through various pages before finding the required information. In this paper we present an interactive visualization system for content analysis of web search results. The system combines a number of algorithms to present a novel layout methodology which helps users to analyze and navigate through a collection of web pages. We have tested this system with a number of data sets and have found it very useful for the exploration of data. Different case studies are presented based on searching different topics on Wikipedia through Exalead's search engine.
Santos, Diana & Cardoso, Nuno GikiP: Evaluating geographical answers from wikipedia 5th Workshop on Geographic Information Retrieval, GIR'08, Co-located with the ACM 17th Conference on Information and Knowledge Management, CIKM 2008, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [229]
This paper describes GikiP, a pilot task that took place in 2008 in CLEF.} We present the motivation behind GikiP} and the use of Wikipedia as the evaluation collection, detail the task and we list new ideas for its continuation.
Santos, Diana; Cardoso, Nuno; Carvalho, Paula; Dornescu, Iustin; Hartrumpf, Sven; Leveling, Johannes & Skalban, Yvonne GikiP at geoCLEF 2008: Joining GIR and QA forces for querying wikipedia 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [230]
This paper reports on the GikiP} pilot that took place in 2008 in GeoCLEF.} This pilot task requires a combination of methods from geographical information retrieval and question answering to answer queries to the Wikipedia. We start by the task description, providing details on topic choice and evaluation measures. Then we offer a brief motivation from several perspectives, and we present results in detail. A comparison of participants' approaches is then presented, and the paper concludes with improvements for the next edition. 2009 Springer Berlin Heidelberg.
Sarrafzadeh, Bahareh & Shamsfard, Mehrnoush Parallel annotation and population: A cross-language experience Proceedings - 2009 International Conference on Computer Engineering and Technology, ICCET 2009 445 Hoes Lane - P.O.Box} 1331, Piscataway, NJ} 08855-1331, United States 2009 [231]
In recent years automatic Ontology Population (OP) from texts has emerged as a new field of application for knowledge acquisition techniques. In OP, the instances of an ontology classes will be extracted from text and added under the ontology concepts. On the other hand, semantic annotation which is a key task in moving toward semantic web tries to tag instance data in a text by their corresponding ontology classes; so the ontology population activity accompanies generating semantic annotations usually. In this paper we introduce a cross-lingual population/ annotation system called POPTA} which annotates Persian texts according to an English lexicalized ontology and populates the English ontology according to the input Persian texts. It exploits a hybrid approach, a combination of statistical and pattern-based methods as well as techniques founded on the web and search engines and a novel method of resolving translation ambiguities. POPTA} also uses Wikipedia as a vast natural language encyclopedia to extract new instances to populate the input ontology. ""
Sawaki, M.; Minami, Y.; Higashinaka, R.; Dohsaka, K. & Maeda, E. Who is this" quiz dialogue system and users' evaluation" 2008 IEEE Workshop on Spoken Language Technology, SLT 2008, December 15, 2008 - December 19, 2008 Goa, India 2008 [232]
In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes Who} is this" quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner we implemented the system as a stuffed-toy (or CG} equivalent). Quizzes are automatically generated from Wikipedia articles rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level. """
Scardino, Giuseppe; Infantino, Ignazio & Gaglio, Salvatore Automated object shape modelling by clustering of web images 3rd International Conference on Computer Vision Theory and Applications, VISAPP 2008, January 22, 2008 - January 25, 2008 Funchal, Madeira, Portugal 2008
The paper deals with the description of a framework to create shape models of an object using images fromthe web. Results obtained from different image search engines using simple keywords are filtered, and it is possible to select images viewing a single object owning a well-defined contour. In order to have a large set of valid images, the implemented system uses lexical web databases (e.g. WordNet) or free web encyclopedias (e.g. Wikipedia), to get more keywords correlated to the given object. The shapes extracted from selected images are represented by Fourier descriptors, and are grouped by K-means algorithm. Finally, the more representative shapes of main clusters are considered as prototypical contours of the object. Preliminary experimental results are illustrated to show the effectiveness of the proposed approach.
Scarpazza, Daniele Paolo & Braudaway, Gordon W. Workload characterization and optimization of high-performance text indexing on the cell broadband enginetm (Cell/B.E.) 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, October 4, 2009 - October 6, 2009 Austin, TX, United states 2009 [233]
In this paper we examine text indexing on the Cell Broadband EngineTM} (Cell/B.E.), an emerging workload on an emerging multicore architecture. The Cell Broadband Engine is a microprocessor jointly developed by Sony Computer Entertainment, Toshiba, and IBM} (herein, we refer to it simply as the Cell").} The importance of text indexing is growing not only because it is the core task of commercial and enterprise-level search engines but also because it appears more and more frequently in desktop and mobile applications and on network appliances. Text indexing is a computationally intensive task. Multi-core processors promise a multiplicative increase in compute power but this power is fully available only if workloads exhibit the right amount and kind of parallelism. We present the challenges and the results of mapping text indexing tasks to the Cell processor. The Cell has become known as a platform capable of impressive performance but only when algorithms have been parallelized with attention paid to its hardware peculiarities (expensive branching wide SIMD} units small local memories). We propose a parallel software design that provides essential text indexing features at a high throughput (161 Mbyte/s per chip on Wikipedia inputs) and we present a performance analysis that details the resources absorbed by each subtask. Not only does this result affect traditional applications but it also enables new ones such as live network traffic indexing for security forensics until now believed to be too computationally demanding to be performed in real time. We conclude that at the cost of a radical algorithmic redesign our Cell-based solution delivers a 4 performance advantage over recent commodity machine like the Intel Q6600. In a per-chip comparison ours is the fastest text indexer that we are aware of. """
Scheau, Cristina; Rebedea, Traian; Chiru, Costin & Trausan-Matu, Stefan Improving the relevance of search engine results by using semantic information from Wikipedia 9th RoEduNet IEEE International Conference, RoEduNet 2010, June 24, 2010 - June 26, 2010 Sibiu, Romania 2010
Depending on the user's intention, the queries processed by a search engine can be classified in transactional, informational and navigational {[I].} In order to meet the three types of searches, at this moment search engines basically use algorithmic analysis of the links between pages improved by a factor that depends on the number of occurrences of the keywords in the query and the order of these words on each web page returned as a result. For transactional and informational queries, the relevance of the results returned by the search engine may be improved by using semantic information about the query concepts when computing the order of the results presented to the user. Wikipedia is a huge thesaurus which has the advantage of already being multi-lingual and semi-structured, presenting a dense structure of internal links that can be used to extract various types of information. This paper proposes a method to extract semantic relations between concepts considered as the names of the articles from Wikipedia, and then use these relations to determine the rank of the results returned by a search engine for a given query.
Schonberg, Christian; Pree, Helmuth & Freitag, Burkhard Rich ontology extraction and wikipedia expansion using language resources 11th International Conference on Web-Age Information Management, WAIM 2010, July 15, 2010 - July 17, 2010 Jiuzhaigou, China 2010 [234]
Existing social collaboration projects contain a host of conceptual knowledge, but are often only sparsely structured and hardly machine-accessible. Using the well known Wikipedia as a showcase, we propose new and improved techniques for extracting ontology data from the wiki category structure. Applications like information extraction, data classification, or consistency checking require ontologies of very high quality and with a high number of relationships. We improve upon existing approaches by finding a host of additional relevant relationships between ontology classes, leveraging multi-lingual relations between categories and semantic relations between terms. ""
Schonhofen, Peter Identifying document topics using the wikipedia category network Web Intelligence and Agent Systems 2009 [235]
In the last few years the size and coverage of Wikipedia, a community edited, freely available on-line encyclopedia has reached the point where it can be effectively used to identify topics discussed in a document, similarly to an ontology or taxonomy. In this paper we will show that even a fairly simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and also by performing classification and clustering on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of (or in addition to) their texts. 2009 - IOS} Press.
Schonhofen, Peter Annotating documents by Wikipedia concepts 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [236]
We present a technique which is able to reliably label words or phrases of an arbitrary document with Wikipedia articles (concepts) best describing their meaning. First it scans the document content, and when it finds a word sequence matching the title of a Wikipedia article, it attaches the article to the constituent word(s). The collected articles are then scored based on three factors: (1) how many other detected articles they semantically relate to, according to the Wikipedia link structure; (2) how specific is the concept they represent; and (3) how similar is the title by which they were detected to their official" title. If a text location refers to multiple Wikipedia articles only the one with the highest score is retained. Experiments on 24000 randomly selected Wikipedia article bodies showed that 81\% of phrases annotated by article authors were correctly identified. Moreover out of the 5 concepts deemed as the most important by our algorithm during a final ranking in average 72\% was indeed marked in the original text. """
Schonhofen, Peter; Benczur, Andras; Biro, Istvan & Csalogany, Karoly Cross-language retrieval with wikipedia 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [237]
We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms. 2008 Springer-Verlag} Berlin Heidelberg.
Shahid, Ahmad R. & Kazakov, Dimitar Automatic multilingual lexicon generation using wikipedia as a resource 1st International Conference on Agents and Artificial Intelligence, ICAART 2009, January 19, 2009 - January 21, 2009 Porto, Portugal 2009
This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs} are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.
Shilman, Michael Aggregate documents: Making sense of a patchwork of topical documents 8th ACM Symposium on Document Engineering, DocEng 2008, September 16, 2008 - September 19, 2008 Sao Paulo, Brazil 2008 [238]
With the dramatic increase in quantity and diversity of online content, particularly in the form of user generated content, we now have access to unprecedented amounts of information. Whether you are researching the purchase of a new cell phone, planning a vacation, or trying to assess a political candidate, there are now countless resources at your fingertips. However, finding and making sense of all this information is laborious and it is difficult to assess high-level trends in what is said. Web sites like Wikipedia and Digg democratize the process of organizing the information from countless document into a single source where it is somewhat easier to understand what is important and interesting. In this talk, I describe a complementary set of automated alternatives to these approaches, demonstrate these approaches with a working example, the commercial web site Wize.com, and derive some basic principles for aggregating a diverse set of documents into a coherent and useful summary.
Shiozaki, Hitohiro & Eguchi, Koji Entity ranking from annotated text collections using multitype topic models 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [239]
Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based} methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a 'multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection. 2008 Springer-Verlag} Berlin Heidelberg.
Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Concept vector extraction from Wikipedia category network 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC'09, January 15, 2009 - January 16, 2009 Suwon, Korea, Republic of 2009 [240]
The availability of machine readable taxonomy has been demonstrated by various applications such as document classification and information retrieval. One of the main topics of automated taxonomy extraction research is Web mining based statistical NLP} and a significant number of researches have been conducted. However, existing works on automatic dictionary building have accuracy problems due to the technical limitation of statistical NLP} (Natural} Language Processing) and noise data on the WWW.} To solve these problems, in this work, we focus on mining Wikipedia, a large scale Web encyclopedia. Wikipedia has high-quality and huge-scale articles and a category system because many users in the world have edited and refined these articles and category system daily. Using Wikipedia, the decrease of accuracy deriving from NLP} can be avoided. However, affiliation relations cannot be extracted by simply descending the category system automatically since the category system in Wikipedia is not in a tree structure but a network structure. We propose concept vectorization methods which are applicable to the category network structured in Wikipedia. ""
Siira, Erkki; Tuikka, Tuomo & Tormanen, Vili Location-based mobile wiki using NFC tag infrastructure 2009 1st International Workshop on Near Field Communication, NFC 2009, February 24, 2009 - February 24, 2009 Hagenberg, Austria 2009 [241]
Wikipedia is widely known encyclopedia in the web updated by volunteers around the world. Mobile and locationbased wiki with NFC, however, brings forward the idea of using Near Field Communication tags as an enabler for seeking information content from wiki. In this paper we shortly address how NFC} infrastructure can be created in a city for the use of location-based wiki. The users of the system can read local information from the Wikipedia system and also update the location-based content. We present an implementation of such a system. Finally, we evaluate the restrictions of the technological system, and delineate further work. ""
Silva, Lalindra De & Jayaratne, Lakshman Semi-automatic extraction and modeling of ontologies using wikipedia XML corpus 2nd International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2009, August 4, 2009 - August 6, 2009 London, United kingdom 2009 [242]
This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus derived from Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""
Silva, Lalindra De & Jayaratne, Lakshman WikiOnto: A system for semi-automatic extraction and modeling of ontologies using Wikipedia XML corpus ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [243]
This paper introduces WikiOnto:} a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML} Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. ""
Sipo, Ruben; Bhole, Abhijit; Fortuna, Blaz; Grobelnik, Marko & Mladenic, Dunja Demo: Historyviz - Visualizing events and relations extracted from wikipedia 6th European Semantic Web Conference, ESWC 2009, May 31, 2009 - June 4, 2009 Heraklion, Crete, Greece 2009 [244]
{HistoryViz} provides a new perspective on a certain kind of textual data, in particular the data available in the Wikipedia, where different entities are described and put in historical perspective. Instead of browsing through pages each describing a certain topic, we can look at the relations between entities and events connected with the selected entities. The presented solution implemented in {HistoryViz} provides user with a graphical interface allowing viewing events concerning the selected person on a timeline and viewing relations to other entities as a graph that can be dynamically expanded. 2009 Springer Berlin Heidelberg.
Sjobergh, Jonas; Sjobergh, Olof & Araki, Kenji What types of translations hide in Wikipedia? 3rd International Conference on Large-Scale Knowledge Resources, LKR 2008, March 3, 2008 - March 5, 2008 Tokyo, Japan 2008 [245]
We extend an automatically generated bilingual Japanese-Swedish} dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation. 2008 Springer-Verlag} Berlin Heidelberg.
Slattery, Shaun Edit this page": The socio-technological infrastructure of a wikipedia article" 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [246]
Networked environments, such as wikis, are commonly used to support work, including the collaborative authoring of information and fact-building. " In networked environments the activity of fact-building is mediated not only by the technological features of the interface but also by the social conventions of the community it supports. This paper examines the social and technological features of a Wikipedia article in order to understand how these features help mediate the activity of factbuilding and highlights the need for communication designers to consider the goals and needs of the communities for which they design. """
Sluis, Frans Van Der & Broek, Egon L. Van Den Using complexity measures in Information Retrieval 3rd Information Interaction in Context Symposium, IIiX'10, August 18, 2010 - August 21, 2010 New Brunswick, NJ, United states 2010 [247]
Although Information Retrieval (IR) is meant to serve its users, surprisingly little IR} research is not user-centered. In contrast, this article utilizes the concept complexity of in- formation as the determinant of the user's comprehension, not as a formal golden measure. Four aspects of user's com- prehension are applies on a database of simple and normal Wikipedia articles and found to distinguish between them. The results underline the feasibility of the principle of par- simony for IR:} where two topical articles are available, the simpler one is preferred ""
Smirnov, Alexander V. & Krizhanovsky, Andrew A. Information filtering based on wiki index database Computational Intelligence in Decision and Control - 8th International FLINS Conference, September 21, 2008 - September 24, 2008 Madrid, Spain 2008
In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user's document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The index databases of these subcorpora are applied to filtering information flow (e.g., mails, news). Thus, the analyzed texts are classified into several topics explicitly presented in the user profile. The paper concentrates on the indexing part of the approach. The architecture of an application implementing the Wikipedia indexing is described. The indexing method is evaluated using the Russian and Simple English Wikipedia.
Sood, Sara Owsley & Vasserman, Lucy ESSE: Exploring mood on the web 2009 ICWSM Workshop, May 20, 2009 - May 20, 2009 San Jose, CA, United states 2009
Future machines will connect with users on an emotional level in addition to performing complex computations (Norman} 2004). In this article, we present a system that adds an emotional dimension to an activity that Internet users engage in frequently, search. ESSE, which stands for Emotional State Search Engine, is a web search engine that goes beyond facilitating a user's exploration of the web by topic, as search engines such as Google or Yahoo! afford. Rather, it enables the user to browse their topically relevant search results by mood, providing the user with a unique perspective on the topic at hand. Consider a user wishing to read opinions about the new president of the United States. Typing President} Obama" into a Google search box will return (among other results) a few recent news stories about Obama the Whitehouse's website as well as a wikipedia article about him. Typing {"President} Obama" into a Google Blog Search box will bring the user a bit closer to their goal in that all of the results are indeed blogs (typically opinions) about Obama. However where blog search engines fall short is in providing users with a way to navigate and digest the vastness of the blogosphere the incredible number of results for the query {"President} Obama" (approximately 17335307 as of 2/24/09) (Google} Blog Search 2009). ESSE} provides another dimension by which users can take in the vastness of the web or the blogosphere. This article outlines the contributions of ESSE} including a new approach to mood classification. Copyright 2009 Association for the Advancement of Artificial Intelligence (www.aaai.org)."
Suh, Bongwon; Chi, Ed H.; Kittur, Aniket & Pendleton, Bryan A. Lifting the veil: Improving accountability and social transparency in Wikipedia with WikiDashboard 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2008, April 5, 2008 - April 10, 2008 Florence, Italy 2008 [248]
Wikis are collaborative systems in which virtually anyone can edit anything. Although wikis have become highly popular in many domains, their mutable nature often leads them to be distrusted as a reliable source of information. Here we describe a social dynamic analysis tool called WikiDashboard} which aims to improve social transparency and accountability on Wikipedia articles. Early reactions from users suggest that the increased transparency afforded by the tool can improve the interpretation, communication, and trustworthiness of Wikipedia articles. ""
Suh, Bongwon; Chi, Ed H.; Pendleton, Bryan A. & Kittur, Aniket Us vs. Them: Understanding social dynamics in wikipedia with revert graph visualizations VAST IEEE Symposium on Visual Analytics Science and Technology 2007, October 30, 2007 - November 1, 2007 Sacramento, CA, United states 2007 [249]
Wikipedia is a wiki-based encyclopedia that has become one of the most popular collaborative on-line knowledge systems. As in any large collaborative system, as Wikipedia has grown, conflicts and coordination costs have increased dramatically. Visual analytic tools provide a mechanism for addressing these issues by enabling users to more quickly and effectively make sense of the status of a collaborative environment. In this paper we describe a model for identifying patterns of conflicts in Wikipedia articles. The model relies on users' editing history and the relationships between user edits, especially revisions that void previous edits, known as reverts". Based on this model we constructed Revert Graph a tool that visualizes the overall conflict patterns between groups of users. It enables visual analysis of opinion groups and rapid interactive exploration of those relationships via detail drill-downs. We present user patterns and case studies that show the effectiveness of these techniques and discuss how they could generalize to other systems. """
Swarts, Jason The collaborative construction of 'fact' on wikipedia 27th ACM International Conference on Design of Communication, SIGDOC'09, October 5, 2009 - October 7, 2009 Bloomington, IN, United states 2009 [250]
For years Wikipedia has come to symbolize the potential of Web 2.0 for harnessing the power of mass collaboration and collective intelligence. As wikis continue to develop and move into streams of cultural, social, academic, and enterprise work activity, it is appropriate to consider how collective intelligence emerges from mass collaboration. Collective intelligence can take many forms - this paper examines one, the emergence of stable facts on Wikipedia. More specifically, this paper examines ways of participating that lead to the creation of facts. This research will show how we can be more effective consumers, producers, and managers of wiki information by understanding how collaboration shapes facts. ""
Szomszor, Martin; Alani, Harith; Cantador, Ivan; O'Hara, Kieron & Shadbolt, Nigel Semantic modelling of user interests based on cross-folksonomy analysis 7th International Semantic Web Conference, ISWC 2008, October 26, 2008 - October 30, 2008 Karlsruhe, Germany 2008 [251]
The continued increase in Web usage, in particular participation in folksonomies, reveals a trend towards a more dynamic and interactive Web where individuals can organise and share resources. Tagging has emerged as the de-facto standard for the organisation of such resources, providing a versatile and reactive knowledge management mechanism that users find easy to use and understand. It is common nowadays for users to have multiple profiles in various folksonomies, thus distributing their tagging activities. In this paper, we present a method for the automatic consolidation of user profiles across two popular social networking sites, and subsequent semantic modelling of their interests utilising Wikipedia as a multi-domain model. We evaluate how much can be learned from such sites, and in which domains the knowledge acquired is focussed. Results show that far richer interest profiles can be generated for users when multiple tag-clouds are combined. 2008 Springer Berlin Heidelberg.
Szymanski, Julian Mining relations between wikipedia categories 2nd International Conference on 'Networked Digital Technologies', NDT 2010, July 7, 2010 - July 9, 2010 Prague, Czech republic 2010 [252]
The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created category graph with the original Wikipedia category graph, testing its quality in terms of coverage. 2010 Springer-Verlag} Berlin Heidelberg.
Szymanski, Julian WordVenture - Cooperative WordNet editor: Architecture for lexical semantic acquisition 1st International Conference on Knowledge Engineering and Ontology Development, KEOD 2009, October 6, 2009 - October 8, 2009 Funchal, Madeira, Portugal 2009
This article presents architecture for acquiring lexical semantics in a collaborative approach paradigm. The system enables functionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation. It has been used for semantic network presentation, and brings simultaneously modification functionality.
Tan, Saravadee Sae; Kong, Tang Enya & Sodhy, Gian Chand Annotating wikipedia articles with semantic tags for structured retrieval 2nd ACM Workshop on Social Web Search and Mining, SWSM'09, Co-located with the 18th ACM International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [253]
Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve information retrieval. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the semantic information in web documents and explicitly annotate the information with semantic tags. Based on the well-known Wikipedia corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Our approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. We also describe a lazy approach used in the learning process. By utilizing the Wikipedia categories provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category. ""
Taneva, Bilyana; Kacimi, Mouna & Weikum, Gerhard Gathering and ranking photos of named entities with high precision, high recall, and diversity 3rd ACM International Conference on Web Search and Data Mining, WSDM 2010, February 3, 2010 - February 6, 2010 New York City, NY, United states 2010 [254]
Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia} enable the construction of large machine-processible knowledge bases with relational facts about entities. These endeavors lack multimodal data like photos and videos of people and places. While photos of famous entities are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. Our goal is to populate a knowledge base with photos of named entities, with high precision, high recall, and diversity of photos for a given entity. We harness relational facts about entities for generating expanded queries to retrieve different candidate lists from image search engines. We use a weighted voting method to determine better rankings of an entity's photos. Appropriate weights are dependent on the type of entity (e.g., scientist vs. politician) and automatically computed from a small set of training entities. We also exploit visual similarity measures based on SIFT} features, for higher diversity in the final rankings. Our experiments with photos of persons and landmarks show significant improvements of ranking measures like MAP} and NDCG, and also for diversity-aware ranking. ""
Tellez, Alberto; Juarez, Antonio; Hernandez, Gustavo; Denicia, Claudia; Villatoro, Esau; Montes, Manuel & Villasenor, Luis A lexical approach for Spanish question answering 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, September 19, 2007 - September 21, 2007 Budapest, Hungary 2008 [255]
This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy. 2008 Springer-Verlag} Berlin Heidelberg.
Theng, Yin-Leng; Li, Yuanyuan; Lim, Ee-Peng; Wang, Zhe; Goh, Dion Hoe-Lian; Chang, Chew-Hung; Chatterjea, Kalyani & Zhang, Jun Understanding user perceptions on usefulness and usability of an integrated Wiki-G-Portal 9th International Conference on Asian Digital Libraries, ICADL 2006, November 27, 2006 - November 30, 2006 Kyoto, Japan 2006
This paper describes a pilot study on Wiki-G-Portal, a project integrating Wikipedia, an online encyclopedia, into G-Portal, a Web-based digital library, of geography resources. Initial findings from the pilot study seemed to suggest positive perceptions on usefulness and usability of Wiki-G-Portal, as well as subjects' attitude and intention to use. Springer-Verlag} Berlin Heidelberg 2006.
Thomas, Christopher; Mehra, Pankaj; Brooks, Roger & Sheth, Amit Growing fields of interest using an expand and reduce strategy for domain model extraction 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [256]
Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR} systems or as a basis for manual or semi-automatic creation of formal ontologies. ""
Tianyi, Shi; Shidou, Jiao; Junqi, Hou & Minglu, Li Improving keyphrase extraction using wikipedia semantics 2008 2nd International Symposium on Intelligent Information Technology Application, IITA 2008, December 21, 2008 - December 22, 2008 Shanghai, China 2008 [257]
Keyphrase extraction plays a key role in various fields such as information retrieval, text classification etc. However, most traditional keyphrase extraction methods relies on word frequency and position instead of document inherent semantic information, often results in inaccurate output. In this paper, we propose a novel automatic keyphrase extraction algorithm using semantic features mined from online Wikipedia. This algorithm first identifies candidate keyphrases based on lexical methods, and then a semantic graph which connects candidate keyphrases with document topics is constructed. Afterwards, a link analysis algorithm is applied to assign semantic feature weight to the candidate keyphrases. Finally, several statistical and semantic features are assembled by a regression model to predict the quality of candidates. Encouraging results are achieved in our experiments which show the effectiveness of our method. ""
Tran, Tien; Kutty, Sangeetha & Nayak, Richi Utilizing the structure and content information for XML document clustering 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [258]
This paper reports on the experiments and results of a clustering approach used in the INEX} 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML} document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML} documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD).} On a large feature space matrix, the computation of SVD} is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD.} The document space reduction is based on the common structural information of the Wikipedia XML} document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX} 2008 document mining challenge. 2009 Springer Berlin Heidelberg.
Tran, Tien; Nayak, Richi & Bruza, Peter Document clustering using incremental and pairwise approaches 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [259]
This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX} 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved. 2008 Springer-Verlag} Berlin Heidelberg.
Tsikrika, Theodora & Kludas, Jana Overview of the WikipediaMM task at ImageCLEF 2008 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [260]
The WikipediaMM} task provides a testbed for the system- oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF} evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task's resources, topics, assessments, participants' approaches, and main results. 2009 Springer Berlin Heidelberg.
Tsikrika, Theodora; Serdyukov, Pavel; Rode, Henning; Westerveld, Thijs; Aly, Robin; Hiemstra, Djoerd & Vries, Arjen P. De Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [261]
CWI} and University of Twente used PF/Tijah, a flexible XML} retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX} 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results. 2008 Springer-Verlag} Berlin Heidelberg.
Urdaneta, Guido; Pierre, Guillaume & Steen, Maarten Van A decentralized wiki engine for collaborative wikipedia hosting 3rd International Conference on Web Information Systems and Technologies, Webist 2007, March 3, 2007 - March 6, 2007 Barcelona, Spain 2007
This paper presents the design of a decentralized system for hosting large-scale wiki web sites like Wikipedia, using a collaborative approach. Our design focuses on distributing the pages that compose the wiki across a network of nodes provided by individuals and organizations willing to collaborate in hosting the wiki. We present algorithms for placing the pages so that the capacity of the nodes is not exceeded and the load is balanced, and algorithms for routing client requests to the appropriate nodes. We also address fault tolerance and security issues.
Vaishnavi, Vijay K.; Vandenberg, Art; Zhang, Yanqing & Duraisamy, Saravanaraj Towards design principles for effective context-and perspective-based web mining 4th International Conference on Design Science Research in Information Systems and Technology, DESRIST '09, May 7, 2009 - May 8, 2009 Philadelphia, CA, United states 2009 [262]
A practical and scalable web mining solution is needed that can assist the user in processing existing web-based resources to discover specific, relevant information content. This is especially important for researcher communities where data deployed on the World Wide Web are characterized by autonomous, dynamically evolving, and conceptually diverse information sources. The paper describes a systematic design research study that is based on prototyping/evaluation and abstraction using existing and new techniques incorporated as plug and play components into a research workbench. The study investigates an approach, DISCOVERY, for using (1) context/perspective information and (2) social networks such as ODP} or Wikipedia for designing practical and scalable human-web systems for finding web pages that are relevant and meet the needs and requirements of a user or a group of users. The paper also describes the current implementation of DISCOVERY} and its initial use in finding web pages in a targeted web domain. The resulting system arguably meets the common needs and requirements of a group of people based on the information provided by the group in the form of a set of context web pages. The system is evaluated for a scenario in which assistance of the system is sought for a group of faculty members in finding NSF} research grant opportunities that they should collaboratively respond to, utilizing the context provided by their recent publications. ""
Vercoustre, Anne-Marie; Pehcevski, Jovan & Naumovski, Vladimir Topic difficulty prediction in entity ranking 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, December 15, 2008 - December 18, 2008 Dagstuhl Castle, Germany 2009 [263]
Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX} Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX} topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking. 2009 Springer Berlin Heidelberg.
Vercoustre, Anne-Marie; Pehcevski, Jovan & Thom, James A. Using Wikipedia categories and links in entity ranking 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [264]
This paper describes the participation of the INRIA} group in the INEX} 2007 XML} entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on both the training and the testing data sets demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks. 2008 Springer-Verlag} Berlin Heidelberg.
Vercoustre, Anne-Marie; Thom, James A. & Pehcevski, Jovan Entity ranking in Wikipedia 23rd Annual ACM Symposium on Applied Computing, SAC'08, March 16, 2008 - March 20, 2008 Fortaleza, Ceara, Brazil 2008 [265]
The traditional entity extraction problem, lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX} Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness. ""
Viegas, Fernanda B.; Wattenberg, Martin & Mckeon, Matthew M. The hidden order of wikipedia 2nd International Conference on Online Communities and Social Computing, OCSC 2007, July 22, 2007 - July 27, 2007 Beijing, China 2007
We examine the procedural side of Wikipedia, the well-known internet encyclopedia. Despite the lack of structure in the underlying wiki technology, users abide by hundreds of rules and follow well-defined processes. Our case study is the Featured Article (FA) process, one of the best established procedures on the site. We analyze the FA} process through the theoretical framework of commons governance, and demonstrate how this process blends elements of traditional workflow with peer production. We conclude that rather than encouraging anarchy, many aspects of wiki technology lend themselves to the collective creation of formalized process and policy. Springer-Verlag} Berlin Heidelberg 2007.
Villarreal, Sara Elena Gaza; Elizalde, Lorena Martinez & Viveros, Adriana Canseco Clustering hyperlinks for topic extraction: An exploratory analysis 8th Mexican International Conference on Artificial Intelligence, MICAI 2009, November 9, 2009 - November 13, 2009 Guanajuato, Guanajuato, Mexico 2009 [266]
In a Web of increasing size and complexity, a key issue is automatic document organization, which includes topic extraction in collections. Since we consider topics as document clusters with semantic properties, we are concerned with exploring suitable clustering techniques for their identification on hyperlinked environments (where we only regard structural information). For this purpose, three algorithms (PDDP, kmeans, and graph local clustering) were executed over a document subset of an increasingly popular corpus: Wikipedia. Results were evaluated with unsupervised metrics (cosine similarity, semantic relatedness, Jaccard index) and suggest that promising results can be produced for this particular domain. ""
Vries, Arjen P. De; Vercoustre, Anne-Marie; Thom, James A.; Craswell, Nick & Lalmas, Mounia Overview of the INEX 2007 entity ranking track 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, December 17, 2007 - December 19, 2007 Dagstuhl Castle, Germany 2008 [267]
Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include {'Countries} where one can pay with the euro' or {'Impressionist} art museums in The Netherlands'. The Initiative for Evaluation of XML} Retrieval (INEX) started the XML} Entity Ranking track (INEX-XER) to create a test collection for entity retrieval in Wikipedia. Entities are assumed to correspond to Wikipedia entries. The goal of the track is to evaluate how well systems can rank entities in response to a query; the set of entities to be ranked is assumed to be loosely defined either by a generic category (entity ranking) or by some example entities (list completion). This track overview introduces the track setup, and discusses the implications of the new relevance notion for entity ranking in comparison to ad hoc retrieval. 2008 Springer-Verlag} Berlin Heidelberg.
Vries, Christopher M. De; Geva, Shlomo & Vine, Lance De Clustering with random indexing K-tree and XML structure 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [268]
This paper describes the approach taken to the clustering task at INEX} 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX} 2009 Wikipedia collection. The RI} K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies. 2010 Springer-Verlag} Berlin Heidelberg.
Vroom, Regine W.; Vossen, Lysanne E. & Geers, Anoek M. Aspects to motivate users of a design engineering wiki to share their knowledge Proceedings of World Academy of Science, Engineering and Technology 2009
Industrial design engineering is an information and knowledge intensive job. Although Wikipedia offers a lot of this information, design engineers are better served with a wiki tailored to their job, offering information in a compact manner and functioning as a design tool. For that reason WikID} has been developed. However for the viability of a wiki, an active user community is essential. The main subject of this paper is a study to the influence of the communication and the contents of WikID} on the user's willingness to contribute. At first the theory about a website's first impression, general usability guidelines and user motivation in an online community is studied. Using this theory, the aspects of the current site are analyzed on their suitability. These results have been verified with a questionnaire amongst 66 industrial design engineers (or students industrial design engineering). The main conclusion is that design engineers are enchanted with the existence of WikID} and its knowledge structure (taxonomy) but this structure has not become clear without any guidance. In other words, the knowledge structure is very helpful for inspiring and guiding design engineers through their tailored knowledge domain in WikID} but this taxonomy has to be better communicated on the main page. Thereby the main page needs to be fitted more to the target group preferences.
Waltinger, Ulli & Mehler, Alexander Who is it? Context sensitive named entity and instance recognition by means of Wikipedia 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, December 9, 2008 - December 12, 2008 Sydney, NSW, Australia 2008 [269]
This paper presents an approach for predicting context sensitive entities exemplified in the domain of person names. Our approach is based on building a weighted context but also a weighted people graph and predicting the context entity by extracting the best fitting sub graph using a spreading activation technique. The results of the experiments show a quite promising F-Measure} of 0.99. ""
Waltinger, Ulli; Mehler, Alexander & Heyer, Gerhard Towards automatic content tagging - Enhanced web services in digital libraries using lexical chaining WEBIST 2008 - 4th International Conference on Web Information Systems and Technologies, May 4, 2008 - May 7, 2008 Funchal, Madeira, Portugal 2008
This paper proposes a web-based application which combines social tagging, enhanced visual representation of a document and the alignment to an open-ended social ontology. More precisely we introduce on the one hand an approach for automatic extraction of document related keywords for indexing and representing document content as an alternative to social tagging. On the other hand a proposal for automatic classification within a social ontology based on the German Wikipedia category taxonomy is proposed. This paper has two main goals: to describe the method of automatic tagging of digital documents and to provide an overview of the algorithmic patterns of lexical chaining that can be applied for topic tracking and -labelling of digital documents.
Wang, Gang; Yu, Yong & Zhu, Haiping PORE: Positive-only relation extraction from wikipedia text 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, ASWC 2007, November 11, 2007 - November 15, 2007 Busan, Korea, Republic of 2007 [270]
Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE} (Positive-Only} Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL} extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identifi cation, and transductive inference to work with fewer positive training exam ples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL} can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM.} Furthermore, although PORE} is applied in the context of Wiki pedia, the core algorithm B-POL} is a general approach for Ontology Population and can be adapted to other domains. 2008 Springer-Verlag} Berlin Heidelberg.
Wang, Jun; Jin, Xin & Wu, Yun-Peng An empirical study of knowledge collaboration networks in virtual community: Based on wiki 2009 16th International Conference on Management Science and Engineering, ICMSE 2009, September 14, 2009 - September 16, 2009 Moscow, Russia 2009 [271]
Wikipedia is a typical Knowledge collaboration-oriented virtual community. Yet its collaboration mechanism remains unclear. This empirical study explores wikipedia's archive data and proposes a knowledge collaboration network model. The analysis indicates that wiki-based knowledge collaboration network is a type of BA} scale-free network which obeys power-law distribution. On the other hand, this network is characterized with higher stable clustering coefficient and smaller average distance, thus present obvious small-world effect. Moreover, the network topology is non-hierarchical becuase clustering coefficients and degrees don't conform to power -law distribution. The above results profile the collaboration network and figure the key network property. Thus we can use the model to describe how people interact with each other and to what extend they collaborate on content creation. ""
Wang, Juncheng; Ma, Feicheng & Cheng, Jun The impact of research design on the half-life of the wikipedia category system 2010 International Conference on Computer Design and Applications, ICCDA 2010, June 25, 2010 - June 27, 2010 Qinhuangdao, Hebei, China 2010 [272]
The Wikipedia category system has shown a phenomenon of life or obsolescence similar as periodical literatures do, so this paper aims to investigate how the factors related to study design and research process, involving the observation points and the time span, play an impact on the obsolescence of the Wikipedia category system. For the impact of different observation points, we make use of the datasets at different time points under the same time span and the results show that the observation points do have an obvious influence on the category cited half-life; And for the impact of time span, we use the datasets with different intervals at the same time point and the results indicate that the time span has a certain impact on the categories' obsolescence. Based on the deep analysis, the paper further proposes some useful suggestions for the similar studies on information obsolescence in the future. ""
Wang, Li; Yata, Susumu; Atlam, El-Sayed; Fuketa, Masao; Morita, Kazuhiro; Bando, Hiroaki & Aoe, Jun-Ichi A method of building Chinese field association knowledge from Wikipedia 2009 International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009, September 24, 2009 - September 27, 2009 Dalian, China 2009 [273]
Field Association (FA) terms form a limited set of discriminating terms that give us the knowledge to identify document fields. The primary goal of this research is to make a system that can imitate the process whereby humans recognize the fields by looking at a few Chinese FA} terms in a document. This paper proposes a new approach to build a Chinese FA} terms dictionary automatically from Wikipedia. 104,532 FA} terms are added in the dictionary. The resulting FA} terms by using this dictionary are applied to recognize the fields of 5,841 documents. The average accuracy in the experiment is 92.04\%. The results show that the presented method is effective in building FA} terms from Wikipedia automatically. ""
Wang, Qiuyue; Li, Qiushi; Wang, Shan & Du, Xiaoyong Exploiting semantic tags in XML retrieval 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [274]
With the new semantically annotated Wikipedia XML} corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS} queries help in retrieving an XML} document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO} queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO} and CAS} queries over the document collection at INEX} 2009 ad hoc track, and we propose a method to improve the effectiveness of CO} queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML} element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS} queries are generally superior. 2010 Springer-Verlag} Berlin Heidelberg.
Wang, Yang; Wang, Haofen; Zhu, Haiping & Yu, Yong Exploit semantic information for category annotation recommendation in Wikipedia 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007
Compared with plain-text resources, the ones in semi-semantic" web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles. Springer-Verlag} Berlin Heidelberg 2007."
Wannemacher, Klaus Articles as assignments - Modalities and experiences of wikipedia use in university courses 8th International Conference on Web Based Learning, ICWL 2009, August 19, 2009 - August 21, 2009 Aachen, Germany 2009 [275]
In spite of perceived quality deficits, Wikipedia is a popular information resource among students. Instructors increasingly take advantage of the positive student attitude through actively integrating Wikipedia as a learning tool into university courses. The contribution raises the question if Wikipedia assignments in university courses are suited to make complex research, editing and bibliographic processes through which scholarship is produced transparent to students and to effectively improve their research and writing skills. 2009 Springer Berlin Heidelberg.
Wartena, Christian & Brussee, Rogier Topic detection by clustering keywords DEXA 2008, 19th International Conference on Database and Expert Systems Applications, September 1, 2008 - September 5, 2008 Turin, Italy 2008 [276]
We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon} divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results. ""
Wattenberg, Martin; Viegas, Fernanda B. & Hollenbach, Katherine Visualizing activity on wikipedia with chromograms 11th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2007, September 10, 2007 - September 14, 2007 Rio de Janeiro, Brazil 2007
To investigate how participants in peer production systems allocate their time, we examine editing activity on Wikipedia, the well-known online encyclopedia. To analyze the huge edit histories of the site's administrators we introduce a visualization technique, the chromogram, that can display very long textual sequences through a simple color coding scheme. Using chromograms we describe a set of characteristic editing patterns. In addition to confirming known patterns, such reacting to vandalism events, we identify a distinct class of organized systematic activities. We discuss how both reactive and systematic strategies shed light on self-allocation of effort in Wikipedia, and how they may pertain to other peer-production systems. IFIP} International Federation for Information Processing 2007.
Wee, Leong Chee & Hassan, Samer Exploiting Wikipedia for directional inferential text similarity International Conference on Information Technology: New Generations, ITNG 2008, April 7, 2008 - April 9, 2008 Las Vegas, NV, United states 2008 [277]
In natural languages, variability of semantic expression refers to the situation where the same meaning can be inferred from different words or texts. Given that many natural language processing tasks nowadays (e.g. question answering, information retrieval, document summarization) often model this variability by requiring a specific target meaning to be inferred from different text variants, it is helpful to capture text similarity in a directional manner to serve such inference needs. In this paper, we show how Wikipedia can be used as a semantic resource to build a directional inferential similarity metric between words, and subsequently, texts. Through experiments, we show that our Wikipedia-based metric performs significantly better when applied to a standard evaluation dataset, with a reduction in error rate of 16.1\% over the random metric baseline. ""
Weikum, Gerhard Chapter 3: Search for knowledge 1st Workshop on Search Computing Challenges and Directions, SeCo 2009, June 17, 2009 - June 19, 2009 Como, Italy 2010 [278]
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. In addition, Semantic-Web-style} ontologies, structured Deep-Web} sources, and Social-Web} networks and tagging communities can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This vision and position paper discusses opportunities and challenges along this research avenue. The technical issues to be looked into include knowledge harvesting to construct large knowledge bases, searching for knowledge in terms of entities and relationships, and ranking the results of such queries. ""
Weikum, Gerhard Harvesting, searching, and ranking knowledge on the web 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, February 9, 2009 - February 12, 2009 Barcelona, Spain 2009 [279]
There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style} ontologies [22] and reaching into Deep-Web} sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO} knowledge base [23, 24] and the NAGA} search engine [14] but also covering related projects. YAGO} is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet} with high accuracy and reconciled into a consistent RDF-style} semantic" graph. For further growing YAGO} from Web sources while retaining its high quality pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA} provides graph-template-based search over this data with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15 17]. YAGO} is publicly available and has been imported into various other knowledge-management projects including DB-pedia.} YAGO} shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19] Cimple/DBlife} [10 21] DBpedia} [3] Know-ItAll/TextRunner} [12 5] Kylin/KOG} [26 27] and the Libra technology [18 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities. """
Weiping, Wang; Peng, Chen & Bowen, Liu A self-adaptive explicit semantic analysis method for computing semantic relatedness using wikipedia 2008 International Seminar on Future Information Technology and Management Engineering, FITME 2008, November 20, 2008 - November 20, 2008 Leicestershire, United kingdom 2008 [280]
{{{2}}}
Welker, Andrea L. & Quintiliano, Barbara Information literacy: Moving beyond Wikipedia GeoCongress 2008: Geosustainability and Geohazard Mitigation, March 9, 2008 - March 12, 2008 New Orleans, LA, United states 2008 [281]
In the past, finding information was the challenge. Today, the challenge our students face is to sift through and evaluate the incredible amount of information available. This ability to find and evaluate information is sometimes referred to as information literacy. Information literacy relates to a student's ability to communicate, but, more importantly, information literate persons are well-poised to learn throughout life because they have learned how to learn. A series of modules to address information literacy were created in a collaborative effort between faculty in the Civil and Environmental Engineering Department at Villanova and the librarians at Falvey Memorial Library. These modules were integrated throughout the curriculum, from sophomore to senior year. Assessment is based on modified ACRL} (Association} of College and Research Libraries) outcomes. This paper will document the lessons learned in the implementation of this program and provide concrete examples of how to incorporate information literacy into geotechnical engineering classes. Copyright ASCE} 2008.
West, Andrew G.; Kannan, Sampath & Lee, Insup Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata? 3rd European Workshop on System Security, EUROSEC'10, April 13, 2010 - April 13, 2010 Paris, France 2010 [282]
Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85\% accuracy at 50\% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set. ""
Westerveid, Thijs; Rode, Henning; Os, Roel Van; Hiemstra, Djoerd; Ramirez, Georgina; Mihajlovie, Vojkan & Vries, Arjen P. De Evaluating structured information retrieval and multimedia retrieval using PF/Tijah 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007
We used a flexible XML} retrieval system for evaluating structured document retrieval and multimedia retrieval tasks in the context of the INEX} 2006 benchmarks. We investigated the differences between article and element retrieval for Wikipedia data as well as the influence of an elements context on its ranking. We found that article retrieval performed well on many tasks and that pinpointing the relevant passages inside an article may hurt more than it helps. We found that for finding images in isolation the associated text is a very good descriptor in the Wikipedia collection, but we were not very succesful at identifying relevant multimedia fragments consisting of a combination of text and images. Springer-Verlag} Berlin Heidelberg 2007.
Winter, Judith & Kuhne, Gerold Achieving high precisions with peer-to-peer is possible 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010 [283]
Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX.} However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval} tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval.} Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure} to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX} resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59\%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P} system achieved an official search quality comparable with the top-10 centralized solutions! 2010 Springer-Verlag} Berlin Heidelberg.
Witmer, Jeremy & Kalita, Jugal Extracting geospatial entities from Wikipedia ICSC 2009 - 2009 IEEE International Conference on Semantic Computing, September 14, 2009 - September 16, 2009 Berkeley, CA, United states 2009 [284]
This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM} recognizes place names in the corpus with a very high recall, close to 100\%, with an acceptable precision. The set of geospatial NEs} is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82\%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia. ""
Wong, Wilson; Liu, Wei & Bennamoun, Mohammed Featureless similarities for terms clustering using tree-traversing ants International Symposium on Practical Cognitive Agents and Robots, PCAR 2006, November 27, 2006 - November 28, 2006 Perth, WA, Australia 2006 [285]
Besides being difficult to scale between different domains and to handle knowledge fluctuations, the results of terms clustering presented by existing ontology engineering systems are far from desirable. In this paper, we propose a new version of ant-based method for clustering terms known as Tree-Traversing} Ants (TTA).} With the help of the Normalized Google Distance (NGD) and n of Wikipedia (nW) as measures for similarity and distance between terms, we attempt to achieve an adaptable clustering method that is highly scalable across domains. Initial experiments with two datasets show promising results and demonstrated several advantages that are not simultaneously present in standard ant-based and other conventional clustering methods. Copyright held by author.
Wongboonsin, Jenjira & Limpiyakorn, Yachai Wikipedia customization for organization's process asset management 2008 International Conference on Advanced Computer Theory and Engineering, ICACTE 2008, December 20, 2008 - December 22, 2008 Phuket, Thailand 2008 [286]
Mature organizations typically establish various process assets served as standards for work operations in their units. Process assets include policies, guidelines, standard process definitions, life cycle models, forms and templates, etc. These assets are placed in a repository called Organization's Process Asset Library or OPAL.} Working in a project will then utilize these assets and tailor organizational standard processes to suit for individual project processes. This research proposed an approach to establishing an organization's process asset library by customizing open source software- Wikipedia. The system is called WikiOPAL.} CMMI} is used as the referenced process improvement model for the establishment of organization's process assets in this work. We also demonstrated that Wikipedia can be properly used as an approach for constructing a process asset library in the collaborative environment. ""
Woodley, Alan & Geva, Shlomo NLPX at INEX 2006 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, December 17, 2006 - December 20, 2006 Dagstuhl Castle, Germany 2007
XML} information retrieval (XML-IR) systems aim to better fulfil users' information needs than traditional IR} systems by returning results lower than the document level. In order to use XML-IR} systems users must encapsulate their structural and content information needs in a structured query. Historically, these structured queries have been formatted using formal languages such as NEXI.} Unfortunately, formal query languages are very complex and too difficult to be used by experienced - let alone casual - users and are too closely bound to the underlying physical structure of the collection. INEX's} NLP} task investigates the potential of using natural language to specify structured queries. QUT} has participated in the NLP} task with our system NLPX} since its inception. Here, we discuss the changes we've made to NLPX} since last year, including our efforts to port NLPX} to Wikipedia. Second, we present the results from the 2006 INEX} track where NLPX} was the best performing participant in the Thorough and Focused tasks. Springer-Verlag} Berlin Heidelberg 2007.
Wu, Shih-Hung; Li, Min-Xiang; Yang, Ping-Che & Ku, Tsun Ubiquitous wikipedia on handheld device for mobile learning 6th IEEE International Conference on Wireless, Mobile and Ubiquitous Technologies in Education, WMUTE 2010, April 12, 2010 - April 16, 2010 Kaohsiung, Taiwan 2010 [287]
The hand-held systems and the wireless Internet access are widely available in recent years. However, the mobile learning with web content is still inconvenient. For example, the information has not well organized and it is uneasy to surf on the small screen of handheld device. We propose a mobile system based on the content of Wikipedia. Wikipedia is a free content resource and has abundant contents of text and pictures. We use the Wikipedia wrapper that we developed before, to develop the mobile-learning interface of cross-language and cross-platform applications. Our system can present the content of Wikipedia on the small screens of PDA, and can be use on mobile learning. A teaching scenario of mobile learning during a museum visiting is discussed in this paper. ""
Xavier, Clarissa Castella & Lima, Vera Lucia Strube De Construction of a domain ontological structure from Wikipedia 7th Brazilian Symposium in Information and Human Language Technology, STIL 2009, September 8, 2009 - September 11, 2009 Sao Carlos, Sao Paulo, Brazil 2010 [288]
Data extraction from Wikipedia for ontologies construction, enrichment and population is an emerging research field. This paper describes a study on automatic extraction of an ontological structure containing hyponymy and location relations from Wikipedia's Tourism category in Portuguese, illustrated with an experiment, and evaluation of its results. ""
Xu, Hongtao; Zhou, Xiangdong; Wang, Mei; Xiang, Yu & Shi, Baile Exploring Flickr's related tags for semantic annotation of web images ACM International Conference on Image and Video Retrieval, CIVR 2009, July 8, 2009 - July 10, 2009 Santorini Island, Greece 2009 [289]
Exploring social media resources, such as Flickr and Wikipedia to mitigate the difficulty of semantic gap has attracted much attention from both academia and industry. In this paper, we first propose a novel approach to derive semantic correlation matrix from Flickr's related tags resource. We then develop a novel conditional random field model for Web image annotation, which integrates the keyword correlations derived from Flickr, and the textual and visual features of Web images into an unified graph model to improve the annotation performance. The experimental results on real Web image data set demonstrate the effectiveness of the proposed keyword correlation matrix and the Web image annotation approach. ""
Xu, Jinsheng; Yilmaz, Levent & Zhang, Jinghua Agent simulation of collaborative knowledge processing in Wikipedia 2008 Spring Simulation Multiconference, SpringSim'08, April 14, 2008 - April 17, 2008 Ottawa, ON, Canada 2008 [290]
Wikipedia, a User Innovation Community (UIC), is becoming increasingly influential source of knowledge. The knowledge in Wikipedia is produced and processed collaboratively by UIC.} The results of this collaboration process present various seemingly complex patterns demonstrated by update history of different articles in Wikipedia. Agent simulation is a powerful method that is used to study the behaviors of complex systems of interacting and autonomous agents. In this paper, we study the collaborative knowledge processing in Wikipedia using a simple agent-based model. The proposed model considers factors including knowledge distribution among agents, number of agents, behavior of agents and vandalism. We use this model to explain content growth rate, number and frequency of updates, edit war and vandalism in Wikipedia articles. The results demonstrate that the model captures the important empirical aspects in collaborative knowledge processing in Wikipedia.
Yan, Ying; Wang, Chen; Zhou, Aoying; Qian, Weining; Ma, Li & Pan, Yue Efficient indices using graph partitioning in RDF triple stores 25th IEEE International Conference on Data Engineering, ICDE 2009, March 29, 2009 - April 2, 2009 Shanghai, China 2009 [291]
With the advance of the Semantic Web, varying RDF} data were increasingly generated, published, queried, and reused via the Web. For example, the DBpedia, a community effort to extract structured data from Wikipedia articles, broke 100 million RDF} triples in its latest release. Initiated by Tim Berners-Lee, likewise, the Linking Open Data (LOD) project has published and interlinked many open licence datasets which consisted of over 2 billion RDF} triples so far. In this context, fast query response over such large scaled data would be one of the challenges to existing RDF} data stores. In this paper, we propose a novel triple indexing scheme to help RDF} query engine fast locate the instances within a small scope. By considering the RDF} data as a graph, we would partition the graph into multiple subgraph pieces and store them individually, over which a signature tree would be built up to index the URIs.} When a query arrives, the signature tree index is used to fast locate the partitions that might include the matches of the query by its constant URIs.} Our experiments indicate that the indexing scheme dramatically reduces the query processing time in most cases because many partitions would be early filtered out and the expensive exact matching is only performed over a quite small scope against the original dataset. ""
Yang, Jingjing; Li, Yuanning; Tian, Yonghong; Duan, Lingyu & Gao, Wen A new multiple kernel approach for visual concept learning 15th International Multimedia Modeling Conference, MMM 2009, January 7, 2009 - January 9, 2009 Sophia-Antipolis, France 2009 [292]
In this paper, we present a novel multiple kernel method to learn the optimal classification function for visual concept. Although many carefully designed kernels have been proposed in the literature to measure the visual similarity, few works have been done on how these kernels really affect the learning performance. We propose a Per-Sample} Based Multiple Kernel Learning method (PS-MKL) to investigate the discriminative power of each training sample in different basic kernel spaces. The optimal, sample-specific kernel is learned as a linear combination of a set of basic kernels, which leads to a convex optimization problem with a unique global optimum. As illustrated in the experiments on the Caltech 101 and the Wikipedia MM} dataset, the proposed PS-MKL} outperforms the traditional Multiple Kernel Learning methods (MKL) and achieves comparable results with the state-of-the-art methods of learning visual concepts. 2008 Springer Berlin Heidelberg.
Yang, Kai-Hsiang; Chen, Chun-Yu; Lee, Hahn-Ming & Ho, Jan-Ming EFS: Expert finding system based on wikipedia link pattern analysis 2008 IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, October 12, 2008 - October 15, 2008 Singapore, Singapore 2008 [293]
Building an expert finding system is very important for many applications especially in the academic environment. Previous work uses e-mails or web pages as corpus to analyze the expertise for each expert. In this paper, we present an Expert Finding System, abbreviated as EFS} to build experts' profiles by using their journal publications. For a given proposal, the EFS} first looks up the Wikipedia web site to get relative link information, and then list and rank all associated experts by using those information. In our experiments, we use a real-world dataset which comprises of 882 people and 13,654 papers, and are categorized into 9 expertise domains. Our experimental results show that the EFS} works well on several expertise domains like Artificial} Intelligence" and {"Image} Pattern Recognition" etc. """
Yap, Poh-Hean; Ong, Kok-Leong & Wang, Xungai Business 2.0: A novel model for delivery of business services 5th International Conference on Service Systems and Service Management, ICSSSM'08, June 30, 2008 - July 2, 2008 Melbourne, Australia 2008 [294]
Web 2.0, regardless of the exact definition, has proven to bring about significant changes to the way the Internet was used. Evident by key innovations such as Wikipedia, FaceBook, YouTube, and Blog sites, these community-based Website in which contents are generated and consumed by the same group of users are changing the way businesses operate. Advertisements are no longer 'forced' upon the viewers but are instead 'intelligently' targeted based on the contents of interest. In this paper, we investigate the concept of Web 2.0 in the context of business entities. We asked if Web 2.0 concepts could potentially lead to a change of paradigm or the way businesses operate today. We conclude with a discussion of a Web 2.0 application we recently developed that we think is an indication that businesses will ultimately be affected by these community-based technologies; thus bringing about Business 2.0 - a paradigm for businesses to cooperate with one another to deliver improved products and services to their own customers. ""
Yuan, Pingpeng; Wang, Guoyin; Zhang, Qin & Jin, Hai SASL: A semantic annotation system for literature International Conference on Web Information Systems and Mining, WISM 2009, November 7, 2009 - November 8, 2009 Shanghai, China 2009 [295]
Due to ambiguity, search engines for scientific literatures may not return right search results. One efficient solution to the problems is to automatically annotate literatures and attach the semantic information to them. Generally, semantic annotation requires identifying entities before attaching semantic information to them. However, due to abbreviation and other reasons, it is very difficult to identify entities correctly. The paper presents a Semantic Annotation System for Literature (SASL), which utilizes Wikipedia as knowledge base to annotate literatures. SASL} mainly attaches semantic to terminology, academic institutions, conferences, and journals etc. Many of them are usually abbreviations, which induces ambiguity. Here, SASL} uses regular expressions to extract the mapping between full name of entities and their abbreviation. Since full names of several entities may map to a single abbreviation, SASL} introduces Hidden Markov Model to implement name disambiguation. Finally, the paper presents the experimental results, which confirm SASL} a good performance. ""
Zacharouli, Polyxeni; Titsias, Michalis & Vazirgiannis, Michalis Web page rank prediction with PCA and em clustering 6th International Workshop on Algorithms and Models for the Web-Graph, WAW 2009, February 12, 2009 - February 13, 2009 Barcelona, Spain 2009 [296]
In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA).} These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM} algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA} so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines. 2009 Springer Berlin Heidelberg.
Zhang, Congle; Xue, Gui-Rong & Yu, Yong Knowledge supervised text classification with no labeled documents 10th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2008, December 15, 2008 - December 19, 2008 Hanoi, Viet nam 2008 [297]
In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called Knowledge} Supervised Learning"(KSL) which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL} problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1} on average which is much better than baselines and even comparable against SVM} classifier supervised by labeled documents. 2008 Springer Berlin Heidelberg."
Zhang, Xu; Song, Yi-Cheng; Cao, Juan; Zhang, Yong-Dong & Li, Jin-Tao Large scale incremental web video categorization 1st International Workshop on Web-Scale Multimedia Corpus, WSMC'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09, October 19, 2009 - October 24, 2009 Beijing, China 2009 [298]
With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology for organizing the huge amount of videos. In this paper we investigate the characteristics of web videos, and make two contributions for the large scale incremental web video categorization. First, we develop an effective semantic feature space Concept Collection for Web Video with Categorization Distinguishability (CCWV-CD), which is consisted of concepts with small semantic gap, and the concept correlations are diffused by a novel Wikipedia Propagation (WP) method. Second, we propose an incremental support vector machine with fixed number of support vectors (n-ISVM) for large scale incremental learning. To evaluate the performance of CCWV-CD, WP} and N-ISVM, we conduct extensive experiments on the dataset of 80,021 most representative videos on a video sharing website. The experiment results show that the CCWV-CD} and WP} is more representative for web videos, and the N-ISVM} algorithm greatly improves the efficiency in the situation of incremental learning. ""
Zhang, Yi; Sun, Aixin; Datta, Anwitaman; Chang, Kuiyu & Lim, Ee-Peng Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010 [299]
Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT.} We identified 409 Wikipedia articles matching TKB} records, and went ahead to study them from three aspects: creation, revision, and link evolution. We found that the knowledge building in Wikipedia had largely been independent, and did not follow TKB} - despite the open and online availability of the latter, as well as awareness of at least some of the Wikipedia contributors about the TKB} source. In an attempt to identify possible reasons, we conducted a detailed analysis of contribution behavior demonstrated by Wikipedians. It was found that most Wikipedians contribute to a relatively small set of articles each. Their contribution was biased towards one or very few article(s). At the same time, each article's contributions are often championed by very few active contributors including the article's creator. We finally arrive at a conjecture that the contributions in Wikipedia are more to cover knowledge at the article level rather than at the domain level. ""
Zhou, Zhi; Tian, Yonghong; Li, Yuanning; Huang, Tiejun & Gao, Wen Large-scale cross-media retrieval of wikipediaMM images with textual and visual query expansion 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, September 17, 2008 - September 19, 2008 Aarhus, Denmark 2009 [300]
In this paper, we present our approaches for the WikipediaMM} task at ImageCLEF} 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two meta-search tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably. 2009 Springer Berlin Heidelberg.
Zirn, Cacilia; Nastase, Vivi & Strube, Michael Distinguishing between instances and classes in the wikipedia taxonomy 5th European Semantic Web Conference, ESWC 2008, June 1, 2008 - June 5, 2008 Tenerife, Canary Islands, Spain 2008 [301]
This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet} and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc.} On the subnetwork shared by our taxonomy and ResearchCyc} we report 84.52\% accuracy. 2008 Springer-Verlag} Berlin Heidelberg.
Focused Retrieval and Evaluation - 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Revised and Selected Papers 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, December 7, 2009 - December 9, 2009 Brisbane, QLD, Australia 2010
The proceedings contain 42 papers. The topics discussed include: is there something quantum-like about the human mental lexicon?; supporting for real-world tasks: producing summaries of scientific articles tailored to the citation context; semantic document processing using wikipedia as a knowledge base; a methodology for producing improved focused elements; use of language model, phrases and wikipedia forward links for INEX} 2009; combining language models with NLP} and interactive query expansion; exploiting semantic tags in XML} retrieval; the book structure extraction competition with the resurgence software at Caen university; ranking and fusion approaches for XML} book retrieval; index tuning for efficient proximity-enhanced query processing; fast and effective focused retrieval; combining term-based and category-based representations for entity search; and focused search in books and wikipedia: categories, links and relevance feedback.
IEEE Pacific Visualization Symposium 2010, PacificVis 2010 - Proceedings IEEE Pacific Visualization Symposium 2010, PacificVis 2010, March 2, 2010 - March 5, 2010 Taipei, Taiwan 2010
The proceedings contain 27 papers. The topics discussed include: quantitative effectiveness measures for direct volume rendered images; shape-based transfer functions for volume visualization; volume visualization based on statistical transfer-function spaces; volume exploration using ellipsoidal Gaussian transfer functions; stack zooming for multi-focus interaction in time-series data visualization; a layer-oriented interface for visualizing time-series data from oscilloscopes; wikipediaviz: conveying article quality for casual wikipedia readers; caleydo: design and evaluation of a visual analysis framework for gene expression data in its biological context; visualizing field-measured seismic data; seismic volume visualization for horizon extraction; and verification of the time evolution of cosmological simulations via hypothesis-driven comparative and quantitative visualization.
JCDL'10 - Digital Libraries - 10 Years Past, 10 Years Forward, a 2020 Vision 10th Annual Joint Conference on Digital Libraries, JCDL 2010, June 21, 2010 - June 25, 2010 Gold Coast, QLD, Australia 2010
The proceedings contain 66 papers. The topics discussed include: making web annotations persistent over time; transferring structural markup across translations using multilingual alignment and projection; scholarly paper recommendation via user's recent research interests; effective self-training author name disambiguation in scholarly digital libraries; evaluating methods to rediscover missing web pages from the web infrastructure; exploiting time-based synonyms in searching document archives; using word sense discrimination on historic document collections; Chinese calligraphy specific style rendering system; do wikipedians follow domain experts? a domain-specific study on wikipedia knowledge building; crowdsourcing the assembly of concept hierarchies; a user-centered design of a personal digital library for music exploration; and improving mood classification in music digital libraries by combining lyrics and audio.
Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR'10 6th Workshop on Geographic Information Retrieval, GIR'10, February 18, 2010 - February 19, 2010 Zurich, Switzerland 2010
The proceedings contain 24 papers. The topics discussed include: linkable geographic ontologies; unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation; an ontology of place and service types to facilitate place-affordance geographic information retrieval; Geotagging: using proximity, sibling, and prominence clues to understand comma groups; evaluation of georeferencing; a GIR} architecture with semantic-flavored query reformulation; OGC} catalog service for heterogeneous earth observation metadata using extensible search indices; TWinner:} understanding news queries with geo-content using Twitter; geographical classification of documents using evidence from Wikipedia; a web platform for the evaluation of vernacular place names in automatically constructed gazetteers; grounding toponyms in an Italian local news corpus; and using the geographic scopes of web documents for contextual advertising.
2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009 2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2009, November 11, 2009 - November 14, 2009 Washington, DC, United states 2009
The proceedings contain 68 papers. The topics discussed include: multi-user multi-account interaction in groupware supporting single-display collaboration; supporting collaborative work through flexible process execution; dynamic data services: data access for collaborative networks in a multi-agent systems architecture; integrating external user profiles in collaboration applications; a collaborative framework for enforcing server commitments, and for regulating server interactive behavior in SOA-based} systems; CASTLE:} a social framework for collaborative anti-phishing databases; VisGBT:} visually analyzing evolving datasets for adaptive learning; an IT} appliance for remote collaborative review of mechanisms of injury to children in motor vehicle crashes; user contribution and trust in Wikipedia; and a new perspective on experimental analysis of N-tier systems: evaluating database scalability, multi-bottlenecks, and economical operation.
Internet and Other Electronic Resources for Materials Education 2007 136th TMS Annual Meeting, 2007, Febrary 25, 2007 - March 1, 2007 Orlando, FL, United states 2007
The proceedings contain 1 papers. The topics discussed include: Wikipedia in materials education.
Natural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, June 27, 2007 - June 29, 2007 Paris, France 2007
The proceedings contain 42 papers. The topics discussed include: an alternative approach to tagging; an efficient denotational semantics for natural language database queries; developing methods and heuristics with low time complexities for filtering spam messages; exploit semantic information for category annotation recommendation in wikipedia; a lightweight approach to semantic annotation of research papers; a new text clustering method using hidden markov model; identifying event sequences using hidden markov model; selecting labels for news document clusters; generating ontologies via language components and ontology reuse; experiences using the researchcyc upper level ontology; ontological text mining of software documents; treatment of passive voice and conjunctions in use case documents; and natural language processing and the conceptual model self-organizing map; and automatic issue extraction from a focused dialogue.
Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007
The proceedings contain 20 papers. The topics discussed include: evaluation of datalog extended with an XPath predicate; data allocation scheme based on term weight for P2P information retrieval; distributed monitoring of peer to peer systems; self-optimizing block transfer in web service grids; supporting personalized top-k skyline queries using partial compressed skycube; toward editable web browser: edit-and-propagate operation for web browsing; mining user navigation patterns for personalizing topic directories; an online PPM} prediction model for web prefetching; extracting the discussion structure in comments on news-articles; pattern detection from web using AFA set theory; using neighbors to date web documents; on improving wikipedia search using article quality; and SATYA: a reputation-based approach for service discovery and selection in service oriented architectures.
Tamagawa, Susumu; Sakurai, Shinya; Tejima, Takuya; Morita, Takeshi; Izumi, Noriaki & Yamaguchi, Takahira Learning a Large Scale of Ontology from Japanese Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
Here is discussed how to learn a large scale of ontology from Japanese Wikipedia. The learned ontology includes the following properties: Rdfs:subClassOf} (IS-A} relationships), rdf:type (class-instance relationships), Owl:Object/DatatypeProperty} (Infobox} triples), rdfs:domain (property domains), and Skos:altLabel} (synonyms). Experimental case studies show us that the learned Japanese Wikipedia Ontology goes better than already existing general linguistic ontologies, such as EDR} and Japanese WordNet, from the points of building costs and structure information richness.
Jing, Liping; Yun, Jiali; Yu, Jian & Huang, Houkuan Text Clustering via Term Semantic Units Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
How best to represent text data is an important problem in text mining tasks including information retrieval, clustering, classification and etc.. In this paper, we proposed a compact document representation with term semantic units which are identified from the implicit and explicit semantic information. Among it, the implicit semantic information is extracted from syntactic content via statistical methods such as latent semantic indexing and information bottleneck. The explicit semantic information is mined from the external semantic resource (Wikipedia).} The proposed compact representation model can map a document collection in a low-dimension space (term semantic units which are much less than the number of all unique terms). Experimental results on real data sets have shown that the compact representation efficiently improve the performance of text clustering.
Breuing, Alexa Improving Human-Agent Conversations by Accessing Contextual Knowledge from Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context of the KnowCIT} project we want to improve human-agent conversations by connecting the agent to an adequate representation of such contextual knowledge drawn from the online encyclopedia Wikipedia. Thereby we make use of additional components provided by Wikipedia which goes beyond encyclopedical information to identify the current dialog topic and to implement human like look-up abilities.
Salahli, M.A.; Gasimzade, T.M. & Guliyev, A.I. Domain specific ontology on computer science Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on 2009
In this paper we introduce the application system based on the domain specific ontology. Some design problems of the ontology are discussed. The ontology is based on the WordNet's} database and consists of Turkish and English terms on computer science and informatics. Second we present the method for determining a set of words, which are related to a given concept and computing the degree of semantic relatedness between them. The presented method has been used for semantic searching process, which is carried out by our application.
Yang, Kai-Hsiang; Kuo, Tai-Liang; Lee, Hahn-Ming & Ho, Jan-Ming A Reviewer Recommendation System Based on Collaborative Intelligence Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on 2009
In this paper, expert-finding problem is transformed to a classification issue. We build a knowledge database to represent the expertise characteristic of domain from web information constructed by collaborative intelligence, and an incremental learning method is proposed to update the database. Furthermore, results are ranked by measuring the correlation in the concept network from online encyclopedia. In our experiments, we use the real world dataset which comprise 2,701 experts who are categorized into 8 expertise domains. Our experimental results show that the expertise knowledge extracted from collaborative intelligence can improve efficiency and effect of classification and increase the precision of ranking expert at least 20\%.
Mishra, Surjeet; Gorai, Amarendra; Oberoi, Tavleen & Ghosh, Hiranmay Efficient Visualization of Content and Contextual Information of an Online Multimedia Digital Library for Effective Browsing Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
In this paper, we present a few innovative techniques for visualization of content and contextual information of a multimedia digital library for effective browsing. A traditional collection visualization portal often depicts some metadata or a short synopsis, which is quite inadequate for assessing the documents. We have designed a novel web portal that incorporates a few preview facilities to disclose an abstract of the contents. Moreover, we place the documents on Google Maps to make its geographical context explicit. A semantic network, created automatically around the collection, brings out other contextual information from external knowledge resources like Wikipedia which is used for navigating collection. This paper also reports economical hosting techniques using Amazon Cloud.
Jinwei, Fu; Jianhong, Sun & Tianqing, Xiao A FAQ online system based on wiki E-Health Networking, Digital Ecosystems and Technologies (EDT), 2010 International Conference on 2010
In this paper, we will propose a FAQ} online system based on wiki engine. The goal of this system is to reduce the counseling workload in our university. It is also can be used in other counseling field. The proposed system will be built based on one of the popular wiki engines, TikiWiki.} Actually, the function of the proposed system has gone far beyond the FAQ-platform} functionality in practical application process, due to wiki wonderful concept and characteristics.
Martins, A.; Rodrigues, E. & Nunes, M. Information repositories and learning environments: Creating spaces for the promotion of virtual literacy and social responsibility International Association of School Librarianship. Selected Papers from the ... Annual Conference 2007 [302]
Information repositories are collections of digital information which can be built in several different ways and with different purposes. They can be collaborative and with a soft control of the contents and authority of the documents, as well as directed to the general public (Wikipedia} is an example of this). But they can also have a high degree of control and be conceived in order to promote literacy and responsible learning, as well as directed to special groups of users like, for instance, school students. In the new learning environments built upon digital technologies, the need to promote quality information resources that can support formal and informal e-learning emerges as one of the greatest challenges that school libraries have to face. It is now time that school libraries, namely through their regional and national school library networks, start creating their own information repositories, oriented for school pupils and directed to their specific needs of information and learning. The creation of these repositories implies a huge work of collaboration between librarians, school teachers, pupils, families and other social agents that interact within the school community, which is, in itself, a way to promote cooperative learning and social responsibility between all members of such communities. In our presentation, we will discuss the bases and principles that are behind the construction of the proposed information repositories and learning platforms as well as the need for a constant dialogue between technical and content issues.
Lucchese, C.; Orlando, S.; Perego, R.; Silvestri, F. & Tolomei, G. Detecting Task-Based Query Sessions Using Collaborative Knowledge Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE).} The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.
Cover Art Computational Aspects of Social Networks, 2009. CASON '09. International Conference on 2009
The following topics are dealt with: online social network; pattern clustering; Web page content; Wikipedia article; learning management system; Web database descriptor; genetic algorithm; face recognition; interactive robotics; and security of data.
Liu, Lei & Tan, Pang-Ning A Framework for Co-classification of Articles and Users in Wikipedia Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.
Ohmori, K. & Kunii, T.L. Author Index Cyberworlds, 2007. CW '07. International Conference on 2007
The mathematical structure of cyberworlds is clarified based on the duality of homology lifting property and homotopy extension property. The duality gives bottom-up and top-down methods to model, design and analyze the structure of cyberworlds. The set of homepages representing a cyberworld is transformed into a state finite machine. In development of the cyberworld, a sequence of finite state machines is obtained. This sequence has homotopic property. This property is clarified to map a finite state machine to a simplicial complex. Wikipedia, bottom-up network construction and top-down network analysis are described as examples.
Missen, M.M.S. & Boughanem, M. Sentence-Level Opinion-Topic Association for Opinion Detection in Blogs Advanced Information Networking and Applications Workshops, 2009. WAINA '09. International Conference on 2009
The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach tries to tackle opinion detection problem by using some document level heuristics and processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query terms expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.2177 with improvement of 28.89\% over baseline results obtained through BM25} matching formula. TREC} Blog 2006 data is used as test data collection.
Baeza-Yates, R. Keynote Speakers Web Congress, 2009. LE-WEB '09. Latin American 2009
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
Alemzadeh, Milad & Karray, Fakhri An Efficient Method for Tagging a Query with Category Labels Using Wikipedia towards Enhancing Search Engine Results Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
This paper intends to present a straightforward, extensive, and noise resistant method for efficiently tagging a web query, submitted to a search engine, with proper category labels. These labels are intended to represent the closest categories related to the query which can ultimately be used to enhance the results of any typical search engine by either restricting the results to matching categories or enriching the query itself. The presented method effectively rules out noise words within a query, forms the optimal keyword packs using a density function, and returns a set of category labels which represent the common topics of the given query using Wikipedia category hierarchy.
Indrie, Sergiu & Groza, Adrian Towards social argumentative machines Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on 2010
This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The Argnet system was developed based on the Semantic MediaWiki} framework and on the Argument Interchange Format ontology.
Liu, Ming-Chi; Wen, Dunwei; Kinshuk & Huang, Yueh-Min Learning Animal Concepts with Semantic Hierarchy-Based Location-Aware Image Browsing and Ecology Task Generator Wireless, Mobile and Ubiquitous Technologies in Education (WMUTE), 2010 6th IEEE International Conference on 2010
This study firstly notices that lack of overall ecologic knowledge structure is one critical reason for learners' failure of keyword search. Therefore in order to identify their current interesting sight, the dynamic location-aware and semantic hierarchy (DLASH) is presented for learners to browse images. This hierarchy mainly considers that plant and animal species are discontinuously distributed around the planet, hence this hierarchy combines location information for constructing the semantic hierarchy through WordNet.} After learners confirmed their intent information needs, this study also provides learners three kinds of image-based learning tasks to learn: similar-images comparison, concept map fill-out and placement map fill-out. These tasks are designed based on Ausubel's advance organizers and improved it by integrating three new properties: Displaying the nodes of the concepts by authentic images, automatically generating the knowledge structure by computer and interactively integrating new and old knowledge.
Takemoto, M.; Yokohata, Y.; Tokunaga, T.; Hamada, M. & Nakamura, T. Demo: Implementation of Information-Provision Service with Smart Phone and Field Trial in Shopping Area Mobile and Ubiquitous Systems: Networking \& Services, 2007. MobiQuitous 2007. Fourth Annual International Conference on 2007
To achieve the information-provision service, we adopted the social network concept (http://en.wikipedia.org/wiki/Social\_network\_service), which handles human relationships in networks. We have implemented the information recommendation mechanism, by which users may obtain suitable information from the system based on relationships with other users in the social network service. We believe that information used by people should be handled based on their behavior. We have developed an information-provision service based on our platform. We have been studying and developing the service coordination and provision architecture - ubiquitous service-oriented network (USON) (Takemoto} et al., 2002) - for services in ubiquitous computing environments. We have developed an information-provision service using the social network service based on USON} architecture. This demonstration shows the implementation of the information-provision system with the actual information which was used in the field trial.
Ayyasamy, Ramesh Kumar; Tahayna, Bashar; Alhashmi, Saadat; gene, Siew Eu & Egerton, Simon Mining Wikipedia Knowledge to improve document indexing and classification Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on 2010
Weblogs are an importan source of information that requires automatic techniques to categorize them into “topic-based‿ content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes Wikipedia. The Knowledge Base-Wikipedia} is considered as a large scale Web encyclopaedia, that has high-quality and huge number of articles and categorical indexes. Using this system, our proposed framework consists of two stages to solve weblog classification problem. The first stage is to find out the terms belonging to a unique concept (article), as well as to disambiguate the terms belonging to more than one concept. The second stage is the determination of the categories to which these found concepts belong to. Experimental result confirms that, proposed system can distinguish the weblogs that belongs to more than one category efficiently and has a better performance and success than the traditional statistical Natural Language Processing-NLP} approaches.
Malone, T.W. Collective intelligence Collaborative Technologies and Systems, 2007. CTS 2007. International Symposium on 2007
While people have talked about collective intelligence for decades, new communication technologies - especially the Internet - now allow huge numbers of people all over the planet to work together in new ways. The recent successes of systems like Google and Wikipedia suggest that the time is now ripe for many more such systems, and this talk will examine ways to take advantage of these possibilities. Using examples from business, government, and other areas, the talk will address the fundamental question: How can people and computers be connected so that - collectively - they act more intelligently than any individuals, groups, or computers have ever done before?
Zeng, Honglei; Alhossaini, Maher A.; Fikes, Richard & McGuinness, Deborah L. Mining Revision History to Assess Trustworthiness of Article Fragments Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on 2006
Wikis are a type of collaborative repository system that enables users to create and edit shared content on the Web. The popularity and proliferation of Wikis have created a new set of challenges for trust research because the content in a Wiki can be contributed by a wide variety of users and can change rapidly. Nevertheless, most Wikis lack explicit trust management to help users decide how much they should trust an article or a fragment of an article. In this paper, we investigate the dynamic nature of revisions as we explore ways of utilizing revision history to develop an article fragment trust model. We use our model to compute trustworthiness of articles and article fragments. We also augment Wikis with a trust view layer with which users can visually identify text fragments of an article and view trust values computed by our model
qdah, Majdi Al & Falzi, Aznan An Educational Game for School Students World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [303]
Abrial, J. -R. & Hoang, Thai Son Using Design Patterns in Formal Methods: An Event-B Approach Proceedings of the 5th international colloquium on Theoretical Aspects of Computing 2008 [304]
{Emphasis Motivation.Emphasis Formal Methods users are given sophisticated languages and tools for constructing models of complex systems. But quite often they lack some systematic methodological approaches which could help them. The goal of introducing design patterns within formal methods is precisely to bridge this gap. Emphasis A design pattern Emphasis is a general reusable solution to a commonly occurring problem in (software) design . . . It is a description or template for how to solve a problem that can be used in many different situations (Wikipedia on "Design Pattern").
Adafre, Sisay Fissaha & de Rijke, Maarten Discovering missing links in Wikipedia Proceedings of the 3rd international workshop on Link discovery 2005 [305]
In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank} and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.
Adams, Catherine Learning Management Systems as sites of surveillance, control, and corporatization: A review of the critical literature Society for Information Technology \& Teacher Education International Conference 2010 [306]
Al-Senaidi, Said Integrating Web 2.0 in Technology based learning environment World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [307]
Allen, Matthew Authentic Assessment and the Internet: Contributions within Knowledge Networks World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [308]
Allen, Nancy; Alnaimi, Tarfa Nasser & Lubaisi, Huda Ak Leadership for Technology Adoption in a Reform Community World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [309]
Allen, R.B. & Nalluru, S. Exploring history with narrative timelines Human Interface and the Management of Information. Designing Information Environments. Symposium on Human Interface 2009, 19-24 July 2009 Berlin, Germany 2009 [310]
We develop novel timeline interfaces which separate the events in timelines into threads and then allow users to select among them. This interface is illustrated with five threads describing the causes of the American Civil War. In addition to selecting each of the threads, the sequence of events it describes can be played. That is, the user can step through the sequence of events and get a description of each event in the context of its thread. In addition, many of the events have links to more focused timelines and to external resources such as Wikipedia.
Amin, Mohammad Shafkat; Bhattacharjee, Anupam & Jamil, Hasan Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names Proceedings of the 2010 ACM Symposium on Applied Computing 2010 [311]
As the volume of information available on the internet is growing exponentially, it is clear that most of this information will have to be processed and digested by computers to produce useful information for human consumption. Unfortunately, most web contents are currently designed for direct human consumption in which it is assumed that a human will decipher the information presented to him in some context and will be able to connect the missing dots, if any. In particular, information presented in some tabular form often does not accompany descriptive titles or column names similar to attribute names in tables. While such omissions are not really an issue for humans, it is truly hard to extract information in autonomous systems in which a machine is expected to understand the meaning of the table presented and extract the right information in the context of the query. It is even more difficult when the information needed is distributed across the globe and involve semantic heterogeneity. In this paper, our goal is to address the issue of how to interpret tables with missing column names by developing a method for the assignment of attributes names in an arbitrary table extracted from the web in a fully autonomous manner. We propose a novel approach by leveraging Wikipedia for the first time for column name discovery for the purpose of table annotation. We show that this leads to an improved likelihood of capturing the context and interpretation of the table accurately and producing a semantically meaningful query response.
Ammann, Alexander & Matthies, Herbert K. K-Space DentMed/Visual Library: Generating and Presenting Dynamic Knowledge Spaces for Dental Research, Education, Clinical and Laboratory Practice World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [312]
Anderka, M.; Lipka, N. & Stein, B. Evaluating cross-language explicit semantic analysis and cross querying Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [313]
This paper describes our participation in the TEL@CLEF} task of the CLEF} 2009 ad-hoc track. The task is to retrieve items from various multilingual collections of library catalog records, which are relevant to a user's query. Two different strategies are employed: (i) the Cross-Language} Explicit Semantic Analysis, CL-ESA, where the library catalog records and the queries are represented in a multilingual concept space that is spanned by aligned Wikipedia articles, and, (ii) a Cross Querying approach, where a query is translated into all target languages using Google Translate and where the obtained rankings are combined. The evaluation shows that both strategies outperform the monolingual baseline and achieve comparable results. Furthermore, inspired by the Generalized Vector Space Model we present a formal definition and an alternative interpretation of the CL-ESA} model. This interpretation is interesting for real-world retrieval applications since it reveals how the computational effort for CL-ESA} can be shifted from the query phase to a preprocessing phase.
Angel, Albert; Lontou, Chara; Pfoser, Dieter & Efentakis, Alexandros Qualitative geocoding of persistent web pages Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems 2008 [314]
Information and specifically Web pages may be organized, indexed, searched, and navigated using various metadata aspects, such as keywords, categories (themes), and also space. While categories and keywords are up for interpretation, space represents an unambiguous aspect to structure information. The basic problem of providing spatial references to content is solved by geocoding; a task that relates identifiers in texts to geographic co-ordinates. This work presents a methodology for the semiautomatic geocoding of persistent Web pages in the form of collaborative human intervention to improve on automatic geocoding results. While focusing on the Greek language and related Web pages, the developed techniques are universally applicable. The specific contributions of this work are (i) automatic geocoding algorithms for phone numbers, addresses and place name identifiers and (ii) a Web browser extension providing a map-based interface for manual geocoding and updating the automatically generated results. With the geocoding of a Web page being stored as respective annotations in a central repository, this overall mechanism is especially suited for persistent Web pages such as Wikipedia. To illustrate the applicability and usefulness of the overall approach, specific geocoding examples of Greek Web pages are presented.
Anma, Fumihiko & Okamoto, Toshio Development of a Participatory Learning Support System based on Social Networking Service World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [315]
Antin, Judd & Cheshire, Coye Readers are not free-riders: reading as a form of participation on wikipedia Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [316]
The success of Wikipedia as a large-scale collaborative effort has spurred researchers to examine the motivations and behaviors of Wikipedia's participants. However, this research has tended to focus on active involvement rather than more common forms of participation such as reading. In this paper we argue that Wikipedia's readers should not all be characterized as free-riders -- individuals who knowingly choose to take advantage of others' effort. Furthermore, we illustrate how readers provide a valuable service to Wikipedia. Finally, we use the notion of legitimate peripheral participation to argue that reading is a gateway activity through which newcomers learn about Wikipedia. We find support for our arguments in the results of a survey of Wikipedia usage and knowledge. Implications for future research and design are discussed.
Anzai, Yayoi Digital Trends among Japanese University Students: Focusing on Podcasting and Wikis World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [317]
Anzai, Yayoi Interactions as the key for successful Web 2.0 integrated language learning: Interactions in a planetary community World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [318]
Anzai, Yayoi Introducing a Wiki in EFL Writing Class World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [319]
Aoki, Kumiko & Molnar, Pal International Collaborative Learning using Web 2.0: Learning of Foreign Language and Intercultural Understanding Global Learn Asia Pacific 2010 [320]
Arney, David Cooperative e-Learning and other 21st Century Pedagogies World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [321]
Ashraf, Bill Teaching the Google–Eyed YouTube Generation World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [322]
Atkinson, Tom Cell-Based Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [323]
Auer, Sören & Lehmann, Jens What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content Proceedings of the 4th European conference on The Semantic Web: Research and Applications 2007 [324]
Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF} statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used.
Avgerinou, Maria & Pettersson, Rune How Multimedia Research Can Optimize the Design of Instructional Vodcasts World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [325]
Aybar, Hector; Juell, Paul & Shanmugasundaram, Vijayakumar Increased Flexablity in Display of Course Content World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [326]
Baeza-Yates, R. Mining the Web 2.0 to improve search 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA} 2009 [327]
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show several applications of mining the wisdom of crowds behind UGC} to improve search. These results not only impact the search performance but also the user interface, suggesting new ways of interaction. We will show live demos that find relations in the Wikipedia or improve image search, already available at sandbox.yahoo.com, the demo site of Yahoo! Research. Our final goal is to produce a virtuous data feedback circuit to leverage the Web itself.
Baeza-Yates, Ricardo User generated content: how good is it? Proceedings of the 3rd workshop on Information credibility on the web 2009 [328]
User Generated Content (UGC) is one of the main current trends in the Web. This trend has allowed all people that can access the Internet to publish content in different media, such as text (e.g. blogs), photos or video. This data can be crucial for many applications, in particular for semantic search. It is early to say which impact UGC} will have and to what extent. However, the impact will be clearly related to the quality of this content. Hence, how good is the content that people generate in the so called Web 2.0? Clearly is not as good as editorial content in the Web site of a publisher. However, histories of success such as the case of the Wikipedia, show that it can be quite good. In addition, the quality gap is balanced by volume, as user generated content is much larger than, say, editorial content. In fact, Ramakrishnan and Tomkins estimate that UGC} generates daily from 8 to {10GB} while the professional Web only generates {2GB} in the same time. How we can estimate the quality of UGC?} One possibility is to directly evaluate the quality, but that is not easy as depends on the type of content and the availability of human judgments. One example of such approach is the study of Yahoo! Answers done by Agichtein et al. In this work they start from a judged question/answer collection where good questions usually have good answers. Then they predict good questions and good answers, obtaining an AUC} (area under the curve of the precision-recall graph) of 0.76 and 0.88, respectively. A second possibility is obtaining indirect evidence of the quality. For example, use UGC} for a given task and then evaluate the quality of the task results. One such example is the extraction of semantic relations done by Baeza-Yates} and Tiberi. To evaluate the quality of the results they used the Open Directory Project (ODP), showing that the results had a precision of over 60\%. For the cases that were not found in the ODP, a manually verified sample showed that the real precision was close to 100\%. What happened was that the ODP} was not specific enough to contain very specific relations, and every day the problem gets worse as we have more data. This example shows the quality of ODP} as well as the semantic encoded in queries. Notice that we can define queries as implicit UGC, because each query can be considered an implicit tag to Web pages that are clicked for that query, and hence we have an implicit folksonomy. A final alternative is crossing different UGC} sources and infer from there the quality of those sources. An example of this case, is the work by Van Zwol et al. where they use collective knowledge (wisdom of crowds) to extend image tags, and prove that almost 70\% of the tags can be semantically classified by using Wordnet and Wikipedia. This exposes the quality of both Flickr tags and Wikipedia. Our main motivation, is that by being able to generate semantic resources automatically from the Web (and in particular the Web 2.0), even with noise, coupling that with open content resources, we can create a virtuous feedback circuit. In fact, explicit and implicit folksonomies can be used to do supervised machine learning without the need of manual intervention (or at least drastically reduce it) to improve semantic tagging. After that, we can feedback the results on itself, and repeat the process. Using the right conditions, every iteration should improve the output, obtaining a virtuous cycle. As a side effect, we can also improve Web search, our main goal.
Baker, Peter; Xiao, Yun & Kidd, Jennifer Digital Natives and Digital Immigrants: A Comparison across Course Tasks and Delivery Methodologies Society for Information Technology \& Teacher Education International Conference 2010 [329]
Bakker, A.; Petrocco, R.; Dale, M.; Gerber, J.; Grishchenko, V.; Rabaioli, D. & Pouwelse, J. Online Video Using BitTorrent And HTML5 Applied To Wikipedia 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P 2010), 25-27 Aug. 2010 Piscataway, NJ, USA 2010 [330]
Wikipedia started a project in order to enable users to add video and audio on their Wiki pages. The technical downside of this is that its bandwidth requirements will increase manifold. BitTorrent-based} peer-to-peer technology from P2P-Next} (a European research project) is explored to handle this bandwidth surge. We discuss the impact on the BitTorrent} piece picker and outline our tribe protocol for seamless integration of P2P} into the {HTML5} video and audio elements. Ongoing work on libswift which uses UDP, an enhanced transport protocol and integrated NAT/Firewall} puncturing, is also described.
Balasuriya, Dominic; Ringland, Nicky; Nothman, Joel; Murphy, Tara & Curran, James R. Named entity recognition in Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [331]
Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER} evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG} shows that Wikipedia text may be a harder NER} domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG} and, when used as training data, outperforms newswire models by up to 7.7\%.
Balmin, Andrey & Curtmola, Emiran WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data WebDB '10 Procceedings of the 13th International Workshop on the Web and Databases 2010 [332]
Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user's intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics} uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics} provides a dynamic and intuitive interface that lets the user explore the UNL} and construct homogeneous structured tables, which can be further queried and aggregated using the conventional tools.
Balog-Crisan, Radu; Roxin, Ioan & Smeureanu, Ion e-Learning platforms for Semantic Web World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [333]
Banek, M.; Juric, D. & Skocir, Z. Learning Semantic N-ary Relations from Wikipedia Database and Expert Systems Applications. 21st International Conference, DEXA 2010, 30 Aug.-3 Sept. 2010 Berlin, Germany 2010 [334]
Automated construction of ontologies from text corpora, which saves both time and human effort, is a principal condition for realizing the idea of the Semantic Web. However, the recently proposed automated techniques are still limited in the scope of context that can be captured. Moreover, the source corpora generally lack the consensus of ontology users regarding the understanding and interpretation of ontology concepts. In this paper we introduce an unsupervised method for learning domain n-ary relations from Wikipedia articles, thus harvesting the consensus reached by the largest world community engaged in collecting and classifying knowledge. Providing ontologies with n-ary relations instead of the standard binary relations built on the subject-verb-object paradigm results in preserving the initial context of time, space, cause, reason or quantity that otherwise would be lost irreversibly. Our preliminary experiments with a prototype software tool show highly satisfactory results when extracting ternary and quaternary relations, as well as the traditional binary ones.
Barker, Philip Using Wikis and Weblogs to Enhance Human Performance World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [335]
Barker, Philip Using Wikis for Knowledge Management World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [336]
Baron, Georges-Louis & Bruillard, Eric New learners, Teaching Practices and Teacher Education: Which Synergies? The French case Society for Information Technology \& Teacher Education International Conference 2008 [337]
Bart, Thurber & Pope, Jack The Humanities in the Learning Space World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [338]
Basiel, Anthony Skip The media literacy spectrum: shifting pedagogic design World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [339]
Basile, Anthony & Murphy, John The Path to Open Source in Course Management Systems Used in Distance Education Programs Society for Information Technology \& Teacher Education International Conference 2010 [340]
Basile, Pierpaolo & Semeraro, Giovanni UBA: Using automatic translation and Wikipedia for cross-lingual lexical substitution Proceedings of the 5th International Workshop on Semantic Evaluation 2010 [341]
This paper presents the participation of the University of Bari (UBA) at the SemEval-2010} Cross-Lingual} Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, but there are some differences: Cross-lingual lexical substitution targets one word at a time and the main goal is to find as many good translations as possible for the given target word. Moreover, there are some connections with Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to find the best substitutions. An important aspect of this kind of task is the possibility of finding synonyms without using a particular sense inventory or a specific parallel corpus, thus allowing the participation of unsupervised approaches. UBA} proposes two systems: the former is based on an automatic translation system which exploits Google Translator, the latter is based on a parallel corpus approach which relies on Wikipedia in order to find the best substitutions.
Basili, Roberto; Bos, Johan & Copestake, Ann Proceedings of the 2008 Conference on Semantics in Text Processing 2008 [342]
Thanks to both statistical approaches and finite state methods, natural language processing (NLP), particularly in the area of robust, open-domain text processing, has made considerable progress in the last couple of decades. It is probably fair to say that NLP} tools have reached satisfactory performance at the level of syntactic processing, be the output structures chunks, phrase structures, or dependency graphs. Therefore, the time seems ripe to extend the state-of-the-art and consider deep semantic processing as a serious task in wide-coverage NLP.} This is a step that normally requires syntactic parsing, as well as integrating named entity recognition, anaphora resolution, thematic role labelling and word sense disambiguation, and other lower levels of processing for which reasonably good methods have already been developed. The goal of the STEP} workshop is to provide a forum for anyone active in semantic processing of text to discuss innovative technologies, representation issues, inference techniques, prototype implementations, and real applications. The preferred processing targets are large quantities of texts---either specialised domains, or open domains such as newswire text, blogs, and wikipedia-like text. Implemented rather than theoretical work is emphasised in STEP.} Featuring in STEP} 2008 workshop is a shared task" on comparing semantic representations as output by state-of-the-art NLP} systems. Participants were asked to supply a (small) text before the workshop. The test data for the shared task is composed out of all the texts submitted by the participants allowing participants to "challenge" each other. The output of these systems will be judged on a number of aspects by a panel of experts in the field during the workshop."
Bataineh, Emad & Abbar, Hend Al New Mobile-based Electronic Grade Management System World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [343]
Batista, Carlos Eduardo C. F. & Schwabe, Daniel LinkedTube: semantic information on web media objects Proceedings of the XV Brazilian Symposium on Multimedia and the Web 2009 [344]
LinkedTube} is a service to create semantic and non-semantic relationships between videos available on services on the Internet (such as YouTube) and external elements (such as Wikipedia, Internet Movie Database, DBPedia, etc). The relationships are defined based on semantic entities obtained through an analysis of textual elements related to the video - its metadata, tags, user comments and external related content (such as sites linking to the video). The set of data comprising the extracted entities and the video metadata are used to define semantic relations between the video and the semantic entities from the Linked Data Cloud. Those relationships are defined using a vocabulary extended from MOWL, based on an extensible set of rules of analysis of the video's related content.
Battye, Greg Turning the ship around while changing horses in mid-stream: Building a University-wide framework for Online and Blended Learning at the University of Canberra World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [345]
Baytiyeh, Hoda & Pfaffman, Jay Why be a Wikipedian Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1 2009 [346]
{{{2}}}
Bechet, F. & Charton, E. Unsupervised knowledge acquisition for extracting named entities from speech 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010, 14-19 March 2010 Dallas, TX, USA} 2010 [347]
This paper presents a Named Entity Recognition (NER) method dedicated to process speech transcriptions. The main principle behind this method is to collect in an unsupervised way lexical knowledge for all entries in the ASR} lexicon. This knowledge is gathered with two methods: by automatically extracting NEs} on a very large set of textual corpora and by exploiting directly the structure contained in the Wikipedia resource. This lexical knowledge is used to update the statistical models of our NER} module based on a mixed approach with generative models (Hidden} Markov Models - {HMM) and discriminative models (Conditional} Random Field - CRF).} This approach has been evaluated within the French ESTER} 2 evaluation program and obtained the best results at the NER} task on ASR} transcripts.
Becker, Katrin Teaching Teachers about Serious Games World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [348]
Belz, Anja; Kow, Eric & Viethen, Jette The GREC named entity generation challenge 2009: overview and evaluation results Proceedings of the 2009 Workshop on Language Generation and Summarisation 2009 [349]
The GREC-NEG} Task at Generation Challenges 2009 required participating systems to select coreference chains for all people entities mentioned in short encyclopaedic texts about people collected from Wikipedia. Three teams submitted six systems in total, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in an intrinsic evaluation involving human judges. This report describes the GREC-NEG} Task and the evaluation methods applied, gives brief descriptions of the participating systems, and presents the evaluation results.
Belz, Anja; Kow, Eric; Viethen, Jette & Gatt, Albert The GREC challenge: overview and evaluation results Proceedings of the Fifth International Natural Language Generation Conference 2008 [350]
The GREC} Task at REG} '08 required participating systems to select coreference chains to the main subject of short encyclopaedic texts collected from Wikipedia. Three teams submitted a total of 6 systems, and we additionally created four baseline systems. Systems were tested automatically using a range of existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in a reading/comprehension experiment involving human subjects. This report describes the GREC} Task and the evaluation methods, gives brief descriptions of the participating systems, and presents the evaluation results.
Bernardis, Daniela Education and Pervasive Computing. Didactical Use of the Mobile Phone: Create and Share Information Concerning Artistic Heritages and the Environment. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [351]
Bhattacharya, Madhumita & Dron, Jon Mining Collective Intelligence for Creativity and Innovation: A Research proposal World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [352]
Bjelland, Tor Kristian & Nordbotten, Svein A Best Practice Online Course Architect World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [353]
Black, Aprille Noe; Falls, Jane & Black, Aprille Noe The Use of Web 2.0 Tools for Collaboration and the Development of 21st Century Skills Society for Information Technology \& Teacher Education International Conference 2009 [354]
Blocher, Michael & Tu, Chih-Hsiung Utilizing a Wiki to Construct Knowledge Society for Information Technology \& Teacher Education International Conference 2008 [355]
Blok, Rasmus & Godsk, Mikkel Podcasts in Higher Education: What Students Want, What They Really Need, and How This Might be Supported World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [356]
Bocek, Thomas; Peric, Dalibor; Hecht, Fabio; Hausheer, David & Stiller, Burkhard PeerVote: A Decentralized Voting Mechanism for P2P Collaboration Systems Proceedings of the 3rd International Conference on Autonomous Infrastructure, Management and Security: Scalability of Networks and Services 2009 [357]
Peer-to-peer (P2P) systems achieve scalability, fault tolerance, and load balancing with a low-cost infrastructure, characteristics from which collaboration systems, such as Wikipedia, can benefit. A major challenge in P2P} collaboration systems is to maintain article quality after each modification in the presence of malicious peers. A way of achieving this goal is to allow modifications to take effect only if a majority of previous editors approve the changes through voting. The absence of a central authority makes voting a challenge in P2P} systems. This paper proposes the fully decentralized voting mechanism PeerVote, which enables users to vote on modifications in articles in a P2P} collaboration system. Simulations and experiments show the scalability and robustness of PeerVote, even in the presence of malicious peers.
Bonk, Curtis The World is Open: How Web Technology Is Revolutionizing Education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [358]
Bouma, Gosse; Duarte, Sergio & Islam, Zahurul Cross-lingual alignment and completion of Wikipedia templates Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies 2009 [359]
For many languages, the size of Wikipedia is an order of magnitude smaller than the English Wikipedia. We present a method for cross-lingual alignment of template and infobox attributes in Wikipedia. The alignment is used to add and complete templates and infoboxes in one language with information derived from Wikipedia in another language. We show that alignment between English and Dutch Wikipedia is accurate and that the result can be used to expand the number of template attribute-value pairs in Dutch Wikipedia by 50\%. Furthermore, the alignment provides valuable information for normalization of template and attribute names and can be used to detect potential inconsistencies.
Bouma, G.; Fahmi, I.; Mur, J.; van Noord, G.; van der Plas, L. & Tiedemann, J. Using syntactic knowledge for QA* Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA} tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF} tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English} to Dutch) QA} system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31\% (20\%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.
Boyles, Michael; Frend, Chauney; Rogers, Jeff; William, Albert; Reagan, David & Wernert, Eric Leveraging Pre-Existing Resources at Institutions of Higher Education for K-12 STEM Engagement World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [360]
Bra, Paul De; Smits, David; van der Sluijs, Kees; Cristea, Alexandra; Hendrix, Maurice & Bra, Paul De GRAPPLE: Personalization and Adaptation in Learning Management Systems World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [361]
Brachman, Ron Emerging Sciences of the Internet: Some New Opportunities Proceedings of the 4th European conference on The Semantic Web: Research and Applications 2007 [362]
Semantic Web technologies have started to make a difference in enterprise settings and have begun to creep into use in limited parts of the World Wide Web. As is common in overview articles, it is easy to imagine scenarios in which the Semantic Web could provide important infrastructure for activities across the broader Internet. Many of these seem to be focused on improvements to what is essentially a search function (e.g., list the prices of flat screen {HDTVs} larger than 40 inches with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings" {Web) and such capabilities will surely be of use to future Internet users. However if one looks closely at the research agendas of some of the largest Internet companies it is not clear that the staples of SW} thinking will intersect the most important paths of the major broad-spectrum service providers. Some of the emerging trends in the research labs of key industry players indicate that SW} goals generally taken for granted may be less central than envisioned and that the biggest opportunities may come from some less obvious directions. Given the level of investment and the global reach of big players like Yahoo! and Google it would pay us to look more closely at some of their fundamental investigations."
Bradshaw, Daniele; Siko, Kari Lee; Hoffman, William; Talvitie-Siple, June; Fine, Bethann; Carano, Ken; Carlson, Lynne A.; Mixon, Natalie K; Rodriguez, Patricia; Sheffield, Caroline C.; Sullens-Mullican, Carey; Bolick, Cheryl & Berson, Michael J. The Use of Videoconferencing as a Medium for Collaboration of Experiences and Dialogue Among Graduate Students: A Case Study from Two Southeastern Universities Society for Information Technology \& Teacher Education International Conference 2006 [363]
Bristow, Paul The Digital Divide an age old question? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [364]
Bruckman, Amy Social Support for Creativity and Learning Online Proceedings of the 2008 Second IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning 2008 [365]
Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Brunetti, Korey & Townsend, Lori Extreme (Class) Makeover: Engaging Information Literacy Students with Web 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [366]
Brunvand, Stein & Bouwman, Jeffrey The Math Boot Camp Wiki: Using a Wiki to Extend the Learning Beyond June Society for Information Technology \& Teacher Education International Conference 2009 [367]
Brusilovsky, Peter; Yudelson, Michael & Sosnovsky, Sergey Collaborative Paper Exchange World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [368]
Bucur, Johanna Teacher and Student Support Services for eLearning in Higher Education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [369]
Bulkowski, Aleksander; Nawarecki, Edward & Duda, Andrzej Peer-to-Peer Dissemination of Learning Objects for Creating Collaborative Learning Communities World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [370]
Bullock, Shawn The Challenge of Digital Technologies to Educational Reform World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [371]
Buriol, Luciana S.; Castillo, Carlos; Donato, Debora; Leonardi, Stefano & Millozzi, Stefano Temporal Analysis of the Wikigraph Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2006 [372]
Wikipedia is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a Wikigraph"} a graph that represents the link structure of Wikipedia. The Wikigraph differs from other Web graphs studied in the literature by the fact that there are explicit timestamps associated with each node's events. This allows us to do a detailed analysis of the Wikipedia evolution over time. In the first part of this study we characterize this evolution in terms of users editions and articles; in the second part we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available."
Buscaldi, D. & Rosso, P. A bag-of-words based ranking method for the Wikipedia question answering task Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
This paper presents a simple approach to the Wikipedia question answering pilot task in CLEF} 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in Wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one.
Buscaldi, Davide & Rosso, Paolo A comparison of methods for the automatic identification of locations in wikipedia Proceedings of the 4th ACM workshop on Geographical information retrieval 2007 [373]
In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based} method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.
Butler, Janice W. & Butler, Janice W. A Whodunit in Two Acts: An Online Murder Mystery that Enhances Library and Internet Search Skills Society for Information Technology \& Teacher Education International Conference 2010 [374]
Butnariu, Cristina & Veale, Tony UCD-S1: a hybrid model for detecting semantic relations between noun pairs in text Proceedings of the 4th International Workshop on Semantic Evaluations 2007 [375]
We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that characterize each of the noun pairs are a blend of lexical-semantic categories extracted from WordNet} and several flavors of syntactic patterns extracted from various corpora, including Wikipedia and the WMTS} corpus.
Byron, Akilah The Use of Open Source to mitigate the costs of implementing E-Government in the Caribbean World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [376]
Bélisle, Claire Academic Use of Online Encyclopedias World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [377]
la Calzada, Gabriel De & Dekhtyar, Alex On measuring the quality of Wikipedia articles Proceedings of the 4th workshop on Information credibility 2010 [378]
This paper discusses an approach to modeling and measuring information quality of Wikipedia articles. The approach is based on the idea that the quality of Wikipedia articles with distinctly different profiles needs to be measured using different information quality models. We report on our initial study, which involved two categories of Wikipedia articles: stabilized" (those whose content has not undergone major changes for a significant period of time) and "controversial" (the articles which have undergone vandalism revert wars or whose content is subject to internal discussions between Wikipedia editors). We present simple information quality models and compare their performance on a subset of Wikipedia articles with the information quality evaluations provided by human users. Our experiment shows that using special-purpose models for information quality captures user sentiment about Wikipedia articles better than using a single model for both categories of articles."
Capuano, Nicola; Pierri, Anna; Colace, Francesco; Gaeta, Matteo & Mangione, Giuseppina Rita A mash-up authoring tool for e-learning based on pedagogical templates Proceedings of the first ACM international workshop on Multimedia technologies for distance learning 2009 [379]
The purpose of this paper is twofold. On the one hand it aims at presenting the pedagogical template" methodology for the definition of didactic activities through the aggregation of atomic learning entities on the basis of pre-defined schemas. On the other hand it proposes a Web-based authoring tool to build learning resources applying a defined methodology. The authoring tool is inspired by mashing-up principles and allows the combination of local learning entities with learning entities coming from external sources belonging to Web 2.0 like Wikipedia Flickr YouTube} and SlideShare.} Eventually the results of a small-scale experimentation inside a University course purposed both to define a pedagogical template for "virtual scientific experiments" and to build and deploy learning resources applying such template are presented."
Carano, Kenneth; Keefer, Natalie & Berson, Michael Mobilizing Social Networking Technology to Empower a New Generation of Civic Activism Among Youth Society for Information Technology \& Teacher Education International Conference 2007 [380]
Cardoso, N. GikiCLEF Topics and Wikipedia Articles: Did They Blend? Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [381]
This paper presents a post-hoc analysis on how the Wikipedia collections fared in providing answers and justifications to GikiCLEF} topics. Based on all solutions found by all GikiCLEF} participant systems, this paper measures how self-sufficient the particular Wikipedia collections were to provide answers and justifications for the topics, in order to better understand the recall limit that a GikiCLEF} system specialised in one single language has.
Cardoso, N.; Batista, D.; Lopez-Pellicer, F.J. & Silva, M.J. Where In The Wikipedia Is That Answer? The XLDB At The GikiCLEF 2009 Task Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [382]
We developed a new semantic question analyser for a custom prototype assembled for participating in GikiCLEF} 2009, which processes grounded concepts derived from terms, and uses information extracted from knowledge bases to derive answers. We also evaluated a newly developed named-entity recognition module, based in Conditional Random Fields, and a new world geo-ontology, derived from Wikipedia, which is used in the geographic reasoning process.
Carter, B. Beyond Google: Improving learning outcomes through digital literacy International Association of School Librarianship. Selected Papers from the ... Annual Conference 2009
The internet is often students' first choice when researching school assignments; however students' online search strategies typically consist of a basic Google search and Wikipedia. The creation of library intranet pages providing a range of search tools and the teaching of customised information literacy lessons aim to better utilise library resources and improve students' research skills and learning outcomes. ""
Cataltepe, Z.; Turan, Y. & Kesgin, F. Turkish document classification using shorter roots 2007 15th IEEE Signal Processing and Communications Applications, 11-13 June 2007 Piscataway, NJ, USA} 2007
Stemming is one of commonly used pre-processing steps in document categorization. Especially when fast and accurate classification of a lot of documents is needed, it is important to have as small number of and as small length roots as possible. This would not only reduce the time it takes to train and test classifiers but also would reduce the storage requirements for each document. In this study, we analyze the performance of classifiers when the longest or shortest roots found by a stemmer are used. We also analyze the effect of using only the consonants in the roots. We use two document data sets, obtained from Milliyet newspaper and Wikipedia to analyze classification accuracy of classifiers when roots obtained under these four conditions are used. We also analyze the classification accuracy when only the first 4, 3 or 2 letters or consonants are used from the roots. Using smaller roots results in smaller number of TF-IDF} vectors. Especially for small sized TF-IDF} vectors, using only consonants in the roots gives better performance than using all letters in the roots.
Chan, Michael; fai Chan, Stephen Chi & ki Leung, Cane Wing Online Search Scope Reconstruction by Connectivity Inference Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2007 [383]
To cope with the continuing growth of the web, improvements should be made to the current brute-force techniques commonly used by robot-driven search engines. We propose a model that strikes a balance between robot and directorybased search engines by expanding the search scope of conventional directories to automatically include related categories. Our model makes use of a knowledge-rich and wellstructured corpus to infer relationships between documents and topic categories. We show that the hyperlink structure of Wikipedia articles can be effectively exploited to identify relations among topic categories. Our experiments show the average recall rate and precision rate achieved are 91\% and between 85\% and 215\% of Google's respectively.
Chan, Peter & Dovchin, Tuul Evaluation Study of the Development of Multimedia Cases for Training Mongolian Medical Professionals World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [384]
Charles, Elizabeth S.; Lasry, Nathaniel & Whittaker, Chris Does scale matter: using different lenses to understand collaborative knowledge building Proceedings of the 9th International Conference of the Learning Sciences - Volume 2 2010 [385]
Web-based environments for communicating, networking and sharing information, often referred to collectively as Web} 2.0 have become ubiquitous - e.g., Wikipedia, Facebook, Flickr, or YouTube.} Understanding how such technologies can promote participation, collaboration and co-construction of knowledge, and how such affordances could be used for educational purposes has become a focus of research in the Learning Science and CSCL} communities (e.g., Dohn, 2009; Greenhow et al., 2009). One important mechanism is self-organization, which includes the regulation of feedback loops and the flows of information and resources within an activity system (Holland, 1996). But the study of such mechanisms calls for new ways of thinking about the unit of analysis, and the development of analytic tools that allow us to move back and forth through levels of activity systems that are designed to promote learning. Here, we propose that content analysis can focus on the flows of resources (i.e., content knowledge, scientific artifacts, epistemic beliefs) in terms of how they are established and the factors affecting whether they are taken up by members of the community.
Charnitski, Christina W. & Harvey, Francis A. The Clash Between School and Corporate Reality Society for Information Technology \& Teacher Education International Conference 2008 [386]
Chen, Irene L. & Beebe, Ronald Assessing Students’ Wiki Projects: Alternatives and Implications World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [387]
Chen, Pearl; Wan, Peiwen & Son, Jung-Eun Web 2.0 and Education: Lessons from Teachers’ Perspectives World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [388]
Chen, Jing-Ying Resource-Oriented Computing: Towards a Univeral Virtual Workspace Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 02 2007 [389]
Emerging popular Web applications such as blogs and Wikipedia are transforming the Internet into a global collaborative environment where most people can participate and contribute. When resources created by and shared among people are not just content but also software artifacts, a much more accommodating, universal, and virtual workspace is foreseeable that can support people with diverse background and needs. To realize the goal, it requires not only necessary infrastructure support for resource deployment and composition, but also strategies and mechanisms to handle the implied complexity. We propose a service-oriented architecture in which arbitrary resources are associated with syntactical descriptors, called metaphors, based on which runtime services can be instantiated and managed. Furthermore, service composition can be achieved through syntactic metaphor composition. We demonstrate our approach via an E-Science} workbench that allows user to access and combine distributed computing and storage resources in a flexible manner.
Cheryl, Cheryl Seals; Zhang, Lei & Gilbert, Juan Human Centered Computing Lab Web Site Redesign Effort World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [390]
Chikhi, Nacim Fateh; Rothenburger, Bernard & Aussenac-Gilles, Nathalie A Comparison of Dimensionality Reduction Techniques for Web Structure Mining Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2007 [391]
In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP).} Experiments conducted on three datasets allow us to assert the following: NMF} outperforms PCA} and ICA} in terms of stability and interpretability of the discovered structures; the wellknown WebKb} dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
Chin, Alvin; Hotho, Andreas & Strohmaier, Markus Proceedings of the International Workshop on Modeling Social Media 2010 [392]
In recent years, social media applications such as blogs, microblogs, wikis, news aggregation sites and social tagging systems have pervaded the web and have transformed the way people communicate and interact with each other online. In order to understand and effectively design social media systems, we need to develop models that are capable of reflecting their complex, multi-faceted socio-technological nature. While progress has been made in modeling particular aspects of selected social media applications (such as the architecture of weblog conversations, the evolution of wikipedia, or the mechanics of news propagation), other aspects are less understood.
Choi, Boreum; Alexander, Kira; Kraut, Robert E. & Levine, John M. Socialization tactics in wikipedia and their effects Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [393]
Socialization of newcomers is critical both for conventional groups. It helps groups perform effectively and the newcomers develop commitment. However, little empirical research has investigated the impact of specific socialization tactics on newcomers' commitment to online groups. We examined WikiProjects, subgroups in Wikipedia organized around working on common topics or tasks. In study 1, we identified the seven socialization tactics used most frequently: invitations to join, welcome messages, requests to work on project-related tasks, offers of assistance, positive feedback on a new member's work, constructive criticism, and personal-related comments. In study 2, we examined their impact on newcomers' commitment to the project. Whereas most newcomers contributed fewer edits over time, the declines were slowed or reversed for those socialized with welcome messages, assistance, and constructive criticism. In contrast, invitations led to steeper declines in edits. These results suggest that different socialization tactics play different roles in socializing new members in online groups compared to offline ones.
Chong, Ng & Yamamoto, Michihiro Using Many Wikis for Collaborative Writing World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [394]
Chou, Chen-Hsiung Multimedia in Higher Education of Tourism World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [395]
Choudhury, Monojit; Hassan, Samer; Mukherjee, Animesh & Muresan, Smaranda Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing 2009 [396]
The last few years have shown a steady increase in applying graph-theoretic models to computational linguistics. In many NLP} applications, entities can be naturally represented as nodes in a graph and relations between them can be represented as edges. There have been extensive research showing that graph-based representations of linguistic units such as words, sentences and documents give rise to novel and efficient solutions in a variety of NLP} tasks, ranging from part-of-speech tagging, word sense disambiguation and parsing, to information extraction, semantic role labeling, summarization, and sentiment analysis. More recently, complex network theory, a popular modeling paradigm in statistical mechanics and physics of complex systems, was proven to be a promising tool in understanding the structure and dynamics of languages. Complex network based models have been applied to areas as diverse as language evolution, acquisition, historical linguistics, mining and analyzing the social networks of blogs and emails, link analysis and information retrieval, information extraction, and representation of the mental lexicon. In order to make this field of research more visible, this time the workshop incorporated a special theme on Cognitive and Social Dynamics of Languages in the framework of Complex Networks. Cognitive dynamics of languages include topics focused primarily on language acquisition, which can be extended to language change (historical linguistics) and language evolution as well. Since the latter phenomena are also governed by social factors, we can further classify them under social dynamics of languages. In addition, social dynamics of languages also include topics such as mining the social networks of blogs and emails. A collection of articles pertaining to this special theme will be compiled in a special issue of the Computer Speech and Language journal. This volume contains papers accepted for presentation at the TextGraphs-4} 2009 Workshop on Graph-Based} Methods for Natural Language Processing. The event took place on August 7, 2009, in Suntec, Singapore, immediately following ACL/IJCNLP} 2009, the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Being the fourth workshop on this topic, we were able to build on the success of the previous TextGraphs} workshops, held as part of {HLT-NAACL} 2006, {HLT-NAACL} 2007 and Coling 2008. It aimed at bringing together researchers working on problems related to the use of graph-based algorithms for NLP} and on pure graph-theoretic methods, as well as those applying complex networks for explaining language dynamics. Like last year, TextGraphs-4} has also been endorsed by SIGLEX.} We issued calls for both regular and short papers. Nine regular and three short papers were accepted for presentation, based on the careful reviews of our program committee. Our sincere thanks to all the program committee members for their thoughtful, high quality and elaborate reviews, especially considering our extremely tight time frame for reviewing. The papers appearing in this volume have surely benefited from their expert feedback. This year's workshop attracted papers employing graphs in a wide range of settings and we are therefore proud to present a very diverse program. We received quite a few papers on discovering semantic similarity through random walks. Daniel Ramage et al. explore random walk based methods to discover semantic similarity in texts, while Eric Yeh et al. attempt to discover semantic relatedness through random walks on the Wikipedia. Amec Herdagdelen et al. describes a method for measuring semantic relatedness with vector space models and random walks.
Choulat, Tracey Teacher Education and Internet Safety Society for Information Technology \& Teacher Education International Conference 2010 [397]
Clauson, Kevin A; Polen, Hyla H; Boulos, Maged N K & Dzenowagis, Joan H Accuracy and completeness of drug information in Wikipedia AMIA} ... Annual Symposium Proceedings / AMIA} Symposium. AMIA} Symposium 2008 [398]
{{{2}}}
Clow, Doug Resource Discovery: Heavy and Light Metadata Approaches World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [399]
Colazzo, Luigi; Magagnino, Francesco; Molinari, Andrea & Villa, Nicola From e-learning to Social Networking: a Case Study World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [400]
Cook, John Generating New Learning Contexts: Novel Forms of Reuse and Learning on the Move World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [401]
Copeland, Nancy & Bednar, Anne Mobilizing Educational Technologists in a Collaborative Online Community to Develop a Knowledge Management System as a Wiki Society for Information Technology \& Teacher Education International Conference 2010 [402]
Corbeil, Joseph Rene & Valdes-Corbeil, Maria Elena Enhance Your Online Courses by Re-Engineering The Courseware Management System Society for Information Technology \& Teacher Education International Conference 2008 [403]
Cosley, Dan; Frankowski, Dan; Terveen, Loren & Riedl, John Using intelligent task routing and contribution review to help communities build artifacts of lasting value Proceedings of the SIGCHI conference on Human Factors in computing systems 2006 [404]
Many online communities are emerging that, like Wikipedia, bring people together to build community-maintained artifacts of lasting value (CALVs).} Motivating people to contribute is a key problem because the quantity and quality of contributions ultimately determine a CALV's} value. We pose two related research questions: 1) How does intelligent task routing---matching people with work---affect the quantity of contributions? 2) How does reviewing contributions before accepting them affect the quality of contributions? A field experiment with 197 contributors shows that simple, intelligent task routing algorithms have large effects. We also model the effect of reviewing contributions on the value of CALVs.} The model predicts, and experimental data shows, that value grows more slowly with review before acceptance. It also predicts, surprisingly, that a CALV} will reach the same final value whether contributions are reviewed before or after they are made available to the community.
Costa, Luís Fernando Using answer retrieval patterns to answer Portuguese questions Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access 2008 [405]
Esfinge is a general domain Portuguese question answering system which has been participating at QA@CLEF} since 2004. It uses the information available in the official" document collections used in QA@CLEF} (newspaper text and Wikipedia) and information from the Web as an additional resource when searching for answers. Where it regards the use of external tools Esfinge uses a syntactic analyzer a morphological analyzer and a named entity recognizer. This year an alternative approach to retrieve answers was tested: whereas in previous years search patterns were used to retrieve relevant documents this year a new type of search patterns was also used to extract the answers themselves. We also evaluated the second and third best answers returned by Esfinge. This evaluation showed that when Esfinge answers correctly a question it does so usually with its first answer. Furthermore the experiments revealed that the answer retrieval patterns created for this participation improve the results but only for definition questions."
Coursey, Kino & Mihalcea, Rada Topic identification using Wikipedia graph centrality Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers 2009 [406]
This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.
Coutinho, Clara Using Blogs, Podcasts and Google Sites as Educational Tools in a Teacher Education Program World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [407]
Coutinho, Clara Web 2.0 technologies as cognitive tools: preparing future k-12 teachers Society for Information Technology \& Teacher Education International Conference 2009 [408]
Coutinho, Clara & Junior, João Bottentuit Using social bookmarking to enhance cooperation/collaboration in a Teacher Education Program World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [409]
Coutinho, Clara & Junior, João Batista Bottentuit Web 2.0 in Portuguese Academic Community: An Exploratory Survey Society for Information Technology \& Teacher Education International Conference 2008 [410]
Coutinho, Clara & Rocha, Aurora Screencast and Vodcast: An Experience in Secondary Education Society for Information Technology \& Teacher Education International Conference 2010 [411]
Crawford, Caroline; Smith, Richard A. & Smith, Marion S. Podcasting in the Learning Environment: From Podcasts for the Learning Community, Towards the Integration of Podcasts within the Elementary Learning Environment Society for Information Technology \& Teacher Education International Conference 2006 [412]
Crawford, Caroline M. & Thomson, Jennifer Graphic Novels as Visual Human Performance and Training Tools: Towards an Understanding of Information Literacy for Preservice Teachers Society for Information Technology \& Teacher Education International Conference 2007 [413]
Cui, Gaoying; Lu, Qin; Li, Wenjie & Chen, Yirong Mining Concepts from Wikipedia for Ontology Construction Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 2009 [414]
An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP} knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5\% makes it an effective approach to mine concepts from Wikipedia for ontology construction.
Cummings, Jeff; Massey, Anne P. & Ramesh, V. Web 2.0 proclivity: understanding how personal use influences organizational adoption Proceedings of the 27th ACM international conference on Design of communication 2009 [415]
Web 2.0 represents a major shift in how individuals communicate and collaborate with others. While many of these technologies have been used for public, social interactions (e.g., Wikipedia and YouTube), organizations are just beginning to explore their use in day-to-day operations. Due to relatively recent introduction and public popularity, Web 2.0 has led to a resurgent focus on how organizations can once again leverage technology within the organization for virtual and mass collaboration. In this paper, we explore some of the key questions facing organizations with regard to Web 2.0 implementation and adoption. We develop a model of Web} 2.0 Proclivity" defined as an individual's propensity to use Web 2.0 tools within the organization. Our model and set of associated hypotheses focuses on understanding an employee's internal Web 2.0 content behaviors based on non-work personal use behaviors. To test our model and hypotheses survey-based data was collected from a global engine design and manufacturing company. Our results show that Web 2.0 Proclivity is positively influenced by an employee's external behaviors and that differences exist across both functional departments and employee work roles. We discuss the research implications of our findings as well as how our findings and model of Web 2.0 Proclivity can be used to help guide organizational practice."
Cusinato, Alberto; Mea, Vincenzo Della; Salvatore, Francesco Di & Mizzaro, Stefano QuWi: quality control in Wikipedia Proceedings of the 3rd workshop on Information credibility on the web 2009 [416]
We propose and evaluate QuWi} (Quality} in Wikipedia), a framework for quality control in Wikipedia. We build upon a previous proposal by Mizzaro [11], who proposed a method for substituting and/or complementing peer review in scholarly publishing. Since articles in Wikipedia are never finished, and their authors change continuously, we define a modified algorithm that takes into account the different domain, with particular attention to the fact that authors contribute identifiable pieces of information that can be further modified by other authors. The algorithm assigns quality scores to articles and contributors. The scores assigned to articles can be used, e.g., to let the reader understand how reliable are the articles he or she is looking at, or to help contributors in identifying low quality articles to be enhanced. The scores assigned to users measure the average quality of their contributions to Wikipedia and can be used, e.g., for conflict resolution policies based on the quality of involved users. Our proposed algorithm is experimentally evaluated by analyzing the obtained quality scores on articles for deletion and featured articles, also on six temporal Wikipedia snapshots. Preliminary results demonstrate that the proposed algorithm seems to appropriately identify high and low quality articles, and that high quality authors produce more long-lived contributions than low quality authors.
Cuthell, John & Preston, Christina Preston An interactivist e-community of practice using Web 2:00 tools Society for Information Technology \& Teacher Education International Conference 2007 [417]
Dale, Michael; Stern, Abram; Deckert, Mark & Sack, Warren System demonstration: Metavid.org: a social website and open archive of congressional video Proceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government 2009 [418]
We have developed Metavid.org, a site that archives video footage of the U.S.} Senate and House floor proceedings. Visitors can search for who said what when and also download, remix, blog, edit, discuss, and annotate transcripts and metadata. The site has been built with Open Source Software (OSS) and the video is archived in an OSS} codec (Ogg} Theora). We highlight two aspects of the Metavid design: (1) open standards; and, (2) Wiki functionality. First, open standards allow Metavid to function both as a platform, on top of which other sites can be built, and as a resource for mashing" (i.e. semi-automatically assembling custom websites). For example Voterwatch.org pulls its video from the Metavid archive. Second Metavid extends the MediaWiki} software (which is the foundation of Wikipedia) into the domain of collaborative video authoring. This extension allows closed-captioned text or video sequences to be collectively edited."
Dallman, Alicia & McDonald, Michael Upward Bound Success: Climbing the Collegiate Ladder with Web 2.0 Wikis Society for Information Technology \& Teacher Education International Conference 2010 [419]
Danyaro, K.U.; Jaafar, J.; Lara, R.A.A. De & Downe, A.G. An evaluation of the usage of Web 2.0 among tertiary level students in Malaysia 2010 International Symposium on Information Technology (ITSim 2010), 15-17 June 2010 Piscataway, NJ, USA} 2010 [420]
Web 2.0 is increasingly becoming a familiar pedagogical tool in higher education, facilitating the process of teaching and learning. But this advancement in information technology has further provoked the problems like plagiarism and other academic misconduct. This paper evaluates the patterns of use and behavior of tertiary level students towards the use of Web 2.0 as an alternative and supplemental ELearning} Portal. A total of 92 students' data were collected and analyzed according to {'Self-Determination} Theory' (SDT).} It was found that students use social websites for chatting, gamming and sharing files. Facebook, YouTube} and Wikipedia are ranked as the most popular websites used by college students. It also reveals that students have an inherent desire of expressing ideas and opinion online openly and independently. This sense of freedom makes students feel more competent, autonomous or participative and find learning to be less tedious. Therefore, this report, recommends educators to adopt strategies for acknowledging students' feelings and activities online to reinforce positive behavior effective learning. Finally, we discussed the implications of Web 2.0 on education.
DeGennaro, Donna & Kress, Tricia Looking to Transform Learning: From Social Transformation in the Public Sphere to Authentic Learning in the Classroom World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [421]
Dehinbo, Johnson Strategy for progressing from in-house training into e-learning using Activity Theory at a South African university World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [422]
Dehinbo, Johnson Suitable research paradigms for social inclusion through enhancement of Web applications development in developing countries World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [423]
Desjardins, Francois & vanOostveen, Roland Collaborative Online Learning Environment:Towards a process driven approach and collective knowledge building World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [424]
Desmontils, E.; Jacquin, C. & Monceaux, L. Question types specification for the use of specialized patterns in Prodicos system Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
We present the second version of the Prodicos query answering system which was developed by the TALN} team from the LINA} institute. The main improvements made concern in the one hand, the use of external knowledge (Wikipedia) to improve the passage selection step. And on the other hand, the answer extraction step is improved by the determination of four different strategies for locating the answer to a question regarding its type. Afterwards, for the passage selection and answer extraction modules, the evaluation is put forward to justify the results obtained.
Dicheva, Darina & Dichev, Christo Helping Courseware Authors to Build Ontologies: The Case of TM4L Proceeding of the 2007 conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work 2007 [425]
The authors of topic map-based learning resources face major difficulties in constructing the underlying ontologies. In this paper we propose two approaches to address this problem. The first one is aimed at automatic construction of a “draft‿ topic map for the authors to start with. It is based on a set of heuristics for extracting semantic information from {HTML} documents and transforming it into a topic map format. The second one is aimed at providing help to authors during the topic map creating process by mining the Wikipedia knowledge base. It suggests “standard‿ names for the new topics (paired with URIs), along with lists of related topics in the considered domain. The proposed approaches are implemented in the educational topic maps editor TM4L.
Diem, Richard Technology and Culture: A Conceptual Framework Society for Information Technology \& Teacher Education International Conference 2007 [426]
Diplaris, S.; Kompatsiaris, I.; Flores, A.; Escriche, M.; Sigurbjornsson, B.; Garcia, L. & van Zwol, R. Collective Intelligence in Mobile Consumer Social Applications 2010 Ninth International Conference on Mobile Business \& 2010 Ninth Global Mobility Roundtable. ICMB-GMR 2010, 13-15 June 2010 Piscataway, NJ, USA} 2010 [427]
This paper presents a mobile software application for the provision of mobile guidance, supporting functionalities, which are based on automatically extracted Collective Intelligence. Collective Intelligence is the intelligence which emerges from the collaboration, competition and coordination among individuals and can be extracted by the analysis of mass amount of user-contributed data currently available in Web 2.0 applications. More specifically, services including automatic Point of Interest (POI) detection, raking, search and aggregation with semi-structured sources (e.g. Wikipedia) are developed, which are based on lexical and statistical analysis of mass data coming from Wikipedia, Yahoo! Geoplanet, query logs and flickr tags. These services together with personalization functionalities are integrated in a travel mobile application, enabling their efficient usage exploiting on the same time user location information. Evaluation with real users depicts the application's potential for providing a higher degree of satisfaction compared to existing travel information management solutions and also directions for future enhancements.
Dixon, Brian Reflective Video Journals and Adolescent Metacognition: An exploratory study Society for Information Technology \& Teacher Education International Conference 2009 [428]
Dobrila, T.-A.; Diaconasu, M.-C.; Lungu, I.-D. & Iftene, A. Methods for Classifying Videos by Subject and Detecting Narrative Peak Points Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [429]
2009 marked UAIC's} first participation at the VideoCLEF} evaluation campaign. Our group built two separate systems for the Subject} Classification" and {"Affect} Detection" tasks. For the first task we created two resources starting from Wikipedia pages and pages identified with Google and used two tools for classification: Lucene and Weka. For the second task we extracted the audio component from a given video file using FFmpeg.} After that we computed the average amplitude for each word from the transcript by applying the Fast Fourier Transform algorithm in order to analyze the sound. A brief description of our systems' components is given in this paper."
Dodge, Bernie & Molebash, Philip Mini-Courses for Teaching with Technology: Thinking Outside the 3-Credit Box Society for Information Technology \& Teacher Education International Conference 2005 [430]
Dominik, Magda The Alternate Reality Game: Learning Situated in the Realities of the 21st Century World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [431]
Dondio, P.; Barrett, S.; Weber, S. & Seigneur, J.M. Extracting trust from domain analysis: a case study on the Wikipedia project Autonomic and Trusted Computing. Third International Conference, ATC 2006. Proceedings, 3-6 Sept. 2006 Berlin, Germany 2006
The problem of identifying trustworthy information on the World Wide Web is becoming increasingly acute as new tools such as wikis and blogs simplify and democratize publications. Wikipedia is the most extraordinary example of this phenomenon and, although a few mechanisms have been put in place to improve contributions quality, trust in Wikipedia content quality has been seriously questioned. We thought that a deeper understanding of what in general defines high-standard and expertise in domains related to Wikipedia - i.e. content quality in a collaborative environment - mapped onto Wikipedia elements would lead to a complete set of mechanisms to sustain trust in Wikipedia context. Our evaluation, conducted on about 8,000 articles representing 65\% of the overall Wikipedia editing activity, shows that the new trust evidence that we extracted from Wikipedia allows us to transparently and automatically compute trust values to isolate articles of great or low quality
Dopichaj, P. The university of Kaiserslautern at INEX 2006 Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007
Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML} retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the current version of our search engine that was used for INEX} 2006: Like at INEX} 2005, our search engine exploits structural patterns - in particular, automatic detection of titles - in the retrieval results to find the appropriate results among overlapping elements. This year, we examine how we can change this method to work better with the Wikipedia collection, which is significantly larger than the IEEE} collection used in previous years. We show that our optimizations both retain the retrieval quality and reduce retrieval time significantly.
Dormann, Claire & Biddle, Robert Urban expressions and experiential gaming World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [432]
Dornescu, I. Semantic QA for Encyclopaedic Questions: EQUAL in GikiCLEF Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [433]
This paper presents a new question answering (QA) approach and a prototype system, EQUAL, which relies on structural information from Wikipedia to answer open-list questions. The system achieved the highest score amongst the participants in the GikiCLEF} 2009 task. Unlike the standard textual QA} approach, EQUAL} does not rely on identifying the answer within a text snippet by using keyword retrieval. Instead, it explores the Wikipedia page graph, extracting and aggregating information from multiple documents and enforcing semantic constraints. The challenges for such an approach and an error analysis are also discussed.
Dost, Ascander & King, Tracy Holloway Using large-scale parser output to guide grammar development Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks 2009 [434]
This paper reports on guiding parser development by extracting information from output of a large-scale parser applied to Wikipedia documents. Data-driven parser improvement is especially important for applications where the corpus may differ from that originally used to develop the core grammar and where efficiency concerns affect whether a new construction should be added, or existing analyses modified. The large size of the corpus in question also brings scalability concerns to the foreground.
Doucet, A. & Lehtonen, M. Unsupervised classification of text-centric XML document collections Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007
This paper addresses the problem of the unsupervised classification of text-centric XML} documents. In the context of the INEX} mining track 2006, we present methods to exploit the inherent structural information of XML} documents in the document clustering process. Using the k-means algorithm, we have experimented with a couple of feature sets, to discover that a promising direction is to use structural information as a preliminary means to detect and put aside structural outliers. The improvement of the semantic-wise quality of clustering is significantly higher through this approach than through a combination of the structural and textual feature sets. The paper also discusses the problem of the evaluation of XML} clustering. Currently, in the INEX} mining track, XML} clustering techniques are evaluated against semantic categories. We believe there is a mismatch between the task (to exploit the document structure) and the evaluation, which disregards structural aspects. An illustration of this fact is that, over all the clustering track submissions, our text-based runs obtained the 1st rank (Wikipedia} collection, out of 7) and 2nd rank (IEEE} collection, out of 13).
Dovchin, Tuul & Chan, Peter Multimedia Cases for Training Mongolian Medical Professionals -- An Innovative Strategy for Overcoming Pedagogical Challenges Society for Information Technology \& Teacher Education International Conference 2006 [435]
Dowling, Sherwood Adopting a Long Tail Web Publishing Strategy for Museum Educational Materials at the Smithsonian American Art Museum World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [436]
Dron, Jon & Anderson, Terry Collectives, Networks and Groups in Social Software for E-Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [437]
Dron, Jon & Bhattacharya, Madhumita A Dialogue on E-Learning and Diversity: the Learning Management System vs the Personal Learning Environment World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [438]
Désilets, Alain & Paquet, Sébastien Wiki as a Tool for Web-based Collaborative Story Telling in Primary School: a Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [439]
Díaz, Francisco; Osorio, Maria & Amadeo, Ana Evolution of the use of Moodle in Argentina, adding Web2.0 features World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [440]
Ebner, Martin E-Learning 2.0 = e-Learning 1.0 + Web 2.0? Proceedings of the The Second International Conference on Availability, Reliability and Security 2007 [441]
Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Ebner, Martin & Nagler, Walther Has Web2.0 Reached the Educated Top? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [442]
Ebner, Martin & Taraghi, Behnam Personal Learning Environment for Higher Education – A First Prototype World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [443]
Els, Christo J. & Blignaut, A. Seugnet Exploring Teachers’ ICT Pedagogy in the North-West Province, South Africa World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [444]
Erlenkötter, Annekatrin; Kühnle, Claas-Michael; Miu, Huey-Ru; Sommer, Franziska & Reiners, Torsten Enhancing the Class Curriculum with Virtual World Use Cases for Production and Logistics World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [445]
van Erp, Marieke; Lendvai, Piroska & van den Bosch, Antal Comparing alternative data-driven ontological vistas of natural history Proceedings of the Eighth International Conference on Computational Semantics 2009 [446]
Traditionally, domain ontologies are created manually, based on human experts' views on the classes and relations of the domain at hand. We present ongoing work on two approaches to the automatic construction of ontologies from a flat database of records, and compare them to a manually constructed ontology. The latter CIDOC-CRM} ontology focusses on the organisation of classes and relations. In contrast, the first automatic method, based on machine learning, focuses on the mutual predictiveness between classes, while the second automatic method, created with the aid of Wikipedia, stresses meaningful relations between classes. The three ontologies show little overlap; their differences illustrate that a different focus during ontology construction can lead to radically different ontologies. We discuss the implications of these differences, and argue that the two alternative ontologies may be useful in higher-level information systems such as search engines.
Erren, Patrick & Keil, Reinhard Enabling new Learning Scenarios in the Age of the Web 2.0 via Semantic Positioning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [447]
Every, Vanessa; Garcia, Gna & Young, Michael A Qualitative Study of Public Wiki Use in a Teacher Education Program Society for Information Technology \& Teacher Education International Conference 2010 [448]
Ewbank, Ann; Carter, Heather & Foulger, Teresa MySpace Dilemmas: Ethical Choices for Teachers using Social Networking Society for Information Technology \& Teacher Education International Conference 2008 [449]
Eymard, Oivier; Sanchis, Eric & Selves, Jean-Louis A Peer-to-Peer Collaborative Framework Based on Perceptive Reasoning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [450]
Farhoodi, M.; Yari, A. & Mahmoudi, M. Combining content-based and context-based methods for Persian web page classification 2009 Second International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), 4-6 Aug. 2009 Piscataway, NJ, USA} 2009 [451]
As the Internet includes millions of web pages for each and every search query, a fast retrieving of the desired and related information from the Web becomes very challenging subject. Automatic classification of web pages into relevant categories is an important and effective way to deal with the difficulty of retrieving information from the Internet. There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. We conduct various experiments on a dataset consisting of 352 pages belonging to Persian Wikipedia, using content-based and context-based web page features. Our experiments demonstrate the usefulness of combining these features.
Farkas, Richárd; Szarvas, György & Ormándi, Róbert Improving a state-of-the-art named entity recognition system using the world wide web Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications 2007 [452]
The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER} system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER} task, and proved to be particularly effective in the speech-to-text scenario.
Farley, Alan & Barton, Siew Mee Developing and rewarding advanced teaching expertise in higher education - a different approach World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [453]
Feldmann, Birgit & Franzkowiak, Bettina Studying in Web 2.0 - What (Distance) Students Really Want World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [454]
Ferguson, Donald F. Autonomic business service management Proceedings of the 6th international conference on Autonomic computing 2009 [455]
Medium and large enterprises think of information technology implementing business services. Examples include online banking or Web commerce. Most systems and application management technology manage individual hardware and software systems. A business service is inherently a composite comprised from multiple {HW, SW} and logical entities. For example, a Web commerce system may have a Web server, Web application server, database server and messaging system to connect to mainframe inventory management. Each of the systems has various installed software. Businesses want to automate management of the business service, not the individual instances. IT} management systems must manage the service, unwind" the high level policies and operations and apply them to individual {HW} and SW} elements. SOA} makes managing composites more difficult due to dynamic binding and request routing. This presentation describes the design and implementation of a business service management system. The core elements include: A Unified Service Model A real-time management database that extends the concept of a Configuration Management Database (CMDB) {[456]} and integrates external management and monitoring systems. Rule based event correlation and rule based discovery of the structure of a business service. Algorithmic analysis of the composite service to automatically detect and repair availability and end-to-end performance problems. The presentation suggests topics for additional research."
Ferres, D. & Rodriguez, H. TALP at GikiCLEF 2009 Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [457]
This paper describes our experiments in Geographical Information Retrieval with the Wikipedia collection in the context of our participation in the GikiCLEF} 2009 Multilingual task in English and Spanish. Our system, called GikiTALP, follows a simple approach that uses standard Information Retrieval with the Sphinx full-text search engine and some Natural Language Processing techniques without Geographical Knowledge.
Ferrés, Daniel & Rodríguez, Horacio Experiments adapting an open-domain question answering system to the geographical domain using scope-based resources Proceedings of the Workshop on Multilingual Question Answering 2006 [458]
This paper describes an approach to adapt an existing multilingual Open-Domain} Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA} system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS} Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.
Fiaidhi, Jinan & Mohammed, Sabah Detecting Some Collaborative Academic Indicators Based on Social Networks: A DBLP Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [459]
Filatova, Elena Directions for exploiting asymmetries in multilingual Wikipedia Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies 2009 [460]
Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for training and testing NLP} applications. In this paper we present preliminary results on quantifying Wikipedia multilinguality. Our results support the observation about the substantial variation in descriptions of Wikipedia entries created in different languages. However, we believe that asymmetries in multilingual Wikipedia do not make Wikipedia an undesirable corpus for NLP} applications training. On the contrary, we outline research directions that can utilize multilingual Wikipedia asymmetries to bridge the communication gaps in multilingual societies.
Fleet, Gregory & Wallace, Peter How could Web 2.0 be shaping web-assisted learning? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [461]
Flouris, G.; Fundulaki, I.; Pediaditis, P.; Theoharis, Y. & Christophides, V. Coloring RDF triples to capture provenance Semantic Web - ISWC 2009. 8th International Semantic Web Conference, ISWC 2009, 25-29 Oct. 2009 Berlin, Germany 2009 [462]
Recently, the W3C} Linking Open Data effort has boosted the publication and inter-linkage of large amounts of RDF} datasets on the Semantic Web. Various ontologies and knowledge bases with millions of RDF} triples from Wikipedia and other sources, mostly in e-science, have been created and are publicly available. Recording provenance information of RDF} triples aggregated from different heterogeneous sources is crucial in order to effectively support trust mechanisms, digital rights and privacy policies. Managing provenance becomes even more important when we consider not only explicitly stated but also implicit triples (through RDFS} inference rules) in conjunction with declarative languages for querying and updating RDF} graphs. In this paper we rely on colored RDF} triples represented as quadruples to capture and manipulate explicit provenance information.
Fogarolli, Angela & Ronchetti, Marco A Web 2.0-enabled digital library World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [463]
Foley, Brian & Chang, Tae Wiki as a Professional Development Tool Society for Information Technology \& Teacher Education International Conference 2008 [464]
Forrester, Bruce & Verdon, John Introducing Peer Production into the Department of National Defense World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [465]
Forte, Andrea & Bruckman, Amy From Wikipedia to the classroom: exploring online publication and learning Proceedings of the 7th international conference on Learning sciences 2006 [466]
Wikipedia represents an intriguing new publishing paradigm---can it be used to engage students in authentic collaborative writing activities? How can we design wiki publishing tools and curricula to support learning among student authors? We suggest that wiki publishing environments can create learning opportunities that address four dimensions of authenticity: personal, real world, disciplinary, and assessment. We have begun a series of design studies to investigate links between wiki publishing experiences and writing-to-learn. The results of an initial study in an undergraduate government course indicate that perceived audience plays an important role in helping students monitor the quality of writing; however, students' perception of audience on the Internet is not straightforward. This preliminary iteration resulted in several guidelines that are shaping efforts to design and implement new wiki publishing tools and curricula for students and teachers.
Francke, H. & Sundin, O. An inside view: credibility in Wikipedia from the perspective of editors Information Research 2010
Introduction. The question of credibility in participatory information environments, particularly Wikipedia, has been much debated. This paper investigates how editors on Swedish Wikipedia consider credibility when they edit and read Wikipedia articles. Method. The study builds on interviews with 11 editors on Swedish Wikipedia, supported by a document analysis of policies on Swedish Wikipedia. Analysis. The interview transcripts have been coded qualitatively according to the participants' use of Wikipedia and what they take into consideration in making credibility assessments. Results. The participants use Wikipedia for purposes where it is not vital that the information is correct. Their credibility assessments are mainly based on authorship, verifiability, and the editing history of an article. Conclusions. The situations and purposes for which the editors use Wikipedia are similar to other user groups, but they draw on their knowledge as members of the network of practice of wikipedians to make credibility assessments, including knowledge of certain editors and of the MediaWiki} architecture. Their assessments have more similarities to those used in traditional media than to assessments springing from the wisdom of crowds.
Freeman, Wendy Reflecting on the Culture of Research Using Weblogs Society for Information Technology \& Teacher Education International Conference 2006 [467]
Futrell-Schilling, Dawn Teaching and Learning in the Conceptual Age: Integrating a Sense of Symphony into the Curriculum Society for Information Technology \& Teacher Education International Conference 2009 [468]
Gagne, Claude & Fels, Deborah Learning through Weblogs World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [469]
Ganeshan, Kathiravelu A Technological Framework for Improving Education in the Developing World World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [470]
Ganeshan, Kathiravelu & Komosny, Dan Rojak: A New Paradigm in Teaching and Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [471]
Ganjisaffar, Y.; Javanmardi, S. & Lopes, C. Review-based ranking of Wikipedia articles 2009 International Conference on Computational Aspects of Social Networks (CASON), 24-27 June 2009 Piscataway, NJ, USA} 2009 [472]
Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."
Ganjisaffar, Yasser; Javanmardi, Sara & Lopes, Cristina Review-Based Ranking of Wikipedia Articles Proceedings of the 2009 International Conference on Computational Aspects of Social Networks 2009 [473]
Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on Wisdom} of Crowds" and the effectiveness of the knowledge collected by a large number of people we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy PageRank} and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based} rankings."
Gantner, Zeno & Schmidt-Thieme, Lars Automatic content-based categorization of Wikipedia articles Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [474]
Wikipedia's article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse -- using text classification methods for predicting the categories of Wikipedia articles -- has attracted less attention so far. We propose to return the favor" and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine Learning/NLP} methods. We define the categorization of Wikipedia articles as a multi-label classification task describe two solutions to the task and perform experiments that show that our approach is feasible despite the high number of labels."
Gaonkar, Shravan & Choudhury, Romit Roy Micro-Blog: map-casting from mobile phones to virtual sensor maps Proceedings of the 5th international conference on Embedded networked sensor systems 2007 [475]
The synergy of phone sensors (microphone, camera, GPS, etc.), wireless capability, and ever-increasing device density can lead to novel people-centric applications. Unlike traditional sensor networks, the next generation networks may be participatory, interactive, and in the scale of human users. Millions of global data points can be organized on a visual platform, queried, and sophistically answered through human participation. Recent years have witnessed the isolated impacts of distributed knowledge sharing (Wikipedia), social networks, sensor networks, and mobile communication. We believe that significant more impact is latent in their convergence, that can to be drawn out through innovations in applications. This demonstration, called Micro-Blog, is a first step towards this goal.
Gardner, J.; Krowne, A. & Xiong, Li NNexus: towards an automatic linker for a massively-distributed collaborative corpus 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA} 2006
Collaborative online encyclopedias such as Wikipedia and PlanetMath} are becoming increasingly popular. In order to understand an article in a corpus a user must understand the related and underlying concepts through linked articles. In this paper, we introduce NNexus, a generalization of the automatic linking component of PlanetMath.org} and the first system that automates the process of linking encyclopedia entries into a semantic network of concepts. We discuss the challenges, present the conceptual models as well as specific mechanisms of NNexus} system, and discuss some of our ongoing and completed works
Garvoille, Alexa & Buckner, Ginny Writing Wikipedia Pages in the Constructivist Classroom World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [476]
Garza, S.E. & Brena, R.F. Graph local clustering for topic detection in Web collections 2009 Latin American Web Congress. LA-WEB 2009, 9-11 Nov. 2009 Piscataway, NJ, USA} 2009 [477]
In the midst of a developing Web that increases its size with a constant rhythm, automatic document organization becomes important. One way to arrange documents is by categorizing them into topics. Even when there are different forms to consider topics and their extraction, a practical option is to view them as document groups and apply clustering algorithms. An attractive alternative that naturally copes with the Web size and complexity is the one proposed by graph local clustering (GLC) methods. In this paper, we define a formal framework for working with topics in hyperlinked environments and analyze the feasibility of GLC} for this task. We performed tests over an important Web collection, namely Wikipedia, and our results, which were validated using various kinds of methods (some of them specific for the information domain), indicate that this approach is suitable for topic discovery.
Geiger, R. Stuart & Ribes, David The work of sustaining order in wikipedia: the banning of a vandal Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [478]
In this paper, we examine the social roles of software tools in the English-language Wikipedia, specifically focusing on autonomous editing programs and assisted editing tools. This qualitative research builds on recent research in which we quantitatively demonstrate the growing prevalence of such software in recent years. Using trace ethnography, we show how these often-unofficial technologies have fundamentally transformed the nature of editing and administration in Wikipedia. Specifically, we analyze vandal fighting" as an epistemic process of distributed cognition highlighting the role of non-human actors in enabling a decentralized activity of collective intelligence. In all this case shows that software programs are used for more than enforcing policies and standards. These tools enable coordinated yet decentralized action independent of the specific norms currently in force."
Gentile, Anna Lisa; Basile, Pierpaolo; Iaquinta, Leo & Semeraro, Giovanni Lexical and Semantic Resources for NLP: From Words to Meanings Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III 2008 [479]
A user expresses her information need through words with a precise meaning, but from the machine point of view this meaning does not come with the word. A further step is needful to automatically associate it to the words. Techniques that process human language are required and also linguistic and semantic knowledge, stored within distinct and heterogeneous resources, which play an important role during all Natural Language Processing (NLP) steps. Resources management is a challenging problem, together with the correct association between URIs} coming from the resources and meanings of the Words.This} work presents a service that, given a lexeme (an abstract unit of morphological analysis in linguistics, which roughly corresponds to a set of words that are different forms of the same word), returns all syntactic and semantic information collected from a list of lexical and semantic resources. The proposed strategy consists in merging data with origin from stable resources, such as WordNet, with data collected dynamically from evolving sources, such as the Web or Wikipedia. That strategy is implemented in a wrapper to a set of popular linguistic resources that provides a single point of access to them, in a transparent way to the user, to accomplish the computational linguistic problem of getting a rich set of linguistic and semantic annotations in a compact way.
Geraci, Michael Implementing a Wiki as a collaboration tool for group projects World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [480]
Ghislandi, Patrizia; Mattei, Antonio; Paolino, Daniela; Pellegrini, Alice & Pisanu, Francesco Designing Online Learning Communities for Higher Education: Possibilities and Limits of Moodle World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [481]
Gibson, David; Reynolds-Alpert, Suzanne; Doering, Aaron & Searson, Michael Participatory Media in Informal Learning Society for Information Technology \& Teacher Education International Conference 2009 [482]
Giza, Brian & McCann, Erin The Use of Free Translation Tools in the Biology Classroom Society for Information Technology \& Teacher Education International Conference 2007 [483]
Gleim, Rüdiger; Mehler, Alexander & Dehmer, Matthias Web corpus mining by instance of Wikipedia Proceedings of the 2nd International Workshop on Web as Corpus 2006 [484]
In this paper we present an approach to structure learning in the area of web documents. This is done in order to approach the goal of webgenre tagging in the area of web corpus linguistics. A central outcome of the paper is that purely structure oriented approaches to web document classification provide an information gain which may be utilized in combined approaches of web content and structure analysis.
Gleim, R.; Mehler, A.; Dehmer, M. & Pustylnikov, O. Aisles through the category forest Third International Conference on Web information systems and technologies, WEBIST 2007, 3-6 March 2007 Setubal, Portugal 2007
The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the Web are much more demanding. In order to successfully develop approaches to Web mining, respective corpora are needed. However, the composition of genre- or domain-specific Web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because Web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki} software, can be utilised as a source of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia category explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning.
Glogoff, Stuart Channeling Students and Parents: Promoting the University Through YouTube World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [485]
Glover, Ian & Oliver, Andrew Hybridisation of Social Networking and Learning Environments World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [486]
Glover, Ian; Xu, Zhijie & Hardaker, Glenn Redeveloping an eLearning Annotation System as a Web Service World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [487]
Goh, Hui-Ngo & Kiu, Ching-Chieh Context-based term identification and extraction for ontology construction 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA} 2010 [488]
Ontology construction often requires a domain specific corpus in conceptualizing the domain knowledge; specifically, it is an association of terms, relation between terms and related instances. It is a vital task to identify a list of significant term for constructing a practical ontology. In this paper, we present the use of a context-based term identification and extraction methodology for ontology construction from text document. The methodology is using a taxonomy and Wikipedia to support automatic term identification and extraction from structured documents with an assumption of candidate terms for a topic are often associated with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the context-based term identification and extraction methodology is viable in defining topic concepts and its sub-concepts for constructing ontology. The experimental results have also proven its viability to be applied in a small corpus / text size environment in supporting ontology construction.
González-Martínez, MaríaDolores & Herrera-Batista, Miguel Angel Habits and preferences of University Students on the use of Information and Communication Technologies in their academic activities and of socialization Society for Information Technology \& Teacher Education International Conference 2009 [489]
Gool, Luc Van; Breitenstein, Michael D.; Gammeter, Stephan; Grabner, Helmut & Quack, Till Mining from large image sets Proceeding of the ACM International Conference on Image and Video Retrieval 2009 [490]
So far, most image mining was based on interactive querying. Although such querying will remain important in the future, several applications need image mining at such wide scales that it has to run automatically. This adds an additional level to the problem, namely to apply appropriate further processing to different types of images, and to decide on such processing automatically as well. This paper touches on those issues in that we discuss the processing of landmark images and of images coming from webcams. The first part deals with the automated collection of images of landmarks, which are then also automatically annotated and enriched with Wikipedia information. The target application is that users photograph landmarks with their mobile phones or PDAs, and automatically get information about them. Similarly, users can get images in their photo albums annotated automatically. The object of interest can also be automatically delineated in the images. The pipeline we propose actually retrieves more images than manual keyword input would produce. The second part of the paper deals with an entirely different source of image data, but one that also produces massive amounts (although typically not archived): webcams. They produce images at a single location, but rather continuously and over extended periods of time. We propose an approach to summarize data coming from webcams. This data handling is quite different from that applied to the landmark images.
Gore, David; Lee, Marie & Wassus, Kenny New Possibilities with IT and Print Technologies: Variable Data Printing VDP Society for Information Technology \& Teacher Education International Conference 2010 [491]
Gray, Kathleen Originality and Plagiarism Resources for Academic Staff Development in the Era of New Web Authoring Formats World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [492]
Greenberg, Valerie & Carbajal, Darlene Using Convergent Media to Engage Graduate Students in a Digital and Electronic Writing class: Some Surprising Results World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [493]
Greene, M. Epidemiological Monitoring for Emerging Infectious Diseases Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense IX, 5-8 April 2010 USA} 2010 [494]
The Homeland Security News Wire has been reporting on new ways to fight epidemics using digital tools such as IPhone, social networks, Wikipedia, and other Internet sites. Instant two-way communication now gives consumers the ability to complement official reports on emerging infectious diseases from health authorities. However, there is increasing concern that these communications networks could open the door to mass panic from unreliable or false reports. There is thus an urgent need to ensure that epidemiological monitoring for emerging infectious diseases gives health authorities the capability to identify, analyze, and report disease outbreaks in as timely and efficient a manner as possible. One of the dilemmas in the global dissemination of information on infectious diseases is the possibility that information overload will create inefficiencies as the volume of Internet-based surveillance information increases. What is needed is a filtering mechanism that will retrieve relevant information for further analysis by epidemiologists, laboratories, and other health organizations so they are not overwhelmed with irrelevant information and will be able to respond quickly. This paper introduces a self-organizing ontology that could be used as a filtering mechanism to increase relevance and allow rapid analysis of disease outbreaks as they evolve in real time.
Greenhow, Christine What Teacher Education Needs to Know about Web 2.0: Preparing New Teachers in the 21st Century Society for Information Technology \& Teacher Education International Conference 2007 [495]
Greenhow, Christine; Searson, Michael & Strudler, Neal FWIW: What the Research Says About Engaging the Web 2.0 Generation Society for Information Technology \& Teacher Education International Conference 2009 [496]
Guerrero, Shannon Web 2.0 in a Preservice Math Methods Course: Teacher Candidates’ Perceptions and Predictions Society for Information Technology \& Teacher Education International Conference 2010 [497]
Guetl, Christian Context-sensitive and Personalized Concept-based Access to Knowledge for Learning and Training Purposes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [498]
Guo, Zinan & Greer, Jim Connecting E-portfolios and Learner Models World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [499]
Gupta, Priyanka; Seals, Cheryl & Wilson, Dale-Marie Design And Evaluation of SimBuilder World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [500]
Gurevych, Iryna & Zesch, Torsten Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [501]
Welcome to the proceedings of the ACL} Workshop The} People's Web Meets NLP:} Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP} community is currently considerably influenced by online resources which are collaboratively constructed by ordinary users on the Web. In many works such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary Mechanical Turk or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community. We should also add that we hoped to bring together researchers from both worlds: those using collaboratively created resources in NLP} applications and those using NLP} applications for improving the resources or extracting different types of semantic information from them. This is also reflected in the proceedings although the stronger interest was taken in using semantic resources for NLP} applications."
Guru, D. S.; Harish, B. S. & Manjunath, S. Symbolic representation of text documents Proceedings of the Third Annual ACM Bangalore Conference 2010 [502]
This paper presents a novel method of representing a text document by the use of interval valued symbolic features. A method of classification of text documents based on the proposed representation is also presented. The newly proposed model significantly reduces the dimension of feature vectors and also the time taken to classify a given document. Further, extensive experimentations are conducted on vehicles-wikipedia datasets to evaluate the performance of the proposed model. The experimental results reveal that the obtained results are on par with the existing results for vehicles-wikipedia dataset. However, the advantage of the proposed model is that it takes relatively a less time for classification as it is based on a simple matching strategy.
Gyarmati, A. & Jones, G.J.F. When to Cross Over? Cross-Language Linking Using Wikipedia for VideoCLEF 2009 Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [503]
We describe Dublin City University (DCU)'s} participation in the VideoCLEF} 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV} broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method first translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the Dutch Wikipedia yielded better results.
Hamilton, Margaret & Howell, Sheila Technology Options for Assessment Purposes and Quality Graduate Outcomes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [504]
Hammond, Thomas; Friedman, Adam; Keeler, Christy; Manfra, Meghan & Metan, Demet Epistemology is elementary: Historical thinking as applied epistemology in an elementary social studies methods class Society for Information Technology \& Teacher Education International Conference 2008 [505]
Haridas, M. & Caragea, D. Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany 2009 [506]
The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests.
Harman, D.; Kando, N.; Lalmas, M. & Peters, C. The Four Ladies of Experimental Evaluation Multilingual and Multimodal Information Access Evaluation. International Conference of the Cross-Language Evaluation Forum, CLEF 2010, 20-23 Sept. 2010 Berlin, Germany 2010 [507]
The goal of the panel is to present some of the main lessons that we have learned in well over a decade of experimental evaluation and to promote discussion with respect to what the future objectives in this field should Be.TREC} was started in 1992 in conjunction with the building of a new 2 GB} test collection for the DARPA} TIPSTER} project. Whereas the main task in the early TRECs} was the adhoc retrieval task in English, many other tasks such as question-answering, web retrieval, and retrieval within specific domain have been tried over the years. NTCIR, the Asian version of TREC, started in 1999 and has run in an 18-months cycle. Whereas NTCIR} is similar to TREC, there has always been a tighter connection to the NLP} community, allowing for some unique tracks. Additionally NTCIR} has done extensive pioneering work with patents, including searching, classification, and translation. The coordination of the European CLIR} task moved from TREC} to Europe in 2000 and CLEF} (Cross-Language} Information Forum) was launched. The objective was to expand the European CLIR} effort by including more languages and more tasks, and by encouraging more participation from Europe. The INitiative} for the Evaluation of XML} retrieval (INEX) started in 2002 to provide evaluation of structured document retrieval, in particular to investigate the retrieval of document components that are XML} elements of varying granularity. The initiative used 12,107 full-text scientific articles from 18 IEEE} Computer Society publications, with each article containing 1,532 XML} nodes on average. The collection grew to 16,819 articles in 2005 and moved on to using Wikipedia articles starting in 2006.
Hartrumpf, S.; Bruck, T. Vor Der & Eichhorn, C. Detecting duplicates with shallow and parser-based methods 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010), 21-23 Aug. 2010 Piscataway, NJ, USA} 2010 [508]
Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.
Hartrumpf, S. & Leveling, J. Recursive Question Decomposition for Answering Complex Geographic Questions Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [509]
This paper describes the GIRSA-WP} system and the experiments performed for GikiCLEF} 2009, the geographic information retrieval task in the question answering track at CLEF} 2009. Three runs were submitted. The first one contained only results from the InSicht} QA} system; it showed high precision, but low recall. The combination with results from the GIR} system GIRSA} increased recall considerably, but reduced precision. The second run used a standard IR} query, while the third run combined such queries with a Boolean query with selected keywords. The evaluation showed that the third run achieved significantly higher mean average precision (MAP) than the second run. In both cases, integrating GIR} methods and QA} methods was successful in combining their strengths (high precision of deep QA, high recall of GIR), resulting in the third-best performance of automatic runs in GikiCLEF.} The overall performance still leaves room for improvements. For example, the multilingual approach is too simple. All processing is done in only one Wikipedia (the German one); results for the nine other languages are collected by following the translation links in Wikipedia.
Hattori, S. & Tanaka, K. Extracting concept hierarchy knowledge from the Web based on property inheritance and aggregation Wl 2008. 2008 IEEE/WIC/ACM International Conference on Web Intelligence. IAT 2008. 2008 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. Wl-IAT Workshop 2008. 2008 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops, 9-12 Dec. 2008 Piscataway, NJ, USA} 2008 [510]
Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various natural language processing systems. While WordNet} and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their methods are mostly based on lexico-syntactic patterns as not necessary but sufficient conditions of hyponymy and meronymy, so they can achieve high precision but low recall when using stricter patterns or they can achieve high recall but low precision when using looser patterns. Therefore, we need necessary conditions of hyponymy and meronymy to achieve high recall and not low precision. In this paper, not only Property} Inheritance from a target concept to its hyponyms but also {"Property} Aggregation from its hyponyms to the target concept is assumed to be necessary and sufficient conditions of hyponymy and we propose a method to extract concept hierarchy knowledge from the Web based on property inheritance and property aggregation."
Hauck, Rita Immersion in another Language and Culture through Multimedia and Web Resources World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [511]
Hecht, Brent & Gergle, Darren Measuring self-focus bias in community-maintained knowledge repositories Proceedings of the fourth international conference on Communities and technologies 2009 [512]
Self-focus is a novel way of understanding a type of bias in community-maintained Web 2.0 graph structures. It goes beyond previous measures of topical coverage bias by encapsulating both node- and edge-hosted biases in a single holistic measure of an entire community-maintained graph. We outline two methods to quantify self-focus, one of which is very computationally inexpensive, and present empirical evidence for the existence of self-focus using a hyperlingual" approach that examines 15 different language editions of Wikipedia. We suggest applications of our methods and discuss the risks of ignoring self-focus bias in technological applications."
Hecht, Brent J. & Gergle, Darren On the localness" of user-generated content" Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [513]
The localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's} greatest benefits. However the degree of localness in major UGC} repositories such as Flickr and Wikipedia has never been examined. We show that over 50 percent of Flickr users contribute local information on average and over 45 percent of Flickr photos are local to the photographer. Across four language editions of Wikipedia however we find that participation is less local. We introduce the spatial content production model (SCPM) as a possible factor in the localness of UGC} and discuss other theoretical and applied implications."
Heer, Rex My Space in College: Students Use of Virtual Communities to Define their Fit in Higher Education Society for Information Technology \& Teacher Education International Conference 2007 [514]
Hellmann, S.; Stadler, C.; Lehmann, J. & Auer, S. DBpedia Live Extraction On the Move to Meaningful Internet Systems: OTM 2009. Confederated International Conferences CoopIS, DOA, IS, and ODBASE 2009, 1-6 Nov. 2009 Berlin, Germany 2009 [515]
The DBpedia} project extracts information from Wikipedia, interlinks it with other knowledge bases, and makes this data available as RDF.} So far the DBpedia} project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the heavy-weight extraction process has been a drawback. It requires manual effort to produce a new release and the extracted information is not up-to-date. We extended DBpedia} with a live extraction framework, which is capable of processing tens of thousands of changes per day in order to consume the constant stream of Wikipedia updates. This allows direct modifications of the knowledge base and closer interaction of users with DBpedia.} We also show how the Wikipedia community itself is now able to take part in the DBpedia} ontology engineering process and that an interactive roundtrip engineering between Wikipedia and DBpedia} is made possible.
Hengstler, Julia Exploring Open Source for Educators: We're Not in Kansas Anymore--Entering Os World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [516]
Hennis, Thieme; Veen, Wim & Sjoer, Ellen Future of Open Courseware; A Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [517]
Heo, Gyeong Mi; Lee, Romee & Park, Young Blog as a Meaningful Learning Context: Adult Bloggers as Cyworld Users in Korea World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [518]
Herbold, Katy & Hsiao, Wei-Ying Online Learning on Steroids: Combining Brain Research with Time Saving Techniques World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [519]
Herczeg, Michael Educational Media: From Canned Brain Food to Knowledge Traces World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [520]
Herring, Donna & Friery, Kathleen efolios for 21st Century Learners World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [521]
Herring, Donna; Hibbs, Roger; Morgan, Beth & Notar, Charles Show What You Know: ePortfolios for 21st Century Learners Society for Information Technology \& Teacher Education International Conference 2007 [522]
Herrington, Anthony; Kervin, Lisa & Ilias, Joanne Blogging Beginning Teaching Society for Information Technology \& Teacher Education International Conference 2006 [523]
Herrington, Jan Authentic E-Learning in Higher Education: Design Principles for Authentic Learning Environments and Tasks World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [524]
Heuer, Lars Towards converting the internet into topic maps Proceedings of the 2nd international conference on Topic maps research and applications 2006 [525]
This paper describes Semants, a work-in progress framework that uses the Wikipedia as focal point to collect information from various resources. Semants aims at developing several specialized applications (the ants) that are used to convert a resource into a topic map fragment that is merged into a bigger topic map.
Hewitt, Jim & Peters, Vanessa Using Wikis to Support Knowledge Building in a Graduate Education Course World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [526]
Hewitt, Jim; Peters, Vanessa & Brett, Clare Using Wiki Technologies as an Adjunct to Computer Conferencing in a Graduate Teacher Education Course Society for Information Technology \& Teacher Education International Conference 2006 [527]
Higdon, Jude; Miller, Sean & Paul, Nora Educational Gaming for the Rest of Us: Thinking Worlds and WYSIWYG Game Development World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [528]
Hoehndorf, R.; Prufer, K.; Backhaus, M.; Herre, H.; Kelso, J.; Loebe, F. & Visagie, J. A proposal for a gene functions wiki On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops Berlin, Germany 2006
Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the Internet now makes large-scale community collaboration for the construction of knowledge bases, such as the successful online encyclopedia Wikipedia"} possible. We propose an extension of this model to the collaborative annotation of molecular data. We argue that a semantic wiki provides the functionality required for this project since this can capitalize on the existing representations in biological ontologies. We discuss the use of a different relationship model than the one provided by RDF} and OWL} to represent the semantic data. We argue that this leads to a more intuitive and correct way to enter semantic content in the wiki. Furthermore we show how formal ontologies could be used to increase the usability of the software through type-checking and automatic reasoning"
Holcomb, Lori & Beal, Candy Using Web 2.0 to Support Learning in the Social Studies Context Society for Information Technology \& Teacher Education International Conference 2008 [529]
Holifield, Phil Visual History Project: an Image Map Authoring Tool Assisting Students to Present Project Information World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [530]
Holmes, Bryn; Wasty, Shujaat; Hafeez, Khaled & Ahsan, Shakib The Knowledge Box: Can a technology bring schooling to children in crisis? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [531]
Honnibal, Matthew; Nothman, Joel & Curran, James R. Evaluating a statistical CCG parser on Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [532]
The vast majority of parser evaluation is conducted on the 1984 Wall Street Journal (WSJ).} In-domain evaluation of this kind is important for system development, but gives little indication about how the parser will perform on many practical problems. Wikipedia is an interesting domain for parsing that has so far been under-explored. We present statistical parsing results that for the first time provide information about what sort of performance a user parsing Wikipedia text can expect. We find that the C\&C} parser's standard model is 4.3\% less accurate on Wikipedia text, but that a simple self-training exercise reduces the gap to 3.8\%. The self-training also speeds up the parser on newswire text by 20\%.
Hopson, David & Martland, David Network Web Directories: Do they deliver and to whom? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [533]
Hoven, Debra Networking to learn: blogging for social and collaborative purposes World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [534]
Hsu, Yu-Chang; Ching, Yu-Hui & Grabowski, Barbara Bookmarking/Tagging in the Web 2.0 Era: From an Individual Cognitive Tool to a Collaborative Knowledge Construction Tool for Educators World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [535]
Hu, Meiqun; Lim, Ee-Peng; Sun, Aixin; Lauw, Hady Wirawan & Vuong, Ba-Quy On improving wikipedia search using article quality Proceedings of the 9th annual ACM international workshop on Web information and data management 2007 [536]
Wikipedia is presently the largest free-and-open online encyclopedia collaboratively edited and maintained by volunteers. While Wikipedia offers full-text search to its users, the accuracy of its relevance-based search can be compromised by poor quality articles edited by non-experts and inexperienced contributors. In this paper, we propose a framework that re-ranks Wikipedia search results considering article quality. We develop two quality measurement models, namely Basic and Peer Review, to derive article quality based on co-authoring data gathered from articles' edit history. Compared WithWikipedia's full-text search engine, Google and Wikiseek, our experimental results showed that (i) quality-only ranking produced by Peer Review gives comparable performance to that of Wikipedia and Wikiseek; (ii) Peer Review combined with relevance ranking outperforms Wikipedia's full-text search significantly, delivering search accuracy comparable to Google.
ling Huang, Hsiang & ju Hung, Yu An overview of information technology on language education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [537]
Huang, Wenhao & Yoo, Sunjoo How Do Web 2.0 Technologies Motivate Learners? A Regression Analysis based on the Motivation, Volition, and Performance Theory World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [538]
Huang, Yin-Fu & Huang, Yu-Yu A framework automating domain ontology construction WEBIST 2008. Fourth International Conference on Web Information Systems and Technologies, 4-7 May 2008 Madeira, Portugal 2008
This paper proposed a general framework that could automatically construct domain ontology on a collection of documents with the help of The Free Dictionary, WordNet, and Wikipedia Categories. Both explicit and implicit features of index terms in documents are used to evaluate word correlations and then to construct Is-A} relationships in the framework. Thus, the built ontology would consist of 1) concepts, 2) Is-A} and Parts-of relationships among concepts, and 3) word relationships. Besides, the built ontology could be further refined by learning from incremental documents periodically. To help users browse the built ontology, an ontology browsing system was implemented and provided different search modes and functionality to facilitate searching a variety of relationships.
Huckell, Travis The Academic Exception as Foundation for Innovation in Online Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [539]
Hussein, Ramlah; Saeed, Moona; Karim, Nor Shahriza Abdul & Mohamed, Norshidah Instructor’s Perspective on Factors influencing Effectiveness of Virtual Learning Environment (VLE) in the Malaysian Context: Proposed Framework World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [540]
Hwang, Jya-Lin University EFL Students’ Learning Strategies On Multimedia YouTube World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [541]
Höller, Harald & Reisinger, Peter Wiki Based Teaching and Learning Scenarios at the University of Vienna World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [542]
Høivik, Helge An Experimental Player/Editor for Web-based Multi-Linguistic Cooperative Lectures World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [543]
Høivik, Helge Read and Write Text and Context - Learning as Poietic Fields of Engagement World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [544]
Iftene, A. Identifying Geographical Entities in Users' Queries Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [545]
In 2009 we built a system in order to compete in the LAGI} task (Log} Analysis and Geographic Query Identification). The system uses an external resource built into GATE} in combination with Wikipedia and Tumba in order to identify geographical entities in user's queries. The results obtained with and without Wikipedia resources are comparable. The main advantage of only using GATE} resources is the improved run time. In the process of system evaluation we have identified the main problem of our approach: the system has insufficient external resources for the recognition of geographic entities.
Iftene, Adrian Building a Textual Entailment System for the RTE3 Competition. Application to a QA System Proceedings of the 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing 2008 [546]
Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Indrie, S.M. & Groza, A. Enacting argumentative web in semantic wikipedia 2010 9th Roedunet International Conference (RoEduNet), 24-26 June 2010 Piscataway, NJ, USA} 2010
This research advocates the idea of combining argumentation theory with the social web technology, aiming to enact large scale or mass argumentation. The proposed framework allows mass-collaborative editing of structured arguments in the style of semantic wikipedia. The long term goal is to apply the abstract machinery of argumentation theory to more practical applications based on human generated arguments, such as deliberative democracy, business negotiation, or self-care.
Ingram, Richard JMU/Microsoft Partnership for 21st Century Skills: Overview of Goals, Activities, and Challenges Society for Information Technology \& Teacher Education International Conference 2007 [547]
Inkpen, Kori; Gutwin, Carl & Tang, John Proceedings of the 2010 ACM conference on Computer supported cooperative work 2010 [548]
Welcome to the 2010 ACM} Conference on Computer Supported Cooperative Work! We hope that this conference will be a place to hear exciting talks about the latest in CSCW} research, an opportunity to learn new things, and a chance to connect with friends in the community. We are pleased to see such a strong and diverse program at this year's conference. We have a mix of research areas represented -- some that are traditionally part of our community, and several that have not been frequently seen at CSCW.} There are sessions to suit every taste: from collaborative software development, healthcare, and groupware technologies, to studies of Wikipedia, family communications, games, and volunteering. We are particularly interested in a new kind of forum at the conference this year -- the {'CSCW} Horizon' -- which will present novel and challenging ideas, and will do so in a more interactive fashion than standard paper sessions. The program is an exciting and topical mix of cutting-edge research and thought in CSCW.} A major change for CSCW} beginning this year is our move from being a biennial to an annual conference. This has meant a change in the time of the conference (from November to February), and subsequent changes in all of our normal deadlines and procedures. Despite these changes, the community has responded with enormous enthusiasm, and we look forward to the future of yearly meetings under the ACM} CSCW} banner.
Ioannou, Andri Towards a Promising Technology for Online Collaborative Learning: Wiki Threaded Discussion World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [549]
Ioannou, Andri & Artino, Anthony Incorporating Wikis in an Educational Technology Course: Ideas, Reflections and Lessons Learned … Society for Information Technology \& Teacher Education International Conference 2008 [550]
Ion, Radu; Ştefănescu, Dan; Ceauşu, Alexandru & Tufiş, Dan RACAI's QA system at the Romanian-Romanian QA@CLEF2008 main task Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access 2008 [551]
This paper describes the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Multiple Language Question Answering Main Task at the CLEF} 2008 competition. We present our Question Answering system answering Romanian questions from Romanian Wikipedia documents focusing on the implementation details. The presentation will also emphasize the fact that question analysis, snippet selection and ranking provide a useful basis of any answer extraction mechanism.
Iqbal, Muhammad; Barton, Greg & Barton, Siew Mee Internet in the pesantren: A tool to promote or continue autonomous learning? Global Learn Asia Pacific 2010 [552]
Ireland, Alice; Kaufman, David & Sauvé, Louise Simulation and Advanced Gaming Environments (SAGE) for Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [553]
Iske, Stefan & Marotzki, Winfried Wikis: Reflexivity, Processuality and Participation World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [554]
Jackson, Allen; Gaudet, Laura; Brammer, Dawn & McDaniel, Larry Curriculum, a Change in Theoretical Thinking Theory World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [555]
Jacquin, Christine; Desmontils, Emmanuel & Monceaux, Laura French EuroWordNet Lexical Database Improvements Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing 2009 [556]
Semantic knowledge is often used in the framework of Natural Language Processing (NLP) applications. However, for some languages different from English, such knowledge is not always easily available. In fact, for example, French thesaurus are not numerous and are not enough developed. In this context, we present two modifications made on the French version of the EuroWordnet} Thesaurus in order to improve it. Firstly, we present the French EuroWordNet} thesaurus and its limits. Then we explain two improvements we have made. We add non-existing relationships by using the bilinguism capability of the EuroWordnet} thesaurus, and definitions by using an external multilingual resource (Wikipedia} [1]).
Jadidinejad, A.H. & Mahmoudi, F. Cross-language Information Retrieval Using Meta-language Index Construction and Structural Queries Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [557]
Structural Query Language allows expert users to richly represent its information needs but unfortunately, the complexity of SQLs} make them impractical in the Web search engines. Automatically detecting the concepts in an unstructured user's information need and generating a richly structured, multilingual equivalent query is an ideal solution. We utilize Wikipedia as a great concept repository and also some state of the art algorithms for extracting Wikipedia's concepts from the user's information need. This process is called Query} Wikification". Our experiments on the TEL} corpus at CLEF2009} achieves +23\% and + 17\% improvement in Mean Average Precision and Recall against the baseline. Our approach is unique in that it does improve both precision and recall; two pans that often improving one hurt the another."
Jamaludin, Rozinah; Annamalai, Subashini & Abdulwahed, Mahmoud Web 1.0, Web 2.0: Implications to move from Education 1.0 to Education 2.0 to enhance collaborative intelligence towards the future of Web 3.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [558]
von Jan, Ute; Ammann, Alexander; Matthies, Herbert K. & von Jan, Ute Generating and Presenting Dynamic Knowledge in Medicine and Dentistry World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [559]
Jang, Soobaek & Green, T.M. Best practices on delivering a wiki collaborative solution for enterprise applications 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing, 17-20 Nov. 2006 Piscataway, NJ, USA} 2006
Wikis have become a hot topic in the world of collaboration tools. Wikipedia.org, a vast, community-driven encyclopedia, has proven to be an invaluable information resource that has been developed through collaboration among thousands of people around the world. Today wikis are increasingly being employed for a wide variety of uses in business. Consequently, one of the key challenges is to enable wikis to interoperate with informational and business process applications. The ability to dynamically change the content of Webpages and reflect the changes within an enterprise application brings the power of collaboration to business applications. This paper includes general information about wikis and describes how to use a wiki solution within an enterprise application. Integrating an enterprise application with a wild permits real-time updates of pages in the application by certain groups of experts, without deploying files from the Web application server
Jankowski, Jacek & Decker, Stefan 2LIP: filling the gap between the current and the three-dimensional web Proceedings of the 14th International Conference on 3D Web Technology 2009 [560]
In this article we present a novel approach, the {2-Layer} Interface Paradigm (2LIP), for designing simple yet interactive {3D} web applications, an attempt to marry advantages of {3D} experience with the advantages of the narrative structure of hypertext. The hypertext information, together with graphics, and multimedia, is presented semi-transparently on the foreground layer. It overlays the {3D} representation of the information displayed in the background of the interface. Hyperlinks are used for navigation in the {3D} scenes (in both layers). We introduce a reference implementation of {2LIP:} Copernicus - The Virtual {3D} Encyclopedia, which can become a model for building {3D} Wikipedia. Based on the evaluation of Copernicus we show that designing web interfaces according to {2LIP} provides users with a better experience during browsing the Web, has a positive effect on the visual and associative memory, improves spatial cognition of presented information, and increases overall user's satisfaction without harming the interaction.
Jansche, Martin & Sproat, Richard Named entity transcription with pair n-gram models Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration 2009 [561]
We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC} datasets or mined from Wikipedia) for the Non-Standard} Runs.
Javanmardi, S. & Lopes, C.V. Modeling trust in collaborative information systems 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), 12 Nov.-15 Nov. 2007 Piscataway, NJ, USA} 2007 [562]
Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs and shared forums. All of these systems contain information and resources with different degrees of sensitivity. However, the open nature of such infrastructures makes it difficult for users to determine the reliability of the available information and trustworthiness of information providers. Hence, integrating trust management systems to open collaborative systems can play a crucial role in the growth and popularity of open information repositories. In this paper, we present a trust model for collaborative systems, namely for platforms based on Wiki technology. This model, based on hidden Markov models, estimates the reputation of the contributors and the reliability of the content dynamically. The focus of this paper is on reputation estimation. Evaluation results based on a subset of Wikipedia shows that the model can effectively be used for identifying vandals, and users with high quality contributions.
Jijkoun, Valentin; Khalid, Mahboob Alam; Marx, Maarten & de Rijke, Maarten Named entity normalization in user generated content Proceedings of the second workshop on Analytics for noisy unstructured text data 2008 [563]
Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and entity tracking. In many of these tasks it is important to be able to accurately normalize the recognized entities, i.e., to map surface forms to unambiguous references to real world entities. Within the context of structured databases, this task (known as record linkage and data de-duplication) has been a topic of active research for more than five decades. For edited content, such as news articles, the named entity normalization (NEN) task is one that has recently attracted considerable attention. We consider the task in the challenging context of user generated content (UGC), where it forms a key ingredient of tracking and media-analysis systems. A baseline NEN} system from the literature (that normalizes surface forms to Wikipedia pages) performs considerably worse on UGC} than on edited news: accuracy drops from 80\% to 65\% for a Dutch language data set and from 94\% to 77\% for English. We identify several sources of errors: entity recognition errors, multiple ways of referring to the same entity and ambiguous references. To address these issues we propose five improvements to the baseline NEN} algorithm, to arrive at a language independent NEN} system that achieves overall accuracy scores of 90\% on the English data set and 89\% on the Dutch data set. We show that each of the improvements contributes to the overall score of our improved NEN} algorithm, and conclude with an error analysis on both Dutch and English language UGC.} The NEN} system is computationally efficient and runs with very modest computational requirements.
Jitkrittum, Wittawat; Haruechaiyasak, Choochart & Theeramunkong, Thanaruk QAST: question answering system for Thai Wikipedia Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions 2009 [564]
We propose an open-domain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a search index. For the structured information, SPARQL} transformed query is applied to retrieve a short answer from the RDF} base. For the unstructured information, keyword-based query is used to retrieve the shortest text span containing the questions's key terms. From the experimental results, the system which integrates both approaches could achieve an average MRR} of 0.47 based on 215 test questions.
Johnson, Peter C.; Kapadia, Apu; Tsang, Patrick P. & Smith, Sean W. Nymble: anonymous IP-address blocking Proceedings of the 7th international conference on Privacy enhancing technologies 2007 [565]
Anonymizing networks such as Tor allow users to access Internet services privately using a series of routers to hide the client's IP} address from the server. Tor's success, however, has been limited by users employing this anonymity for abusive purposes, such as defacing Wikipedia. Website administrators rely on IP-address} blocking for disabling access to misbehaving users, but this is not practical if the abuser routes through Tor. As a result, administrators block all Tor exit nodes, denying anonymous access to honest and dishonest users alike. To address this problem, we present a system in which (1) honest users remain anonymous and their requests unlinkable; (2) a server can complain about a particular anonymous user and gain the ability to blacklist the user for future connections; (3) this blacklisted user's accesses before the complaint remain anonymous; and (4) users are aware of their blacklist status before accessing a service. As a result of these properties, our system is agnostic to different servers' definitions of misbehavior.
Jordan, C.; Watters, C. & Toms, E. Using Wikipedia to make academic abstracts more readable Proceedings of the American Society for Information Science and Technology 2008 [566]
Junior, João Batista Bottentuit; Coutinho, Clara & Junior, João Batista Bottentuit The use of mobile technologies in Higher Education in Portugal: an exploratory survey World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [567]
Kabisch, Thomas; Padur, Ronald & Rother, Dirk UsingWeb Knowledge to Improve the Wrapping of Web Sources Proceedings of the 22nd International Conference on Data Engineering Workshops 2006 [568]
During the wrapping of web interfaces ontological know-ledge is important in order to support an automated interpretation of information. The development of ontologies is a time consuming issue and not realistic in global contexts. On the other hand, the web provides a huge amount of knowledge, which can be used instead of ontologies. Three common classes of web knowledge sources are: Web Thesauri, search engines and Web encyclopedias. The paper investigates how Web knowledge can be utilized to solve the three semantic problems Parameter Finding for Query Interfaces, Labeling of Values and Relabeling after interface evolution. For the solution of the parameter finding problem an algorithm has been implemented using the web encyclopedia WikiPedia} for the initial identification of parameter value candidates and the search engine Google for a validation of label-value relationships. The approach has been integrated into a wrapper definition framework.
Kallis, John R. & Patti, Christine Creating an Enhanced Podcast with Section 508 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [569]
Kameyama, Shumei; Uchida, Makoto & Shirayama, Susumu A New Method for Identifying Detected Communities Based on Graph Substructure Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops 2007 [570]
Many methods have been developed that can detect community structures in complex networks. The detection methods can be classified into three groups based on their characteristic properties. In this study, the inherent features of the detection methods were used to develop a method that identifies communities extracted using a given community detection method. Initially, a common detection method is used to divide a network into communities. The communities are then identified using another detection method from adifferent class. In this paper, the community structures are first extracted from a network using the method proposed by Newman and Girvan. The extracted communities are then identified using the proposed detection method that is an extension of the vertex similarity method proposed by Leicht et al. The proposed method was used to identify communities in a blog network (blogosphere) and in a Wikipedia wordnetwork.
Kaminishi, Hidekazu & Murota, Masao Development of Multi-Screen Presentation Software World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [571]
Kapur, Manu; Hung, David; Jacobson, Michael; Voiklis, John; Kinzer, Charles K. & Victor, Chen Der-Thanq Emergence of learning in computer-supported, large-scale collective dynamics: a research agenda Proceedings of the 8th iternational conference on Computer supported collaborative learning 2007 [572]
Seen through the lens of complexity theory, past CSCL} research may largely be characterized as small-scale (i.e., small-group) collective dynamics. While this research tradition is substantive and meaningful in its own right, we propose a line of inquiry that seeks to understand computer-supported, large-scale collective dynamics: how large groups of interacting people leverage technology to create emergent organizations (knowledge, structures, norms, values, etc.) at the collective level that are not reducible to any individual, e.g., Wikipedia, online communities etc. How does learning emerge in such large-scale collectives? Understanding the interactional dynamics of large-scale collectives is a critical and an open research question especially in an increasingly participatory, inter-connected, media-convergent culture of today. Recent CSCL} research has alluded to this; we, however, develop the case further in terms of what it means for how one conceives learning, as well as methodologies for seeking understandings of how learning emerges in these large-scale networks. In the final analysis, we leverage complexity theory to advance computational agent-based models (ABMs) as part of an integrated, iteratively-validated Phenomenological-ABM} inquiry cycle to understand emergent phenomenon from the bottom up"."
Karadag, Zekeriya & McDougall, Douglas E-contests in Mathematics: Technological Challenges versus Technological Innovations World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [573]
Karakus, Turkan; Sancar, Hatice & Cagiltay, Kursat An Eye Tracking Study: The Effects of Individual Differences on Navigation Patterns and Recall Performance on Hypertext Environments World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [574]
Karlsson, Mia Teacher Educators Moving from Learning the Office Package to Learning About Digital Natives' Use of ICT World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [575]
Karsenti, Thierry; Goyer, Sophie; Villeneuve, Stephane & Raby, Carole The efficacy of eportfolios : an experiment with pupils and student teachers from Canada Society for Information Technology \& Teacher Education International Conference 2007 [576]
Karsenti, Thierry; Villeneuve, Stephane & Goyer, Sophie The Development of an Eportfolio for Student Teachers World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [577]
Kasik, Maribeth Montgomery & Kasik, Maribeth Montgomery Been there done that: emerged, evolved and ever changing face of e-learning and emerging technologies. Society for Information Technology \& Teacher Education International Conference 2008 [578]
Kasik, Maribeth Montgomery; Mott, Michael; Wasowski, Robert & Kasik, Maribeth Montgomery Cyber Bullies Among the Digital Natives and Emerging Technologies World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [579]
Keengwe, Jared Enhacing e-learning through Technology and Constructivist Pedagogy World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [580]
Kennard, Carl Differences in Male and Female Wiki Participation during Educational Group Projects World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [581]
Kennard, Carl Wiki Productivity and Discussion Forum Activity in a Postgraduate Online Distance Learning Course World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [582]
Kennedy, Ian One Encyclopedia Per Child (OEPC) in Simple English World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [583]
Ketterl, Markus & Morisse, Karsten User Generated Web Lecture Snippets to Support a Blended Learning Approach World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [584]
Khalid, Mahboob Alam & Verberne, Suzan Passage retrieval for question answering using sliding windows Proceedings of the 2nd workshop on Information Retrieval for Question Answering 2008 [585]
The information retrieval (IR) community has investigated many different techniques to retrieve passages from large collections of documents for question answering (QA).} In this paper, we specifically examine and quantitatively compare the impact of passage retrieval for QA} using sliding windows and disjoint windows. We consider two different data sets, the TREC} 2002--2003 QA} data set, and 93 why-questions against INEX} Wikipedia. We discovered that, compared to disjoint windows, using sliding windows results in improved performance of TREC-QA} in terms of TDRR, and in improved performance of Why-QA} in terms of success@n and MRR.
Kidd, Jennifer; Baker, Peter; Kaufman, Jamie; Hall, Tiffany; O'Shea, Patrick & Allen, Dwight Wikitextbooks: Pedagogical Tool for Student Empowerment Society for Information Technology \& Teacher Education International Conference 2009 [586]
Kidd, Jennifer; O'Shea, Patrick; Baker, Peter; Kaufman, Jamie & Allen, Dwight Student-authored Wikibooks: Textbooks of the Future? Society for Information Technology \& Teacher Education International Conference 2008 [587]
Kidd, Jennifer; O'Shea, Patrick; Kaufman, Jamie; Baker, Peter; Hall, Tiffany & Allen, Dwight An Evaluation of Web 2.0 Pedagogy: Student-authored Wikibook vs Traditional Textbook Society for Information Technology \& Teacher Education International Conference 2009 [588]
Kim, Daesang; Rueckert, Daniel & Hwang, Yeiseon Let’s create a podcast! Society for Information Technology \& Teacher Education International Conference 2008 [589]
Kim, Youngjun & Baek, Youngkyun Educational uses of HUD in Second Life Society for Information Technology \& Teacher Education International Conference 2010 [590]
Kimmerle, Joachim; Moskaliuk, Johannes & Cress, Ulrike Learning and knowledge building with social software Proceedings of the 9th international conference on Computer supported collaborative learning - Volume 1 2009 [591]
The progress of the Internet in recent years has led to the emergence of so-called social software. This technology concedes users a more active role in creating Web content. This has important effects both on individual learning and collaborative knowledge building. In this paper we will present an integrative framework model to describe and explain learning and knowledge building with social software on the basis of systems theoretical and equilibration theoretical considerations. This model assumes that knowledge progress emerges from cognitive conflicts that result from incongruities between an individual's prior knowledge and the information which is contained in a shared digital artifact. This paper will provide empirical support for the model by applying it to Wikipedia articles and by examining knowledge-building processes using network analyses. Finally, this paper will present a review of a series of experimental studies.
Kimmons, Royce Digital Play, Ludology, and the Future of Educational Games World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [592]
Kimmons, Royce What Does Open Collaboration on Wikipedia Really Look Like? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [593]
Kinney, Lance Evidence of Engineering Education in Virtual Worlds World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [594]
Kiran, G.V.R.; Shankar, R. & Pudi, V. Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge Knowledge-Based and Intelligent Information and Engineering Systems. 14th International Conference, KES 2010, 8-10 Sept. 2010 Berlin, Germany 2010
High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent itemsets for clustering. But, most of these algorithms neglect the semantic relationship between the words. On the other hand there are algorithms that take care of the semantic relations between the words by making use of external knowledge contained in WordNet, Mesh, Wikipedia, etc but do not handle the high dimensionality. In this paper we present an efficient solution that addresses both these problems. We propose a hierarchical clustering algorithm using closed frequent itemsets that use Wikipedia as an external knowledge to enhance the document representation. We evaluate our methods based on F-Score} on standard datasets and show our results to be better than existing approaches.
Kobayashi, Michiko Creating Wikis in the technology class: How do we use Wikis in K-12 classrooms? Society for Information Technology \& Teacher Education International Conference 2010 [595]
Koh, Elizabeth & Lim, John An Integrated Collaboration System to Manage Student Team Projects Global Learn Asia Pacific 2010 [596]
Kohlhase, Andrea MS PowerPoint Use from a Micro-Perspective World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [597]
Kohlhase, Andrea What if PowerPoint became emPowerPoint (through CPoint)? Society for Information Technology \& Teacher Education International Conference 2006 [598]
Kolias, C.; Demertzis, S. & Kambourakis, G. Design and implementation of a secure mobile wiki system Seventh IASTED International Conference on Web-Based Education, 17-19 March 2008 Anaheim, CA, USA} 2008
During the last few years wikis have emerged as one of the most popular tool shells. Wikipedia has boosted their popularity, but they also keep a significant share in e- learning, intranet-based applications such as defect tracking, requirements management, test-case management, and project portals. However, existing wiki systems cannot fully support mobile clients due to several incompatibilities that exist. On the top of that, an effective secure mobile wiki system must be lightweight enough to support low-end mobile devices having several limitations. In this paper we analyze the requirements for a novel multi-platform secure wiki implementation. XML} encryption and Signature specifications are employed to realize end-to-end confidentiality and integrity services. Our scheme can be applied selectively and only to sensitive wiki content, thus diminishing by far computational resources needed at both ends; the server and the client. To address authentication of wiki clients a simple one-way authentication and session key agreement protocol is also intro-duced. The proposed solution can be easily applied to both centralized and forthcoming P2P} wiki implementations.
Kondo, Mitsumasa; Tanaka, Akimichi & Uchiyama, Tadasu Search your interests everywhere!: wikipedia-based keyphrase extraction from web browsing history Proceedings of the 21st ACM conference on Hypertext and hypermedia 2010 [599]
This paper proposes a method that can extract user interests from the user's Web browsing history. Our method allows easy access to multiple content domains such as blogs, movies, QA} sites, etc. since the user does not need to input a separate search query in each domain/site. To extract user interests, the method first extracts candidate keyphrases from the user's web browsed documents. Second, important keyphrases obtained from a link structure analysis of Wikipedia content is extracted from the main contents of web documents. This technique is based on the idea that important keyphrases in Wikipedia are important keyphrases in the real world. Finally, keyphrases contained in the documents important to the user are set in order as user interests. An experiment shows that our method offers improvements over a conventional method and can recommend interests attractive to the user.
Koolen, Marijn; Kazai, Gabriella & Craswell, Nick Wikipedia pages as entry points for book search Proceedings of the Second ACM International Conference on Web Search and Data Mining 2009 [600]
A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Books, provide mechanisms for searchers to enter this vast knowledge space using queries as entry points. In this paper, we view Wikipedia as a summary of this world knowledge and aim to use this resource to guide users to relevant books. Thus, we investigate possible ways of using Wikipedia as an intermediary between the user's query and a collection of books being searched. We experiment with traditional query expansion techniques, exploiting Wikipedia articles as rich sources of information that can augment the user's query. We then propose a novel approach based on link distance in an extended Wikipedia graph: we associate books with Wikipedia pages that cite these books and use the link distance between these nodes and the pages that match the user query as an estimation of a book's relevance to the query. Our results show that a) classical query expansion using terms extracted from query pages leads to increased precision, and b) link distance between query and book pages in Wikipedia provides a good indicator of relevance that can boost the retrieval score of relevant books in the result ranking of a book search engine.
Kowase, Yasufumi; Kaneko, Keiichi & Ishikawa, Masatoshi A Learning System for Related Words based on Thesaurus and Image Retrievals World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [601]
Krauskopf, Karsten Developing a psychological framework for teachers’ constructive implementation of digital media in the classroom – media competence from the perspective of socio-cognitive functions of digital tools. Society for Information Technology \& Teacher Education International Conference 2009 [602]
Krishnan, S. & Bieszczad, A. SEW: the semantic Extensions to Wikipedia 2007 International Conference on Semantic Web \& Web Services (SWWS'07), 25-28 June 2007 Las Vegas, NV, USA} 2007
The Semantic Web represents the next step in the evolution of the Web. The goal of the Semantic Web initiative is to create a universal medium for data exchange where data can be shared and processed by people as well as by automated tools. The paper presents the research and implementation of an application, SEW} (Semantic} Extensions to Wikipedia), that uses the Semantic Web technologies to extract information from the user and to store the data along with the semantics. SEW} addresses the shortcomings of the existing portal, Wikipedia through its knowledge extraction and representation techniques. The paper focuses on applying SEW} to solving a problem in the real world domain.
Krotzsch, M.; Vrandecic, D. & Volkel, M. Semantic MediaWiki The Semantic Web - ISWC 2006. OTM 2006 Workshops. 5th International Semantic Web Conference, ISWC 2006. Proceedings, 5-9 Nov. 2006 Berlin, Germany 2006
Semantic MediaWiki} is an extension of MediaWiki} - a widely used wiki-engine that also powers Wikipedia. Its aim is to make semantic technologies available to a broad community by smoothly integrating them with the established usage of MediaWiki.} The software is already used on a number of productive installations world-wide, but the main target remains to establish semantic Wikipedia" as an early adopter of semantic technologies on the Web. Thus usability and scalability are as important as powerful semantic features"
Krupa, Y.; Vercouter, L.; Hubner, J.F. & Herzig, A. Trust based Evaluation of Wikipedia's Contributors Engineering Societies in the Agents World X. 10th International Workshop, ESAW 2009, 18-20 Nov. 2009 Berlin, Germany 2009 [603]
Wikipedia is an encyclopedia on which anybody can change its content. Some users, self-proclaimed patrollers" regularly check recent changes in order to delete or correct those which are ruining articles integrity. The huge quantity of updates leads some articles to remain polluted a certain time before being corrected. In this work we show how a multiagent trust model can help patrollers in their task of controlling the Wikipedia. To direct the patrollers verification towards suspicious contributors our work relies on a formalisation of Castelfranchi Falcone's social trust theory to assist them by representing their trust model in a cognitive way."
Kulathuramaiyer, Narayanan & Maurer, Hermann Current Development of Mashups in Shaping Web Applications World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [604]
Kulathuramaiyer, Narayanan; Zaka, Bilal & Helic, Denis Integrating Copy-Paste Checking into an E-Learning Ecosystem World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [605]
Kumar, Swapna Building a Learning Community using Wikis in Educational Technology Courses Society for Information Technology \& Teacher Education International Conference 2009 [606]
Kumar, Swapna Can We Model Wiki Use in Technology Courses to Help Teachers Use Wikis in their Classrooms? Society for Information Technology \& Teacher Education International Conference 2008 [607]
Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou Report of NEWS 2010 transliteration mining shared task Proceedings of the 2010 Named Entities Workshop 2010 [608]
This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS} 2010), an ACL} 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 5 groups took part in this shared task, participating in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task are published in the Shared Task White Paper {[Kumaran} et al, 2010]. We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling a diverse set of participants working in the area of transliteration mining, (ii) creating a baseline performance of transliteration mining systems in a set of diverse languages using commonly available Wikipedia data, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in mining. We believe that this shared task would complement the NEWS} 2010 transliteration generation shared task, in enabling development of practical systems with a small amount of seed data in a given pair of languages.
Kumaran, A.; Khapra, Mitesh M. & Li, Haizhou Whitepaper of NEWS 2010 shared task on transliteration mining Proceedings of the 2010 Named Entities Workshop 2010 [609]
Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality corpus ({\textasciitilde}15--25K} parallel names) between a given pair of languages. In this shared task, we focus on acquisition of such good quality names corpora in many languages, thus complementing the machine transliteration shared task that is concurrently conducted in the same NEWS} 2010 workshop. Specifically, this task focuses on mining the Wikipedia paired entities data (aka, inter-wiki-links) to produce high-quality transliteration data that may be used for transliteration tasks.
Kunnath, Maria Lorna MLAKedusoln eLearnovate's Unified E-Learning Strategy For the Semantic Web World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [610]
Kupatadze, Ketevan Conducting chemistry lessons in Georgian schools with computer-educational programs (exemplificative one concrete programe) World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [611]
Kurhila, Jaakko Unauthorized" Use of Social Software to Support Formal Higher Education" World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [612]
Kutty, S.; Tran, Tien; Nayak, R. & Li, Yuefeng Clustering XML documents using frequent subtrees Advances in Focused Retrieval. 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, 15-18 Dec. 2008 Berlin, Germany 2009
This paper presents an experimental study conducted over the INEX} 2008 Document Mining Challenge corpus using both the structure and the content of XML} documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML} documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX} 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.
Lahti, Lauri Guided Generation of Pedagogical Concept Maps from the Wikipedia World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [613]
Lahti, L. Educational tool based on topology and evolution of hyperlinks in the Wikipedia 2010 IEEE 10th International Conference on Advanced Learning Technologies (ICALT 2010), 5-7 July 2010 Los Alamitos, CA, USA} 2010 [614]
We propose a new method to support educational exploration in the hyperlink network of the Wikipedia online encyclopedia. The learner is provided with alternative parallel ranking lists, each one promoting hyperlinks that represent a different pedagogical perspective to the desired learning topic. The learner can browse the conceptual relations between the latest versions of articles or the conceptual relations belonging to consecutive temporal versions of an article, or a mixture of both approaches. Based on her needs and intuition, the learner explores hyperlink network and meanwhile the method builds automatically concept maps that reflect her conceptualization process and can be used for varied educational purposes. Initial experiments with a prototype tool based on the method indicate enhancement to ordinary learning results and suggest further research.
Lai, Alice An Examination of Technology-Mediated Feminist Consciousness-raising in Art Education Society for Information Technology \& Teacher Education International Conference 2010 [615]
Lapadat, Judith; Atkinson, Maureen & Brown, Willow The Electronic Lives of Teens: Negotiating Access, Producing Digital Narratives, and Recovering From Internet Addiction World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [616]
Lara, Sonia & Naval, Concepción Educative proposal of web 2.0 for the encouragement of social and citizenship competence World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [617]
Larson, M.; Newman, E. & Jones, G.J.F. Overview of VideoCLEF 2009: new perspectives on speech-based multimedia content enrichment Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [618]
VideoCLEF} 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language} television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the Beeldenstorm"} collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes elevated speaking pitch increased speaking intensity and radical visual changes. The Linking Task also called {'Finding} Related Resources Across Languages involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language Beeldenstorm"} collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback query translation and methods that targeted proper names."
Lau, C.; Tjondronegoro, D.; Zhang, J.; Geva, S. & Liu, Y. Fusing visual and textual retrieval techniques to effectively search large collections of Wikipedia images Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007
This paper presents an experimental study that examines the performance of various combination techniques for content-based image retrieval using a fusion of visual and textual search results. The evaluation is comprehensively benchmarked using more than 160,000 samples from INEX-MM2006} images dataset and the corresponding XML} documents. For visual search, we have successfully combined Hough transform, object's color histogram, and texture (H.O.T).} For comparison purposes, we used the provided UvA} features. Based on the evaluation, our submissions show that Uva+Text} combination performs most effectively, but it is closely followed by our {H.O.T-} (visual only) feature. Moreover, {H.O.T+Text} performance is still better than UvA} (visual) only. These findings show that the combination of effective text and visual search results can improve the overall performance of CBIR} in Wikipedia collections which contain a heterogeneous (i.e. wide) range of genres and topics.
Leake, David & Powell, Jay Mining Large-Scale Knowledge Sources for Case Adaptation Knowledge Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development 2007 [619]
Making case adaptation practical is a longstanding challenge for case-based reasoning. One of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck: the adaptation process may require extensive domain knowledge, which may be difficult or expensive for system developers to provide. This paper advances a new approach to addressing this problem, proposing that systems mine their adaptation knowledge as needed from pre-existing large-scale knowledge sources available on the World Wide Web. The paper begins by discussing the case adaptation problem, opportunities for adaptation knowledge mining, and issues for applying the approach. It then presents an initial illustration of the method in a case study of the testbed system WebAdapt.} WebAdapt} applies the approach in the travel planning domain, using OpenCyc, Wikipedia, and the Geonames GIS} database as knowledge sources for generating substitutions. Experimental results suggest the promise of the approach, especially when information from multiple sources is combined.
Lee, Jennifer Fads and Facts in Technology-Based Learning Environments Society for Information Technology \& Teacher Education International Conference 2009 [620]
Lee, Stella & Dron, Jon Giving Learners Control through Interaction Design World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [621]
Lee, Zeng-Han Attitude Changes Toward Applying Technology (A case study of Meiho Institute of Technology in Taiwan) Society for Information Technology \& Teacher Education International Conference 2008 [622]
Lemay, Philippe Game and flow concepts for learning: some considerations Society for Information Technology \& Teacher Education International Conference 2008 [623]
Leong, Peter; Joseph, Samuel; Ho, Curtis & Fulford, Catherine Learning to learn in a virtual world: An exploratory qualitative study Global Learn Asia Pacific 2010 [624]
Li, Haizhou & Kumaran, A. Proceedings of the 2010 Named Entities Workshop 2010 [625]
Named Entities play a significant role in Natural Language Processing and Information Retrieval. While identifying and analyzing named entities in a given natural language is a challenging research problem by itself, the phenomenal growth in the Internet user population, especially among the Non-English} speaking parts of the world, has extended this problem to the crosslingual arena. We specifically focus on research on all aspects of the Named Entities in our workshop series, Named Entities WorkShop} (NEWS).} The first of the NEWS} workshops (NEWS} 2009) was held as a part of ACL-IJCNLP} 2009 conference in Singapore, and the current edition (NEWS} 2010) is being held as a part of ACL} 2010, in Uppsala, Sweden. The purpose of the NEWS} workshop is to bring together researchers across the world interested in identification, analysis, extraction, mining and transformation of named entities in monolingual or multilingual natural language text. The workshop scope includes many interesting specific research areas pertaining to the named entities, such as, orthographic and phonetic characteristics, corpus analysis, unsupervised and supervised named entities extraction in monolingual or multilingual corpus, transliteration modelling, and evaluation methodologies, to name a few. For this years edition, 11 research papers were submitted, each of which was reviewed by at least 3 reviewers from the program committee. 7 papers were chosen for publication, covering main research areas, from named entities recognition, extraction and categorization, to distributional characteristics of named entities, and finally a novel evaluation metrics for co-reference resolution. All accepted research papers are published in the workshop proceedings. This year, as parts of the NEWS} workshop, we organized two shared tasks: one on Machine Transliteration Generation, and another on Machine Transliteration Mining, participated by research teams from around the world, including industry, government laboratories and academia. The transliteration generation task was introduced in NEWS} 2009. While the focus of the 2009 shared task was on establishing the quality metrics and on baselining the transliteration quality based on those metrics, the 2010 shared task expanded the scope of the transliteration generation task to about dozen languages, and explored the quality depending on the direction of transliteration, between the languages. We collected significantly large, hand-crafted parallel named entities corpora in dozen different languages from 8 language families, and made available as common dataset for the shared task. We published the details of the shared task and the training and development data six months ahead of the conference that attracted an overwhelming response from the research community. Totally 7 teams participated in the transliteration generation task. The approaches ranged from traditional unsupervised learning methods (such as, Phrasal SMT-based, Conditional Random Fields, etc.) to somewhat unique approaches (such as, DirectTL} approach), combined with several model combinations for results re-ranking. A report of the shared task that summarizes all submissions and the original whitepaper are also included in the proceedings, and will be presented in the workshop. The participants in the shared task were asked to submit short system papers (4 pages each) describing their approach, and each of such papers was reviewed by at least two members of the program committee to help improve the quality of the content and presentation of the papers. 6 of them were finally accepted to be published in the workshop proceedings (one participating team did not submit their system paper in time). NEWS} 2010 also featured a second shared task this year, on Transliteration Mining; in this shared task we focus specifically on mining transliterations from the commonly available resource Wikipedia titles. The objective of this shared task is to identify transliterations from linked Wikipedia titles between English and another language in a Non-Latin} script. 5 teams participated in the mining task, each participating in multiple languages. The shared task was conducted in 5 language pairs, and the paired Wikipedia titles between English and each of the languages was provided to the participants. The participating systems output was measured using three specific metrics. All the results are reported in the shared task report. We hope that NEWS} 2010 would provide an exciting and productive forum for researchers working in this research area. The technical programme includes 7 research papers and 9 system papers (3 as oral papers, and 6 as poster papers) to be presented in the workshop. Further, we are pleased to have Dr Dan Roth, Professor at University of Illinois and The Beckman Institute, delivering the keynote speech at the workshop.
Li, Yun; Tian, Fang; Ren, F.; Kuroiwa, S. & Zhong, Yixin A method of semantic dictionary construction from online encyclopedia classifications 2007 IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE '07), 30 Aug.-1 Sept. 2007 Piscataway, NJ, USA} 2007
This paper introduces a method of constructing a semantic dictionary automatically from the keywords and classify relations of the web encyclopedia Chinese WikiPedia.} Semantic units, which are affixes (core/modifier) shared between many phrased-keywords, are selected using statistic method and string affix matching, also with other units to explain the semantic meanings. Then the result are used to mark the semantic explanations for most WikiPedia} keywords by analyzing surface text or upper classes. The feature form ' structure or advantages comparing to other semantic resource are also concerned.
Liao, Ching-Jung & Sun, Cheng-Chieh A RIA-Based Collaborative Learning System for E-Learning 2.0 World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [626]
Liao, Ching-Jung & Yang, Jin-Tan The Development of a Pervasive Collaborative LMS 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [627]
Lim, Keol & Park, So Youn An Exploratory Approach to Understanding the Purposes of Computer and Internet Use in Web 2.0 Trends World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [628]
Lim, Ee-Peng; Vuong, Ba-Quy; Lauw, Hady Wirawan & Sun, Aixin Measuring Qualities of Articles Contributed by Online Communities Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2006 [629]
Using open source Web editing software (e.g., wiki), online community users can now easily edit, review and publish articles collaboratively. While much useful knowledge can be derived from these articles, content users and critics are often concerned about their qualities. In this paper, we develop two models, namely basic model and peer review model, for measuring the qualities of these articles and the authorities of their contributors. We represent collaboratively edited articles and their contributors in a bipartite graph. While the basic model measures an article's quality using both the authorities of contributors and the amount of contribution from each contributor, the peer review model extends the former by considering the review aspect of article content. We present results of experiments conducted on some Wikipedia pages and their contributors. Our result show that the two models can effectively determine the articles' qualities and contributors' authorities using the collaborative nature of online communities.
Lin, Hong & Kelsey, Kathleen Do Traditional and Online Learning Environments Impact Collaborative Learning with Wiki? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [630]
Lin, S. College students' perceptions, motivations and uses of Wikipedia Proceedings of the American Society for Information Science and Technology 2008 [631]
Lin, Chun-Yi Integrating wikis to support collaborative learning in higher education: A design-based approach to developing the instructional theory World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [632]
Lin, Chun-Yi & Lee, Hyunkyung Adult Learners' Motivations in the Use of Wikis: Wikipedia, Higher Education, and Corporate Settings World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [633]
Lin, Chun-Yi; Lee, Lena & Bonk, Curtis Teaching Innovations on Wikis: Practices and Perspectives of Early Childhood and Elementary School Teachers World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [634]
Lin, Meng-Fen Grace; Sajjapanroj, Suthiporn & Bonk, Curtis Wikibooks and Wikibookians: Loosely-Coupled Community or the Future of the Textbook Industry? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [635]
Lindroth, Tomas & Lundin, Johan Students with laptops – the laptop as portfolio Society for Information Technology \& Teacher Education International Conference 2010 [636]
Linser, Roni; Ip, Albert; Rosser, Elizabeth & Leigh, Elyssebeth On-line Games, Simulations \& Role-plays as Learning Environments: Boundary and Role Characteristics World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [637]
Lisk, Randy & Brown, Victoria Digital Paper: The Possibilities Society for Information Technology \& Teacher Education International Conference 2009 [638]
Liu, Leping & Maddux, Cleborne Online Publishing: A New Online Journal on “Social Media in Education‿ Society for Information Technology \& Teacher Education International Conference 2009 [639]
Liu, Min; Hamilton, Kurstin & Wivagg, Jennifer Facilitating Pre-Service Teachers’ Understanding of Technology Use With Instructional Activities Society for Information Technology \& Teacher Education International Conference 2010 [640]
Liu, Sandra Shu-Chao & Lin, Elaine Mei-Ying Using the Internet in Developing Taiwanese Students' English Writing Abilities World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [641]
Liu, Xiongyi; Li, Lan & Vonderwell, Selma Digital Ink-Based Engaged Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [642]
Liu, X.; Qin, J.; Chen, M. & Park, J.-H. Automatic semantic mapping between query terms and controlled vocabulary through using WordNet and Wikipedia Proceedings of the American Society for Information Science and Technology 2008 [643]
Livingston, Michael; Strickland, Jane & Moulton, Shane Decolonizing Indigenous Web Sites World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [644]
Livne, Nava; Livne, Oren & Wight, Charles Automated Error Analysis through Parsing Mathematical Expressions in Adaptive Online Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [645]
Llorente, A.; Motta, E. & Ruger, S. Exploring the Semantics behind a Collection to Improve Automated Image Annotation Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [646]
The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet} or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF} competition, a subset of the MIR-Flickr} 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet} and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance.
Lopes, António; Pires, Bruno; Cardoso, Márcio; Santos, Arnaldo; Peixinho, Filipe; Sequeira, Pedro & Morgado, Leonel System for Defining and Reproducing Handball Strategies in Second Life On-Demand for Handball Coaches’ Education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [647]
Lopes, Rui & Carriço, Luis On the credibility of wikipedia: an accessibility perspective Proceeding of the 2nd ACM workshop on Information credibility on the web 2008 [648]
User interfaces play a critical role on the credibility of authoritative information sources on the Web. Citation and referencing mechanisms often provide the required support for the independent verifiability of facts and, consequently, influence the credibility of the conveyed information. Since the quality level of these references has to be verifiable by users without any barriers, user interfaces cannot pose problems on accessing information. This paper presents a study about the influence of accessibility of user interfaces on the credibility of Wikipedia articles. We have analysed the accessibility quality level of the articles and the external Web pages used as authoritative references. This study has shown that there is a discrepancy on the accessibility of referenced Web pages, which can compromise the overall credibility of Wikipedia. Based on these results, we have analysed the article referencing lifecycle (technologies and policies) and propose a set of improvements that can help increasing the accessibility of references within Wikipedia articles.
Lopes, Rui & Carriço, Luís The impact of accessibility assessment in macro scale universal usability studies of the web Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A) 2008 [649]
This paper presents a modelling framework, Web Interaction Environments, to express the synergies and differences of audiences, in order to study universal usability of the Web. Based on this framework, we have expressed the implicit model of WCAG} and developed an experimental study to assess the Web accessibility quality of Wikipedia at a macro scale. This has resulted on finding out that template mechanisms such as those provided by Wikipedia lower the burden of producing accessible contents, but provide no guarantee that hyperlinking to external websites maintain accessibility quality. We discuss the black-boxed nature of guidelines such as WCAG} and how formalising audiences helps leveraging universal usability studies of the Web at macro scales.
Lopez, Patrice & Romary, Laurent HUMB: Automatic key term extraction from scientific articles in GROBID Proceedings of the 5th International Workshop on Semantic Evaluation 2010 [650]
The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's} facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP} and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of cousage of keywords in {HAL, a large Open Access publication repository.
Lops, P.; Basile, P.; de Gemmis, M. & Semeraro, G. Language Is the Skin of My Thought: Integrating Wikipedia and AI to Support a Guillotine Player AI*IA 2009: Emergent Perspectives in Artificial Intelligence. Xlth International Conference of the Italian Association for Artificial Intelligence, 9-12 Dec. 2009 Berlin, Germany 2009 [651]
This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game, called Guillotine, which demands knowledge covering a broad range of topics, such as movies, politics, literature, history, proverbs, and popular culture. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The paper describes the process of modeling these sources and the reasoning mechanism to find the solution of the game. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Experiments carried out showed promising results. Our feeling is that the presented approach has a great potential for other more practical applications besides solving a language game.
Lotzmann, U. Enhancing agents with normative capabilities 24th European Conference on Modelling and Simulation, ECMS 2010, 1-4 June 2010 Nottingham, UK} 2010
This paper describes the derivation of a software architecture (and its implementation called EMIL-S) from a logical normative agent architecture (called EMIL-A).} After a short introduction into the theoretical background of agent-based normative social simulation, the paper focuses on intra-agent structures and processes. The pivotal element in this regard is a rule-based agent design with a corresponding generalised intra-agent process" that involves decision making and learning capabilities. The resulting simulation dynamics are illustrated afterwards by means of an application sample where agents contribute to a Wikipedia community by writing editing and discussing articles. Findings and material presented in the paper are part of the results achieved in the FP6} project EMIL} (EMergence} In the Loop: Simulating the two-way dynamics of norm innovation)."
Louis, Ellyn St; McCauley, Pete; Breuch, Tyler; Hatten, Jim & Louis, Ellyn St Artscura: Experiencing Art Through Art World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [652]
Lowerison, Gretchen & Schmid, Richard F Pedagogical Implications of Using Learner-Controlled, Web-based Tools for Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [653]
Lu, Jianguo; Wang, Yan; Liang, Jie; Chen, Jessica & Liu, Jiming An Approach to Deep Web Crawling by Sampling Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2008 [654]
Crawling deep web is the process of collecting data from search interfaces by issuing queries. With wide availability of programmable interface encoded in web services, deep web crawling has received a large variety of applications. One of the major challenges crawling deep web is the selection of the queries so that most of the data can be retrieved at a low cost. We propose a general method in this regard. In order to minimize the duplicates retrieved, we reduced the problem of selecting an optimal set of queries from a sample of the data source into the well-known set-covering problem and adopt a classical algorithm to resolve it. To verify that the queries selected from a sample also produce a good result for the entire data source, we carried out a set of experiments on large corpora including Wikipedia and Reuters. We show that our sampling-based method is effective by empirically proving that 1) The queries selected from samples can harvest most of the data in the original database; 2) The queries with low overlapping rate in samples will also result in a low overlapping rate in the original database; and 3) The size of the sample and the size of the terms from where to select the queries do not need to be very large.
Lu, Laura Digital Divide: Does the Internet Speak Your Language? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [655]
Lucassen, Teun & Schraagen, Jan Maarten Trust in wikipedia: how users trust information from an unknown source Proceedings of the 4th workshop on Information credibility 2010 [656]
The use of Wikipedia as an information source is becoming increasingly popular. Several studies have shown that its information quality is high. Normally, when considering information trust, the source of information is an important factor. However, because of the open-source nature of Wikipedia articles, their sources remain mostly unknown. This means that other features need to be used to assess the trustworthiness of the articles. We describe article features - such as images and references - which lay Wikipedia readers use to estimate trustworthiness. The quality and the topics of the articles are manipulated in an experiment to reproduce the varying quality on Wikipedia and the familiarity of the readers with the topics. We show that the three most important features are textual features, references and images.
Lund, Andreas & Rasmussen, Ingvill Tasks 2.0: Education Meets Social Computing and Mass Collaboration Society for Information Technology \& Teacher Education International Conference 2010 [657]
Luther, Kurt Supporting and transforming leadership in online creative collaboration Proceedings of the ACM 2009 international conference on Supporting group work 2009 [658]
Behind every successful online creative collaboration, from Wikipedia to Linux, is at least one effective project leader. Yet, we know little about what such leaders do and how technology supports or inhibits their work. My thesis investigates leadership in online creative collaboration, focusing on the novel context of animated movie-making. I first conducted an empirical study of existing leadership practices in this context. I am now designing a Web-based collaborative system, Sandbox, to understand the impact of technological support for centralized versus decentralized leadership in this context. My expected contributions include a comparative investigation of the effects of different types of leadership on online creative collaboration, and a set of empirically validated design principles for supporting leadership in online creative collaboration.
Luyt, B.; Kwek, Wee Tin; Sim, Ju Wei & York, Peng Evaluating the comprehensiveness of wikipedia: the case of biochemistry Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. 10th International Conference on Asian Digital Libraries, ICADL 2007, 10-13 Dec. 2007 Berlin, Germany 2007
In recent years, the world of encyclopedia publishing has been challenged as new collaborative models of online information gathering and sharing have developed. Most notable of these is Wikipedia. Although Wikipedia has a core group of devotees, it has also attracted critical comment and concern, most notably in regard to its quality. In this article we compare the scope of Wikipedia and Encyclopedia Britannica in the subject of biochemistry using a popular first year undergraduate textbook as a benchmark for concepts that should appear in both works, if they are to be considered comprehensive in scope.
Lykourentzou, Ioanna; Vergados, Dimitrios J. & Loumos, Vassili Collective intelligence system engineering Proceedings of the International Conference on Management of Emergent Digital EcoSystems 2009 [659]
Collective intelligence (CI) is an emerging research field which aims at combining human and machine intelligence, to improve community processes usually performed by large groups. CI} systems may be collaborative, like Wikipedia, or competitive, like a number of recently established problem-solving companies that attempt to find solutions to difficult R\&D} or marketing problems drawing on the competition among web users. The benefits that CI} systems earn user communities, combined with the fact that they share a number of basic common characteristics, open up the prospect for the design of a general methodology that will allow the efficient development and evaluation of CI.} In the present work, an attempt is made to establish the analytical foundations and main challenges for the design and construction of a generic collective intelligence system. First, collective intelligence systems are categorized into active and passive and specific examples of each category are provided. Then, the basic modeling framework of CI} systems is described. This includes concepts such as the set of possible user actions, the CI} system state and the individual and community objectives. Additional functions, which estimate the expected user actions, the future state of the system, as well as the level of objective fulfillment, are also established. In addition, certain key issues that need to be considered prior to system launch are also described. The proposed framework is expected to promote efficient CI} design, so that the benefit gained by the community and the individuals through the use of CI} systems, will be maximized.
Mach, Nada Gaming, Learning 2.0, and the Digital Divide World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [660]
Mach, Nada Reorganizing Schools to Engage Learners through Using Learning 2.0 World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [661]
Mach, Nada & Bhattacharya, Madhumita Social Learning Versus Individualized Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [662]
MacKenzie, Kathleen Distance Education Policy: A Study of the SREB Faculty Support Policy Construct at Four Virtual College and University Consortia. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [663]
Maddux, Cleborne; Johnson, Lamont & Ewing-Taylor, Jacque An Annotated Bibliography of Outstanding Educational Technology Sites on the Web: A Study of Usefulness and Design Quality Society for Information Technology \& Teacher Education International Conference 2006 [664]
Mader, Elke; Budka, Philipp; Anderl, Elisabeth; Stockinger, Johann & Halbmayer, Ernst Blended Learning Strategies for Methodology Education in an Austrian Social Science Setting World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [665]
Malik, Manish Work In Progress: Use of Social Software for Final Year Project Supervision at a Campus Based University World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [666]
Malyn-Smith, Joyce; Coulter, Bob; Denner, Jill; Lee, Irene; Stiles, Joel & Werner, Linda Computational Thinking in K-12: Defining the Space Society for Information Technology \& Teacher Education International Conference 2010 [667]
Manfra, Meghan; Friedman, Adam; Hammond, Thomas & Lee, John Peering behind the curtain: Digital history, historiography, and secondary social studies methods Society for Information Technology \& Teacher Education International Conference 2009 [668]
Marenzi, Ivana; Demidova, Elena & Nejdl, Wolfgang LearnWeb 2.0 - Integrating Social Software for Lifelong Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [669]
Margaryan, Anoush; Nicol, David; Littlejohn, Allison & Trinder, Kathryn Students’ use of technologies to support formal and informal learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [670]
Martin, Philippe; Eboueya, Michel; Blumenstein, Michael & Deer, Peter A Network of Semantically Structured Wikipedia to Bind Information World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [671]
Martin, Sylvia S. & Crawford, Caroline M. Special Education Methods Coursework: Information Literacy for Teachers through the Implementation of Graphic Novels Society for Information Technology \& Teacher Education International Conference 2007 [672]
Martinez-Cruz, C. & Angeletou, S. Folksonomy expansion process using soft techniques 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), 10-12 Aug. 2010 Piscataway, NJ, USA} 2010 [673]
The use of folksonomies involves several problems due to its lack of semantics associated with them. The nature of these structures makes difficult the process to enrich them semantically by the association of meaningful terms of the Semantic Web. This task implies a phase of disambiguation and another of expansion of the initial tagset, returning an increased contextualised set where synonyms, hyperonyms, gloss terms, etc. are part of it. In this novel proposal a technique based on confidence and similarity degrees is applied to weight this extended tagset in order to allow the user to obtain a customised resulting tagset. Moreover a comparision between the two main thesaurus, WordNet} and Wikipedia, are presented due to their great influence in the disambiguation and expansion process.
Martland, David The Development of Web/Learning Communities: Is Technology the Way Forward? Society for Information Technology \& Teacher Education International Conference 2004 [674]
Martland, David E-learning: What communication tools does it require? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2003 [675]
Mass, Y. IBM HRL at INEX 06 Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007
In previous INEX} years we presented an XML} component ranking algorithm that was based on separation of nested XML} elements to different indices. This worked fine for the IEEE} collection which has a small number of potential component types that can be returned as query results. However, such an assumption doesn't scale to this year Wikipedia collection where there is a large set of potential component types that can be returned. We show a new version of the component ranking algorithm that does not assume any knowledge on the set of component types. We then show some preliminary work we did to exploit the connectivity of the Wikipedia collection to improve ranking.
Matsuno, Ryoji; Tsutsumi, Yutaka; Matsuo, Kanako & Gilbert, Richard MiWIT: Integrated ESL/EFL Text Analysis Tools for Content Creation in MSWord World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [676]
Matthew, Kathryn & Callaway, Rebecca Wiki as a Collaborative Learning Tool World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [677]
Matthew, Kathryn; Callaway, Rebecca; Matthew, Christie & Matthew, Josh Online Solitude: A Lack of Student Interaction World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [678]
Matthew, Kathryn; Felvegi, Emese & Callaway, Rebecca Collaborative Learning Using a Wiki Society for Information Technology \& Teacher Education International Conference 2009 [679]
Maurer, Hermann & Kulathuramaiyer, Narayanan Coping With the Copy-Paste-Syndrome World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [680]
Maurer, Hermann & Safran, Christian Beyond Wikipedia World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [681]
Maurer, Hermann & Schinagl, Wolfgang E-Quiz - A Simple Tool to Enhance Intra-Organisational Knowledge Management eLearning and Edutainment Training World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [682]
Maurer, Hermann & Schinagl, Wolfgang Wikis and other E-communities are Changing the Web World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [683]
Maurer, Hermann & Zaka, Bilal Plagiarism - A Problem And How To Fight It World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [684]
McCulloch, Allison & Smith, Ryan The Nature of Students’ Collaboration in the Creation of a Wiki Society for Information Technology \& Teacher Education International Conference 2009 [685]
McCulloch, Allison; Smith, Ryan; Wilson, P. Holt; McCammon, Lodge; Stein, Catherine & Arias, Cecilia Creating Asynchronous Learning Communities in Mathematics Teacher Education, Part 2 Society for Information Technology \& Teacher Education International Conference 2009 [686]
McDonald, Roger Using the Secure Wiki for Teaching Scientific Collaborative World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [687]
McGee, Patricia; Carmean, Colleen; Rauch, Ulrich; Noakes, Nick & Lomas, Cyprien Learning in a Virtual World, Part 2 World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [688]
McKay, Sean Wiki as CMS Society for Information Technology \& Teacher Education International Conference 2005 [689]
McKay, Sean & Headley, Scot Best Practices for the Use of Wikis in Teacher Education Programs Society for Information Technology \& Teacher Education International Conference 2007 [690]
McLoughlin, Catherine & Lee, Mark J.W. Listen and learn: A systematic review of the evidence that podcasting supports learning in higher education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [691]
McNeil, Sara; White, Cameron; Angela, Miller & Behling, Debbie Emerging Web 2.0 Technologies to Enhance Teaching and Learning in American History Classrooms Society for Information Technology \& Teacher Education International Conference 2009 [692]
Mehdad, Yashar; Moschitti, Alessandro & Zanzotto, Fabio Massimo Syntactic/semantic structures for textual entailment recognition HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [693]
In this paper, we describe an approach based on off-the-shelf parsers and semantic resources for the Recognizing Textual Entailment (RTE) challenge that can be generally applied to any domain. Syntax is exploited by means of tree kernels whereas lexical semantics is derived from heterogeneous resources, e.g. WordNet} or distributional semantics through Wikipedia. The joint syntactic/semantic model is realized by means of tree kernels, which can exploit lexical related-ness to match syntactically similar structures, i.e. whose lexical compounds are related. The comparative experiments across different RTE} challenges and traditional systems show that our approach consistently and meaningfully achieves high accuracy, without requiring any adaptation or tuning.
Meijer, Erik Fooled by expediency, saved by duality: how i denied the fallacies of distributed programming and trivialized the CAP theorem, but found the truth in math Proceedings of the 2010 Workshop on Analysis and Programming Languages for Web Applications and Cloud Applications 2010 [694]
Serendipitously, I recently picked up a copy of the book Leadership} and Self-Deception"} at my local thrift store. According to Wikipedia {"Self-deception} is a process of denying or rationalizing away the relevance significance or importance of opposing evidence and logical argument." While reading the book it occurred to me that I unknowingly minimized and repressed the fallacies of distributed programming and the CAP} theorem in my own work on {"Democratizing} the Cloud". Instead redirecting the forces that distribution imposes on the design of systems to create the simplest possible and correct solution I foolishly tried to attack them directly thereby making things more difficult than necessary and of course ultimately fail. Fortunately I found redemption for my sins by turning to math. By judicious use of categorical duality literally reversing the arrows we show how scalable is the dual of non-scalable. The result is a scalable and compositional approach to building distributed systems that we believe is so simple it can be applied by the average developer."
Memmel, Martin; Wolpers, Martin & Tomadaki, Eleftheria An Approach to Enable Collective Intelligence in Digital Repositories World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [695]
Meza, R. & Buchmann, R.A. Real-time Social Networking Profile Information Semantization Using Pipes And FCA 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR 2010), 28-30 May 2010 Piscataway, NJ, USA} 2010 [696]
This paper describes a convenient method of processing and contextualizing information extracted from social networking systems such as Myspace, Facebook or Hi5 users in real-time by using the Yahoo! Pipes feed mash-up service and formal concept analysis. Interests referring to media consumption (favorite movies, favorite music, favorite books or role models) declared by users can be expanded into machine-readable semantic information in conjunction with databases available on the World Wide Web (imdb.com, allmusic.com, amazon.com or wikipedia.org).
Millard, Mark & Essex, Christopher Web 2.0 Technologies for Social and Collaborative E-Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [697]
Milne, David; Medelyan, Olena & Witten, Ian H. Mining Domain-Specific Thesauri from Wikipedia: A Case Study Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2006 [698]
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts.
Min, Jinming; Wilkins, P.; Leveling, J. & Jones, G.J.F. Document Expansion for Text-based Image Retrieval at CLEF 2009 Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010
In this paper, we describe and analyze our participation in the WikipediaMM} task at CLEF} 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection DBpedia.} In our experiments, we use the Okapi feedback algorithm for document expansion. Compared with our text retrieval baseline, our best document expansion RUN} improves MAP} by 17.89\%. As one of our conclusions, document expansion from external resource can play an effective factor in the image metadata retrieval task.
Missen, Malik Muhammad Saad; Boughanem, Mohand & Cabanac, Guillaume Using passage-based language model for opinion detection in blogs Proceedings of the 2010 ACM Symposium on Applied Computing 2010 [699]
In this work, we evaluate the importance of Passages in blogs especially when we are dealing with the task of Opinion Detection. We argue that passages are basic building blocks of blogs. Therefore, we use Passage-Based} Language Modeling approach as our approach for Opinion Finding in Blogs. Our decision to use Language Modeling (LM) in this work is totally based on the performance LM} has given in various Opinion Detection Approaches. In addition to this, we propose a novel method for bi-dimensional Query Expansion with relevant and opinionated terms using Wikipedia and Relevance-Feedback} mechanism respectively. We also compare the impacts of two different query terms weighting (and ranking) approaches on final results. Besides all this, we also compare the performance of three Passage-based document ranking functions (Linear, Avg, Max). For evaluation purposes, we use the data collection of TREC} Blog06 with 50 topics of TREC} 2006 over TREC} provided best baseline with opinion finding MAP} of 0.3022. Our approach gives a MAP} improvement of almost 9.29\% over best TREC} provided baseline (baseline4).
Mitsuhara, Hiroyuki; Kanenishi, Kazuhide & Yano, Yoneo Learning Process Sharing for Educational Modification of the Web World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [700]
Moan, Michael Special student contest on a collective intelligence challenge problem Proceedings of the 2009 conference on American Control Conference 2009 [701]
Students and observers are cordially invited to join a Student Special Session on Thursday afternoon concerning a Collective} Intelligence Challenge Problem" (snacks and coffee provided.) From Wikipedia {"Collective} intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals." Sign up at the registration desk beginning on Tuesday for speaking time slots (5 minutes) and the option to present two power point slides in a speed session focused on using collective intelligence to understand possible research areas for collaborative control system engineering. For instance come give us your thoughts on how we can better organize and disseminate controls knowledge control algorithm objects and control system building blocks within the open cyber world? How do we solve the problem that control system engineers within industry are overwhelmed by the amount of controls related information available through cyber discovery as even a simple search on "control system" gives over 1 billion hits! As control system engineers how should we organize the knowledge within our area of engineering to facilitate expedient development of control systems in an increasingly systems-of-systems world? Please feel free to share this invitation with your colleagues. There is no fee or peer review for this session and special session participants will receive a token of appreciation for participating. Registration will be accepted on a first-in first-serve basis until all the available time slots are taken. To register please stop by the registration desk on Tuesday or Wednesday."
Moran, John Mashups - the Web's Collages Society for Information Technology \& Teacher Education International Conference 2008 [702]
Morneau, Maxime & Mineau, Guy W. Employing a Domain Specific Ontology to Perform Semantic Search Proceedings of the 16th international conference on Conceptual Structures: Knowledge Visualization and Reasoning 2008 [703]
Increasing the relevancy of Web search results has been a major concern in research over the last years. Boolean search, metadata, natural language based processing and various other techniques have been applied to improve the quality of search results sent to a user. Ontology-based methods were proposed to refine the information extraction process but they have not yet achieved wide adoption by search engines. This is mainly due to the fact that the ontology building process is time consuming. An all inclusive ontology for the entire World Wide Web might be difficult if not impossible to construct, but a specific domain ontology can be automatically built using statistical and machine learning techniques, as done with our tool: SeseiOnto.} In this paper, we describe how we adapted the SeseiOnto} software to perform Web search on the Wikipedia page on climate change. SeseiOnto, by using conceptual graphs to represent natural language and an ontology to extract links between concepts, manages to properly answer natural language queries about climate change. Our tests show that SeseiOnto} has the potential to be used in domain specific Web search as well as in corporate intranets.
Moseley, Warren; Campbell, Brian & Campbell, Melaine OK-RMSP-2006-COP : The Oklahoma Rural Math and Science Partnership’s Community of Practice World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [704]
Moseley, Warren; Campbell, Brian; Thomason, Matt & Mengers, Jessica SMART-COP - Legitimate Peripheral Participation in the Science Math Association of Rural Teacher’s Community of Practice World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [705]
Moseley, Warren; Campbell, Brian; Thompson, Matt & Mengers, Jessica A Sense of Urgency: Linking the Tom P. Stafford Air and Space Museum to the Science and Math Association of Rural Teacher’s Community Of Practice (SMART-COP) World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [706]
Moseley, Warren & Raoufi, Mehdi ROCCA : The Rural Oklahoma Collaborative Computing Alliance Society for Information Technology \& Teacher Education International Conference 2006 [707]
Moshirnia, Andrew Am I Still Wiki? The Creeping Centralization of Academic Wikis World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [708]
Moshirnia, Andrew The Educational Implications of Synchronous and Asynchronous Peer-Tutoring in Video Games World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [709]
Moshirnia, Andrew Emergent Features and Reciprocal Innovation in Modding Communities World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [710]
Moshirnia, Andrew What do I Press? The Limited role of Collaborative Websites in Teacher Preparation Programs. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [711]
Moshirnia, Andrew & Israel, Maya The Use of Graphic Organizers within E-mentoring Wikis Society for Information Technology \& Teacher Education International Conference 2008 [712]
Motschnig, Renate & Figl, Kathrin The Effects of Person Centered Education on Communication and Community Building World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [713]
Moulin, C.; Barat, C.; Lemaitre, C.; Gery, M.; Ducottet, C. & Largeron, C. Combining Text/Image In WikipediaMM Task 2009 Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [714]
This paper reports our multimedia information retrieval experiments carried out for the ImageCLEF} Wikipedia task 2009. We extend our previous multimedia model defined as a vector of textual and visual information based on a bag of words approach. We extract additional textual information from the original Wikipedia articles and we compute several image descriptors (local colour and texture features). We show that combining linearly textual and visual information significantly improves the results.
Muller-Birn, Claudia; Lehmann, Janette & Jeschke, Sabina A Composite Calculation for Author Activity in Wikis: Accuracy Needed Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2009 [715]
Researchers of computer science and social science are increasingly interested in the Social Web and its applications. To improve existing infrastructures, to evaluate the success of available services, and to build new virtual communities and their applications, an understanding of dynamics and evolution of inherent social and informational structures is essential. One key question is how communities which exist in these applications are structured in terms of author contributions. Are there similar contribution patterns in different applications? For example, does the so called onion model revealed from open source software communities apply to Social Web applications as well? In this study, author contributions in the open content project Wikipedia are investigated. Previous studies to evaluate author contributions mainly concentrate on editing activities. Extending this approach, the added significant content and investigation of which author groups contribute the majority of content in terms of activity and significance are considered. Furthermore, the social information space is described by a dynamic collaboration network and the topic coverage of authors is analyzed. In contrast to existing approaches, the position of an author in a social network is incorporated. Finally, a new composite calculation to evaluate author contributions in Wikis is proposed. The action, the content contribution, and the connectedness of an author are integrated into one equation in order to evaluate author activity.
Murakami, Violet The Learning Community Class combining an Introduction to Digital Art class with a Hawaiian Studies Native Plants and Their Uses Class: A Case Study World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [716]
Murugeshan, M.S. & Mukherjee, S. An n-gram and initial description based approach for entity ranking track Focused Access to XML Documents. 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, 17-19 Dec. 2007 Berlin, Germany 2008 [717]
The most important work that takes the center stage in the entity ranking track of INEX} is proper query formation. Both the subtasks, namely entity ranking and list completion, would immensely benefit if the given query can be expanded with more relevant terms, thereby improving the efficiency of the search engine. This paper stresses on the correct identification of meaningful n-grams" from the given title and proper selection of the "prominent n-grams" among them as the utmost important task that improves query formation and hence improves the efficiencies of the overall entity ranking tasks. We also exploit the initial descriptions (IDES) of the Wikipedia articles for ranking the retrieved answers based on their similarities with the given topic. List completion task is further aided by the related Wikipedia articles that boosted the score of retrieved answers."
Myoupo, D.; Popescu, A.; Borgne, H. Le & Moellic, P.-A. Multimodal image retrieval over a large database Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [718]
We introduce a new multimodal retrieval technique which combines query reformulation and visual image reranking in order to deal with results sparsity and imprecision, respectively. Textual queries are reformulated using Wikipedia knowledge and results are then reordered using a K-NN} based reranking method. We compare textual and multimodal retrieval and show that introducing visual reranking results in a significant improvement of performance.
Mödritscher, Felix; Garcia-Barrios, Victor Manuel; Gütl, Christian & Helic, Denis The first AdeLE Prototype at a Glance World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [719]
Mödritscher, Felix; Garcia-Barrios, Victor Manuel & Maurer, Hermann The Use of a Dynamic Background Library within the Scope of adaptive e-Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [720]
Möller, Manuel; Regel, Sven & Sintek, Michael RadSem: Semantic Annotation and Retrieval for Medical Images Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications 2009 [721]
We present a tool for semantic medical image annotation and retrieval. It leverages the MEDICO} ontology which covers formal background information from various biomedical ontologies such as the Foundational Model of Anatomy (FMA), terminologies like ICD-10} and RadLex} and covers various aspects of clinical procedures. This ontology is used during several steps of annotation and retrieval: (1) We developed an ontology-driven metadata extractor for the medical image format DICOM.} Its output contains, {\textless}em{\textgreater}e. g.{\textless}/em{\textgreater} , person name, age, image acquisition parameters, body region, {\textless}em{\textgreater}etc{\textless}/em{\textgreater} . (2) The output from (1) is used to simplify the manual annotation by providing intuitive visualizations and to provide a preselected subset of annotation concepts. Furthermore, the extracted metadata is linked together with anatomical annotations and clinical findings to generate a unified view of a patient's medical history. (3) On the search side we perform query expansion based on the structure of the medical ontologies. (4) Our ontology for clinical data management allows us to link and combine patients, medical images and annotations together in a comprehensive result list. (5) The medical annotations are further extended by links to external sources like Wikipedia to provide additional information.
Nabende, Peter Mining transliterations from Wikipedia using pair HMMs Proceedings of the 2010 Named Entities Workshop 2010 [722]
This paper describes the use of a pair Hidden Markov Model (pair {HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair {HMM} variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identified and used in estimating transliteration similarity. The system resulted in a precision of 78\% and recall of 83\% when evaluated on a random selection of English-Russian} Wikipedia topics.
Nagler, Walther & Ebner, Martin Is Your University Ready For the Ne(x)t-Generation? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [723]
Nagler, Walther; Huber, Thomas & Ebner, Martin The ABC-eBook System - From Content Management Application to Mashup Landscape World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [724]
Najim, Najim Ussiph & Najim, Najim Ussiph VLE And Its Impact On Learning Experience Of Students: Echoes From Rural Community School In Ghana World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [725]
Nakamura, Carlos; Lajoie, Susanne & Berdugo, Gloria Do Information Systems Actually Improve Problem-Solving and Decision-Making Performance? -- An Analysis of 3 Different Approaches to the Design of Information Systems World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [726]
Nakasaki, H.; Kawaba, M.; Utsuro, T.; Fukuhara, T.; Nakagawa, H. & Kando, N. Cross-lingual blog analysis by cross-lingual comparison of characteristic terms and blog posts 2008 Second International Symposium on Universal Communication, 15-16 Dec. 2008 Piscataway, NJ, USA} 2008 [727]
The goal of this paper is to cross-lingually analyze multilingual blogs collected with a topic keyword. The framework of collecting multilingual blogs with a topic keyword is designed as the blog feed retrieval procedure. Multilingual queries for retrieving blog feeds are created from Wikipedia entries. Finally, we cross-lingually and cross-culturally compare less well known facts and opinions that are closely related to a given topic. Preliminary evaluation results support the effectiveness of the proposed framework.
Nakayama, K.; Hara, T. & Nishio, S. A thesaurus construction method from large scale Web dictionaries 21st International Conference on Advanced Information Networking and Applications (AINA '07), 21-23 May 2007 Piscataway, NJ, USA} 2007
Web-based dictionaries, such as Wikipedia, have become dramatically popular among the Internet users in past several years. The important characteristic of Web-based dictionary is not only the huge amount of articles, but also hyperlinks. Hyperlinks have various information more than just providing transfer function between pages. In this paper, we propose an efficient method to analyze the link structure of Web-based dictionaries to construct an association thesaurus. We have already applied it to Wikipedia, a huge scale Web-based dictionary which has a dense link structure, as a corpus. We developed a search engine for evaluation, then conducted a number of experiments to compare our method with other traditional methods such as cooccurrence analysis.
Nance, Kara; Hay, Brian & Possenti, Karina Communicating Computer Security Issues to K-12 Teachers and Students Society for Information Technology \& Teacher Education International Conference 2006 [728]
Naumanen, Minnamari & Tukiainen, Markku Discretionary use of computers and Internet among senior-clubbers – communication, writing and information search World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [729]
Naumanen, Minnamari & Tukiainen, Markku K-60 - Access to ICT granted but not taken for granted World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [730]
Navarro, Emmanuel; Sajous, Franck; Gaume, Bruno; Prévot, Laurent; ShuKai, Hsieh; Tzu-Yi, Kuo; Magistry, Pierre & Chu-Ren, Huang Wiktionary and NLP: improving synonymy networks Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [731]
Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential resource for Natural Language Processing. It requires however to be processed before being used efficiently as an NLP} resource. After describing the relevant aspects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We provide an in-depth study of these synonymy networks and compare them to those extracted from traditional resources. Finally, we describe two methods for semi-automatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself.
Newbury, Robert Podcasting: Beyond Fad and Into Reality Society for Information Technology \& Teacher Education International Conference 2008 [732]
Newman, David; Lau, Jey Han; Grieser, Karl & Baldwin, Timothy Automatic evaluation of topic coherence HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [733]
This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet} are patchy at best.
Nguyen, Huong; Nguyen, Thanh; Nguyen, Hoa & Freire, Juliana Querying Wikipedia documents and relationships WebDB '10 Procceedings of the 13th International Workshop on the Web and Databases 2010 [734]
Wikipedia has become an important source of information which is growing very rapidly. However, the existing infrastructure for querying this information is limited and often ignores the inherent structure in the information and links across documents. In this paper, we present a new approach for querying Wikipedia content that supports a simple, yet expressive query interfaces that allow both keyword and structured queries. A unique feature of our approach is that, besides returning documents that match the queries, it also exploits relationships among documents to return richer, multi-document answers. We model Wikipedia as a graph and cast the problem of finding answers for queries as graph search. To guide the answer-search process, we propose a novel weighting scheme to identify important nodes and edges in the graph. By leveraging the structured information available in infoboxes, our approach supports queries that specify constraints over this structure, and we propose a new search algorithm to support these queries. We evaluate our approach using a representative subset of Wikipedia documents and present results which show that our approach is effective and derives high-quality answers.
Niemann, Katja & Wolpers, Martin Real World Object Based Access to Architecture Learning Material – the MACE Experience World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [735]
Nowak, Stefanie; Llorente, Ainhoa; Motta, Enrico & Rüger, Stefan The effect of semantic relatedness measures on multi-label classification evaluation Proceedings of the ACM International Conference on Image and Video Retrieval 2010 [736]
In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures based on web-search engines, WordNet, Wikipedia and Flickr to the ontology-based score (OS) proposed in [22]. The final objective is to assess the benefit of integrating semantic distances to the OS} measure. Hence, we have evaluated them in a real case scenario: the results (73 runs) provided by 19 research teams during their participation in the ImageCLEF} 2009 Photo Annotation Task. Two experiments were conducted with a view to understand what aspect of the annotation behaviour is more effectively captured by each measure. First, we establish a comparison of system rankings brought about by different evaluation measures. This is done by computing the Kendall τ and Kolmogorov-Smirnov} correlation between the ranking of pairs of them. Second, we investigate how stable the different measures react to artificially introduced noise in the ground truth. We conclude that the distributional measures based on image information sources show a promising behaviour in terms of ranking and stability.
Nunes, Sérgio; Ribeiro, Cristina & David, Gabriel Term frequency dynamics in collaborative articles Proceedings of the 10th ACM symposium on Document engineering 2010 [737]
Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, generally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collection because it is a broad and public resource and, more important, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely revision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents - i.e. comprehensive and focused on a single topic - exhibits a rapid and steady progression towards the document's current version. The content in early versions quickly becomes very similar to the present version of the document.
O'Bannon, Blanche Using Wikis for Collaboration in Creating A Collection: Perceptions of Pre-service Teachers Society for Information Technology \& Teacher Education International Conference 2008 [738]
O'Bannon, Blanche; Baytiyeh, Hoda & Beard, Jeff Using Wikis to Create Collections of Curriculum-based Resources: Perceptions of Pre-service Teachers Society for Information Technology \& Teacher Education International Conference 2010 [739]
O'Shea, Patrick Using Voice to Provide Feedback in Online Education Society for Information Technology \& Teacher Education International Conference 2008 [740]
O'Shea, Patrick; Curry-Corcoran, Daniel; Baker, Peter; Allen, Dwight & Allen, Douglas A Student-written WikiText for a Foundations Course in Education Society for Information Technology \& Teacher Education International Conference 2007 [741]
O'Shea, Patrick; Kidd, Jennifer; Baker, Peter; Kauffman, Jaime & Allen, Dwight Studying the Credibility of a Student-Authored Textbook: Is it worth the effort for the results? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [742]
Oh, Sangchul; Kim, Sung-Wan; Choi, Yonghun & Yang, Youjung A Study on the Learning Participation and Communication Process by Learning Task Types in Wiki-Based Collaborative Learning System Global Learn Asia Pacific 2010 [743]
Ohara, Maggie & Armstrong, Robin Managing a Center for Online Testing: Challenges and Successes World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [744]
Okamoto, A.; Yokoyama, S.; Fukuta, N. & Ishikawa, H. Proposal of Spatiotemporal Data Extraction and Visualization System Based on Wikipedia for Application to Earth Science 2010 IEEE/ACIS 9th International Conference on Computer and Information Science (ICIS 2010), 18-20 Aug. 2010 Los Alamitos, CA, USA} 2010 [745]
The evolution of information technologies has simplified the handling of huge amounts of earth science data. However, since various problems exist before these data can be exploited well, new uses are required. One example is combining earth science data with spatiotemporal data on the Web. In our research, we focus on Wikipedia, which is a Web encyclopedia that has many constructional features. We report our system that extracts spatiotemporal data on Wikipedia and visualizes them.
Okike, Benjamin Distance/Flexible Learning Education in a Developing Country World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [746]
Oliver, Ron & Luca, Joe Using Mobile Technologies and Podcasts to Enhance Learning Experiences in Lecture-Based University Course Delivery World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [747]
Olson, J.F.; Howison, J. & Carley, K.M. Paying attention to each other in visible work communities: Modeling bursty systems of multiple activity streams 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA} 2010 [748]
Online work projects, from open source to wikipedia, have emerged as an important phenomenon. These communities offer exciting opportunities to investigate social processes because they leave traces of their activity over time. We argue that the rapid visibility of others' work afforded by the information systems used by these projects reaches out and attracts the attention of others who are peripherally aware of the group's online space, prompting them to begin or intensify their participation, binding separate individual streams of activity into a social entity. Previous work has suggested that for certain types of bursty social behavior (e.g. email), the frequency of the behavior is not homogeneously distributed but rather can be divided into two generative mechanisms: active sessions and passive background participation. We extend this work for the case of multiple conditionally independent streams of behavior, where each stream is characterized by these two generative mechanisms. Our model can characterized by a double-chain hidden markov model, allowing efficient inference using expectation-maximization. We apply this model to visible work communities by modeling each participant as a single stream of behavior, assessing transition probabilities between active sessions of different participants. This allows us to examine the extent to which the various members of the community are influenced by the active participation of others. Our results indicate that an active session by a participant at least triples the likelihood of another participant beginning an active session.
Olsson, Lena & Sandorf, Monica Increase the Professional Use of Digital Learning Resources among Teachers and Students Society for Information Technology \& Teacher Education International Conference 2010 [749]
Ong, Chorng-Shyong & Day, Min-Yuh An Integrated Evaluation Model Of User Satisfaction With Social Media Services 2010 IEEE International Conference on Information Reuse \& Integration (IRI 2010), 4-6 Aug. 2010 Piscataway, NJ, USA} 2010 [750]
Social media services (SMSs) have been growing rapidly in recent years, and have therefore attracted increasing attention from practitioner and researchers. Social media services refer to the online services provide users with social media applications like Youtube, Facebook, and Wikipedia. Satisfaction is an important construct and user satisfaction is critical to the successful information systems. This study integrated the expectation-confirmation theory (ECT) by introducing perceived social influence and perceived enjoyment in the development of an integrated evaluation model for studying user satisfaction and continuance intention in the context of social media services. Structural equation modeling (SEM) is used to analyze the measurement and structural model. Empirical results show that the proposed model has a good fit in terms of theoretical robustness and practical application. Our findings suggest that the key determinants of user satisfaction with social media service are confirmation, perceived social influence, and perceived enjoyment, while the outcome of user satisfaction is enhanced continuance intention.
Or-Bach, Rachel; Vale, Katie Livingston; del Alamo, Jesus & Lerman, Steven Towards a Collaboration Space for Higher Education Teachers – The Case of MIT iLab Project World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 [751]
Ouyang, John Ronghua Make Statistic Analysis Simple: Solution with a Simple Click on the Screen Society for Information Technology \& Teacher Education International Conference 2009 [752]
Overell, S.; Magalhaes, J. & Ruger, S. Forostar: a system for GIR Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
We detail our methods for generating and applying co-occurrence models for the purpose of placename disambiguation. We explain in detail our use of co-occurrence models for placename disambiguation using a model generated from Wikipedia. The presented system is split into two stages: a batch text geographic indexer and a real time query engine. Four alternative query constructions and six methods of generating a geographic index are compared. The paper concludes with a full description of future work and ways in which the system could be optimised.
Ozkan, Betul Current and future trends in Free and Open Source Software World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [753]
Ozkan, Betul & McKenzie, Barbara Open social software applications and their impact on distance education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [754]
de Pablo-Sanchez, C.; Gonzalez-Ledesma, A.; Moreno-Sandoval, A. & Vicente-Diez, M.T. MIRACLE Experiments in QA@CLEF 2006 in Spanish: Main Task, Real-Time QA and Exploratory QA Using Wikipedia (WiQA) Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
We describe the participation of MIRACLE} group in the QA} track at CLEF.} We participated in three subtasks and presented two systems that work in Spanish. The first system is a traditional QA} system and was evaluated in the main task and the Real-Time} QA} pilot. The system features improved Named Entity recognition and shallow linguistic analysis and achieves moderate performance. In contrast, results obtained in RT-QA} shows that this approach is promising to provide answers in constrained time. The second system focus in the WiQA} pilot task that aims at retrieving important snippets to complete a Wikipedia. The system uses collection link structure, cosine similarity and Named Entities to retrieve new and important snippets. Although the experiments have not been exhaustive it seems that the performance depends on the type of concept.
Padula, Marco; Reggiori, Amanda & Capetti, Giovanna Managing Collective Knowledge in the Web 3.0 Proceedings of the 2009 First International Conference on Evolving Internet 2009 [755]
Knowledge Management (KM) is one of the hottest Internet challenges influencing the design and the architecture of the infrastructures that will be accessed by the future generation. In this paper, we bridge KM} to philosophical theories to quest a theoretical foundation for the discussion, today utterly exciting, about Web’s semantics. The man has always tried to organise the knowledge he gained, using lists, encyclopaedias, libraries, etc., in order to make the consultation and the finding of information easier. Nowadays it is possible to get information from the Web, digital archives and databases, but the actual problem is linked to its interpretation, which is now possible only by human beings. The act of interpreting is peculiar for men, not for machines. At the moment there are lots of available digital tools which are presented as KM} technologies, but languages often do not discern meanings. We shall investigate the meaning of knowledge" in the digital world sustaining it with references to the Philosophy of Information and epistemology. After having provided a definition of "knowledge" suitable for the digital environment it has been extended to "collective knowledge" a very common concept in the area of global information proper to the current process of knowledge production and management. The definition is verified testing if a well-known growing phenomenon like Wikipedia can be truly regarded as a knowledge management system."
Pan, Shu-Chien & Franklin, Teresa Teacher’s Self-efficacy and the Integration of Web 2.0 Tool/Applications in K-12 Schools Society for Information Technology \& Teacher Education International Conference 2010 [756]
Panciera, Katherine; Halfaker, Aaron & Terveen, Loren Wikipedians are born, not made: a study of power editors on Wikipedia Proceedings of the ACM 2009 international conference on Supporting group work 2009 [757]
Open content web sites depend on users to produce information of value. Wikipedia is the largest and most well-known such site. Previous work has shown that a small fraction of editors {--Wikipedians} -- do most of the work and produce most of the value. Other work has offered conjectures about how Wikipedians differ from other editors and how Wikipedians change over time. We quantify and test these conjectures. Our key findings include: Wikipedians' edits last longer; Wikipedians invoke community norms more often to justify their edits; on many dimensions of activity, Wikipedians start intensely, tail off a little, then maintain a relatively high level of activity over the course of their career. Finally, we show that the amount of work done by Wikipedians and Non-Wikipedians} differs significantly from their very first day. Our results suggest a design opportunity: customizing the initial user experience to improve retention and channel new users' intense energy.
Panke, Stefanie Ingredients of Educational Portals as Infrastructures for Informal Learning Activities World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [758]
Pantola, Alexis Velarde; Pancho-Festin, Susan & Salvador, Florante Rating the raters: a reputation system for wiki-like domains Proceedings of the 3rd international conference on Security of information and networks 2010 [759]
Collaborative sites like Wikipedia allow the public to contribute contents to a particular domain to ensure a site's growth. However, a major concern with such sites is the credibility of the information posted. Malicious and lazy" authors can intentionally or accidentally contribute entries that are inaccurate. This paper presents a user-driven reputation system called Rater Rating that encourages authors to review entries in collaborative sites. It uses concepts adopted from reputation systems in mobile ad-hoc networks (MANETs) that promotes cooperation among network nodes. Rater Rating measures the overall reputation of authors based on the quality of their contribution and the "seriousness" of their ratings. Simulations were performed to verify the algorithm's potential in measuring the credibility of ratings made by various raters (i.e. good and lazy). The results show that only 1 out of 4 raters is needed to be a good rater in order to make the algorithm effective."
Papadakis, Ioannis; Stefanidakis, Michalis; Stamou, Sofia & Andreou, Ioannis A Query Construction Service for Large-Scale Web Search Engines Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 2009 [760]
Despite their wide usage, large-scale search engines are not always effective in tracing the best possible information for the user needs. There are times when web searchers spend too much time searching over a large-scale search engine. When (if) they eventually succeed in getting back the anticipated results, they often realize that their successful queries are significantly different from their initial one. In this paper, we introduce a query construction service for assisting web information seekers specify precise and un-ambiguous queries over large-scale search engines. The proposed service leverages the collective knowledge encapsulated mainly in the Wikipedia corpus and provides an intuitive GUI} via which web users can determine the se-mantic orientation of their searches before these are executed by the desired engine.
Park, Hyungsung; Baek, Youngkyun & Hwang, Jihyun The effect of learner and game variables on social problem-solving in simulation game Society for Information Technology \& Teacher Education International Conference 2009 [761]
Parton, Becky Sue; Hancock, Robert; Ennis, Willie; Fulwiler, John & Dawson, John Technology Integration Potential of Physical World Hyperlinks for Teacher Preparation Programs Society for Information Technology \& Teacher Education International Conference 2008 [762]
Pearce, Jon A System to Encourage Playful Exploration in a Reflective Environment World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [763]
Penzhorn, Cecilia & Pienaar, Heila The Academic Library as Partner in Support of Scholarship Society for Information Technology \& Teacher Education International Conference 2009 [764]
Pesenhofer, Andreas; Edler, Sonja; Berger, Helmut & Dittenbach, Michael Towards a patent taxonomy integration and interaction framework Proceeding of the 1st ACM workshop on Patent information retrieval 2008 [765]
Patent classification schemes such as the International Patent Classification maintained by the World Intellectual Property Organization are of vital importance for patent searchers, because they usually act as an entry point for the search process. We present a method for augmenting patents by assigning them to classes of a different classification scheme, i.e.{\textasciitilde}a science taxonomy derived from the Wikipedia Science Portal. For each scientific discipline contained in the portal, descriptive keywords are extracted from the linked Web pages. These keywords are used to identify relevant patents and associate them to the appropriate scientific disciplines. This ontology is part of the Patent Taxonomy Integration and Interaction Framework, which is a flexible approach allowing for the integration of different patent ontologies enabling a wide range of interaction methods.
Peterson, Rob; Verenikina, Irina & Herrington, Jan Standards for Educational, Edutainment, and Developmentally Beneficial Computer Games World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [766]
Pferdt, Frederik G. Designing Learning Environments with Social Software for the Ne(x)t Generation – New Perspectives and Implications for Effective Research Design World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [767]
Plowman, Travis Wikis As a Social Justice Environment Society for Information Technology \& Teacher Education International Conference 2007 [768]
Pohl, Margit & Wieser, Dietmar Enthusiasm or Skepticism? What Students Think about E-Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [769]
Ponzetto, Simone Paolo & Strube, Michael Extracting world and linguistic knowledge from Wikipedia Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts 2009 [770]
Access critical reviews of computing literature. Become a reviewer for Computing Reviews
Pope, Jack; Thurber, Bart & Meshkaty, Shahra The Classroom as Learning Space: Two Disciplines, Two Views. World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [771]
Popescu, Adrian & Grefenstette, Gregory Spatiotemporal mapping of Wikipedia concepts Proceedings of the 10th annual joint conference on Digital libraries 2010 [772]
Space and time are important dimensions in the representation of a large number of concepts. However there exists no available resource that provides spatiotemporal mappings of generic concepts. Here we present a link-analysis based method for extracting the main locations and periods associated to all Wikipedia concepts. Relevant locations are selected from a set of geotagged articles, while relevant periods are discovered using a list of people with associated life periods. We analyze article versions over multiple languages and consider the strength of a spatial/temporal reference to be proportional to the number of languages in which it appears. To illustrate the utility of the spatiotemporal mapping of Wikipedia concepts, we present an analysis of cultural interactions and a temporal analysis of two domains. The Wikipedia mapping can also be used to perform rich spatiotemporal document indexing by extracting implicit spatial and temporal references from texts.
Popescu, Adrian; Grefenstette, Gregory & Bouamor, Houda Mining a Multilingual Geographical Gazetteer from the Web Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2009 [773]
Geographical gazetteers are necessary in a wide variety of applications. In the past, the construction of such gazetteers has been a tedious, manual process and only recently have the first attempts to automate the gazetteers creation been made. Here we describe our approach for mining accurate but large-scale multilingual geographic information by successively filtering information found in heterogeneous data sources (Flickr, Wikipedia, Panoramio, Web pages indexed by search engines). Statistically cross-checking information found in each site, we are able to identify new geographic objects, and to indicate, for each one, its name, its GPS} coordinates, its encompassing regions (city, region, country), the language of the name, its popularity, and the type of the object (church, bridge, etc.). We evaluate our approach by comparing, wherever possible, our multilingual gazetteer to other known attempts at automatically building a geographic database and to Geonames, a manually built gazetteer.
Powell, Allison K12 Online Learning: A Global Perspective World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [774]
Preiss, Judita; Dehdari, Jon; King, Josh & Mehay, Dennis Refining the most frequent sense baseline Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions 2009 [775]
We refine the most frequent sense baseline for word sense disambiguation using a number of novel word sense disambiguation techniques. Evaluating on the Senseval-3 English all words task, our combined system focuses on improving every stage of word sense disambiguation: starting with the lemmatization and part of speech tags used, through the accuracy of the most frequent sense baseline, to highly targeted individual systems. Our supervised systems include a ranking algorithm and a Wikipedia similarity measure.
Premchaiswadi, Wichian; Pangma, Sarayoot & Premchaiswadi, Nucharee Knowledge Sharing for an On-Line Test Bank Construction World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [776]
Priest, W. Curtiss What is the Common Ground between TCPK (Technological Pedagogical Content Knowledge) and Learning Objects? Society for Information Technology \& Teacher Education International Conference 2007 [777]
Priest, W. Curtiss A Paradigm Shifting Architecture for Education Technology Systems World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [778]
Priest, W. Curtiss & Komoski, P. Kenneth Designing Empathic Learning Games to Improve Emotional Competencies (Intelligence) Using Learning Objects [a work in progress] World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [779]
Purwitasari, D.; Okazaki, Y. & Watanabe, K. A study on Web resources' navigation for e-learning: usage of Fourier domain scoring on Web pages ranking method 2007 Second International Conference on Innovative Computing, Information and Control, 5-7 Sept. 2007 Los Alamitos, CA, USA} 2007
Using existing Web resources for e-learning is a very promising idea especially in reducing the cost of authoring. Envisioned as open-source, completely free, and frequently updated Wikipedia could become a good candidate. Even though Wikipedia has been structured by categories, still sometimes they are not dynamically updated when there are modifications. As Web resources for e-learning, it is a necessity to provide a navigation path in Wikipedia which semantically mapping the learning material and not merely based on the structures. The desired learning material could be provided as a request from search results. We introduce in this paper the usage of Fourier domain scoring (FDS) for ranking method in searching certain collection of Wikipedia Web pages. Unlike other methods that would only recognize the occurrence numbers of query terms, FDS} could also recognize the spread of query terms throughout the content of Web pages. Based on the experiments, we concluded that the not relevant results retrieved are mainly influenced by the characteristic of Wikipedia. Given that the changes of Wikipedia Web pages could be done in any-part by anyone, we concluded that it is possible if only some parts of retrieved Web pages strongly related to query terms.
Qian, Yufeng Meaningful Learning with Wikis: Making a Connection Society for Information Technology \& Teacher Education International Conference 2007 [780]
Qiu, Yongqiang & Elsayed, Adel Semantic Structures as Cognitive Tools to Support Reading World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [781]
Qu, Zehui; Wang, Yong; Wang, Juan; Zhang, Fengli & Qin, Zhiguang A classification algorithm of signed networks based on link analysis 2010 International Conference on Communications, Circuits and Systems (ICCCAS), 28-30 July 2010 Piscataway, NJ, USA} 2010 [782]
In the signed networks the links between nodes can be either positive (means relations are friendship) or negative (means relations are rivalry or confrontation), which are very useful for analysis the real social network. After study data sets from Wikipedia and Slashdot networks, We find that the signs of links in the fundamental social networks can be used to classified the nodes and used to forecast the potential emerged sign of links in the future with high accuracy, using models that established across these diverse data sets. Based on the models, the proposed algorithm in the artwork provides perception into some of the underlying principles that extract from signed links in the networks. At the same time, the algorithm shed light on the social computing applications by which the attitude of a person toward another can be predicted from evidence provided by their around friends relationships.
Quack, Till; Leibe, Bastian & Gool, Luc Van World-scale mining of objects and events from community photo collections Proceedings of the 2008 international conference on Content-based image and video retrieval 2008 [783]
In this paper, we describe an approach for mining images of objects (such as touristic sights) from community photo collections in an unsupervised fashion. Our approach relies on retrieving geotagged photos from those web-sites using a grid of geospatial tiles. The downloaded photos are clustered into potentially interesting entities through a processing pipeline of several modalities, including visual, textual and spatial proximity. The resulting clusters are analyzed and are automatically classified into objects and events. Using mining techniques, we then find text labels for these clusters, which are used to again assign each cluster to a corresponding Wikipedia article in a fully unsupervised manner. A final verification step uses the contents (including images) from the selected Wikipedia article to verify the cluster-article assignment. We demonstrate this approach on several urban areas, densely covering an area of over 700 square kilometers and mining over 200,000 photos, making it probably the largest experiment of its kind to date.
Quinton, Stephen Unlocking the Knowledge Generation and eLearning Potential of Contemporary Universities World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [784]
Raaijmakers, S.; Versloot, C. & de Wit, J. A Cocktail Approach to the VideoCLEF'09 Linking Task Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [785]
In this paper, we describe the TNO} approach to the Finding Related Resources or linking task of VideoCLEF09.} Our system consists of a weighted combination of off-the-shelf and proprietary modules, including the Wikipedia Miner toolkit of the University of Waikato. Using this cocktail of largely off-the-shelf technology allows for setting a baseline for future approaches to this task.
Ramakrishnan, Raghu Community systems: the world online Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management 2007 [786]
The Web is about you and me. Until now, for the most part, it has denoted a corpus of information that we put online sometime in the past, and the most celebrated Web application is keyword search over this corpus. Sites such as del.icio.us, flickr, MySpace, Slashdot, Wikipedia, Yahoo! Answers, and YouTube, which are driven by user-generated content, are forcing us to rethink the Web -- it is no longer just a static repository of content; it is a medium that connects us to each other. What are the ramifications of this fundamental shift? What are the new challenges in supporting and amplifying this shift?
Ramanathan, K. & Kapoor, K. Creating User Profiles Using Wikipedia Conceptual Modeling - ER 2009. 28th International Conference on Conceptual Modeling, 9-12 Nov. 2009 Berlin, Germany 2009 [787]
Creating user profiles is an important step in personalization. Many methods for user profile creation have been developed to date using different representations such as term vectors and concepts from an ontology like DMOZ.} In this paper, we propose and evaluate different methods for creating user profiles using Wikipedia as the representation. The key idea in our approach is to map documents to Wikipedia concepts at different levels of resolution: words, key phrases, sentences, paragraphs, the document summary and the entire document itself. We suggest a method for evaluating profile recall by pooling the relevant results from the different methods and evaluate our results for both precision and recall. We also suggest a novel method for profile evaluation by assessing the recall over a known ontological profile drawn from DMOZ.
Rapetti, Emanuele; Ciannamea, Samanta; Cantoni, Lorenzo & Tardini, Stefano The Voice of Learners to Understand ICTs Usages in Learning Experiences: a Quanti-qualitative Research Project in Ticino (Switzerland) World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [788]
Ratkiewicz, J.; Flammini, A. & Menczer, F. Traffic in Social Media I: Paths Through Information Networks 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA} 2010 [789]
Wikipedia is used every day by people all around the world, to satisfy a variety of information needs. We cross-correlate multiple Wikipedia traffic data sets to infer various behavioral features of its users: their usage patterns (e.g., as a reference or a source); their motivations (e.g., routine tasks such as student homework vs. information needs dictated by news events); their search strategies (how and to what extent accessing an article leads to further related readings inside or outside Wikipedia); and what determines their choice of Wikipedia as an information resource. We primarily study article hit counts to determine how the popularity of articles (and article categories) changes over time, and in response to news events in the English-speaking world. We further leverage logs of actual navigational patterns from a very large sample of Indiana University users over a period of one year, allowing us unprecedented ability to study how users traverse an online encyclopedia. This data allows us to make quantitative claims about how users choose links when navigating Wikipedia. From this same source of data we are further able to extract analogous navigation networks representing other large sites, including Facebook, to compare and contrast the use of these sites with Wikipedia. Finally we present a possible application of traffic analysis to page categorization.
Reiners, Torsten; Holt, Karyn & Reiß, Dirk Google Wave: Unnecessary Hype or Promising Tool for Teachers World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [790]
Reiter, Nils; Hartung, Matthias & Frank, Anette A resource-poor approach for linking ontology classes to Wikipedia articles Proceedings of the 2008 Conference on Semantics in Text Processing 2008 [791]
The applicability of ontologies for natural language processing depends on the ability to link ontological concepts and relations to their realisations in texts. We present a general, resource-poor account to create such a linking automatically by extracting Wikipedia articles corresponding to ontology classes. We evaluate our approach in an experiment with the Music Ontology. We consider linking as a promising starting point for subsequent steps of information extraction.
Rejas-Muslera, R.J.; Cuadrado, J.J.; Abran, A. & Sicilia, M.A. Information economy philosophy in universal education. The Open Educational Resources (OER): technical, socioeconomics and legal aspects 2008 IEEE International Professional Communication Conference (IPCC 2008), 13-16 July 2008 Piscataway, NJ, USA} 2008
According to Dr. B.R.} Ambedkars definition by Deshpande, P.M.} (1995), Open Educational Resources (OER) are based on the philosophical view of knowledge as a collective, social product. In consequence, it is also desirable to make it a social property. Terry Foote, one of the Wikipedia projects chairperson emphasize this: Imagine a world in which every single person is given free access to the sum of all human knowledge . The importance of open educational resources (OERs) has been widely documented and demonstrated and a high magnitude impact is to be expected for OERs} in the near future. This paper presents on overview of OERs} and its current usage. Then, the paper goes into detailed some related aspects. Which is the impact, in socio-economic terms, of OER, especially for the less developed? Which legal aspects influence the diffusion and use of OER?} And, which are the technical resources needed for them?.
Repman, Judi; Zinskie, Cordelia & Clark, Kenneth Online Learning, Web 2.0 and Higher Education: A Formula for Reform? Society for Information Technology \& Teacher Education International Conference 2008 [792]
Repman, Judi; Zinskie, Cordelia & Downs, Elizabeth On the Horizon: Will Web 2.0 Change the Face of Online Learning? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [793]
Rezaei, Ali Reza Using social networks for language learning Society for Information Technology \& Teacher Education International Conference 2010 [794]
Richards, Griff; Lin, Arthur; Eap, Ty Mey & Sehboub, Zohra Where Do They Go? Internet Search Strategies in Grade Five Laptop Classrooms World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [795]
Roberts, Cody; Yu, Chien; Brandenburg, Teri & Du, Jianxia The Impact of Webcasting in Major Corporations World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [796]
Rodda, Paul Social Constructivism as guiding philosophy for Software Development World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [797]
Rodriguez, Mark; Huang, Marcy & Merrill, Marcy Analysis of Web Hosting Services in Collaborative Online Learning Society for Information Technology \& Teacher Education International Conference 2008 [798]
Ronda, Natalia Sinitskaya; Owston, Ron; Sanaoui, Razika & Ronda, Natalia Sinitskaya Voulez-Vous Jouer?" [Do you want to play?]: Game Development Environments for Literacy Skill Enhancement" World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [799]
Rosselet, Alan Active Course Notes within a Group Learning Environment World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [800]
Roxin, Ioan; Szilagyi, Ioan & Balog-Crisan, Radu Kernel Design for Semantic Learning Platform World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [801]
Royer, Regina Educational Blogging: Going Beyond Reporting, Journaling, and Commenting to Make Connections and Support Critical Thinking Society for Information Technology \& Teacher Education International Conference 2009 [802]
Royer, Regina Using Web 2.0 Tools in an Online Course to Enhance Student Satisfaction World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [803]
Rozovskaya, Alla & Sproat, Richard Multilingual word sense discrimination: a comparative cross-linguistic study Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies 2007 [804]
We describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance differences for the languages and to identify language-specific problems. The algorithm is tested on semantically ambiguous words using data from Wikipedia, an online encyclopedia. We evaluate the induced clusters against sense clusters created manually. The results suggest a correlation between the algorithm's performance and morphological complexity of the language. In particular, we obtain FScores} of 0.68, 0.66 and 0.61 for English, Hebrew, and Russian, respectively. Moreover, we perform an experiment on Russian, in which the context terms are lemmatized. The lemma-based approach significantly improves the results over the word-based approach, by increasing the FScore} by 16\%. This result demonstrates the importance of morphological analysis for the task for morphologically rich languages like Russian.
Rubens, Neil; Vilenius, Mikko & Okamoto, Toshio Data-driven Group Formation for Informal Collaborative Learning World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [805]
Rudak, Leszek Vector Graphics as a Mathematical Tool Society for Information Technology \& Teacher Education International Conference 2009 [806]
Rueckert, Dan; Kim, Daesang & Yang, Mihwa Using a Wiki as a Communication Tool for Promoting Limited English Proficiency (LEP) Students’ Learning Practices Society for Information Technology \& Teacher Education International Conference 2007 [807]
Ruiz-Casado, Maria; Alfonseca, Enrique; Okumura, Manabu & Castells, Pablo Information Extraction and Semantic Annotation of Wikipedia Proceeding of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge 2008 [808]
An architecture is proposed that, focusing on the Wikipedia as a textual repository, aims at enriching it with semantic information in an automatic way. This approach combines linguistic processing, Word Sense Disambiguation and Relation Extraction techniques for adding the semantic annotations to the existing texts.
Rus, Vasile; Lintean, Mihai; Graesser, Art & McNamara, Danielle Assessing Student Paraphrases Using Lexical Semantics and Word Weighting Proceeding of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling 2009 [809]
We present in this paper an approach to assessing student paraphrases in the intelligent tutoring system ISTART.} The approach is based on measuring the semantic similarity between a student paraphrase and a reference text, called the textbase. The semantic similarity is estimated using knowledge-based word relatedness measures. The relatedness measures rely on knowledge encoded in Word-Net, a lexical database of English. We also experiment with weighting words based on their importance. The word importance information was derived from an analysis of word distributions in 2,225,726 documents from Wikipedia. Performance is reported for 12 different models which resulted from combining 3 different relatedness measures, 2 word sense disambiguation methods, and 2 word-weighting schemes. Furthermore, comparisons are made to other approaches such as Latent Semantic Analysis and the Entailer.
Ruth, Alison & Ruutz, Aaron Four Vignettes of Learning: Wiki Wiki Web or What Went Wrong World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [810]
Ryder, Barbara & Hailpern, Brent Proceedings of the third ACM SIGPLAN conference on History of programming languages 2007 [811]
The twelve papers in this proceedings represent original historical perspectives on programming languages that span at least five different programming paradigms and communities: object-oriented, functional, reactive, parallel, and scripting. At the time of the conference, the programming languages community continues to create broader mini-histories of each of those paradigms at http://en.wikipedia.org/wiki/HOPL
Safran, Christian; Ebner, Martin; Garcia-Barrios, Victor Manuel & Kappe, Frank Higher Education m-Learning and e-Learning Scenarios for a Geospatial Wiki World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [812]
Sagara, T. & Hagiwara, M. Natural Language Processing Neural Network for Recall and Inference Artificial Neural Networks - ICANN 2010. 20th International Conference, 15-18 Sept. 2010 Berlin, Germany 2010 [813]
In this paper, we propose a novel neural network which can learn knowledge from natural language documents and can perform recall and inference. The proposed network has a sentence layer, a knowledge layer, ten kinds of deep case layers and a dictionary layer. In the network learning step, connections are updated based on Hebb's learning rule. The proposed network can handle a complicated sentence by incorporating the deep case layers and get unlearned knowledge from the dictionary layer. In the dictionary layer, Goi-Taikei, containing 400, 000 words dictionary, is employed. Two kinds of experiments were carried out by using goo encyclopedia and Wikipedia as knowledge sources. Superior performance of the proposed neural network has been confirmed.
Sagers, Glen; Kasliwal, Shobit; Vila, Joaquin & Lim, Billy Geo-Terra: Location-based Learning Using Geo-Tagged Multimedia Global Learn Asia Pacific 2010 [814]
Sajjapanroj, Suthiporn; Bonk, Curtis; Lee, Mimi & Lin, Grace The Challenges and Successes of Wikibookian Experts and Want-To-Bees World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [815]
Salajan, Florin & Mount, Greg Instruction in the Web 2.0 Environment: A Wiki Solution for Multimedia Teaching and Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [816]
Saleh, Iman; Darwish, Kareem & Fahmy, Aly Classifying Wikipedia articles into NE's using SVM's with threshold adjustment Proceedings of the 2010 Named Entities Workshop 2010 [817]
In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a language independent way. Classification is done using Support Vectors Machine (SVM) classifier at first, and then the threshold of SVM} is adjusted in order to improve the recall scores of classification. Threshold adjustment is performed using beta-gamma threshold adjustment algorithm which is a post learning step that shifts the hyperplane of SVM.} This approach boosted recall with minimal effect on precision.
SanJuan, E. & Ibekwe-SanJuan, F. Multi Word Term Queries for Focused Information Retrieval Computational Linguistics and Intelligent Text Processing 11th International Conference, CICling 2010, 21-27 March 2010 Berlin, Germany 2010
In this paper, we address both standard and focused retrieval tasks based on comprehensible language models and interactive query expansion (IQE).} Query topics are expanded using an initial set of Multi Word Terms (MWTs) selected from top n ranked documents. MWTs} are special text units that represent domain concepts and objects. As such, they can better represent query topics than ordinary phrases or ngrams. We tested different query representations: bag-of-words, phrases, flat list of MWTs, subsets of MWTs.} We also combined the initial set of MWTs} obtained in an IQE} process with automatic query expansion (AQE) using language models and smoothing mechanism. We chose as baseline the Indri IR} engine based on the language model using Dirichlet smoothing. The experiment is carried out on two benchmarks: TREC} Enterprise track (TRECent) 2007 and 2008 collections; INEX} 2008 Adhoc track using the Wikipedia collection.
Santos, D. & Cabral, L.M. GikiCLEF: expectations and lessons learned Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [818]
This overview paper is devoted to a critical assessment of GikiCLEF} 2009, an evaluation contest specifically designed to expose and investigate cultural and linguistic issues in Wikipedia search, with eight participant systems and 17 runs. After providing a maximally short but self contained overview of the GikiCLEF} task and participation, we present the open source SIGA} system, and discuss, for each of the main guiding ideas, the resulting successes or shortcomings, concluding with further work and still unanswered questions.
Sanz-Santamaría, Silvia; Vare, Juan A. Pereira; Serrano, Julián Gutiérrez; Fernández, Tomás A. Pérez & Zorita, José A. Vadillo Practicing L2 Speaking in a Collaborative Video-Conversation Environment World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [819]
Sarjant, Samuel; Legg, Catherine; Robinson, Michael & Medelyan, Olena All You Can Eat" Ontology-Building: Feeding Wikipedia to Cyc" Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2009 [820]
In order to achieve genuine web intelligence, building some kind of large general machine-readable conceptual scheme (i.e. ontology) seems inescapable. Yet the past 20 years have shown that manual ontology-building is not practicable. The recent explosion of free user-supplied knowledge on the Web has led to great strides in automatic ontology-building, but quality-control is still a major issue. Ideally one should automatically build onto an already intelligent base. We suggest that the long-running Cyc project is able to assist here. We describe methods used to add {35K} new concepts mined from Wikipedia to collections in ResearchCyc} entirely automatically. Evaluation with 22 human subjects shows high precision both for the new concepts’ categorization, and their assignment as individuals or collections. Most importantly we show how Cyc itself can be leveraged for ontological quality control by ‘feeding’ it assertions one by one, enabling it to reject those that contradict its other knowledge.
Schalick, J.A. Technology and changes in the concept of the university: comments on the reinvention of the role of the university east and west Proceedings of PICMET 2006-Technology Management for the Global Future, 8-13 July 2006 Piscataway, NJ, USA} 2007
Technology is a driver of social and economic institutions in the 21st with many consequences as yet unclear because of the rapidity of distribution across institutions. The speed of change and the scope of global communications have created alliances that alter organizational and institutional models as well as access to libraries, research, innovations. Universities, long immune to change by virtue of their role as societal institution, are challenged by technologies on all levels and are being reinvented with or without planned strategies. The extraordinary growth of means of communication, of access online to university courses, of new Internet-facilitated access to the vast resources of international libraries once held close to the smaller academic community, has exploded the concept of where knowledge resides and how it is to be accessed. New `brands' of learning follow on what may be called broadly, an e-commerce model driven by new technologies and the rapidity of communications. The trend to alliances between the university, business, and information distribution technologies is seen as crucial to the survival of both in global networks of private and national universities alike. Alliances in a global economy alter organizational and institutional models. The growth of such operations as Wikipedia, which is user-driven rather than faculty-driven, is an example outside the framework of university-based resources. Connections, alliances, conversation and innovation tie the institutions of education, business, and national policies together in a bold new way bridging cultural boundaries and challenging them at the same time
Schneider, Daniel Edutech Wiki - an all-in-one solution to support whole scholarship? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [821]
Scholl, P.; Bohnstedt, D.; Garcia, R.D.; Rensing, C. & Steinmetz, R. Extended explicit semantic analysis for calculating semantic relatedness of web resources Sustaining TEL: From Innovation to Learning and Practice. 5th European Conference on Technology Enhanced Learning, EC-TEL 2010, 28 Sept.-1 Oct. 2010 Berlin, Germany 2010 [822]
Finding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, usually Wikipedia is applied as reference corpus. We propose enhancements to ESA} (called Extended Explicit Semantic Analysis) that make use of further semantic properties of Wikipedia like article link structure and categorization, thus utilizing the additional semantic information that is included in Wikipedia. We show how we apply this approach to recommendation of web resource fragments in a resource-based learning scenario for self-directed, on-task learning with web resources.
Schrader, Pg & Lawless, Kimberly Gamer Discretion Advised: How MMOG Players Determine the Quality and Usefulness of Online Resources Society for Information Technology \& Teacher Education International Conference 2008 [823]
Semeraro, Giovanni; Lops, Pasquale; Basile, Pierpaolo & de Gemmis, Marco Knowledge infusion into content-based recommender systems Proceedings of the third ACM conference on Recommender systems 2009 [824]
Content-based recommender systems try to recommend items similar to those a given user has liked in the past. The basic process consists of matching up the attributes of a user profile, in which preferences and interests are stored, with the attributes of a content object (item). Common-sense and domain-specific knowledge may be useful to give some meaning to the content of items, thus helping to generate more informative features than plain" attributes. The process of learning user profiles could also benefit from the infusion of exogenous knowledge or open source knowledge with respect to the classical use of endogenous knowledge (extracted from the items themselves). The main contribution of this paper is a proposal for knowledge infusion into content-based recommender systems which suggests a novel view of this type of systems mostly oriented to content interpretation by way of the infused knowledge. The idea is to provide the system with the "linguistic" and "cultural" background knowledge that hopefully allows a more accurate content analysis than classic approaches based on words. A set of knowledge sources is modeled to create a memory of linguistic competencies and of more specific world "facts" that can be exploited to reason about content as well as to support the user profiling and recommendation processes. The modeled knowledge sources include a dictionary Wikipedia and content generated by users (i.e. tags provided on items) while the core of the reasoning component is a spreading activation algorithm."
Sendurur, Emine; Sendurur, Polat & Gedik, Nuray Temur Communicational, Social, and Educational Aspects of Virtual Communities: Potential Educational Opportunities for In-Service Teacher Training Society for Information Technology \& Teacher Education International Conference 2008 [825]
Senette, C.; Buzzi, M.C.; Buzzi, M. & Leporini, B. Enhancing Wikipedia Editing with WAI-ARIA HCI and Usability for e-Inclusion. 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society, USAB 2009, 9-10 Nov. 2009 Berlin, Germany 2009
Nowadays Web 2.0 applications allow anyone to create, share and edit online content, but accessibility and usability issues still exist. For instance, Wikipedia presents many difficulties for blind users, especially when they want to write or edit articles. In a previous stage of our study we proposed and discussed how to apply the W3C} ARIA} suite to simplify the Wikipedia editing page when interacting via screen reader. In this paper we present the results of a user test involving totally blind end-users as they interacted with both the original and the modified Wikipedia editing pages. Specifically, the purpose of the test was to compare the editing and formatting process for original and ARIA-implemented} Wikipedia user interfaces, and to evaluate the improvements.
Seppala, Mika; Caprotti, Olga & Xambo, Sebastian Using Web Technologies to Teach Mathematics Society for Information Technology \& Teacher Education International Conference 2006 [826]
Settles, Patty Wikis, Blogs and their use in the Water/Wastewater World Proceedings of the Water Environment Federation 2006 M3 - doi:10.2175/193864706783779249""
Shakshuki, Elhadi; Lei, Helen & Tomek, Ivan Intelligent Agents in Collaborative E-Learning Environments World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 [827]
Shakya, Aman; Takeda, Hideaki & Wuwongse, Vilas Consolidating User-Defined Concepts with StYLiD Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web 2008 [828]
Information sharing can be effective with structured data. However, there are several challenges for having structured data on the web. Creating structured concept definitions is difficult and multiple conceptualizations may exist due to different user requirements and preferences. We propose consolidating multiple concept definitions into a unified virtual concept and formalize our approach. We have implemented a system called StYLiD} to realize this. StYLiD} is a social software for sharing a wide variety of structured data. Users can freely define their own structured concepts. The system consolidates multiple definitions for the same concept by different users. Attributes of the multiple concept versions are aligned semi-automatically to provide a unified view. It provides a flexible interface for easy concept definition and data contribution. Popular concepts gradually emerge from the cloud of concepts while concepts evolve incrementally. StYLiD} supports linked data by interlinking data instances including external resources like Wikipedia.
Shin, Wonsug & Lowes, Susan Analyzing Web 2.0 Users in an Online Discussion Forum World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [829]
Siemens, George & Tittenberge, Peter Virtual Learning Commons: Designing a Social University World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [830]
Sigurbjornsson, B.; Kamps, J. & de Rijke, M. Focused access to Wikipedia DIR'06. Dutch-Belgian Information Retrieval Workshop. Proceedings, 13-14 March 2006 Enschede, Netherlands 2006
Wikipedia is a free" online encyclopedia. It contains millions of entries in many languages and is growing at a fast pace. Due to its volume search engines play an important role in giving access to the information in Wikipedia. The "free" availability of the collection makes it an attractive corpus for information retrieval experiments. In this paper we describe the evaluation of a search engine that provides focused search access to Wikipedia i.e. a search engine which gives direct access to individual sections of Wikipedia pages. The main contributions of this paper are twofold. First we introduce Wikipedia as a test corpus for information retrieval experiments in general and for semi-structured retrieval in particular. Second we demonstrate that focused XML} retrieval methods can be applied to a wider range of problems than searching scientific journals in XML} format including accessing reference works"
Singh, V. K.; Jalan, R.; Chaturvedi, S. K. & Gupta, A. K. Collective Intelligence Based Computational Approach to Web Intelligence Proceedings of the 2009 International Conference on Web Information Systems and Mining 2009 [831]
The World Wide Web has undergone major transformation during last few years, primarily due to its newly discovered ability to harness collective intelligence of the millions of users across the world. Users are no longer only passive consumers; they are actively participating to create new and useful content and more rich \& personalized web applications. Techniques to leverage user contributions are on one hand making the Web a collective knowledge system (applications like Wikipedia), and on the other hand provide a new approach to mine the unstructured web-content for useful inferences \& new knowledge. In this paper, we have discussed how the collective intelligence phenomenon is bringing about the paradigm shift in the Web; and presented our experimental work on a social-inference oriented approach for opinion analysis on the Blogosphere.
Sinha, Hansa; Rosson, Mary Beth & Carroll, John The Role of Technology in the Development of Teachers’ Professional Learning Communities Society for Information Technology \& Teacher Education International Conference 2009 [832]
Slusky, Ludwig & Partow-Navid, Parviz Development for Computer Forensics Course Using EnCase World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [833]
Slykhuis, David & Stern, Barbara Whither our Wiki? Society for Information Technology \& Teacher Education International Conference 2008 [834]
Smith, Jason R.; Quirk, Chris & Toutanova, Kristina Extracting parallel sentences from comparable corpora using document level alignment HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [835]
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several approaches developed for obtaining parallel sentences from non-parallel, or comparable data, such as news articles published within the same time period (Munteanu} and Marcu, 2005), or web pages with a similar structure (Resnik} and Smith, 2003). One resource not yet thoroughly explored is Wikipedia, an online encyclopedia containing linked articles in many languages. We advance the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity. We also include features which make use of the additional annotation given by Wikipedia, and features using an automatically induced lexicon model. Results for both accuracy in sentence extraction and downstream improvement in an SMT} system are presented.
Son, Moa The Effects of Debriefing on Improvement of Academic Achievements and Game Skills World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [836]
Soriano, Javier; López, Javier; Jiménez, Miguel & Alonso, Fernando Enabling semantics-aware collaborative tagging and social search in an open interoperable tagosphere Proceedings of the 10th International Conference on Information Integration and Web-based Applications \& Services 2008 [837]
To make the most of a global network effect and to search and filter the Long Tail, a collaborative tagging approach to social search should be based on the global activity of tagging, rating and filtering. We take a further step towards this objective by proposing a shared conceptualization of both the activity of tagging and the organization of the tagosphere in which tagging takes place. We also put forward the necessary data standards to interoperate at both data format and semantic levels. We highlight how this conceptualization makes provision for attaching identity and meaning to tags and tag categorization through a Wikipedia-based collaborative framework. Used together, these concepts are a useful and agile means of unambiguously defining terms used during tagging, and of clarifying any vague search terms. This improves search results in terms of recall and precision, and represents an innovative means of semantics-aware collaborative filtering and content ranking.
Sosin, Adrienne Andi; Pepper-Sanello, Miriam; Eichenholtz, Susan; Buttaro, Lucia & Edwards, Richard Digital Storytelling Curriculum for Social Justice Learners \& Leaders Society for Information Technology \& Teacher Education International Conference 2007 [838]
Speelman, Pamela & Gore, David IT Proposal - Simulation Project as a Higher Order Thinking Technique for Instruction Society for Information Technology \& Teacher Education International Conference 2008 [839]
Speelman, Pamela; Gore, David & Hyde, Scott Simulation: Gaming and Beyond Society for Information Technology \& Teacher Education International Conference 2009 [840]
Stefl-Mabry, Joette & William, E. J. Doane Teaching \& Learning 2.0: An urgent call to do away with the isolationist practice of educating and retool education as community in the United States. Society for Information Technology \& Teacher Education International Conference 2008 [841]
Strohmaier, Mahla; Nance, Kara & Hay, Brian Phishing: What Teachers Need to Know for the Home and Classroom Society for Information Technology \& Teacher Education International Conference 2006 [842]
Suanpang, Pannee & Kalceff, Walter Suan Dusit Internet Broadcasting (SDIB):-Educational Innovation in Thailand World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [843]
Sun, Yanling; Masterson, Carolyn & Kahn, Patricia Implementing ePortfolio among Pre-service Teachers: An Approach to Construct Meaning of NCATE Standards to Students Society for Information Technology \& Teacher Education International Conference 2007 [844]
Sung, Woonhee Analysis of underlying features allowing educational uses for collaborative learning in Social Networking Sites, Cyworld World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [845]
Sutcliffe, R.F.E.; Steinberger, J.; Kruschwitz, U.; Alexandrov-Kabadjov, M. & Poesio, M. Identifying novel information using latent semantic analysis in the WiQA task at CLEF 2006 Evaluation of Multilingual and Multi-modal Information Retrieval. 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, 20-22 Sept. 2006 Berlin, Germany 2007
In our two-stage system for the English monolingual WiQA} task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the latent semantic analysis component which judged them novel if their match with the article text was less than a threshold. In Runl, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with average yield per topic 2.46 and precision 0.37. Compared to other groups, our performance was in the middle of the range except for precision where our system was the best. We attribute this to our use of exact title matches in the IR} stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.
Swenson, Penelope Handheld computer use and the ‘killer application’ Society for Information Technology \& Teacher Education International Conference 2005 [846]
Switzer, Anne & Lepkowski, Frank Information Literacy and the Returning Masters Student: Observations from the Library Side Society for Information Technology \& Teacher Education International Conference 2007 [847]
Sánchez, Alejandro Campos; Ureña, José David Flores; Sánchez, Raúl Campos; Gutiérrez, José Alberto Castellanos & Sánchez, Alejandro Campos Knowledge Construction Through ICT's: Social Networks World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [848]
Tahayna, B.; Ayyasamy, R.K.; Alhashmi, S. & Eu-Gene, S. A novel weighting scheme for efficient document indexing and classification 2010 International Symposium on Information Technology (ITSim 2010), 15-17 June 2010 Piscataway, NJ, USA} 2010 [849]
In this paper we propose and illustrate the effectiveness of a new topic-based document classification method. The proposed method utilizes the Wikipedia, a large scale Web encyclopaedia that has high-quality and huge-scale articles and a category system. Wikipedia is used using an N-gram technique to transform the document from being a bag of words to become a bag of concepts. Based on this transformation, a novel concept-based weighting scheme (denoted as Conf.idf) is proposed to index the text with the flavor of the traditional tf.idf indexing scheme. Moreover, a genetic algorithm-based support vector machine optimization method is used for the purpose of feature subset and instance selection. Experimental results showed that proposed weighting scheme outperform the traditional indexing and weighting scheme.
Takayuki, Furukawa; Hoshi, Kouki; Aida, Aya; Mitsuhashi, Sachiko; Kamoshida, Hiromi & In, Katsuya Do the traditional classroom-based motivational methods work in e-learning community? World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [850]
Takeuchi, H. An automatic Web site menu structure evaluation 2009 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 20-24 Aug. 2009 Piscataway, NJ, USA} 2009 [851]
The purpose of this paper is to propose a method for automatically evaluating Web site menu structures. The evaluation system requires content data and a menu structure with link names. This approach consists of three stages. First, the system classifies the content data into appropriate links. Second, the system identifies the usability problems for all content data. Third, the system calculates an index that indicates the averaged predicted mouse clicks for the menu structure. As applications, a link name selection problem and a link structure evaluation problem are discussed. This system was also applied to real data, such as Encarta's and Wikipedia's menus. The results confirmed the usefulness of the system.
Takeuchi, Toshihiko Development of a VBA Macro that Draws Figures in Profile with PowerPoint World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [852]
Takeuchi, Toshihiko; Kato, Shogo & Kato, Yuuki Suggestion of a quiz-form learning-style using a paid membership bulletin board system Society for Information Technology \& Teacher Education International Conference 2008 [853]
Takeuchi, Toshihiko; Kato, Shogo; Kato, Yuuki & Wakui, Tomohiro Manga-Based Beginner-level Textbooks; Proposal of a Website for their Creation World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [854]
Tam, Shuk Ying; Wat, Sin Tung & Kennedy, David M An Evaluation of Two Open Source Digital Library Software Systems World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [855]
Tamashiro, Roy Transforming Curriculum \& Pedagogy for Global Thinking with Social Networking Tools Society for Information Technology \& Teacher Education International Conference 2009 [856]
Tamashiro, Roy; Rodney, Basiyr D. & Beckmann, Mary Do Student-Authored Wiki Textbook Projects Support 21st Century Learning Outcomes? Society for Information Technology \& Teacher Education International Conference 2010 [857]
Tamim, Rana; Shaikh, Kamran & Bethel, Edward EDyoutube: Why not? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [858]
Tanaka, K. Web Information Credibility Web-Age Information Management. 11th International Conference, WAIM 2010, 15-17 July 2010 Berlin, Germany 2010 [859]
Summary form only given. World Wide Web is the biggest repository of information and knowledge. Such information gives people a framework for organizing their private and professional lives. Research aimed at evaluating the credibility of Web content has recently become increasingly crucial because the Web has started to influence our daily lives. The abundance of content on the Web, the lack of publishing barriers, and poor quality control of Web content raise credibility issues. If users are not aware of the credibility of Web information, they can be easily misled, and sometimes it is dangerous to users. For example, some researchers reported that there are more than twenty thousand health-related sites on the Web, but more than half of such sites have not been reviewed by medical specialists. Wikipedia has been more popular on the Web, but the risks of Wikipedia are also indicated from the viewpoint of credibility. There are a lot of exaggerated ads and fake images and movies. The importance of the image forensic research also becomes important. Many dimensions concerned with the information credibility are grouped into two key components: expertise and trustworthiness. Expertise is a factor about the writer's ability to produce correct or fair information and the degree to which the reader can perceive knowledge and skill from the information. The expertise factor is defined by the terms knowledgeable, experienced, competent, and so on. The trustworthiness is a factor about readers' perceptions that the information is true as they know it, and it is the degree to which readers can perceive the goodness or morality of the target information. The trustworthiness factor is defined by the terms well-intentioned, unbiased, reputable, and so on. In the areas of Web search and mining, however, most of conventional research has focused on ranking search results based on popularity by analyzing link structures or on mining useful rules from the Web. They have not focused on the analysis of the credibility of target information. Consequently, few users perform rigorous evaluations of the credibility of obtained information. Therefore, the exploration of a general framework and automatic tools for supporting users in the judgment of web content credibility are becoming increasingly necessary. In this talk, we describe a new framework and methods for evaluating the Web information credibility. These include: a bipartite-graph framework for evaluating the credibility of relations, and several methods for analyzing Web information credibility from the viewpoint of (1) content analysis, (2) social support analysis and (3) author analysis.
Tanaka, Katsumi; Zhou, Xiaofang; Zhang, Min & Jatowt, Adam Proceedings of the 4th workshop on Information credibility 2010 [860]
It is our great pleasure to welcome you all to the 4th Workshop on Information Credibility on the Web (WICOW'10) organized in conjunction with the 19th World Wide Web Conference in Raleigh, NC, USA} on April 27, 2010. The aim of the workshop is to provide a forum for discussion on various issues related to information credibility criteria on the web. Evaluating and improving information credibility requires combination of different technologies and backgrounds. Through the series of WICOW} workshops we hope to exchange novel ideas and findings as well as promote discussions on various aspects of web information credibility. This year we received 22 full paper submissions from 12 countries: Austria, Brazil, China, Egypt, France, Germany, Ireland, Japan, The Netherlands, Saudi Arabia, UK} and USA.} After a careful review process, with at least three reviews for each paper the Program Committee has selected 10 full papers (45\% acceptance rate) covering variety of topics related to information credibility. The accepted papers were grouped into 3 sessions: Wikipedia} Credibility" {"Studies} of Web Information Credibility" and {"Evaluating} Information Credibility". We are also pleased to invite Miriam Metzger from University of California Santa Barbara for giving a keynote talk entitled: {"Understanding} Credibility across Disciplinary Boundaries.""
Tappert, Charles The Interplay of Student Projects and Student-Faculty Research World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [861]
Tappert, Charles Pedagogical Issues in Managing Information Technology Projects Conducted by Geographically Distributed Student Teams Society for Information Technology \& Teacher Education International Conference 2009 [862]
Tappert, Charles & Stix, Allen Assessment of Student Work on Geographically Distributed Information Technology Project Teams World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [863]
Tarkowski, Diane; Donovan, Marie; Salwach, Joe; Avgerinou, Maria; Rotenberg, Robert & Lin, Wen-Der Supporting Faculty and Students with Podcast Workshops World Conference on Educational Multimedia, Hypermedia and Telecommunications 2007 [864]
Theng, Yin-Leng & Jiang, Tao Determinant Factors of Information Use or Misuse in Wikipedia World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [865]
Thomas, Christopher & Sheth, Amit P. Semantic Convergence of Wikipedia Articles Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence 2007 [866]
Social networking, distributed problem solving and human computation have gained high visibility. Wikipedia is a well established service that incorporates aspects of these three fields of research. For this reason it is a good object of study for determining quality of solutions in a social setting that is open, completely distributed, bottom up and not peer reviewed by certified experts. In particular, this paper aims at identifying semantic convergence of Wikipedia articles; the notion that the content of an article stays stable regardless of continuing edits. This could lead to an automatic recommendation of good article tags but also add to the usability of Wikipedia as a Web Service and to its reliability for information extraction. The methods used and the results obtained in this research can be generalized to other communities that iteratively produce textual content.
Thompson, Nicole ICT and globalization in Education World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [867]
Toledo, Cheri Setting the Stage to Use Blogging as a Reflective Tool in Teacher Education Society for Information Technology \& Teacher Education International Conference 2007 [868]
Tomuro, Noriko & Shepitsen, Andriy Construction of disambiguated Folksonomy ontologies using Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [869]
One of the difficulties in using Folksonomies in computational systems is tag ambiguity: tags with multiple meanings. This paper presents a novel method for building Folksonomy tag ontologies in which the nodes are disambiguated. Our method utilizes a clustering algorithm called DSCBC, which was originally developed in Natural Language Processing (NLP), to derive committees of tags, each of which corresponds to one meaning or domain. In this work, we use Wikipedia as the external knowledge source for the domains of the tags. Using the committees, an ambiguous tag is identified as one which belongs to more than one committee. Then we apply a hierarchical agglomerative clustering algorithm to build an ontology of tags. The nodes in the derived ontology are disambiguated in that an ambiguous tag appears in several nodes in the ontology, each of which corresponds to one meaning of the tag. We evaluate the derived ontology for its ontological density (how close similar tags are placed), and its usefulness in applications, in particular for a personalized tag retrieval task. The results showed marked improvements over other approaches.
Traina, Michael; Doctor, Denise; Bean, Erik & Wooldridge, Vernon STUDENT CODE of CONDUCT in the ONLINE CLASSROOM: A CONSIDERATION of ZERO TOLERANCE POLICIES World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 [870]
Tran, T. & Nayak, R. Evaluating the performance of XML document clustering by structure only Comparative Evaluation of XML Information Retrieval Systems. 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, 17-20 Dec. 2006 Berlin, Germany 2007
This paper reports the results and experiments performed on the INEX} 2006 document mining challenge corpus with the PCXSS} clustering method. The PCXSS} method is a progressive clustering method that computes the similarity between a new XML} document and existing clusters by considering the structures within documents. We conducted the clustering task on the INEX} and Wikipedia data sets.
Tripp, Lisa Teaching Digital Media Production in Online Instruction: Strategies and Recommendations World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [871]
Tsikrika, T. & Kludas, J. Overview of the WikipediaMM Task at ImageCLEF 2009 Multilingual Information Access Evaluation II. Multimedia Experiments. 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, 30 Sept.-2 Oct. 2009 Berlin, Germany 2010 [872]
ImageCLEF's} WikipediaMM} task provides a testbed for the system-oriented evaluation of multimedia information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs. This paper presents an overview of the resources, topics, and assessments of the WikipediaMM} task at ImageCLEF} 2009, summarises the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results.
Turek, P.; Wierzbicki, A.; Nielek, R.; Hupa, A. & Datta, A. Learning about the quality of teamwork from Wikiteams 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), 20-22 Aug. 2010 Los Alamitos, CA, USA} 2010 [873]
This paper describes an approach to evaluating teams of contributors in Wikipedia based on social network analysis. We present the idea of creating an implicit social network based on characteristics of pages' edit history and collaboration between contributors. This network consists of four dimensions: trust, distrust, acquaintance and knowledge. Trust and distrust are based on content modifications (copying and deleting respectively), acquaintance is based on the amount of discussion on articles' talk pages between a given pair of authors and knowledge is based on the categories in which an author typically contributes. Our social network is based on the entire Wikipedia edit history, and therefore is a summary of all recorded author interactions. This social network can be used to assess the quality of a team of authors and consequently, to recommend good teams. The social network can also be used by Wikipedia authors and editors as an additional tool that allows to improve the author's collaboration, as it expresses each author's social environment and can be navigated to discover new projects that an author can participate in, or to recommend new collaborators.
Turgut, Yildiz EFL Learners’ Experience of Online Writing by PBWiki World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [874]
Tynan, Belinda; Lee, Mark J.W. & Barnes, Cameron Polar bears, black gold, and light bulbs: Creating stable futures for tertiary education through instructor training and support in the use of ICTs World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [875]
Téllez, Alberto; Juárez, Antonio; Hernández, Gustavo; Denicia, Claudia; Villatoro, Esaú; Montes, Manuel & Villaseñor, Luis A Lexical Approach for Spanish Question Answering Advances in Multilingual and Multimodal Information Retrieval 2008 [876]
This paper discusses our system's results at the Spanish Question Answering task of CLEF} 2007. Our system is centered in a full data-driven approach that combines information retrieval and machine learning techniques. It mainly relies on the use of lexical information and avoids any complex language processing procedure. Evaluation results indicate that this approach is very effective for answering definition questions from Wikipedia. In contrast, they also reveal that it is very difficult to respond factoid questions from this resource solely based on the use of lexical overlaps and redundancy.
Udupa, Raghavendra & Khapra, Mitesh Improving the multilingual user experience of Wikipedia using cross-language name search HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [877]
Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, Non-English} users in general and ESL/EFL} users in particular, have a need to search for relevant English Wikipedia articles as no relevant articles are available in their language. The multilingual experience of such users can be significantly improved if they could express their information need in their native language while searching for English Wikipedia articles. In this paper, we propose a novel cross-language name search algorithm and employ it for searching English Wikipedia articles in a diverse set of languages including Hebrew, Hindi, Russian, Kannada, Bangla and Tamil. Our empirical study shows that the multilingual experience of users is significantly improved by our approach.
Unal, Zafer & Unal, Aslihan Measuring the Preservice Teachers’ Satisfaction with the use of Moodle Learning Management System during Online Educational Technology Course Society for Information Technology \& Teacher Education International Conference 2009 [878]
Valencia, Delailah E-Learning Implementation Model for Blended Learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [879]
Vallance, Michael & Wiz, Charles The Realities of Working in Virtual Worlds World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [880]
Varadharajan, Vijay Evolution and challenges in trust and security in information system infrastructures Proceedings of the 2nd international conference on Security of information and networks 2009 [881]
In these uncertain economic times, two key ingredients which are in short supply are trust and confidence. The concept of trust has been around for many decades (if not for centuries) in different disciplines such as business, psychology, philosophy as well as in security technology. The current financial climate gives a particularly prescient example. As financial journalist Walter Bagehot wrote some 135 years ago, after a great calamity everybody is suspicious of everybody" and "credit the disposition of one man to trust another is singularly varying." The problem as Bagehot observed it was trust or rather the lack of it and it's as true today as it was in his time. Financial mechanisms aren't the only entities that must deal with trust--today's social networking communities such as Facebook Wikipedia and other online communities have to constantly reconcile trust issues from searching and locating credible information to conveying and protecting personal information. Furthermore with ever increasing reliance on digital economy most business and government activities today depend on networked information systems for their operations. In this talk we'll take a short journey through the concept and evolution of trust in the secure computing technology world and examine some of the challenges involved in trusted computing today."
Vaughan, Norm Supporting Deep Approaches to Learning through the Use of Wikis and Weblogs Society for Information Technology \& Teacher Education International Conference 2008 [882]
Vegnaduzzo, Stefano Morphological productivity rankings of complex adjectives Proceedings of the Workshop on Computational Approaches to Linguistic Creativity 2009 [883]
This paper investigates a little-studied class of adjectives that we refer to as 'complex adjectives', i.e., operationally, adjectives constituted of at least two word tokens separated by a hyphen. We study the properties of these adjectives using two very large text collections: a portion of Wikipedia and a Web corpus. We consider three corpus-based measures of morphological productivity, and we investigate how productivity rankings based on them correlate with each other under different conditions, thus providing different angles both on the morphological productivity of complex adjectives, and on the productivity measures themselves.
Veletsianos, George & Kimmons, Royce Networked Participatory Scholarship: Socio-cultural \& Techno-cultural Pressures on Scholarly Practice World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2010 [884]
Verhaart, Michael & Kinshuk The virtualMe: An integrated teaching and learning framework World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [885]
Viana, Windson; Hammiche, Samira; Moisuc, Bogdan; Villanova-Oliver, Marlène; Gensel, Jérôme & Martin, Hervé Semantic keyword-based retrieval of photos taken with mobile devices Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia 2008 [886]
This paper presents an approach for incorporating contextual metadata in a keyword-based photo retrieval process. We use our mobile annotation system PhotoMap} in order to create metadata describing the photo shoot context (e.g., street address, nearby objects, season, lighting, nearby people...). These metadata are then used to generate a set of stamped words for indexing each photo. We adapt the Vector Space Model (VSM) in order to transform these shoot context words into document-vector terms. Furthermore, spatial reasoning is used for inferring new potential indexing terms. We define methods for weighting those terms and for handling a query matching. We also detail retrieval experiments carried out by using PhotoMap} and Flickr geotagged photos. We illustrate the advantages of using Wikipedia georeferenced objects for indexing photos.
Viégas, Fernanda B.; Wattenberg, Martin & Dave, Kushal Studying cooperation and conflict between authors with history flow visualizations Proceedings of the SIGCHI conference on Human factors in computing systems 2004 [887]
The Internet has fostered an unconventional and powerful style of collaboration: wiki" web sites where every visitor has the power to become an editor. In this paper we investigate the dynamics of Wikipedia a prominent thriving wiki. We make three contributions. First we introduce a new exploratory data analysis tool the history flow visualization which is effective in revealing patterns within the wiki context and which we believe will be useful in other collaborative situations as well. Second we discuss several collaboration patterns highlighted by this visualization tool and corroborate them with statistical analysis. Third we discuss the implications of these patterns for the design and governance of online collaborative social spaces. We focus on the relevance of authorship the value of community surveillance in ameliorating antisocial behavior and how authors with competing perspectives negotiate their differences."
Vonrueden, Michael; Hampel, Thorsten & Geissler, Sabrina Collaborative Ontologies in Knowledge Management World Conference on Educational Multimedia, Hypermedia and Telecommunications 2005 [888]
Vroom, R.W.; Kooijman, A. & Jelierse, R. Efficient community management in an industrial design engineering wiki: distributed leadership 11th International Conference on Enterprise Information Systems. DISI, 6-10 May 2009 Setubal, Portugal 2009
Industrial design engineers use a wide variety of research fields when making decisions that will eventually have significant impact on their designs. Obviously, designers cannot master every field, so they are therefore often looking for a simple set of rules of thumb on a particular subject. For this reason a wiki has been set up: www.wikid.eu. Whilst Wikipedia already offers a lot of this information, there is a distinct difference between WikID} and Wikipedia; Wikipedia aims to be an encyclopaedia, and therefore tries to be as complete as possible. WikID} aims to be a design tool. It offers information in a compact manner tailored to its user group, being the Industrial Designers. The main subjects of this paper are the research on how to create an efficient structure for the community of WikID} and the creation of a tool for managing the community. With the new functionality for managing group memberships and viewing information on users, it will be easier to maintain the community. This will also help in creating a better community which will be more inviting to participate in, provided that the assumptions made in this area hold true.
Vuong, Ba-Quy; Lim, Ee-Peng; Sun, Aixin; Chang, Chew-Hung; Chatterjea, K.; Goh, Dion Hoe-Lian; Theng, Yin-Leng & Zhang, Jun Key element-context model: an approach to efficient Web metadata maintenance Research and Advanced Technology for Digital Libraries. 11th European Conference, ECDL 2007, 16-21 Sept. 2007 Berlin, Germany 2007
In this paper, we study the problem of maintaining metadata for open Web content. In digital libraries such as DLESE, NSDL} and G-Portal, metadata records are created for some good quality Web content objects so as to make them more accessible. These Web objects are dynamic making it necessary to update their metadata records. As Web metadata maintenance involves manual efforts, we propose to reduce the efforts by introducing the Key Element-Context} (KeC) model to monitor only those changes made on Web page content regions that concern metadata attributes while ignoring other changes. We also develop evaluation metrics to measure the number of alerts and the amount of efforts in updating Web metadata records. KeC} model has been experimented on metadata records defined for Wikipedia articles, and its performance with different settings is reported. The model is implemented in G-Portal} as a metadata maintenance module.
Vuong, Ba-Quy; Lim, Ee-Peng; Sun, Aixin; Le, Minh-Tam & Lauw, Hady Wirawan On ranking controversies in wikipedia: models and evaluation Proceedings of the international conference on Web search and web data mining 2008 [889]
Wikipedia 1 is a very large and successful Web 2.0 example. As the number of Wikipedia articles and contributors grows at a very fast pace, there are also increasing disputes occurring among the contributors. Disputes often happen in articles with controversial content. They also occur frequently among contributors who are aggressive" or controversial in their personalities. In this paper we aim to identify controversial articles in Wikipedia. We propose three models namely the Basic model and two Controversy Rank (CR) models. These models draw clues from collaboration and edit history instead of interpreting the actual articles or edited content. While the Basic model only considers the amount of disputes within an article the two Controversy Rank models extend the former by considering the relationships between articles and contributors. We also derived enhanced versions of these models by considering the age of articles. Our experiments on a collection of 19456 Wikipedia articles shows that the Controversy Rank models can more effectively determine controversial articles compared to the Basic and other baseline models"
Völker, Johanna; Hitzler, Pascal & Cimiano, Philipp Acquisition of OWL DL Axioms from Lexical Resources Proceedings of the 4th European conference on The Semantic Web: Research and Applications 2007 [890]
State-of-the-art research on automated learning of ontologies from text currently focuses on inexpressive ontologies. The acquisition of complex axioms involving logical connectives, role restrictions, and other expressive features of the Web Ontology Language OWL} remains largely unexplored. In this paper, we present a method and implementation for enriching inexpressive OWL} ontologies with expressive axioms which is based on a deep syntactic analysis of natural language definitions. We argue that it can serve as a core for a semi-automatic ontology engineering process supported by a methodology that integrates methods for both ontology learning and evaluation. The feasibility of our approach is demonstrated by generating complex class descriptions from Wikipedia definitions and from a fishery glossary provided by the Food and Agriculture Organization of the United Nations.
Wake, Donna & Sain, Nathan Exploring Learning Theory the Wiki Way Society for Information Technology \& Teacher Education International Conference 2009 [891]
Wald, Mike; Seale, Jane & Draffan, E A Disabled Learners’ Experiences of E-learning World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [892]
Walker, J. Collective intelligence: the wisdom of crowds Online Information 2007, 4-6 Dec. 2007 London, UK} 2007
Web 2.0 technologies can focus the wisdom of crowds that is latent in social networks. Technologies like Wikipedia and blogs demonstrate how the actions of individuals, when aggregated, can lead to enormous value. Of all these new technologies, blogs and wikis are the most successful. Wikis have become as useful as email to many organisations. This phenomenon is about three things: 1. The social dimension: software that aggregates people around an activity. 2. Collective intelligence: software that facilitates building knowledge. 3. Lightweight software: that is very different from traditionally more complex and more expensive software. These technologies are no longer 'bleeding-edge' or risky ventures: SAP} hosts a public wiki with 750,000 registered users building knowledge on SAP} products. Pixar uses a wiki for all project management of their animated film production. The Los Angeles Fire Department uses Twitter to broadcast emergent activity. IBM's} policy on blogging articulates how blogs are critical to their innovation and corporate citizen values. What accommodations need to be made so that these tools produce value? Don't approach these tools as a way to automate business processes in the traditional sense; they are all about the social interaction of knowledge workers. Avoid the myth of accuracy: the fear that Wikipedia, wikis, and blogs are riddled with bad information. Don't be trapped by the illusion of control: letting go allows the social network to produce the value of collective intelligence. Be prepared for more democratisation of information within the bounds of truly confidential information. Be willing to experiment with less complex software that requires less IT} support.
Wallace, A. Open and Transparent Consensus: a Snapshot of Teachers' use of Wikipedia 8th European Conference on e-Learning, 29-30 Oct. 2009 Reading, UK} 2009
The title of this paper (Open} and Transparent Consensus) is derived from Wikipedia's own description of itself, and reflects its philosophy and approach to collaborative knowledge production and use. Wikipedia is a popular, multi-lingual, web-based, free-content encyclopaedia and is the most well-known of wikis, collaborative websites that can be directly edited by anyone with access to them. Many teachers and students have experience with Wikipedia, and in this survey teachers were asked how Wiki-based practices might contribute to teaching and learning. This study was conducted in England with 133 teachers from a wide range of schools, who have used Wikipedia in some way. The survey was anonymous to protect individuals' and schools' privacy; there was no way of identifying individual responses. The survey was conducted online and respondents were encouraged to be as open and honest as possible. Participation in this survey was entirely voluntary. Many of the questions were based upon descriptions by Wikipedia about itself and these were intended to elicit responses from teachers that reflect how closely their usage relates to the original intention and philosophy of the encyclopaedia. Other questions were intended to probe different ways in which teachers use the website.
Wang, Hong Wiki as a Collaborative Tool to Support Faculty in Mobile Teaching and Learning Society for Information Technology \& Teacher Education International Conference 2008 [893]
Wang, Huan; Chia, Liang-Tien & Gao, Shenghua Wikipedia-assisted concept thesaurus for better web media understanding Proceedings of the international conference on Multimedia information retrieval 2010 [894]
Concept ontology has been used in the area of artificial intelligence, biomedical informatics and library science and it has been shown as an effective approach to better understand data in the respective domains. One main difficulty that hedge against the development of ontology approaches is the extra work required in ontology construction and annotation. With the emergent lexical dictionaries and encyclopedias such as WordNet, Wikipedia, innovations from different directions have been proposed to automatically extract concept ontologies. Unfortunately, many of the proposed ontologies are not fully exploited according to the general human knowledge. We study the various knowledge sources and aim to build a construct scalable concept thesaurus suitable for better understanding of media in the World Wide Web from Wikipedia. With its wide concept coverage, finely organized categories, diverse concept relations, and up-to-date information, the collaborative encyclopedia Wikipedia has almost all the requisite attributes to contribute to a well-defined concept ontology. Besides the explicit concept relations such as disambiguation, synonymy, Wikipedia also provides implicit concept relations through cross-references between articles. In our previous work, we have built ontology with explicit relations from Wikipedia page contents. Even though the method works, mining explicit semantic relations from every Wikipedia concept page content has unsolved scalable issue when more concepts are involved. This paper describes our attempt to automatically build a concept thesaurus, which encodes both explicit and implicit semantic relations for a large-scale of concepts from Wikipedia. Our proposed thesaurus construction takes advantage of both structure and content features of the downloaded Wikipedia database, and defines concept entries with its related concepts and relations. This thesaurus is further used to exploit semantics from web page context to build a more semantic meaningful space. We move a step forward to combine the similarity distance from the image feature space to boost the performance. We evaluate our approach through application of the constructed concept thesaurus to web image retrieval. The results show that it is possible to use implicit semantic relations to improve the retrieval performance.
Wang, Yang; Wang, Haofen; Zhu, Haiping & Yu, Yong Exploit semantic information for category annotation recommendation in Wikipedia Natural Language Processing and Information Systems. 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, 27-29 June 2007 Berlin, Germany 2007
Compared with plain-text resources, the ones in semi-semantic" Web sites such as Wikipedia contain high-level semantic information which will benefit various automatically annotating tasks on them self. In this paper we propose a "collaborative annotating" approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach four typical semantic features in Wikipedia namely incoming link outgoing link section heading and template item are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles."
Wang, Shiang-Kwei Effects of Playing a History-Simulation Game: Romance of Three Kingdoms World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [895]
Wang, Sy-Chyi & Chern, Jin-Yuan The new era of “School 2.0‿—Teaching with Pleasure, not Pressure: An Innovative Teaching Experience in a Software-oriented Course World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [896]
Wartena, Christian & Brussee, Rogier Instanced-Based Mapping between Thesauri and Folksonomies Proceedings of the 7th International Conference on The Semantic Web 2008 [897]
The emergence of web based systems in which users can annotate items, raises the question of the semantic interoperability between vocabularies originating from collaborative annotation processes, often called folksonomies, and keywords assigned in a more traditional way. If collections are annotated according to two systems, e.g. with tags and keywords, the annotated data can be used for instance based mapping between the vocabularies. The basis for this kind of matching is an appropriate similarity measure between concepts, based on their distribution as annotations. In this paper we propose a new similarity measure that can take advantage of some special properties of user generated metadata. We have evaluated this measure with a set of articles from Wikipedia which are both classified according to the topic structure of Wikipedia and annotated by users of the bookmarking service del.icio.us. The results using the new measure are significantly better than those obtained using standard similarity measures proposed for this task in the literature, i.e., it correlates better with human judgments. We argue that the measure also has benefits for instance based mapping of more traditionally developed vocabularies.
Watson, Rachel & Boggs, Christine The Virtual Classroom: Student Perceptions of Podcast Lectures in a General Microbiology Classroom World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [898]
Watson, Rachel & Boggs, Christine Vodcast Venture: How Formative Evaluation of Vodcasting in a Traditional On-Campus Microbiology Class Led to the Development of a Fully Vodcasted Online Biochemistry Course World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [899]
Weaver, Debbi & McIntosh, P. Craig Providing Feedback on Collaboration and Teamwork Amongst Off-Campus Students World Conference on Educational Multimedia, Hypermedia and Telecommunications 2009 [900]
Weaver, Gabriel; Strickland, Barbara & Crane, Gregory Quantifying the accuracy of relational statements in Wikipedia: a methodology Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries 2006 [901]
An initial evaluation of the English Wikipedia indicates that it may provide accurate data for disambiguating and finding relations among named entities.
Weikum, Gerhard Harvesting and organizing knowledge from the web Proceedings of the 11th East European conference on Advances in databases and information systems 2007 [902]
Information organization and search on the Web is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and Deep-Web} search. I envision another big leap forward by automatically harvesting and organizing knowledge from the Web, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three strong trends: 1) rich Semantic-Web-style} knowledge repositories like ontologies and taxonomies, 2) large-scale information extraction from high-quality text sources such as Wikipedia, and 3) social tagging in the spirit of Web 2.0. I refer to the three directions as Semantic Web, Statistical Web, and Social Web (at the risk of some oversimplification), and I briefly characterize each of them.
Weiland, Steven Online Abilities for Teacher Education: The Second Subject in Distance Learning Society for Information Technology \& Teacher Education International Conference 2008 [903]
West, Richard; Wright, Geoff & Graham, Charles Blogs, Wikis, and Aggregators: A New Vocabulary for Promoting Reflection and Collaboration in a Preservice Technology Integration Course Society for Information Technology \& Teacher Education International Conference 2005 [904]
Whittier, David & Supavai, Eisara Supporting Knowledge Building Communities with an Online Application Society for Information Technology \& Teacher Education International Conference 2009 [905]
Wichelhaus, Svenja; Schüler, Thomas; Ramm, Michaela & Morisse, Karsten More than Podcasting - An evaluation of an integrated blended learning scenario World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [906]
Wijaya, Senoaji; Spruit, Marco R. & Scheper, Wim J. Webstrategy Formulation: Benefiting from Web 2.0 Concepts to Deliver Business Values Proceedings of the 1st world summit on The Knowledge Society: Emerging Technologies and Information Systems for the Knowledge Society 2008 [907]
With the accelerating growth of internet users, a rise of globalization, distributed work environments, knowledge-based economies, and collaborative business models, it becomes clear that there is currently a high and growing number of organizations that demand a proper webstrategy. The emergence of web 2.0 technologies has led many internet companies, such as Google, Amazon, Wikipedia, and Facebook, to successfully adjust their webstrategy by adopting web 2.0 concepts to sustain their competitive advantage and reach their objectives. This has raised an interest for more traditional organizations to benefit from web 2.0 concepts in order to enhance their competitive advantage. This article discusses the effective webstrategy formulation based on the web 2.0 concepts in [21] and the differing requirements, characteristics, and objectives in different types of organizations. This research categorizes organizations into Customer Intimacy, Operational Excellence, and Product Leadership, according to the Value Disciplines model in [26].
Wilks, Yorick Artificial companions as dialogue agents Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2009 [908]
COMPANIONS} is an EU} project that aims to change the way we think about the relationships of people to computers and the Internet by developing a virtual conversational {'Companion'.} This is intended as an agent or 'presence' that stays with the user for long periods of time, developing a relationship and 'knowing' its owners preferences and wishes. The Companion communicates with the user primarily through speech. This paper describes the functionality and system modules of the Senior Companion, one of two initial prototypes built in the first two years of the project. The Senior Companion provides a multimodal interface for eliciting and retrieving personal information from the elderly user through a conversation about their photographs. The Companion will, through conversation, elicit their life memories, often prompted by discussion of their photographs; the aim is that the Companion should come to know a great deal about its user, their tastes, likes, dislikes, emotional reactions etc, through long periods of conversation. It is a further assumption that most life information will be stored on the internet (as in the Memories for Life project: http://www.memoriesforlife.org/) and the SC} is linked directly to photo inventories in Facebook, to gain initial information about people and relationships, as well as to Wikipedia to enable it to respond about places mentioned in conversations about images. The overall aim of the SC, not yet achieved, is to produce a coherent life narrative for its user from these materials, although its short term goals are to assist, amuse, entertain and gain the trust of the user. The Senior Companion uses Information Extraction to get content from the speech input, rather than conventional parsing, and retains utterance content, extracted internet information and ontologies all in RDF} formalism over which it does primitive reasoning about people. It has a dialogue manager virtual machine intended to capture mixed initiative, between Companion and user, and which can be a basis for later replacement by learned components.
Williams, Alexandria; Seals, Cheryl; Rouse, Kenneth & Gilbert, Juan E. Visual Programming with Squeak SimBuilder: Techniques for E-Learning in the Creation of Science Frameworks World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [909]
Williams, Vicki Assessing the Web 2.0 Technologies: Mission Impossible? World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2009 [910]
Williams, Vicki Educational Gaming as an Instructional Strategy World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 [911]
Williams, Vicki & Williams, Barry Way of the Wiki: The Zen of Social Computing World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2006 [912]
Winkler, Thomas; Ide, Martina & Herczeg, Michael Connecting Second Life and Real Life: Integrating Mixed-Reality-Technology into Teacher Education Society for Information Technology \& Teacher Education International Conference 2009 [913]
Witteman, Holly; Chandrashekar, Sambhavi; Betel, Lisa & O’Grady, Laura Sense-making and credibility of health information on the social web: A multi-method study accessing tagging and tag clouds World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [914]
Witten, Ian Wikipedia and How to Use It for Semantic Document Representation Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 2010
Witten, I.H. Semantic Document Processing Using Wikipedia as a Knowledge Base Focused Retrieval and Evaluation. 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, 7-9 Dec. 2009 Berlin, Germany 2010
Summary form only given. Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This talk will introduce the process of wikification"; that is automatically and judiciously augmenting a plain-text document with pertinent hyperlinks to Wikipedia articles as though the document were itself a Wikipedia article. This amounts to a new semantic representation of text in terms of the salient concepts it mentions where "concept" is equated to {"Wikipedia} article." Wikification is a useful process in itself adding value to plain text documents. More importantly it supports new methods of document processing. I first describe how Wikipedia can be used to determine semantic relatedness and then introduce a new high-performance method of wikification that exploits Wikipedia's 60 M internal hyperlinks for relational information and their anchor texts as lexical information using simple machine learning. I go on to discuss applications to knowledge-based information retrieval topic indexing document tagging and document clustering. Some of these perform at human levels. For example on CiteULike} data automatically extracted tags are competitive with tag sets assigned by the best human taggers according to a measure of consistency with other human taggers. Although this work is based on English it involves no syntactic parsing and the techniques are largely language independent. The talk will include live demos."
Wojcik, Isaac The Industrialization of Education: Creating an Open Virtual Mega-University for the Developing World (OVMUDW). World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [915]
Wojtanowski, Scott Using Wikis to Build Collaborative Knowing Society for Information Technology \& Teacher Education International Conference 2009 [916]
Wong, C.; Vrijmoed, L. & Wong, E. Learning environment for digital natives: Web 2.0 meets globalization Hybrid Learning and Education. First International Conference, ICHL 2008, 13-15 Aug. 2008 Berlin, Germany 2008 [917]
Web 2.0 services and communities constitute the daily lives of digital natives with online utilities such as Wikipedia and Facebook. Attempts to apply Web 2.0 at the University of Illinois at Urbana-Champaign} demonstrated that the transformation to writing exercises could improve students' learning experiences. Inspired by their success, blogging technology was adopted to pilot a writing-across-the-curriculum project via the learning management system at City University of Hong Kong. Instead of promoting peer assessment, one-on-one tutoring interactions were induced by providing feedback to written assignments. Taking the advantage of the flat world" tutors were hired from the United States Canada Australia New Zealand and Spain to experiment with outsourcing and offshoring some of the English enhancement schemes. For the university wide project deployment in the fall of 2008 a globalized network of online language tutors needs to be built up with support from universities in countries with English as the native language."
Wong, Wai-Yat & Wong, Loong Using Wikiweb for Community Information Sharing and e-Governance Society for Information Technology \& Teacher Education International Conference 2005 [918]
Woodman, William & Krier, Dan An Unblinking Eye: Steps for Replacing Traditional With Visual Scholarship World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [919]
Wu, Kewen; Zhu, Qinghua; Zhao, Yuxiang & Zheng, Hua Mining the Factors Affecting the Quality of Wikipedia Articles 2010 International Conference of Information Science and Management Engineering. ISME 2010, 7-8 Aug. 2010 Los Alamitos, CA, USA} 2010 [920]
In order to observe the variation of factors affecting the quality of Wikipedia articles during the information quality improvement process, we proposed 28 metrics from four aspects, including lingual, structural, historical and reputational features, and then weighted each metrics indifferent stages by using neural network. We found lingual features weighted more in the lower quality stages, and structural features, along with historical features, became more important while article quality improved. However, reputational features did not act as important as expected. The findings indicate that the information quality is mainly affected by completeness, and well-written is a basic requirement in the initial stage. Reputation of authors or editors is not so important in Wikipedia because of its horizontal structure.
Wu, Youzheng & Kashioka, Hideki An Unsupervised Model of Exploiting the Web to Answer Definitional Questions Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2009 [921]
In order to build accurate target profiles, most definition question answering (QA) systems primarily involve utilizing various external resources, such as WordNet, Wikipedia, Biograpy.com, etc. However, these external resources are not always available or helpful when answering definition questions. In contrast, this paper proposes an unsupervised classification model, called the U-Model, which can liberate definitional QA} systems from heavily depending on a variety of external resources via applying sentence expansion (\$SE\$) and SVM} classifier. Experimental results from testing on English TREC} test sets reveal that the proposed U-Model} can not only significantly outperform baseline system but also require no specific external resources.
Wubben, Sander & van den Bosch, Antal A semantic relatedness metric based on free link structure Proceedings of the Eighth International Conference on Computational Semantics 2009 [922]
While shortest paths in WordNet} are known to correlate well with semantic similarity, an is-a hierarchy is less suited for estimating semantic relatedness. We demonstrate this by comparing two free scale networks (ConceptNet} and Wikipedia) to WordNet.} Using the Finkelstein-353 dataset we show that a shortest path metric run on Wikipedia attains a better correlation than WordNet-based} metrics. Concept-Net} attains a good correlation as well, but suffers from a low concept coverage.
Yamane, Y.; Ishida, H.; Hattori, F. & Yasuda, K. Conversation support system for people with language disorders - Making topic lists from Wikipedia 2010 9th IEEE International Conference on Cognitive Informatics (ICCI), 7-9 July 2010 Piscataway, NJ, USA} 2010 [923]
A conversation support system for people with language disorders is proposed. Although the existing conversation support system Raku-raku} Jiyu Kaiwa" (Easy} Free Conversation) is effective it has insufficient topic words and a rigid topic list structure. To solve these problems this paper proposes a method that makes topic lists from Wikipedia's millions of topic words. Experiments using the proposed topic list showed that subject utterances increased and the variety of spoken topics was expanded."
Yan, Y.; Li, Haibo; Matsuo, Y. & Ishizuka, M. Multi-view Bootstrapping for Relation Extraction by Exploring Web Features and Linguistic Features Computational Linguistics and Intelligent Text Processing 11th International Conference, CICling 2010, 21-27 March 2010 Berlin, Germany 2010
Binary semantic relation extraction from Wikipedia is particularly useful for various NLP} and Web applications. Currently frequent pattern mining-based methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a multi-view learning approach for bootstrapping relationships between entities with the complementary between the Web view and linguistic view. On the one hand, from the linguistic view, linguistic features are generated from linguistic parsing on Wikipedia texts by abstracting away from different surface realizations of semantic relations. On the other hand, Web features are extracted from the Web corpus to provide frequency information for relation extraction. Experimental evaluation on a relational dataset demonstrates that linguistic analysis on Wikipedia texts and Web collective information reveal different aspects of the nature of entity-related semantic relationships. It also shows that our multiview learning method considerably boosts the performance comparing to learning with only one view of features, with the weaknesses of one view complement the strengths of the other.
Yang, Junghoon; Han, Jangwhan; Oh, Inseok & Kwak, Mingyung Using Wikipedia technology for topic maps design Proceedings of the 45th annual southeast regional conference 2007 [924]
In this paper we present a method for automatically generating collection of topics from Wikipedia/Wikibooks} based on user input. The constructed collection is intended to be displayed through an intuitive interface as assistance to the user creating Topic Maps for a given subject. We discuss the motivation behind the developed tool and outline the technique used for crawling and collecting relevant concepts from Wikipedia/Wikibooks} and for building the topic structure to be output to the user.
Yang, Qingxiong; Chen, Xin & Wang, Gang Web 2.0 dictionary Proceedings of the 2008 international conference on Content-based image and video retrieval 2008 [925]
How might we benefit from the billions of tagged multimedia files (e.g. image, video, audio) available on the Internet? This paper presents a new concept called Web 2.0 Dictionary, a dynamic dictionary that takes advantage of, and is in fact built from, the huge database of tags available on the Web. The Web 2.0 Dictionary distinguishes itself from the traditional dictionary in six main ways: (1) it is fully automatic because it downloads tags from the Web and inserts this new information into the dictionary; (2) it is dynamic because each time a new shared image/video is uploaded, a bag-of-tags" corresponding to the image/video will be downloaded thus updating Web 2.0 Dictionary. The Web 2.0 Dictionary is literally updating every second which is not true of the traditional dictionary; (3) it integrates all kinds of languages (e.g. English Chinese) as long as the images/videos are tagged with words from such languages; (4) it is built by distilling a small amount of useful information from a massive and noisy tag database maintained by the entire Internet community therefore the relatively small amount of noise present in the database will not affect it; (5) it truly reflects the most prevalent and relevant explanations in the world unaffected by majoritarian views and political leanings. It is a real free dictionary. Unlike Wikipedia" [5] which can be easily revised by even a single person the Web 2.0 Dictionary is very stable because its contents are informed by a whole community of users that upload photo/videos; (6) it provides a correlation value between every two words ranging from 0 to 1. The correlation values stored in the dictionary have wide applications. We demonstrate the effectiveness of the Web 2.0 Dictionary for image/video understanding and retrieval object categorization tagging recommendation etc in this paper."
Yang, Yin; Bansal, Nilesh; Dakka, Wisam; Ipeirotis, Panagiotis; Koudas, Nick & Papadias, Dimitris Query by document Proceedings of the Second ACM International Conference on Web Search and Data Mining 2009 [926]
We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa. In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank} for this purpose. We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document."
Yao, Jian-Min; Sun, Chang-Long; Hong, Yu; Ge, Yun-Dong & Zhu, Qiao-Min Study on Wikipedia for translation mining for CLIR 2010 International Conference on Machine Learning and Cybernetics (ICMLC 2010), 11-14 July 2010 Piscataway, NJ, USA} 2010 [927]
The query translation of Out of Vocabulary (OOV) is one of the key factors that affect the performance of Cross-Language} Information Retrieval (CLIR).} Based on Wikipedia data structure and language features, the paper divides translation environment into target-existence and target-deficit environment. To overcome the difficulty of translation mining in the target-deficit environment, the frequency change information and adjacency information is used to realize the extraction of candidate units, and establish the strategy of mixed translation mining based on the frequency-distance model, surface pattern matching model and summary-score model. Search engine based OOV} translation mining is taken as baseline to test the performance on TOP1} results. It is verified that the mixed translation mining method based on Wikipedia can achieve the precision rate of 0.6279, and the improvement is 6.98\% better than the baseline.
Yatskar, Mark; Pang, Bo; Danescu-Niculescu-Mizil, Cristian & Lee, Lillian For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 [928]
We report on work in progress on extracting lexical simplifications (e.g., collaborate" → "work together") focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list."
Yeh, Eric; Ramage, Daniel; Manning, Christopher D.; Agirre, Eneko & Soroa, Aitor WikiWalk: random walks on Wikipedia for semantic relatedness Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing 2009 [929]
Computing semantic relatedness of natural language texts is a key component of tasks such as information retrieval and summarization, and often depends on knowledge of a broad range of real-world concepts and relationships. We address this knowledge integration issue by computing semantic relatedness using personalized PageRank} (random walks) on a graph derived from Wikipedia. This paper evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes: one based on a dictionary lookup, the other based on Explicit Semantic Analysis. We evaluate our techniques on standard word relatedness and text similarity datasets, finding that they capture similarity information complementary to existing Wikipedia-based relatedness measures, resulting in small improvements on a state-of-the-art measure.
Yesilada, Yeliz & Sloan, David Proceedings of the 2008 international cross-disciplinary conference on Web accessibility (W4A) 2008 [930]
The World Wide Web (Web) is returning to its origins. Surfers are not just passive readers but content creators. Wikis allow open editing and access, blogs enable personal expression. MySpace, Bebo and Facebook encourage social networking by enabling designs to be 'created' and 'wrapped' around content. Flickr and YouTube} are examples of sites that allow sharing of photos, audio and video which through informal taxonomies can be discovered and shared in the most efficient ways possible. Template based tools enable fast professional looking Web content creation using automated placement, with templates for blogging, picture sharing, and social networking. The Web is becoming ever more democratised as a publishing medium, regardless of technical ability. But with this change comes new challenges for accessibility. New tools, new types of content creator --- where does accessibility fit in this process? The call for participation in W4A} 2008 asked you to consider whether the conjugation of authoring tools and user agents represents an opportunity for automatically generated Web Accessibility or yet another problem for Web Accessibility? Will form-based and highly graphical interfaces excluded disabled people from creation, expression and social networking? And what about educating users --- and customers --- in accessible design? How, for example, do we collectively demand that the producers of the next MySpace} or Second Life adhere to the W3C} Authoring Tool Accessibility Guidelines (ATAG)?} What effect will this have on the wider Web? We posed the question: What happens when surfers become authors and designers? We have collected together an excitingly diverse range of papers for W4A} 2008, each contributing in their own way to helping provide an answer to this question. Papers have ranged from topics as diverse as evaluating the accessibility of Wikipedia, one of the most popular user-generated resources on the Web, and considering the accessibility challenges of geo-referenced information often found in user-generated content. We hear about the challenges of raising awareness of accessibility, through experiences of accessibility education in Brazil, the particular challenges of encouraging accessible design to embrace the needs of older Web users, and the challenges of providing appropriate guidance to policymakers and technology developers alike that gives them freedom to provide innovative and holistic accessible Web solutions while building on the technical framework provided by W3C} WAI.} We also see a continuing focus on Web 2.0; several papers focusing directly on making Web 2.0 technologies as accessible as possible, or adapting assistive technology to cope more effectively with the increasingly interactive behaviour of Web 2.0 Web sites.
Yildiz, Ismail; Kursun, Engin; Saltan, Fatih; Gok, Ali & Karaaslan, Hasan Using Wiki in a Collaborative Group Project: Experiences from a Distance Education Course Society for Information Technology \& Teacher Education International Conference 2009 [931]
Yildiz, Melda & Hao, Yungwei Power of Social Interaction Technologies in Youth Activism and Civic Engagement Society for Information Technology \& Teacher Education International Conference 2009 [932]
Yildiz, Melda; Mongillo, Gerri & Roux, Yvonne Literacy from A to Z: Power of New Media and Technologies in Teacher Education Society for Information Technology \& Teacher Education International Conference 2007 [933]
Yildiz, Melda N. & Geldymuradova, Gul Global Positioning System and Social Interaction Software Across Content Areas Society for Information Technology \& Teacher Education International Conference 2010 [934]
Yildiz, Melda N.; Geldymuradova, Gul & Komekova, Guncha Different Continents Similar Challenges: Integrating Social Media in Teacher Education World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 [935]
Yuen, Steve Chi-Yin; Liu, Leping & Maddux, Cleborne Publishing Papers in the International Journal of Technology in Teaching and Learning: Guidelines and Tips Society for Information Technology \& Teacher Education International Conference 2007 [936]
Yun, Jiali; Jing, Liping; Yu, Jian & Huang, Houkuan Semantics-based Representation Model for Multi-layer Text Classification Knowledge-Based and Intelligent Information and Engineering Systems. 14th International Conference, KES 2010, 8-10 Sept. 2010 Berlin, Germany 2010
Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more complicated to be analyzed because it contains too much information, e.g., syntactic and semantic. In this paper, we propose a semantics-based model to represent text data in two levels. One level is for syntactic information and the other is for semantic information. Syntactic level represents each document as a term vector, and the component records tf-idf value of each term. The semantic level represents document with Wikipedia concepts related to terms in syntactic level. The syntactic and semantic information are efficiently combined by our proposed multi-layer classification framework. Experimental results on benchmark dataset (Reuters-21578) have shown that the proposed representation model plus proposed classification framework improves the performance of text classification by comparing with the flat text representation models (term VSM, concept VSM, term+concept VSM) plus existing classification methods.
Zaidi, Faraz; Sallaberry, Arnaud & Melancon, Guy Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 2009 [937]
The emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA} system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia.
Zarro, M.A. & Allen, R.B. User-contributed descriptive metadata for libraries and cultural institutions Research and Advanced Technology for Digital Libraries. 14th European Conference, ECDL 2010, 6-10 Sept. 2010 Berlin, Germany 2010 [938]
The Library of Congress and other cultural institutions are collecting highly informative user-contributed metadata as comments and notes expressing historical and factual information not previously identified with a resource. In this observational study we find a number of valuable annotations added to sets of images posted by the Library of Congress on the Flickr Commons. We propose a classification scheme to manage contributions and mitigate information overload issues. Implications for information retrieval and search are discussed. Additionally, the limits of a collection" are becoming blurred as connections are being built via hyperlinks to related resources outside of the library collection such as Wikipedia and locally relevant websites. Ideas are suggested for future projects including interface design and institutional use of user-contributed information."
Zavitsanos, E.; Tsatsaronis, G.; Varlamis, I. & Paliouras, G. Scalable Semantic Annotation of Text Using Lexical and Web Resources Artificial Intelligence: Theories, Models and Applications. 6th Hellenic Conference on AI (SETN 2010), 4-7 May 2010 Berlin, Germany 2010
In this paper we are dealing with the task of adding domain-specific semantic tags to a document, based solely on the domain ontology and generic lexical and Web resources. In this manner, we avoid the need for trained domain-specific lexical resources, which hinder the scalability of semantic annotation. More specifically, the proposed method maps the content of the document to concepts of the ontology, using the WordNet} lexicon and Wikipedia. The method comprises a novel combination of measures of semantic relatedness and word sense disambiguation techniques to identify the most related ontology concepts for the document. We test the method on two case studies: (a) a set of summaries, accompanying environmental news videos, (b) a set of medical abstracts. The results in both cases show that the proposed method achieves reasonable performance, thus pointing to a promising path for scalable semantic annotation of documents.
Zhang, Liming & Li, Dong Web-Based Home School Collaboration System Design and Development World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008 [939]
Zhang, Lei; Liu, QiaoLing; Zhang, Jie; Wang, HaoFen; Pan, Yue & Yu, Yong Semplore: an IR approach to scalable hybrid query of semantic web data Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference 2007 [940]
As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR} engines in an efficient and scalable manner. We implemented this IR} approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we breifly describe how Semplore is used for searching Wikipedia and an IBM} customer's product information.
Zhang, Weiwei & Zhu, Xiaodong Activity Theoretical Framework for Wiki-based Collaborative Content Creation 2010 International Conference on Management and Service Science (MASS 2010), 24-26 Aug. 2010 Piscataway, NJ, USA} 2010 [941]
Most recently, the use of collaboration element within information behavior research namely Collaborative Information Behavior (CIB) has been increasing. In addition, the success of wiki-based, large-scale, open collaborative content creation systems such as Wikipedia has aroused increasing interests of studies on their collaborative model. In contrast to previous related work, this paper focuses on an integrated theoretical framework of collaborative content creation activities in the context of wiki-based system. An activity theoretical approach is used to construct an activity system of wiki-based collaborative content creation and analyze its components, mediators, subsystems and dynamic processes. It's argued that collaborative content creation is the most important component of wiki-based CIB.} Four stages involved in the dynamic process of collaborative content creation activity are learning, editing, feedback and collaboration, as well as conflicts and coordination. The result of the study has provided an integrated theoretical framework of collaborative content creation activities which combines almost all elements such as motive, goal, subject, object, community, tools, rules, roles and collaboration, conflicts, outcome, etc. into one model. It is argued that an activity-theoretical approach to collaborative content creation systems and information behavior research would provide a sound basis for the elaboration of complex collaboration and self-organization mechanisms.
Zhang, Ziqi & Iria, José A novel approach to automatic gazetteer generation using Wikipedia Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources 2009 [942]
Gazetteers or entity dictionaries are important knowledge resources for solving a wide range of NLP} problems, such as entity extraction. We introduce a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia. Unlike previous methods, our method exploits the rich content and various structural elements of Wikipedia, and does not rely on language- or domain-specific knowledge. Furthermore, applying the extended gazetteers to an entity extraction task in a scientific domain, we empirically observed a significant improvement in system accuracy when compared with those using seed gazetteers.
Zhou, Baoyao; Luo, Ping; Xiong, Yuhong & Liu, Wei Wikipedia-Graph Based Key Concept Extraction towards News Analysis Proceedings of the 2009 IEEE Conference on Commerce and Enterprise Computing 2009 [943]
The well-known Wikipedia can serve as a comprehensive knowledge repository to facilitate textual content analysis, due to its abundance, high quality and well-structuring. In this paper, we propose WikiRank} - a Wikipedia-graph based ranking model, which can be used to extract key Wikipedia concepts from a document. These key concepts can be regarded as the most salient terms to represent the themeof the document. Different from other existing graph-based ranking algorithms, the concept graph used for ranking in this model is constructed by leveraging not only the co-occurrence relations within the local context of a document but also the preprocessed hyperlink-structure of Wikipedia. We have applied the proposed WikiRank} model with the Support Propagation ranking algorithm to analyze the news articles, especially for enterprise news. These promising applications include Wikipedia Concept Linking and Enterprise Concept Cloud Generation.
Zhou, Yunqing; Guo, Zhongqi; Ren, Peng & Yu, Yong Applying Wikipedia-based Explicit Semantic Analysis For Query-biased Document Summarization Advanced Intelligent Computing Theories and Applications. 6th International Conference on Intelligent Computing, ICIC 2010, 18-21 Aug. 2010 Berlin, Germany 2010 [944]
Query-biased summary is a query-centered document brief representation. In many scenarios, query-biased summarization can be accomplished by implementing query-customized ranking of sentences within the web page. However, it is a tough work to generate this summary since it is hard to consider the similarity between the query and the sentences of a particular document for lacking of information and background knowledge behind these short texts. We focused on this problem and improved the summary generation effectiveness by involving semantic information in the machine learning process. And we found these improvements are more significant when query term occurrences are relatively low in the document.
Zhu, Shanyuan Games, simulations and virtual environment in education Society for Information Technology \& Teacher Education International Conference 2010 [945]
Zhu, Shiai; Wang, Gang; Ngo, Chong-Wah & Jiang, Yu-Gang On the sampling of web images for learning visual concept classifiers Proceedings of the ACM International Conference on Image and Video Retrieval 2010 [946]
Visual concept learning often requires a large set of training images. In practice, nevertheless, acquiring noise-free training labels with sufficient positive examples is always expensive. A plausible solution for training data collection is by sampling the largely available user-tagged images from social media websites. With the general belief that the probability of correct tagging is higher than that of incorrect tagging, such a solution often sounds feasible, though is not without challenges. First, user-tags can be subjective and, to certain extent, are ambiguous. For instance, an image tagged with whales" may be simply a picture about ocean museum. Learning concept "whales" with such training samples will not be effective. Second user-tags can be overly abbreviated. For instance an image about concept "wedding" may be tagged with "love" or simply the couple's names. As a result crawling sufficient positive training examples is difficult. This paper empirically studies the impact of exploiting the tagged images towards concept learning investigating the issue of how the quality of pseudo training images affects concept detection performance. In addition we propose a simple approach named semantic field for predicting the relevance between a target concept and the tag list associated with the images. Specifically the relevance is determined through concept-tag co-occurrence by exploring external sources such as WordNet} and Wikipedia. The proposed approach is shown to be effective in selecting pseudo training examples exhibiting better performance in concept learning than other approaches such as those based on keyword sampling and tag voting."
Zinskie, Cordelia & Repman, Judi Teaching Qualitative Research Online: Strategies, Issues, and Resources World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 [947]
Lee, M.J.W. & McLoughlin, C. Harnessing the affordances of Web 2.0 and social software tools: Can we finally make "student-centered" learning a reality? EDMEDIA 2008 [948]
This paper highlights the importance of considering the educational affordances of information and communication technologies (ICTs), in particular the raft of new and emerging Web 2.0 and social software tools that offer rich opportunities for collaboration, interactivity, and socio-experiential learning. The authors argue that perceived affordances, which are a function of individual users' or learners' perceptions and views, are of central significance, and encourage educators to empower learners with freedom and autonomy to select and personalize the tools and technology available to them, as well as allowing them to determine how best to use the technology to support their learning. While "student-centered" learning has become somewhat of a mantra for educators in recent decades, the adoption of social software tools driven by appropriate pedagogies may offer an opportunity for this goal to be truly realized.


Jankowski, Jacek & Kruk, Sebastian Ryszard 2Lip: The step towards the web3D 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [949]
The World Wide Web allows users to create and publish a variety of resources, including multimedia ones. Most of the contemporary best practices for designing web interfaces, however, do not take into account the {3D} techniques. In this paper we present a novel approach for designing interactive web Applications-2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink, scrolling a page) can affect the {3D} scene. We introduce a reference implementation of {2LIP:} Copernicus - The Virtual {3D} Encyclopedia, which shows one of the potential paths of the evolution of Wikipedia towards Web 3.0. Based on the evaluation of Copernicus we prove that designing web interfaces according to {2LIP} provides users a better browsing experience, without harming the interaction.
Tjong, Erik & Sang, Kim A baseline approach for detecting sentences containing uncertainty Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [950]
{{{2}}}
Erdmann, Maike; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro A bilingual dictionary extracted from the Wikipedia link structure 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India 2008 [951]
A lot of bilingual dictionaries have been released on the WWW.} However, these dictionaries insufficiently cover new and domainspecific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.
Tang, Buzhou; Wang, Xiaolong; Wang, Xuan; Yuan, Bo & Fan, Shixi A cascade method for detecting hedges and their scope in natural language text Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [952]
Detecting hedges and their scope in natural language text is very important for information inference. In this paper, we present a system based on a cascade method for the CoNLL-2010} shared task. The system composes of two components: one for detecting hedges and another one for detecting their scope. For detecting hedges, we build a cascade subsystem. Firstly, a conditional random field (CRF) model and a large margin-based model are trained respectively. Then, we train another CRF} model using the result of the first phase. For detecting the scope of hedges, a CRF} model is trained according to the result of the first subtask. The experiments show that our system achieves 86.36\% F-measure on biological corpus and 55.05\% F-measure on Wikipedia corpus for hedge detection, and 49.95\% F-measure on biological corpus for hedge scope detection. Among them, 86.36\% is the best result on biological corpus for hedge detection.
Plank, Barbara A comparison of structural correspondence learning and self-training for discriminative parse selection Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing 2009 [953]
This paper evaluates two semi-supervised techniques for the adaptation of a parse selection model to Wikipedia domains. The techniques examined are Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) and Self-training (Abney, 2007; McClosky} et al., 2006). A preliminary evaluation favors the use of SCL} over the simpler self-training techniques.
Gleave, E.; Welser, H.T.; Lento, T.M. & Smith, M.A. A conceptual and operational definition of 'social role' in online community 2009 42nd Hawaii International Conference on System Sciences. HICSS-42, 5-8 Jan. 2009 Piscataway, NJ, USA} 2008 [954]
Both online and off, people frequently perform particular social roles. These roles organize behavior and give structure to positions in local networks. As more of social life becomes embedded in online systems, the concept of social role becomes increasingly valuable as a tool for simplifying patterns of action, recognizing distinct user types, and cultivating and managing communities. This paper standardizes the usage of the term 'social role' in online community as a combination of social psychological, social structural, and behavioral attributes. Beyond the conceptual definition, we describe measurement and analysis strategies for identifying social roles in online community. We demonstrate this process in two domains, Usenet and Wikipedia, identifying key social roles in each domain. We conclude with directions for future research, with a particular focus on the analysis of communities as role ecologies.
Adler, B. Thomas & de Alfaro, Luca A content-driven reputation system for the wikipedia Proceedings of the 16th international conference on World Wide Web 2007 [955]
We present a content-driven reputation system for Wikipedia authors. In our system, authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order. Thus, author reputation is computed solely on the basis of content evolution; user-to-user comments or ratings are not used. The author reputation we compute could be used to flag new contributions from low-reputation authors, or it could be used to allow only authors with high reputation to contribute to controversialor critical pages. A reputation system for the Wikipedia could also provide an incentive for high-quality contributions. We have implemented the proposed system, and we have used it to analyze the entire Italian and French Wikipedias, consisting of a total of 691, 551 pages and 5, 587, 523 revisions. Our results show that our notion of reputation has good predictive value: changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, as judged by human observers, and of being later undone, as measured by our algorithms.
Weerkamp, Wouter; Balog, Krisztian & de Rijke, Maarten A generative blog post retrieval model that uses query expansion based on external collections Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2 2009 [956]
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's information need and documents in a specific user generated content environment, the blogosphere, we apply a form of query expansion, i.e., adding and reweighing query terms. Since the blogosphere is noisy, query expansion on the collection itself is rarely effective but external, edited collections are more suitable. We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled. Different instantiations of our model are discussed and make different (in)dependence assumptions. Results using two external collections (news and Wikipedia) show that external expansion for retrieval of user generated content is effective; besides, conditioning the external collection on the query is very beneficial, and making candidate expansion terms dependent on just the document seems sufficient.
Ye, Zheng; Huang, Xiangji & Lin, Hongfei A graph-based approach to mining multilingual word associations from Wikipedia 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, July 19, 2009 - July 23, 2009 Boston, MA, United states 2009 [957]
Compilation and indexing terms, Copyright 2010 Elsevier Inc.In this paper, we propose a graph-based approach to constructing a multilingual association dictionary from Wikipedia, in which we exploit two kinds of links in Wikipedia articles to associate multilingual words and concepts together in a graph. The mined association dictionary is applied in cross language information retrieval (CLIR) to verify its quality. We evaluate our approach on four CLIR} data sets and the experimental results show that it is possible to mine a good multilingual association dictionary from Wikipedia articles.
Georgescul, Maria A hedgehop over a max-margin framework using hedge cues Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [958]
In this paper, we describe the experimental settings we adopted in the context of the 2010 CoNLL} shared task for detecting sentences containing uncertainty. The classification results reported on are obtained using discriminative learning with features essentially incorporating lexical information. Hyper-parameters are tuned for each domain: using BioScope} training data for the biomedical domain and Wikipedia training data for the Wikipedia test set. By allowing an efficient handling of combinations of large-scale input features, the discriminative approach we adopted showed highly competitive empirical results for hedge detection on the Wikipedia dataset: our system is ranked as the first with an F-score of 60.17\%.
Kilicoglu, Halil & Bergler, Sabine A high-precision approach to detecting hedges and their scopes Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [959]
We extend our prior work on speculative sentence recognition and speculation scope detection in biomedical text to the CoNLL-2010} Shared Task on Hedge Detection. In our participation, we sought to assess the extensibility and portability of our prior work, which relies on linguistic categorization and weighting of hedging cues and on syntactic patterns in which these cues play a role. For Task {1B, we tuned our categorization and weighting scheme to recognize hedging in biological text. By accommodating a small number of vagueness quantifiers, we were able to extend our methodology to detecting vague sentences in Wikipedia articles. We exploited constituent parse trees in addition to syntactic dependency relations in resolving hedging scope. Our results are competitive with those of closed-domain trained systems and demonstrate that our high-precision oriented methodology is extensible and portable.
Halfaker, Aaron; Kittur, Aniket; Kraut, Robert & Riedl, John A jury of your peers: Quality, experience and ownership in Wikipedia 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [960]
Wikipedia is a highly successful example of what mass collaboration in an informal peer review system can accomplish. In this paper, we examine the role that the quality of the contributions, the experience of the contributors and the ownership of the content play in the decisions over which contributions become part of Wikipedia and which ones are rejected by the community. We introduce and justify a versatile metric for automatically measuring the quality of a contribution. We find little evidence that experience helps contributors avoid rejection. In fact, as they gain experience, contributors are even more likely to have their work rejected. We also find strong evidence of ownership behaviors in practice despite the fact that ownership of content is discouraged within Wikipedia.
Milne, David; Witten, Ian H. & Nichols, David M. A knowledge-based search engine powered by Wikipedia 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007 [961]
This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC} {HARD} track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.
Kane, Gerald; Majchrzak, Ann; Johnson, Jeremiah & Chenisern, Lily A Longitudinal Model of Perspective Making and Perspective Taking Within Fluid Online Collectives 2009 [962]
Although considerable research has investigated perspective making and perspective taking processes in existing communities of practice, little research has explored how these processes are manifest in fluid online collectives. Fluid collectives do not share common emotional bonds, shared languages, mental models, or clearly defined boundaries that are common in communities of practices and that aid in the perspective development process. This paper conducts a retrospective case study of a revelatory online collective – the autism article on Wikipedia – to explore how the collective develops a perspective over time with a fluid group of diverse participants surrounding a highly contentious issue. We find that the collective develops a perspective over time through three archetypical challenges – chaotic perspective taking, perspective shaping, and perspective defending. Using this data, we develop a longitudinal model of perspective development. The theoretical implications are discussed and a set of propositions are developed for testing in more generalized settings.
Chen, Lin & Eugenio, Barbara Di A Lucene and maximum entropy model based hedge detection system Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [963]
This paper describes the approach to hedge detection we developed, in order to participate in the shared task at CoNLL-2010.} A supervised learning approach is employed in our implementation. Hedge cue annotations in the training data are used as the seed to build a reliable hedge cue set. Maximum Entropy (MaxEnt) model is used as the learning technique to determine uncertainty. By making use of Apache Lucene, we are able to do fuzzy string match to extract hedge cues, and to incorporate part-of-speech (POS) tags in hedge cues. Not only can our system determine the certainty of the sentence, but is also able to find all the contained hedges. Our system was ranked third on the Wikipedia dataset. In later experiments with different parameters, we further improved our results, with a 0.612 F-score on the Wikipedia dataset, and a 0.802 F-score on the biological dataset.
Pang, Cheong-Iao & Biuk-Aghai, Robert P. A method for category similarity calculation in wikis 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [964]
Wikis, such as Wikipedia, allow their authors to assign categories to articles in order to better organize related content. This paper presents a method to calculate similarities between categories, illustrated by a calculation for the top-level categories in the Simple English version of Wikipedia.
Webster, David; Xu, Jie; Mundy, Darren & Warren, Paul A practical model for conceptual comparison using a wiki 2009 9th IEEE International Conference on Advanced Learning Technologies, ICALT 2009, July 15, 2009 - July 17, 2009 Riga, Latvia 2009 [965]
One of the key concerns in the conceptualisation of a single object is understanding the context under which that object exists (or can exist). This contextual understanding should provide us with clear conceptual identification of an object including implicit situational information and detail of surrounding objects. For example in learning terms, a learner should be aware of concepts related to the context of their field of study and a surrounding cloud of contextually related concepts. This paper explores the use of an evolving community maintained knowledge-base (that of wikipedia) in order to prioritise concepts that are semantically relevant to the user's interest space.
He, Jiyin & Rijke, Maarten De A ranking approach to target detection for automatic link generation 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [966]
We focus on the task of target detection in automatic link generation with Wikipedia, i.e., given an N-gram in a snippet of text, find the relevant Wikipedia concepts that explain or provide background knowledge for it. We formulate the task as a ranking problem and investigate the effectiveness of learning to rank approaches and of the features that we use to rank the target concepts for a given N-gram. Our experiments show that learning to rank approaches outperform traditional binary classification approaches. Also, our proposed features are effective both in binary classification and learning to rank settings.
Chu, Eric; Baid, Akanksha; Chen, Ting; Doan, AnHai & Naughton, Jeffrey A relational approach to incrementally extracting and querying structure in unstructured data Proceedings of the 33rd international conference on Very large data bases 2007 [967]
There is a growing consensus that it is desirable to query over the structure implicit in unstructured documents, and that ideally this capability should be provided incrementally. However, there is no consensus about what kind of system should be used to support this kind of incremental capability. We explore using a relational system as the basis for a workbench for extracting and querying structure from unstructured data. As a proof of concept, we applied our relational approach to support structured queries over Wikipedia. We show that the data set is always available for some form of querying, and that as it is processed, users can pose a richer set of structured queries. We also provide examples of how we can incrementally evolve our understanding of the data in the context of the relational workbench.
Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro A search engine for browsing the Wikipedia thesaurus 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India 2008 [968]
Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In our previous work, we proposed link structure mining algorithms to extract a huge scale and accurate association thesaurus from Wikipedia. The association thesaurus covers almost 1.3 million concepts and the significant accuracy is proved in detailed experiments. To prove its practicality, we implemented three features on the association thesaurus; a search engine for browsing Wikipedia Thesaurus, an XML} Web service for the thesaurus and a Semantic Web support feature. We show these features in this demonstration. 2008 Springer-Verlag} Berlin Heidelberg.
Li, Decong; Li, Sujian; Li, Wenjie; Wang, Wei & Qu, Weiguang A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network Proceedings of the ACL 2010 Conference Short Papers 2010 [969]
It is a fundamental and important task to extract key phrases from documents. Generally, phrases in a document are not independent in delivering the content of the document. In order to capture and make better use of their relationships in key phrase extraction, we suggest exploring the Wikipedia knowledge to model a document as a semantic network, where both n-ary and binary relationships among phrases are formulated. Based on a commonly accepted assumption that the title of a document is always elaborated to reflect the content of a document and consequently key phrases tend to have close semantics to the title, we propose a novel semi-supervised key phrase extraction approach in this paper by computing the phrase importance in the semantic network, through which the influence of title phrases is propagated to the other phrases iteratively. Experimental results demonstrate the remarkable performance of this approach.
Müller, Christof & Gurevych, Iryna A study on the semantic relatedness of query and document terms in information retrieval Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 2009 [970]
The use of lexical semantic knowledge in information retrieval has been a field of active study for a long time. Collaborative knowledge bases like Wikipedia and Wiktionary, which have been applied in computational methods only recently, offer new possibilities to enhance information retrieval. In order to find the most beneficial way to employ these resources, we analyze the lexical semantic relations that hold among query and document terms and compare how these relations are represented by a measure for semantic relatedness. We explore the potential of different indicators of document relevance that are based on semantic relatedness and compare the characteristics and performance of the knowledge bases Wikipedia, Wiktionary and WordNet.-
Yin, Xiaoshi; Huang, Jimmy Xiangji; Zhou, Xiaofeng & Li, Zhoujun A survival modeling approach to biomedical search result diversification using wikipedia Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010 [971]
In this paper, we propose a probabilistic survival model derived from the survival analysis theory for measuring aspect novelty. The retrieved documents' query-relevance and novelty are combined at the aspect level for re-ranking. Experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed approach in promoting ranking diversity for biomedical information retrieval.
Poole, Erika Shehan & Grudin, Jonathan A taxonomy of wiki genres in enterprise settings 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [972]
A growing body of work examines enterprise wikis. In this paper, we argue that enterprise wiki" is a blanket term describing three different genres of wiki: single contributor wikis group or team wikis and internal-use encyclopedias emulating Wikipedia. Based on the results of a study of wiki usage in a multinational software company we provide a taxonomy of enterprise wiki genres. We discuss emerging challenges specific to company-wide encyclopedias for which platforms such as Wikipedia provide surprisingly little guidance. These challenges include platform and content management decisions territoriality
Yang, Xintian; Asur, Sitaram; Parthasarathy, Srinivasan & Mehta, Sameep A visual-analytic toolkit for dynamic interaction graphs Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008 [973]
In this article we describe a visual-analytic tool for the interrogation of evolving interaction network data such as those found in social, bibliometric, WWW} and biological applications. The tool we have developed incorporates common visualization paradigms such as zooming, coarsening and filtering while naturally integrating information extracted by a previously described event-driven framework for characterizing the evolution of such networks. The visual front-end provides features that are specifically useful in the analysis of interaction networks, capturing the dynamic nature of both individual entities as well as interactions among them. The tool provides the user with the option of selecting multiple views, designed to capture different aspects of the evolving graph from the perspective of a node, a community or a subset of nodes of interest. Standard visual templates and cues are used to highlight critical changes that have occurred during the evolution of the network. A key challenge we address in this work is that of scalability - handling large graphs both in terms of the efficiency of the back-end, and in terms of the efficiency of the visual layout and rendering. Two case studies based on bibliometric and Wikipedia data are presented to demonstrate the utility of the toolkit for visual knowledge discovery.
Potthast, Martin; Stein, Benno & Anderka, Maik A Wikipedia-based multilingual retrieval model 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom 2008 [974]
This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document d*ichosen from the L-subset"} of Wikipedia. Likewise for a second document d' written in language L' L L' we construct a concept vector d' using from the L'-subset of the Wikipedia the topic-aligned counterparts d'*i of our previously chosen documents. Since the two concept vectors d and d' are collection-relative representations of d and d' they are language-independent. I. e. their similarity can directly be computed with the cosine similarity measure
Anand, Sarabjot Singh; Bunescu, Razvan; Carvcdho, Vitor; Chomicki, Jan; Conitzer, Vincent; Cox, Michael T.; Dignum, Virginia; Dodds, Zachary; Dredze, Mark; Furcy, David; Gabrilovich, Evgeniy; Goker, Mehmet H.; Guesgen, Hans; Hirsh, Haym; Jannach, Dietmar; Junker, Ulrich; Ketter, Wolfgang; Kobsa, Alfred; Koenig, Sven; Lau, Tessa; Lewis, Lundy; Matson, Eric; Metzler, Ted; Mihalcea, Rada; Mobasher, Bamshad; Pineau, Joelle; Poupart, Pascal; Raja, Anita; Ruml, Wheeler; Sadeh, Norman; Shani, Guy; Shapiro, Daniel; Smith, Trey; Taylor, Matthew E.; Wagstaff, Kiri; Walsh, William & Zhou, Rong AAAI 2008 workshop reports 445 Burgess Drive, Menlo Park, CA} 94025-3496, United States 2009
AAAI} was pleased to present the AAAI-08} Workshop Program, held Sunday and Monday, July 13-14, in Chicago, Illinois, USA.} The program included the following 15 workshops: Advancements in POMDP} Solvers; AI} Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of {Human-Robot} Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI} Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy. Copyright 2009, Association for the Advancement of Artificial Intelligence. All rights reserved.
Huang, Zhiheng; Zeng, Guangping; Xu, Weiqun & Celikyilmaz, Asli Accurate semantic class classifier for coreference resolution Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 2009 [975]
There have been considerable attempts to incorporate semantic knowledge into coreference resolution systems: different knowledge sources such as WordNet} and Wikipedia have been used to boost the performance. In this paper, we propose new ways to extract WordNet} feature. This feature, along with other features such as named entity feature, can be used to build an accurate semantic class (SC) classifier. In addition, we analyze the SC} classification errors and propose to use relaxed SC} agreement features. The proposed accurate SC} classifier and the relaxation of SC} agreement features on ACE2} coreference evaluation can boost our baseline system by 10.4\% and 9.7\% using MUC} score and anaphor accuracy respectively.
Kelly, Colin; Devereux, Barry & Korhonen, Anna Acquiring human-like feature-based conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics 2010 [976]
The automatic acquisition of feature-based conceptual representations from text corpora can be challenging, given the unconstrained nature of human-generated features. We examine large-scale extraction of concept-relation-feature triples and the utility of syntactic, semantic, and encyclopedic information in guiding this complex task. Methods traditionally employed do not investigate the full range of triples occurring in human-generated norms (e.g. flute produce sound), rather targeting concept-feature pairs (e.g. flute - sound) or triples involving specific relations (e.g. is-a, part-of). We introduce a novel method that extracts candidate triples (e.g. deer have antlers, flute produce sound) from parsed data and re-ranks them using semantic information. We apply this technique to Wikipedia and the British National Corpus and assess its accuracy in a variety of ways. Our work demonstrates the utility of external knowledge in guiding feature extraction, and suggests a number of avenues for future work.
Yang, Jeongwon & Shim, J.P. Adoption Factors of Online Knowledge Sharing Service in the Era of Web 2.0 2009 [977]
While the topic of online knowledge sharing services based on Web 2.0 has received considerable attention, virtually all the studies dealing with online knowledge sharing services have neglected or given cursory attention to the users’ perception regarding the usage of those services and the corresponding level of interaction. This study focuses on users’ different attitudes and expectations toward the domestic online knowledge sharing service represented by Korea’s {‘Jisik} IN’} (translation: knowledge IN) of Naver and a foreign counterpart of online knowledge sharing service represented by Wikipedia, which are often presented as a model of Web 2.0 applications. In Korea, the popularity gap between Jisik IN} and Wikipedia drops a hint of the necessity in grasping which factors are more important in allowing for more users’ engagement and satisfaction with regards to the online knowledge sharing service. This study presents and suggests an integrated model which is based on the constructs of WebQual, subjective norms, and cultural dimensions.
Advances in Information Retrieval. Proceedings 32nd European Conference on IR Research, ECIR 2010 Advances in Information Retrieval. 32nd European Conference on IR Research, ECIR 2010, 28-31 March 2010 Berlin, Germany 2010
The following topics are dealt with: natural language processing; multimedia information retrieval; language modeling; temporal information; recover broken Web; attitude identification; PICO} element; Web search queries; correlation analysis; automatic system evaluation; spatial diversity; online prediction; image detection; gene sequence; ranking fusion methods; peer-to-peer networks; probabilistic; wikipedia-based semantic smoothing; collaborative filtering; contextual image retrieval; XML} ranked retrieval; filtering documents; multilingual retrieval; machine translation; data analysis.
Hoffmann, Raphael; Amershi, Saleema; Patel, Kayur; Wu, Fei; Fogarty, James & Weld, Daniel S. Amplifying community content creation with mixed initiative information extraction Proceedings of the 27th international conference on Human factors in computing systems 2009 [978]
Although existing work has explored both information extraction and community content creation, most research has focused on them in isolation. In contrast, we see the greatest leverage in the synergistic pairing of these methods as two interlocking feedback cycles. This paper explores the potential synergy promised if these cycles can be made to accelerate each other by exploiting the same edits to advance both community content creation and learning-based information extraction. We examine our proposed synergy in the context of Wikipedia infoboxes and the Kylin information extraction system. After developing and refining a set of interfaces to present the verification of Kylin extractions as a non primary task in the context of Wikipedia articles, we develop an innovative use of Web search advertising services to study people engaged in some other primary task. We demonstrate our proposed synergy by analyzing our deployment from two complementary perspectives: (1) we show we accelerate community content creation by using Kylin's information extraction to significantly increase the likelihood that a person visiting a Wikipedia article as a part of some other primary task will spontaneously choose to help improve the article's infobox, and (2) we show we accelerate information extraction by using contributions collected from people interacting with our designs to significantly improve Kylin's extraction performance.
Ponzetto, Simone Paolo & Strube, Michael An API for measuring the relatedness of words in Wikipedia Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions 2007 [979]
We present an API} for computing the semantic relatedness of words in Wikipedia.
Erdmann, Maike; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro An approach for extracting bilingual terminology from Wikipedia 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, March 19, 2008 - March 21, 2008 New Delhi, India 2008 [980]
With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora. 2008 Springer-Verlag} Berlin Heidelberg.
Gollapudi, Sreenivas & Sharma, Aneesh An axiomatic approach for result diversification Proceedings of the 18th international conference on World wide web 2009 [981]
Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results. In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously. We illustrate the use of the axiomatic framework by providing three example diversification objectives that satisfy different subsets of the axioms. We also uncover a rich link to the facility dispersion problem that results in algorithms for a number of diversification objectives. Finally, we propose an evaluation methodology to characterize the objectives and the underlying axioms. We conduct a large scale evaluation of our objectives based on two data sets: a data set derived from the Wikipedia disambiguation pages and a product database.
Popovici, Eugen; Marteau, Pierre-François & Ménier, Gildas An effective method for finding best entry points in semi-structured documents Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval 2007 [982]
Focused structured document retrieval employs the concept of best entry point (BEP), which is intended to provide optimal starting-point from which users can browse to relevant document components [4]. In this paper we describe and evaluate a method for finding BEPs} in XML} documents. Experiments conducted within the framework of INEX} 2006 evaluation campaign on the Wikipedia XML} collection [2] shown the effectiveness of the proposed approach.
Ben-Chaim, Yochai; Farchi, Eitan & Raz, Orna An effective method for keeping design artifacts up-to-date 2009 ICSE Workshop on Wikis for Software Engineering, Wikis4SE 2009, May 16, 2009 - May 24, 2009 Vancouver, BC, Canada 2009 [983]
A major problem in the software development process is that design documents are rarely kept up-to-date with the implementation, and thus become irrelevant for extracting test plans or reviews. Furthermore, design documents tend to become very long and often impossible to review and comprehend. This paper describes an experimental method conducted in a development group at IBM.} The group uses a Wikipedia-like process to maintain design documents, while taking measures to keep them up-todate and in use, and thus relevant. The method uses a wiki enhanced with hierarchal glossaries of terms to maintain design artifacts. Initial results indicate that these enhancements are successful and assist in the creation of more effective design documents. We maintained a large portion of the groups' design documents in use and relevant over a period of three months. Additionally, by archiving artifacts that were not in use, we were able to validate that they were no longer relevant.
Milne, David & Witten, Ian H. An effective, low-cost measure of semantic relatedness obtained from wikipedia links 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Compilation and indexing terms, Copyright 2010 Elsevier Inc.This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Our approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter. Copyright 2008.
Yu, Xiaofeng & Lam, Wai An integrated probabilistic and logic approach to encyclopedia relation extraction with multiple features Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1 2008 [984]
We propose a new integrated approach based on Markov logic networks (MLNs), an effective combination of probabilistic graphical models and first-order logic for statistical relational learning, to extracting relations between entities in encyclopedic articles from Wikipedia. The MLNs} model entity relations in a unified undirected graph collectively using multiple features, including contextual, morphological, syntactic, semantic as well as Wikipedia characteristic features which can capture the essential characteristics of relation extraction task. This model makes simultaneous statistical judgments about the relations for a set of related entities. More importantly, implicit relations can also be identified easily. Our experimental results showed that, this integrated probabilistic and logic model significantly outperforms the current state-of-the-art probabilistic model, Conditional Random Fields (CRFs), for relation extraction from encyclopedic articles.
Nguyen, Chau Q. & Phan, Tuoi T. An ontology-based approach for key phrase extraction Proceedings of the ACL-IJCNLP 2009 Conference Short Papers 2009 [985]
Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP).} In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specific characteristics of the Vietnamese language for the key phrase selection stage. We also explore NLP} techniques that we propose for the analysis of Vietnamese texts, focusing on the advanced candidate phrases recognition phase as well as part-of-speech (POS) tagging. Finally, we review the results of several experiments that have examined the impacts of strategies chosen for Vietnamese key phrase extracting.
Nothman, Joel; Murphy, Tara & Curran, James R. Analysing Wikipedia and gold-standard corpora for NER training Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics 2009 [986]
Named entity recognition (ner) for English typically involves one of three gold standards: muc, conll, or bbn, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of ner. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11\%.
Lizorkin, Dmitry; Medelyan, Olena & Grineva, Maria Analysis of community structure in Wikipedia Proceedings of the 18th international conference on World wide web 2009 [987]
We present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank.} Extracted communities can be organized hierarchically similar to manually created Wikipedia category structure.
Zhang, Xinpeng; Asano, Y. & Yoshikawa, M. Analysis of Implicit Relations on Wikipedia: Measuring Strength through Mining Elucidatory Objects Database Systems for Advanced Applications. 15th International Conference, DASFAA 2010, 1-4 April 2010 Berlin, Germany 2010 [988]
We focus on measuring relations between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relations between two objects exist: in Wikipedia, an explicit relation is represented by a single link between the two pages for the objects, and an implicit relation is represented by a link structure containing the two pages. Previously proposed methods are inadequate for measuring implicit relations because they use only one or two of the following three important factors: distance, connectivity, and co-citation. We propose a new method reflecting all the three factors by using a generalized maximum flow. We confirm that our method can measure the strength of a relation more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relation. We explain that mining elucidatory objects opens a novel way to deeply understand a relation.
Muhr, Markus; Kern, Roman & Granitzer, Michael Analysis of structural relationships for hierarchical cluster labeling 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [989]
Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon} Divergence, x2 Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC} Ohsumed and the CLEF} IP} European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes.
Gupta, Rahul & Sarawagi, Sunita Answering table augmentation queries from unstructured lists on the web Proceedings of the VLDB Endowment VLDB Endowment Hompage 2009 [990]
We present the design of a system for assembling a table from a few example rows by harnessing the huge corpus of information-rich but unstructured lists on the web. We developed a totally unsupervised end to end approach which given the sample query rows --- (a) retrieves {HTML} lists relevant to the query from a pre-indexed crawl of web lists, (b) segments the list records and maps the segments to the query schema using a statistical model, (c) consolidates the results from multiple lists into a unified merged table, (d) and presents to the user the consolidated records ranked by their estimated membership in the target relation. The key challenges in this task include construction of new rows from very few examples, and an abundance of noisy and irrelevant lists that swamp the consolidation and ranking of rows. We propose modifications to statistical record segmentation models, and present novel consolidation and ranking techniques that can process input tables of arbitrary schema without requiring any human supervision. Experiments with Wikipedia target tables and 16 million unstructured lists show that even with just three sample rows, our system is very effective at recreating Wikipedia tables, with a mean runtime of around 20s.
Kriplean, Travis; Beschastnikh, Ivan & McDonald, David W. Articulations of wikiwork: uncovering valued work in wikipedia through barnstars Proceedings of the 2008 ACM conference on Computer supported cooperative work 2008 [991]
Successful online communities have complex cooperative arrangements, articulations of work, and integration practices. They require technical infrastructure to support a broad division of labor. Yet the research literature lacks empirical studies that detail which types of work are valued by participants in an online community. A content analysis of Wikipedia barnstars -- personalized tokens of appreciation given to participants -- reveals a wide range of valued work extending far beyond simple editing to include social support, administrative actions, and types of articulation work. Our analysis develops a theoretical lens for understanding how wiki software supports the creation of articulations of work. We give implications of our results for communities engaged in large-scale collaborations.
Wohner, Thomas & Peters, Ralf Assessing the quality of Wikipedia articles with lifecycle based metrics 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [992]
The main feature of the free online-encyclopedia Wikipedia is the wiki-tool, which allows viewers to edit the articles directly in the web browser. As a weakness of this openness for example the possibility of manipulation and vandalism cannot be ruled out, so that the quality of any given Wikipedia article is not guaranteed. Hence the automatic quality assessment has been becoming a high active research field. In this paper we offer new metrics for an efficient quality measurement. The metrics are based on the lifecycles of low and high quality articles, which refer to the changes of the persistent and transient contributions throughout the entire life span.
Adler, B. Thomas; Chatterjee, Krishnendu; Alfaro, Luca De; Faella, Marco; Pye, Ian & Raman, Vishwanath Assigning trust to Wikipedia content 4th International Symposium on Wikis, WikiSym 2008, September 8, 2008 - September 10, 2008 Porto, Portugal 2008 [993]
The Wikipedia is a collaborative encyclopedia: anyone can contribute to its articles simply by clicking on an edit" button. The open nature of the Wikipedia has been key to its success but has also created a challenge: how can readers develop an informed opinion on its reliability? We propose a system that computes quantitative values of trust for the text in Wikipedia articles; these trust values provide an indication of text reliability. The system uses as input the revision history of each article as well as information about the reputation of the contributing authors as provided by a reputation system. The trust of a word in an article is computed on the basis of the reputation of the original author of the word as well as the reputation of all authors who edited text near the word. The algorithm computes word trust values that vary smoothly across the text; the trust values can be visualized using varying text-background colors. The algorithm ensures that all changes to an article's text are reflected in the trust values
Ito, Masahiro; Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Association thesaurus construction methods based on link co-occurrence analysis for wikipedia 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [994]
Wikipedia, a huge scale Web based encyclopedia, attracts great attention as an invaluable corpus for knowledge extraction because it has various impressive characteristics such as a huge number of articles, live updates, a dense link structure, brief anchor texts and URL} identification for concepts. We have already proved that we can use Wikipedia to construct a huge scale accurate association thesaurus. The association thesaurus we constructed covers almost 1.3 million concepts and its accuracy is proved in detailed experiments. However, we still need scalable methods to analyze the huge number of Web pages and hyperlinks among articles in the Web based encyclopedia. In this paper, we propose a scalable method for constructing an association thesaurus from Wikipedia based on link co-occurrences. Link co-occurrence analysis is more scalable than link structure analysis because it is a one-pass process. We also propose integration method of tfidf and link co-occurrence analysis. Experimental results show that both our proposed methods are more accurate and scalable than conventional methods. Furthermore, the integration of tfidf achieved higher accuracy than using only link cooccurrences.
Chi, Ed H.; Pirolli, Peter; Suh, Bongwon; Kittur, Aniket; Pendleton, Bryan & Mytkowicz, Todd Augmented Social Cognition 2008 AAAI Spring Symposium, March 26, 2008 - March 28, 2008 Stanford, CA, United states 2008
Research in Augmented Social Cognition is aimed at enhancing the ability of a group of people to remember, think, and reason; to augment their speed and capacity to acquire, produce, communicate, and use knowledge; and to advance collective and individual intelligence in socially mediated information environments. In this paper, we describe the emergence of this research endeavor, and summarize some results from the research. In particular, we have found that (1) analyses of conflicts and coordination in Wikipedia have shown us the scientific need to understand social sensemaking environments; and (2) information theoretic analyses of social tagging behavior in del.icio.us shows the need to understand human vocabulary systems.
Chi, Ed H. Augmented social cognition: Using social web technology to enhance the ability of groups to remember, think, and reason International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, June 29, 2009 - July 2, 2009 Providence, RI, United states 2009 [995]
We are experiencing a new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by systems like Wikipedia, twitter, and delicious.com, these environments are turning people into social information foragers and sharers. Groups interact to resolve conflicts and jointly make sense of topic areas from Obama} vs. Clinton" to {"Islam."} PARC's} Augmented Social Cognition researchers -- who come from cognitive psychology computer science {HCI} CSCW} and other disciplines -- focus on understanding how to "enhance a group of people's ability to remember
Wu, Fei; Hoffmann, Raphael & Weld, Daniel S. Augmenting wikipedia-extraction with results from the web 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper explains and evaluates a method for improving recall by extracting from the broader Web. There are two key advances necessary to make Web supplementation effective: 1) a method to filter promising sentences from Web pages, and 2) a novel retraining technique to broaden extractor recall. Experiments show that, used in concert with shrinkage, our techniques increase recall by a factor of up to 8 while maintaining or increasing precision. Copyright 2008.
Dakka, Wisam & Ipeirotis, Panagiotis G. Automatic Extraction of Useful Facet Hierarchies from Text Databases Proceedings of the 2008 IEEE 24th International Conference on Data Engineering 2008 [996]
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to searching. Thus far the identification of the facets was either a manual procedure or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular we observe through a pilot study that facet terms rarely appear in text documents showing that we need external resources to identify useful facet terms. For this we first identify important phrases in each document. Then we expand each phrase with context" phrases using external resources such as WordNet} and Wikipedia causing facet terms to appear in the expanded database. Finally we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies using the Amazon Mechanical Turk service show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster."
Balasubramanian, Niranjan & Cucerzan, Silviu Automatic generation of topic pages using query-based aspect models ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [997]
We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. We describe a general framework, which combines query log analysis to build aspect models, sentence selection methods for identifying relevant and non-redundant Web sentences, and a technique for sentence ordering. We evaluate our approach on biographical topics both automatically and manually, by using Wikipedia as reference.
Gardner, James J. & Xiong, Li Automatic link detection: a sequence labeling approach Proceeding of the 18th ACM conference on Information and knowledge management 2009 [998]
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hyperlinking can be viewed as two sub problems - link detection which determines the source of a link, and link disambiguation which determines the destination of a link. Wikipedia is a rich corpus with hyperlink data provided by authors. It is possible to use this data to train classifiers to be able to mimic the authors in some capacity. In this paper, we introduce automatic link detection as a sequence labeling problem. Conditional random fields (CRFs) are a probabilistic framework for labeling sequential data. We show that training a CRF} with different types of features from the Wikipedia dataset can be used to automatically detect links with almost perfect precision and high recall.
Potthast, Martin; Stein, Benno & Gerling, Robert Automatic vandalism detection in Wikipedia 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom 2008 [999]
We present results of a new approach to detect destructive article revisions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83\% precision at 77\% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure} performance by 49\% while being faster at the same time. 2008 Springer-Verlag} Berlin Heidelberg.
Smets, Koen; Goethals, Bart & Verdonk, Brigitte Automatic vandalism detection in wikipedia: Towards a machine learning approach 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Since the end of 2006 several autonomous bots are, or have been, running on Wikipedia to keep the encyclopedia free from vandalism and other damaging edits. These expert systems, however, are far from optimal and should be improved to relieve the human editors from the burden of manually reverting such edits. We investigate the possibility of using machine learning techniques to build an autonomous system capable to distinguish vandalism from legitimate edits. We highlight the results of a small but important step in this direction by applying commonly known machine learning algorithms using a straightforward feature representation. Despite the promising results, this study reveals that elementary features, which are also used by the current approaches to fight vandalism, are not sufficient to build such a system. They will need to be accompanied by additional information which, among other things, incorporates the semantics of a revision. Copyright 2008.
Sauper, Christina & Barzilay, Regina Automatically generating Wikipedia articles: a structure-aware approach Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 2009 [1,000]
In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structure of human-authored texts to automatically induce a domain-specific template for the topic structure of a new overview. The algorithmic innovation of our work is a method to learn topic-specific extractors for content selection jointly for the entire template. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview. The results of our evaluation confirm the benefits of incorporating structural information into the content selection process.
Wu, Fei & Weld, Daniel S. Automatically refining the wikipedia infobox ontology 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [1,001]
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedia's infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs} and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedia's infobox-class schemata with WordNet.} We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.
Wu, Fei & Weld, Daniel S. Autonomously semantifying wikipedia 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007 [1,002]
Berners-Lee's} compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously Semantifying} Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source because it is comprehensive not too large high-quality and contains enough manually- derived structure to bootstrap an autonomous
Navigli, Roberto & Ponzetto, Simone Paolo BabelNet: building a very large multilingual semantic network Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,003]
In this paper we present BabelNet} -- a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet} and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource.
Kittur, Aniket & Kraut, Robert E. Beyond Wikipedia: Coordination and conflict in online production groups 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states 2010 [1,004]
Online production groups have the potential to transform the way that knowledge is produced and disseminated. One of the most widely used forms of online production is the wiki, which has been used in domains ranging from science to education to enterprise. We examined the development of and interactions between coordination and conflict in a sample of 6811 wiki production groups. We investigated the influence of four coordination mechanisms: intra-article communication, inter-user communication, concentration of workgroup structure, and policy and procedures. We also examined the growth of conflict, finding the density of users in an information space to be a significant predictor. Finally, we analyzed the effectiveness of the four coordination mechanisms on managing conflict, finding differences in how each scaled to large numbers of contributors. Our results suggest that coordination mechanisms effective for managing conflict are not always the same as those effective for managing task quality, and that designers must take into account the social benefits of coordination mechanisms in addition to their production benefits.
Oh, Jong-Hoon; Uchimoto, Kiyotaka & Torisawa, Kentaro Bilingual co-training for monolingual hyponymy-relation acquisition Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 2009 [1,005]
This paper proposes a novel framework called bilingual co-training for a large-scale, accurate acquisition method for monolingual semantic knowledge. In this framework, we combine the independent processes of monolingual semantic-knowledge acquisition for two languages using bilingual resources to boost performance. We apply this framework to large-scale hyponymy-relation acquisition from Wikipedia. Experimental results show that our approach improved the F-measure by 3.6--10.3\%. We also show that bilingual co-training enables us to build classifiers for two languages in tandem with the same combined amount of data as required for training a single classifier in isolation while achieving superior performance.
Liu, Xiaojiang; Nie, Zaiqing; Yu, Nenghai & Wen, Ji-Rong BioSnowball: Automated population of wikis 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010, July 25, 2010 - July 28, 2010 Washington, DC, United states 2010 [1,006]
Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wiki-pedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball} to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball.} We also empirically show that BioSnowball} outperforms the decoupled methods.
Jesus, Rut; Schwartz, Martin & Lehmann, Sune Bipartite networks of Wikipedia's articles and authors: A meso-level approach 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,007]
This exploratory study investigates the bipartite network of articles linked by common editors in Wikipedia, {'The} Free Encyclopedia that Anyone Can Edit'. We use the articles in the categories (to depth three) of Physics and Philosophy and extract and focus on significant editors (at least 7 or 10 edits per each article). We construct a bipartite network, and from it, overlapping cliques of densely connected articles and editors. We cluster these densely connected cliques into larger modules to study examples of larger groups that display how volunteer editors flock around articles driven by interest, real-world controversies, or the result of coordination in WikiProjects.} Our results confirm that topics aggregate editors; and show that highly coordinated efforts result in dense clusters.
Yardi, Sarita; Golder, Scott A. & Brzozowski, Michael J. Blogging at work and the corporate attention economy Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,008]
The attention economy motivates participation in peer-produced sites on the Web like YouTube} and Wikipedia. However, this economy appears to break down at work. We studied a large internal corporate blogging community using log files and interviews and found that employees expected to receive attention when they contributed to blogs, but these expectations often went unmet. Like in the external blogosphere, a few people received most of the attention, and many people received little or none. Employees expressed frustration if they invested time and received little or no perceived return on investment. While many corporations are looking to adopt Web-based communication tools like blogs, wikis, and forums, these efforts will fail unless employees are motivated to participate and contribute content. We identify where the attention economy breaks down in a corporate blog community and suggest mechanisms for improvement.
Ngomo, Axel-Cyrille Ngonga & Schumacher, Frank BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing 2009 [1,009]
In this paper, we introduce BorderFlow, a novel local graph clustering algorithm, and its application to natural language processing problems. For this purpose, we first present a formal description of the algorithm. Then, we use BorderFlow} to cluster large graphs and to extract concepts from word similarity graphs. The clustering of large graphs is carried out on graphs extracted from the Wikipedia Category Graph. The subsequent low-bias extraction of concepts is carried out on two data sets consisting of noisy and clean data. We show that BorderFlow} efficiently computes clusters of high quality and purity. Therefore, BorderFlow} can be integrated in several other natural language processing applications.
Mindel, Joshua & Verma, Sameer Building Collaborative Knowledge Bases: An Open Source Approach Using Wiki Software in Teaching and Research 2005 [1,010]
To open-minded students and professors alike, a classroom is an experience in which all participants collaborate to expand their knowledge. The collective knowledge is typically documented via a mix of lecture slides, notes taken by students, writings submitted by individuals or teams, online discussion forums, etc. A Wiki is collection of hyperlinked web pages that are assembled with Wiki software. It differs from the traditional process of developing a web site in that any registered participant can edit without knowing how to build a web site. It enables a group to asynchronously develop and refine a body of knowledge in full view of all participants. The emergence of Wikipedia and Wikitravel demonstrate that this collaborative process is scalable.1 In this tutorial, we will provide an overview of the Wiki collaboration process; explain how it can be used in teaching courses, and also how it provides an efficient mechanism for collaborating researchers to document their growing body of knowledge. For teaching, students can collectively post and refine each others writings. Participants: If possible, please bring a laptop with Wi-Fi} capability.
DeRose, Pedro; Chai, Xiaoyong; Gao, Byron J.; Shen, Warren; Doan, AnHai; Bohannon, Philip & Zhu, Xiaojin Building community wikipedias: A machine-human partnership approach 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, April 7, 2008 - April 12, 2008 Cancun, Mexico 2008 [1,011]
The rapid growth of Web communities has motivated many solutions for building community data portals. These solutions follow roughly two approaches. The first, approach (e.g., Libra, Citeseer, Cimple) employs semi-automatic methods to extract and integrate data from a multitude of data sources. The second approach (e.g., Wikipedia, Intellipedia) deploys an initial portal in wild format, then invites community members to revise and add material. In this paper we consider combining the above two approaches to building community portals. The new hybrid machine-human approach brings significant benefits. It can achieve broader and deeper coverage, provide more incentives for users to contribute, and keep the portal more up-to-date with less user effort. In a sense, it enables building community wikipedias" backed by an underlying structured database that is continuously updated using automatic techniques. We outline our ideas for the new approach describe its challenges and opportunities and provide initial solutions. Finally we describe a real-world implementation and preliminary experiments that demonstrate the utility of the new approach. "
Wang, Pu & Domeniconi, Carlotta Building semantic kernels for text classification using wikipedia 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, August 24, 2008 - August 27, 2008 Las Vegas, NV, United states 2008 [1,012]
Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The traditional document representation is a word-based vector (Bag} of Words, or BOW), where each dimension is associated with a term of the dictionary containing all the words that appear in the corpus. Although simple and commonly used, this representation has several limitations. It is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms. In this paper, we overcome the shortages of the BOW} approach by embedding background knowledge derived from Wikipedia into a semantic kernel, which is then used to enrich the representation of documents. Our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BOW} technique, and to other recently developed methods.
Yin, Xiaoxin & Shah, Sarthak Building taxonomy of web search intents for name entity queries 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,013]
A significant portion of web search queries are name entity queries. The major search engines have been exploring various ways to provide better user experiences for name entity queries, such as showing search tasks" (Bing} search) and showing direct answers (Yahoo!} Kosmix). In order to provide the search tasks or direct answers that can satisfy most popular user intents we need to capture these intents together with relationships between them. In this paper we propose an approach for building a hierarchical taxonomy of the generic search intents for a class of name entities (e.g. musicians or cities). The proposed approach can find phrases representing generic intents from user queries
Blanco, Roi; Bortnikov, Edward; Junqueira, Flavio; Lempel, Ronny; Telloli, Luca & Zaragoza, Hugo Caching search engine results over incremental indices 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,014]
A Web search engine must update its index periodically to incorporate changes to the Web, and we argue in this work that index updates fundamentally impact the design of search engine result caches. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. To enable efficient invalidation of cached results, we propose a framework for developing invalidation predictors and some concrete predictors. Evaluation using Wikipedia documents and a query log from Yahoo shows that selective invalidation of cached search results can lower the number of query re-evaluations by as much as 30\% compared to a baseline time-to-live scheme, while returning results of similar freshness. 2010 Copyright is held by the author/owner(s).
Kittur, Aniket; Suh, Bongwon & Chi, Ed H. Can you ever trust a wiki? Impacting perceived trustworthiness in wikipedia 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states 2008 [1,015]
Wikipedia has become one of the most important information resources on the Web by promoting peer collaboration and enabling virtually anyone to edit anything. However, this mutability also leads many to distrust it as a reliable source of information. Although there have been many attempts at developing metrics to help users judge the trustworthiness of content, it is unknown how much impact such measures can have on a system that is perceived as inherently unstable. Here we examine whether a visualization that exposes hidden article information can impact readers' perceptions of trustworthiness in a wiki environment. Our results suggest that surfacing information relevant to the stability of the article and the patterns of editor behavior can have a significant impact on users' trust across a variety of page types.
Wang, Haofen; Tran, Thanh & Liu, Chang CE2: towards a large scale hybrid search engine with integrated ranking support Proceeding of the 17th ACM conference on Information and knowledge management 2008 [1,016]
The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF} triples. Many of these triples are annotations that are associated with documents. While structured query is the principal mean to retrieve semantic data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both documents and semantic data can address more complex information needs. In this paper, we present CE2, an integrated solution that leverages mature database and information retrieval technologies to tackle challenges in hybrid search on the large scale. For scalable storage, CE2} integrates database with inverted indices. Hybrid query processing is supported in CE2} through novel algorithms and data structures, which allow for advanced ranking schemes to be integrated more tightly into the process. Experiments conducted on Dbpedia and Wikipedia show that CE2} can provide good performance in terms of both effectiveness and efficiency.
Cowling, Peter; Remde, Stephen; Hartley, Peter; Stewart, Will; Stock-Brooks, Joe & Woolley, Tom C-Link: Concept linkage in knowledge repositories 2010 AAAI Spring Symposium, March 22, 2010 - March 24, 2010 Stanford, CA, United states 2010
When searching a knowledge repository such as Wikipedia or the Internet, the user doesn't always know what they are looking for. Indeed, it is often the case that a user wishes to find information about a concept that was completely unknown to them prior to the search. In this paper we describe C-Link, which provides the user with a method for searching for unknown concepts which lie between two known concepts. C-Link} does this by modeling the knowledge repository as a weighted, directed graph where nodes are concepts and arc weights give the degree of relatedness" between concepts. An experimental study was undertaken with 59 participants to investigate the performance of C-Link} compared to standard search approaches. Statistical analysis of the results shows great potential for C-Link} as a search tool. 2009 Association for the Advancement of Artificial Intelligence."
Huang, Anna; Milne, David; Frank, Eibe & Witten, Ian H. Clustering documents using a wikipedia-based concept representation 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009, April 27, 2009 - April 30, 2009 Bangkok, Thailand 2009 [1,017]
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques. Springer-Verlag} Berlin Heidelberg 2009.
Huang, Anna; Milne, David; Frank, Eibe & Witten, Ian H. Clustering documents with active learning using wikipedia 8th IEEE International Conference on Data Mining, ICDM 2008, December 15, 2008 - December 19, 2008 Pisa, Italy 2008 [1,018]
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-lev el constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20\%.
Banerjee, Somnath; Ramanathan, Krishnan & Gupta, Ajay Clustering short texts using wikipedia 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands 2007 [1,019]
Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.
Emigh, William & Herring, Susan C. Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04 2005 [1,020]
This paper presents the results of a genre analysis of two web-based collaborative authoring environments, Wikipedia and Everything2, both of which are intended as repositories of encyclopedic knowledge and are open to contributions from the public. Using corpus linguistic methods and factor analysis of word counts for features of formality and informality, we show that the greater the degree of post-production editorial control afforded by the system, the more formal and standardized the language of the collaboratively-authored documents becomes, analogous to that found in traditional print encyclopedias. Paradoxically, users who faithfully appropriate such systems create homogeneous entries, at odds with the goal of open-access authoring environments to create diverse content. The findings shed light on how users, acting through mechanisms provided by the system, can shape (or not) features of content in particular ways. We conclude by identifying sub-genres of web-based collaborative authoring environments based on their technical affordances.
Ahmadi, Navid; Repenning, Alexander & Ioannidou, Andri Collaborative end-user development on handheld devices 2008 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2008, September 15, 2008 - September 19, 2008 Herrsching am Ammersee, Germany 2008 [1,021]
Web 2.0 has enabled end users to collaborate through their own developed artifacts, moving on from text (e.g., Wikipedia, Blogs) to images (e.g., Flickr) and movies (e.g., YouTube), changing end-user's role from consumer to producer. But still there is no support for collaboration through interactive end-user developed artifacts, especially for emerging handheld devices, which are the next collaborative platform. Featuring fast always-on networks, Web browsers that are as powerful as their desktop counterparts, and innovative user interfaces, the newest generation of handheld devices can run highly interactive content as Web applications. We have created Ristretto Mobile, a Web-compliant framework for running end-user developed applications on handheld devices. The Webbased Ristretto Mobile includes compiler and runtime components to turn end-user applications into Web applications that can run on compatible handheld devices, including the Apple IPhone} and Nokia N800. Our paper reports on the technological and cognitive challenges in creating interactive content that runs efficiently and is user accessible on handheld devices.
Shieh, Jyh-Ren; Yeh, Yang-Ting; Lin, Chih-Hung; Lin, Ching-Yung & Wu, Ja-Ling Collaborative knowledge semantic graph image search 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [1,022]
In this paper, we propose a Collaborative Knowledge Semantic Graphs Image Search (CKSGIS) system. It provides a novel way to conduct image search by utilizing the collaborative nature in Wikipedia and by performing network analysis to form semantic graphs for search-term expansion. The collaborative article editing process used by Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When a user types in a search term, CKSGIS} automatically retrieves an interactive semantic graph of related terms that allow users to easily find related images not limited to a specific search term. Interactive semantic graph then serve as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive users who do not possess a large vocabulary and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored CKSGIS} system rather than commercial search engines.
Kulkarni, Sayali; Singh, Amit; Ramakrishnan, Ganesh & Chakrabarti, Soumen Collective annotation of wikipedia entities in web text 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, June 28, 2009 - July 1, 2009 Paris, France 2009 [1,023]
To take the first step beyond keyword-based search toward entity-based search, suitable token spans (spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard.} We investigate practical solutions based on local hill-climbing rounding integer linear programs and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manuallyannotated Web pages and tens of thousands of spots our approaches significantly outperform recently-proposed algorithms.
Yao, Limin; Riedel, Sebastian & McCallum, Andrew Collective cross-document relation extraction without labelled data Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 2010 [1,024]
We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic out-of-domain (New} York Times Corpus) setting. For the in-domain setting, our joint model leads to 4\% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13\% over the pipeline, and 15\% over the isolated baseline.
Vibber, Brion Community performance optimization: making your people run as smoothly as your site Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,025]
Collaborative communities such as those building wikis and open source software often discover that their human interactions have just as many scaling problems as their web infrastructure. As the number of people involved in a project grows, key decision-makers often become bottlenecks, and community structure needs to change or a project can become stalled despite the best intentions of all participants. I'll describe some of the community scaling challenges in both Wikipedia's editor community and the development of its underlying MediaWiki} software and how we've overcome -- or are still working to overcome -- decision-making bottlenecks to maximize community throughput"."
He, Jinru; Yan, Hao & Suel, Torsten Compact full-text indexing of versioned document collections ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,026]
We study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches.
Zesch, Torsten; Gurevych, Iryna & Mühlhäuser, Max Comparing Wikipedia and German wordnet by evaluating semantic relatedness on multiple datasets NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers 2007 [1,027]
West, Robert; Precup, Doina & Pineau, Joelle Completing Wikipedia's hyperlink structure through dimensionality reduction ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,028]
Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyperlinks. However, since these links are created by human authors, links one would expect to see are often missing. The goal of this work is to detect such gaps automatically. In this paper, we propose a novel method for augmenting the structure of hyperlinked document collections such as Wikipedia. It does not require the extraction of any manually defined features from the article to be augmented. Instead, it is based on principal component analysis, a well-founded mathematical generalization technique, and predicts new links purely based on the statistical structure of the graph formed by the existing links. Our method does not rely on the textual content of articles; we are exploiting only hyperlinks. A user evaluation of our technique shows that it improves the quality of top link suggestions over the state of the art and that the best predicted links are significantly more valuable than the 'average' link already present in Wikipedia. Beyond link prediction, our algorithm can potentially be used to point out topics an article misses to cover and to cluster articles semantically.
Zhang, Bingjun; Xiang, Qiaoliang; Lu, Huanhuan; Shen, Jialie & Wang, Ye Comprehensive query-dependent fusion using regression-on-folksonomies: a case study of multimodal music search Proceedings of the seventeen ACM international conference on Multimedia 2009 [1,029]
The combination of heterogeneous knowledge sources has been widely regarded as an effective approach to boost retrieval accuracy in many information retrieval domains. While various technologies have been recently developed for information retrieval, multimodal music search has not kept pace with the enormous growth of data on the Internet. In this paper, we study the problem of integrating multiple online information sources to conduct effective query dependent fusion (QDF) of multiple search experts for music retrieval. We have developed a novel framework to construct a knowledge space of users' information need from online folksonomy data. With this innovation, a large number of comprehensive queries can be automatically constructed to train a better generalized QDF} system against unseen user queries. In addition, our framework models QDF} problem by regression of the optimal combination strategy on a query. Distinguished from the previous approaches, the regression model of QDF} (RQDF) offers superior modeling capability with less constraints and more efficient computation. To validate our approach, a large scale test collection has been collected from different online sources, such as Last.fm, Wikipedia, and YouTube.} All test data will be released to the public for better research synergy in multimodal music search. Our performance study indicates that the accuracy, efficiency, and robustness of the multimodal music search can be improved significantly by the proposed Folksonomy-RQDF} approach. In addition, since no human involvement is required to collect training examples, our approach offers great feasibility and practicality in system development.
Gabrilovich, Evgeniy & Markovitch, Shaul Computing semantic relatedness using Wikipedia-based explicit semantic analysis Proceedings of the 20th international joint conference on Artifical intelligence 2007 [1,030]
{{{2}}}
Egozi, Ofer; Gabrilovich, Evgeniy & Markovitch, Shaul Concept-based feature generation and selection for information retrieval Proceedings of the 23rd national conference on Artificial intelligence - Volume 2 2008 [1,031]
Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC} data confirm the superior performance of our method compared to the previous state of the art.
Chang, Jonathan; Boyd-Graber, Jordan & Blei, David M. Connections between the lines: augmenting social networks with text Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 2009 [1,032]
Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. In this paper we present a novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as Wikipedia. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.
Wilkinson, Dennis M. & Huberman, Bernardo A. Cooperation and quality in Wikipedia 2007 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA - 2007 International Symposium on Wikis, WikiSym, October 21, 2007 - October 25, 2007 Montreal, QC, Canada 2007 [1,033]
The rise of the Internet has enabled collaboration and cooperation on anunprecedentedly large scale. The online encyclopedia Wikipedia, which presently comprises 7.2 million articles created by 7.04 million distinct editors, provides a consummate example. We examined all 50 million edits made tothe 1.5 million English-language Wikipedia articles and found that the high-quality articles are distinguished by a marked increase in number of edits, number of editors, and intensity of cooperative behavior, as compared to other articles of similar visibility and age. This is significant because in other domains, fruitful cooperation has proven to be difficult to sustain as the size of the collaboration increases. Furthermore, in spite of the vagaries of human behavior, we show that Wikipedia articles accrete edits according to a simple stochastic mechanism in which edits beget edits. Topics of high interest or relevance are thus naturally brought to the forefront of quality. ""
Krieger, Michel; Stark, Emily Margarete & Klemmer, Scott R. Coordinating tasks on the commons: designing for personal goals, expertise and serendipity Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,034]
How is work created, assigned, and completed on large-scale, crowd-powered systems like Wikipedia? And what design principles might enable these federated online systems to be more effective? This paper reports on a qualitative study of work and task practices on Wikipedia. Despite the availability of tag-based community-wide task assignment mechanisms, informants reported that self-directed goals, within-topic expertise, and fortuitous discovery are more frequently used than community-tagged tasks. We examine how Wikipedia editors organize their actions and the actions of other participants, and what implications this has for understanding, and building tools for, crowd-powered systems, or any web site where the main force of production comes from a crowd of online participants. From these observations and insights, we developed WikiTasks, a tool that integrates with Wikipedia and supports both grassroots creation of site-wide tasks and self-selection of personal tasks, accepted from this larger pool of community tasks.
Rossi, Alessandro; Gaio, Loris; Besten, Matthijs Den & Dalle, Jean-Michel Coordination and division of labor in open content communities: The role of template messages in Wikipedia 43rd Annual Hawaii International Conference on System Sciences, HICSS-43, January 5, 2010 - January 8, 2010 Koloa, Kauai, {HI, United states 2010 [1,035]
Though largely spontaneous and loosely regulated, the process of peer production within online communities is also supplemented by additional coordination mechanisms. In this respect, we study an emergent organizational practice of the Wikipedia community, the use of template messages, which seems to act as effective and parsimonious coordination device to signal quality concerns or other issues that need to be addressed. We focus on the template NPOV"} which signals breaches on the fundamental policy of neutrality of Wikipedia articles and we show how and to what extent putting such template on a page affects the editing process. We notably find that intensity of editing increases immediately after the {"NPOV"} template appears and that controversies about articles which have received the attention of a more limited group of editors before they were tagged as controversial have a lower chance to be treated quickly.
Kittur, Aniket; Lee, Bryant & Kraut, Robert E. Coordination in collective intelligence: the role of team structure and task interdependence Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,036]
The success of Wikipedia has demonstrated the power of peer production in knowledge building. However, unlike many other examples of collective intelligence, tasks in Wikipedia can be deeply interdependent and may incur high coordination costs among editors. Increasing the number of editors increases the resources available to the system, but it also raises the costs of coordination. This suggests that the dependencies of tasks in Wikipedia may determine whether they benefit from increasing the number of editors involved. Specifically, we hypothesize that adding editors may benefit low-coordination tasks but have negative consequences for tasks requiring a high degree of coordination. Furthermore, concentrating the work to reduce coordination dependencies should enable more efficient work by many editors. Analyses of both article ratings and article review comments provide support for both hypotheses. These results suggest ways to better harness the efforts of many editors in social collaborative systems involving high coordination tasks.
Jankowski, Jacek Copernicus: 3D Wikipedia ACM SIGGRAPH 2008 Posters 2008, SIGGRAPH'08, August 11, 2008 - August 15, 2008 Los Angeles, CA, United states 2008 [1,037]
In this paper we present one of the potential paths of the evolution of Wikipedia towards Web 3.0. We introduce Copernicus - The Virtual {3D} Encyclopedia, which was built according to {2-Layer} Interface Paradigm (2LIP).} The background layer of the {2LIP-type} user interface is a {3D} scene, which a user cannot directly interact with. The foreground layer is {HTML} content. Only taking an action on this content (e.g. pressing a hyperlink) can affect the {3D} scene.
Zhao, Shubin & Betz, Jonathan Corroborate and learn facts from the web Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining 2007 [1,038]
The web contains lots of interesting factual information about entities, such as celebrities, movies or products. This paper describes a robust bootstrapping approach to corroborate facts and learn more facts simultaneously. This approach starts with retrieving relevant pages from a crawl repository for each entity in the seed set. In each learning cycle, known facts of an entity are corroborated first in a relevant page to find fact mentions. When fact mentions are found, they are taken as examples for learning new facts from the page via {HTML} pattern discovery. Extracted new facts are added to the known fact set for the next learning cycle. The bootstrapping process continues until no new facts can be learned. This approach is language-independent. It demonstrated good performance in experiment on country facts. Results of a large scale experiment will also be shown with initial facts imported from wikipedia.
Sato, Satoshi Crawling English-Japanese person-name transliterations from the web Proceedings of the 18th international conference on World wide web 2009 [1,039]
Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese} person-name transliterations from the Web, which works a back-end collector for automatic compilation of bilingual person-name lexicon. Our crawler collected {561K} transliterations in five months. From them, an English-Japanese} person-name lexicon with {406K} entries has been compiled by an automatic post processing. This lexicon is much larger than other similar resources including English-Japanese} lexicon of {HeiNER} obtained from Wikipedia.
Lindsay, Brooks Creating the Wikipedia of pros and cons"" 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,040]
Debatepedia Founder Brooks Lindsay will host a panel focusing on projects and individuals attempting to build what amounts to the Wikipedia of debates" or "the Wikipedia of pros and cons". The panel will bring together Debatepedia founder Brooks Lindsay Debatewise founder David Crane Opposing Views founder Russell Fine and ProCon.org} editor Kambiz Akhavan. We will discuss our successes and failures over the past three years and the way forward for clarifying public debates via wiki and other technologies. "
Shachaf, Pnina & Hara, Noriko Cross-Cultural Analysis of the Wikipedia Community 2009 [1,041]
This paper reports a cross-cultural analysis of Wikipedia communities of practice (CoPs).} First, this paper argues that Wikipedia communities can be analyzed and understood as CoPs.} Second, the similarities and differences in norms of behaviors across three different languages (English, Hebrew, and Japanese) and on three types of discussion spaces (Talk, User Talk, and Wikipedia Talk) are identified. These are explained by Hofstede’s dimensions of cultural diversity, the size of the community, and the role of each discussion area. This paper expands the research on online CoPs, which have not performed in-depth examinations of cultural variations across multiple.
Roth, Benjamin & Klakow, Dietrich Cross-language retrieval using link-based language models 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [1,042]
We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF} 2000 data we achieve improvement with respect to the best German-English} system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant).
Hassan, Samer & Mihalcea, Rada Cross-lingual semantic relatedness using encyclopedic knowledge Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 2009 [1,043]
In this paper, we address the task of crosslingual semantic relatedness. We introduce a method that relies on the information extracted from Wikipedia, by exploiting the interlanguage links available between Wikipedia versions in multiple languages. Through experiments performed on several language pairs, we show that the method performs well, with a performance comparable to monolingual measures of relatedness.
Potthast, Martin Crowdsourcing a Wikipedia vandalism corpus 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [1,044]
We report on the construction of the PAN} Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32 452 edits on 28468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as regular" or "vandalism." The corpus is available free of charge. 2010 ACM.}"
Amer-Yahia, Sihem; Markl, Volker; Halevy, Alon; Doan, AnHai; Alonso, Gustavo; Kossmann, Donald & Weikum, Gerhard Databases and Web 2.0 panel at VLDB 2007 2008 [1,045]
Web 2.0 refers to a set of technologies that enables indviduals to create and share content on the Web. The types of content that are shared on Web 2.0 are quite varied and include photos and videos (e.g., Flickr, YouTube), encyclopedic knowledge (e.g., Wikipedia), the blogosphere, social book-marking and even structured data (e.g., Swivel, Many-eyes). One of the important distinguishing features of Web 2.0 is the creation of communities of users. Online communities such as LinkedIn, Friendster, Facebook, MySpace} and Orkut attract millions of users who build networks of their contacts and utilize them for social and professional purposes. In a nutshell, Web 2.0 offers an architecture of participation and democracy that encourages users to add value to the application as they use it.
Nastase, Vivi & Strube, Michael Decoding Wikipedia categories for knowledge acquisition 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states 2008
This paper presents an approach to acquire knowledge from Wikipedia categories and the category network. Many Wikipedia categories have complex names which reflect human classification and organizing instances, and thus encode knowledge about class attributes, taxonomic and other semantic relations. We decode the names and refer back to the network to induce relations between concepts in Wikipedia represented through pages or categories. The category structure allows us to propagate a relation detected between constituents of a category name to numerous concept links. The results of the process are evaluated against ResearchCyc} and a subset also by human judges. The results support the idea that Wikipedia category names are a rich source of useful and accurate knowledge. Copyright 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Grishchenko, Victor Deep hypertext with embedded revision control implemented in regular expressions 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [1,046]
While text versioning was definitely a part of the original hypertext concept [21, 36, 44], it is rarely considered in this context today. Still, we know that revision control underlies the most exciting social co-authoring projects of the today's Internet, namely the Wikipedia and the Linux kernel. With an intention to adapt the advanced revision control technologies and practices to the conditions of the Web, the paper reconsiders some obsolete assumptions and develops a new versioned text format perfectly processable with standard regular expressions (PCRE} [6]). The resulting deep hypertext model allows instant access to past/concurrent versions, authorship, changes; enables deep links to reference changing parts of a changing text. Effectively, it allows distributed and real-time revision control on the Web, implementing the vision of co-evolution and mutation exchange among multiple competing versions of the same text.
Ponzetto, Simone Paolo & Strube, Michael Deriving a large scale taxonomy from Wikipedia AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada 2007
We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Arazy, Ofer & Nov, Oded Determinants of wikipedia quality: The roles of global and local contribution inequality 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states 2010 [1,047]
The success of Wikipedia and the relative high quality of its articles seem to contradict conventional wisdom. Recent studies have begun shedding light on the processes contributing to Wikipedia's success, highlighting the role of coordination and contribution inequality. In this study, we expand on these works in two ways. First, we make a distinction between global (Wikipedia-wide) and local (article-specific) inequality and investigate both constructs. Second, we explore both direct and indirect effects of these inequalities, exposing the intricate relationships between global inequality, local inequality, coordination, and article quality. We tested our hypotheses on a sample of a Wikipedia articles using structural equation modeling and found that global inequality exerts significant positive impact on article quality, while the effect of local inequality is indirect and is mediated by coordination.
Wilson, Shomir Distinguishing use and mention in natural language Proceedings of the NAACL HLT 2010 Student Research Workshop 2010 [1,048]
When humans communicate via natural language, they frequently make use of metalanguage to clarify what they mean and promote a felicitous exchange of ideas. One key aspect of metalanguage is the mention of words and phrases, as distinguished from their use. This paper presents ongoing work on identifying and categorizing instances of language-mention, with the goal of building a system capable of automatic recognition of the phenomenon. A definition of language-mention and a corpus of instances gathered from Wikipedia are discussed, and the future direction of the project is described.
Rafiei, Davood; Bharat, Krishna & Shukla, Anand Diversifying web search results 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,049]
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how 'diversity' interplays with 'quality' and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14\% to 38\% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.
Jr., Joseph M. Reagle Do as I do: Authorial leadership in Wikipedia 2007 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA - 2007 International Symposium on Wikis, WikiSym, October 21, 2007 - October 25, 2007 Montreal, QC, Canada 2007 [1,050]
In seemingly egalitarian collaborative on-line communities, like Wikipedia, there is often a paradoxical, or perhaps merely playful, use of the title Benevolent} Dictator" for leaders. I explore discourse around the use of this title so as to address how leadership works in open content communities. I first review existing literature on "emergent leadership" and then relate excerpts from community discourse on how leadership is understood performed and discussed by Wikipedians. I conclude by integrating concepts from existing literature and my own findings into a theory of "authorial" leadership."
Stein, Klaus & Hess, Claudia Does it matter who contributes - A study on featured articles in the german wikipedia Hypertext 2007: 18th ACM Conference on Hypertext and Hypermedia, HT'07, September 10, 2007 - September 12, 2007 Manchester, United kingdom 2007 [1,051]
The considerable high quality of Wikipedia articles is often accredited to the large number of users who contribute to Wikipedia's encyclopedia articles, who watch articles and correct errors immediately. In this paper, we are in particular interested in a certain type of Wikipedia articles, namely, the featured articles - articles marked by a community's vote as being of outstanding quality. The German Wikipedia has the nice property that it has two types of featured articles: excellent and worth reading. We explore on the German Wikipedia whether only the mere number of contributors makes the difference or whether the high quality of featured articles results from having experienced authors contributing with a reputation for high quality contributions. Our results indicate that it does matter who contributes. ""
U, Leong Hou; Mamoulis, Nikos; Berberich, Klaus & Bedathur, Srikanta Durable top-k search in document archives 2010 International Conference on Management of Data, SIGMOD '10, June 6, 2010 - June 11, 2010 Indianapolis, IN, United states 2010 [1,052]
We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA.} The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.
Sinclair, Patrick A. S.; Martinez, Kirk & Lewis, Paul H. Dynamic link service 2.0: Using wikipedia as a linkbase Hypertext 2007: 18th ACM Conference on Hypertext and Hypermedia, HT'07, September 10, 2007 - September 12, 2007 Manchester, United kingdom 2007 [1,053]
This paper describes how a Web 2.0 mashup approach, reusing technologies and services freely available on the web, have enabled the development of a dynamic link service system that uses Wikipedia as its linkbase.
Nakatani, Makoto; Jatowt, Adam & Tanaka, Katsumi Easiest-first search: Towards comprehension-based web search ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,054]
Although Web search engines have become information gateways to the Internet, for queries containing technical terms, search results often contain pages that are difficult to be understood by non-expert users. Therefore, re-ranking search results in a descending order of their comprehensibility should be effective for non-expert users. In our approach, the comprehensibility of Web pages is estimated considering both the document readability and the difficulty of technical terms in the domain of search queries. To extract technical terms, we exploit the domain knowledge extracted from Wikipedia. Our proposed method can be applied to general Web search engines as Wikipedia includes nearly every field of human knowledge. We demonstrate the usefulness of our approach by user experiments.
Grineva, Maria; Grinev, Maxim & Lizorkin, Dmitry Effective extraction of thematically grouped key terms from text Social Semantic Web: Where Web 2.0 Meets Web 3.0 - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states 2009
We present a novel method for extraction of key terms from text documents. The important and novel feature of our method is that it produces groups of key terms, while each group contains key terms semantically related to one of the main themes of the document. Our method bases on a com-bination of the following two techniques: Wikipedia-based semantic relatedness measure of terms and algorithm for detecting community structure of a network. One of the advantages of our method is that it does not require any training, as it works upon the Wikipedia knowledge base. Our experimental evaluation using human judgments shows that our method produces key terms with high precision and recall. 2009, Association for the Advancement of Artificial Intelligence.
Keegan, Brian & Gergle, Darren Egalitarians at the gate: One-sided gatekeeping practices in social media 2010 ACM Conference on Computer Supported Cooperative Work, CSCW 2010, February 6, 2010 - February 10, 2010 Savannah, GA, United states 2010 [1,055]
Although Wikipedia has increasingly attracted attention for its in-depth and timely coverage of breaking news stories, the social dynamics of how Wikipedia editors process breaking news items has not been systematically examined. Through a 3-month study of 161 deliberations over whether a news item should appear on Wikipedia's front page, we demonstrate that elite users fulfill a unique gatekeeping role that permits them to leverage their community position to block the promotion of inappropriate items. However, these elite users are unable to promote their supported news items more effectively than other types of editors. These findings suggest that one-sided gatekeeping" may reflect a crucial stasis in social media where the community has to balance the experience of its elite users while encouraging contributions from non-elite users. "
Carmel, David; Roitman, Haggai & Zwerdling, Naama Enhancing cluster labeling using wikipedia 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, July 19, 2009 - July 23, 2009 Boston, MA, United states 2009 [1,056]
This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text pure statistical methods have difficulty in identifying them as good descriptors. Furthermore our experiments show that for more than 85\% of the clusters in our test collection the manual label (or an inflection
Hu, Jian; Fang, Lujun; Cao, Yang; Zeng, Hua-Jun; Li, Hua; Yang, Qiang & Chen, Zheng Enhancing text clustering by leveraging wikipedia semantics 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008, July 20, 2008 - July 24, 2008 Singapore, Singapore 2008 [1,057]
Most traditional text clustering methods are based on bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW} however ignores the important information on the semantic relationships between key terms. To overcome this problem several methods have been proposed to enrich text representation with external resource in the past such as WordNet.} However
Sorg, Philipp & Cimiano, Philipp Enriching the crosslingual link structure of wikipedia - A classification-based approach 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
The crosslingual link structure of Wikipedia represents a valuable resource which can be exploited for crosslingual natural language processing applications. However, this requires that it has a reasonable coverage and is furthermore accurate. For the specific language pair German/English} that we consider in our experiments, we show that roughly 50\% of the articles are linked from German to English and only 14\% from English to German. These figures clearly corroborate the need for an approach to automatically induce new cross-language links, especially in the light of such a dynamically growing resource such as Wikipedia. In this paper we present a classification-based approach with the goal of inferring new cross-language links. Our experiments show that this approach has a recall of 70\% with a precision of 94\% for the task of learning cross-language links on a test dataset.
Pennacchiotti, Marco & Pantel, Patrick Entity extraction via ensemble semantics Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1 2009 [1,058]
Combining information extraction systems yields significantly higher quality resources than each system in isolation. In this paper, we generalize such a mixing of sources and features in a framework called Ensemble Semantics. We show very large gains in entity extraction by combining state-of-the-art distributional and pattern-based systems with a large set of features from a webcrawl, query logs, and Wikipedia. Experimental results on a web-scale extraction of actors, athletes and musicians show significantly higher mean average precision scores (29\% gain) compared with the current state of the art.
Xu, Yang; Ding, Fan & Wang, Bin Entity-based query reformulation using Wikipedia 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [1,059]
Many real world applications increasingly involve both structured data and text, and entity based retrieval is an important problem in this realm. In this paper, we present an automatic query reformulation approach based on entities detected in each query. The aim is to utilize semantics associated with entities for enhancing document retrieval. This is done by expanding a query with terms/phrases related to entities in the query. We exploit Wikipedia as a large repository of entity information. Our reformulated approach consists of three major steps : (1) detect representative entity in a query; (2) expand the query with entity related terms/phrases; and (3) facilitate term dependency features. We evaluate our approach in ad-hoc retrieval task on four TREC} collections, including two large web collections. Experiments results show that significant improvement is possible by utilizing entity corresponding information.
Bast, Holger; Chitea, Alexandru; Suchanek, Fabian & Weber, Ingmar ESTER: Efficient search on text, entities, and relations 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands 2007 [1,060]
We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER} builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER} can answer basic SPARQL} graph-pattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER} further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER} to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO} ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER} achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB. ""
Moturu, Sai T. & Liu, Huan Evaluating the trustworthiness of Wikipedia articles through quality and credibility 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,061]
Wikipedia has become a very popular destination for Web surfers seeking knowledge about a wide variety of subjects. While it contains many helpful articles with accurate information, it also consists of unreliable articles with inaccurate or incomplete information. A casual observer might not be able to differentiate between the good and the bad. In this work, we identify the necessity and challenges for trust assessment in Wikipedia, and propose a framework that can help address these challenges by identifying relevant features and providing empirical means to meet the requirements for such an evaluation. We select relevant variables and perform experiments to evaluate our approach. The results demonstrate promising performance that is better than comparable approaches and could possibly be replicated with other social media applications.
Cimiano, Philipp; Schultz, Antje; Sizov, Sergej; Sorg, Philipp & Staab, Steffen Explicit versus latent concept models for cross-language information retrieval Proceedings of the 21st international jont conference on Artifical intelligence 2009 [1,062]
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such as WordNet, latent topics derived from the data itself - as in Latent Semantic Indexing (LSI) or (Latent} Dirichlet Allocation (LDA) - to Wikipedia articles as proxies for concepts, as in the recently proposed Explicit Semantic Analysis (ESA) model. A crucial question which has not been answered so far is whether models based on explicitly given concepts (as in the ESA} model for instance) perform inherently better than retrieval models based on latent" concepts (as in LSI} and/or LDA).} In this paper we investigate this question closer in the context of a cross-language setting which inherently requires concept-based retrieval bridging between different languages. In particular we compare the recently proposed ESA} model with two latent models (LSI} and LDA) showing that the former is clearly superior to the both. From a general perspective our results contribute to clarifying the role of explicit vs. implicitly derived or latent concepts in (cross-language) information retrieval research."
Sánchez, Liliana Mamani; Li, Baoli & Vogel, Carl Exploiting CCG structures with tree kernels for speculation detection Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [1,063]
Our CoNLL-2010} speculative sentence detector disambiguates putative keywords based on the following considerations: a speculative keyword may be composed of one or more word tokens; a speculative sentence may have one or more speculative keywords; and if a sentence contains at least one real speculative keyword, it is deemed speculative. A tree kernel classifier is used to assess whether a potential speculative keyword conveys speculation. We exploit information implicit in tree structures. For prediction efficiency, only a segment of the whole tree around a speculation keyword is considered, along with morphological features inside the segment and information about the containing document. A maximum entropy classifier is used for sentences not covered by the tree kernel classifier. Experiments on the Wikipedia data set show that our system achieves 0.55 F-measure (in-domain).
Billerbeck, Bodo; Demartini, Gianluca; Firan, Claudiu S.; Iofciu, Tereza & Krestel, Ralf Exploiting click-through data for entity retrieval 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [1,064]
We present an approach for answering Entity Retrieval queries using click-through information in query log data from a commercial Web search engine. We compare results using click graphs and session graphs and present an evaluation test set making use of Wikipedia List} of" pages. 2010 ACM.}"
Hu, Xia; Sun, Nan; Zhang, Chao & Chua, Tat-Seng Exploiting internal and external semantics for the clustering of short texts using world knowledge ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,065]
Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term occurring information, traditional text representation methods, such as bag of words model have several limitations when directly applied to short texts tasks. In this paper we propose a novel framework to improve the performance of short texts clustering by exploiting the internal semantics from original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases - Wikipedia and WordNet.} Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods. "
Pehcevski, Jovan; Vercoustre, Anne-Marie & Thom, James A. Exploiting locality of wikipedia links in entity ranking 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom 2008 [1,066]
Information retrieval from web and XML} document collections is ever more focused on returning entities instead of web pages or XML} elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML} document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML} document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking. 2008 Springer-Verlag} Berlin Heidelberg.
Ponzetto, Simone Paolo & Strube, Michael Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics 2006 [1,067]
In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet} and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns.
Milne, David N. Exploiting web 2.0 forallknowledge-based information retrieval Proceedings of the ACM first Ph.D. workshop in CIKM 2007 [1,068]
This paper describes ongoing research into obtaining and using knowledge bases to assist information retrieval. These structures are prohibitively expensive to obtain manually, yet automatic approaches have been researched for decades with limited success. This research investigates a potential shortcut: a way to provide knowledge bases automatically, without expecting computers to replace expert human indexers. Instead we aim to replace the professionals with thousands or even millions of amateurs: with the growing community of contributors who form the core of Web 2.0. Specifically we focus on Wikipedia, which represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide manually-defined yet inexpensive knowledge-bases that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We are also concerned with how best to make these structures available to users, and aim to produce a complete knowledge-based retrieval system-both the knowledge base and the tools to apply it-that can be evaluated by how well it assists real users in performing realistic and practical information retrieval tasks. To this end we have developed Koru, a new search engine that offers concrete evidence of the effectiveness of our Web 2.0 based techniques for assisting information retrieval.
Hu, Xiaohua; Zhang, Xiaodan; Lu, Caimei; Park, E.K. & Zhou, Xiaohua Exploiting wikipedia as external knowledge for document clustering 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, June 28, 2009 - July 1, 2009 Paris, France 2009 [1,069]
In traditional text clustering methods, documents are represented as bags of words" without considering the semantic information of each document. For instance if two documents use different collections of core words to represent the same topic they may be falsely assigned to different clusters due to the lack of shared core words although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited even for WordNet} or Mesh
Winter, Judith Exploiting XML structure to improve information retrieval in peer-to-peer systems Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval 2008 [1,070]
With the advent of XML} as a standard for representation and exchange of structured documents, a growing amount of XML-documents} are being stored in Peer-to-Peer} (P2P) networks. Cur¬rent research on P2P} search engines proposes the use of Informa¬tion Retrieval (IR) techniques to perform content-based search, but does not take into account structural features of documents. P2P} systems typically have no central index, thus avoiding single-points-of-failures, but distribute all information among participating peers. Accordingly, a querying peer has only limited access to the index information and should select carefully which peers can help answering a given query by contributing resources such as local index information or CPU} time for ranking computations. Bandwidth consumption is a major issue. To guarantee scalability, P2P} systems have to reduce the number of peers involved in the retrieval process. As a result, the retrieval quality in terms of recall and precision may suffer substantially. In the proposed thesis, document structure is considered as an extra source of information to improve the retrieval quality of XML-documents} in a P2P} environment. The thesis centres on the following questions: how can structural information help to improve the retrieval of XML-documents} in terms of result quality such as precision, recall, and specificity? Can XML} structure support the routing of queries in distributed environments, especially the selection of promising peers? How can XML} IR} techniques be used in a P2P} network while minimizing bandwidth consumption and considering performance aspects? To answer these questions and to analyze possible achievements, a search engine is proposed that exploits structural hints expressed explicitly by the user or implicitly by the self-describing structure of XML-documents.} Additionally, more focused and specific results are obtained by providing ranked retrieval units that can be either XML-documents} as a whole or the most relevant passages of theses documents. XML} information retrieval techniques are applied in two ways: to select those peers participating in the retrieval process, and to compute the relevance of documents. The indexing approach includes both content and structural information of documents. To support efficient execution of multi term queries, index keys consist of rare combinations of (content, structure)-tuples. Performance is increased by using only fixedsized posting lists: frequent index keys are combined with each other iteratively until the new combination is rare, with a posting list size under a pre-set threshold. All posting lists are sorted by taking into account classical IR} measures such as term frequency and inverted term frequency as well as weights for potential retrieval units of a document, with a slight bias towards documents on peers with good collections regarding the current index key and with good peer characteristics such as online times, available bandwidth, and latency. When extracting the posting list for a specific query, a re-ordering on the posting list is performed that takes into account the structural similarity between key and query. According to this preranking, peers are selected that are expected to hold information about potentially relevant documents and retrieval units The final ranking is computed in parallel on those selected peers. The computation is based on an extension of the vector space model and distinguishes between weights for different structures of the same content. This allows weighting XML} elements with respect to their discriminative power, e.g. a title will be weighted much higher than a footnote. Additionally, relevance is computed as a mixture of content relevance and structural similarity between a given query and a potential retrieval unit. Currently, a first prototype for P2P} Information Retrieval of XML-documents} called SPIRIX} is being implemented. Experiments to evaluate the proposed techniques and use of structural hints will be performed on a distributed version of the INEX} Wikipedia Collection.
Grineva, Maria; Grinev, Maxim & Lizorkin, Dmitry Extracting key terms from noisy and multitheme documents Proceedings of the 18th international conference on World wide web 2009 [1,071]
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected subgraphs or communities, while non-important terms fall into weakly interconnected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two advantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise information in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods.
Shnarch, Eyal; Barak, Libby & Dagan, Ido Extracting lexical reference rules from Wikipedia Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 2009 [1,072]
This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically constructed baselines in a couple of lexical expansion and matching tasks. Our rule-base yields comparable performance to Word-Net} while providing largely complementary information.
Davidov, Dmitry & Rappoport, Ari Extraction and approximation of numerical attributes from the Web Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,073]
We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet} similarity information. First, we obtain from the Web and WordNet} a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet} enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
Pasca, Marius Extraction of open-domain class attributes from text: building blocks for faceted search Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010 [1,074]
Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., top speed" "body style" and "number of cylinders" for the class of "sports cars") from text is a particularly appealing task and has received much attention recently given its natural fit as a building block towards the far-reaching goal of constructing knowledge bases from text. This tutorial provides an overview of extraction methods developed in the area of Web-based information extraction with the purpose of acquiring attributes of open-domain classes. The attributes are extracted for classes organized either as a flat set or hierarchically. The extraction methods operate over unstructured or semi-structured text available within collections of Web documents or over relatively more intriguing data sources consisting of anonymized search queries. The methods take advantage of weak supervision provided in the form of seed examples or small amounts of annotated data
Li, Chengkai; Yan, Ning; Roy, Senjuti B.; Lisham, Lekhendro & Das, Gautam Facetedpedia: Dynamic generation of query-dependent faceted interfaces for Wikipedia 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,075]
This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieval systems, Facetedpedia is fully automatic and dynamic in both facet generation and hierarchy construction, and the facets are based on the rich semantic information from Wikipedia. The essence of our approach is to build upon the collaborative vocabulary in Wikipedia, more specifically the intensive internal structures (hyperlinks) and folksonomy (category system). Given the sheer size and complexity of this corpus, the space of possible choices of faceted interfaces is prohibitively large. We propose metrics for ranking individual facet hierarchies by user's navigational cost, and metrics for ranking interfaces (each with k facets) by both their average pairwise similarities and average navigational costs. We thus develop faceted interface discovery algorithms that optimize the ranking metrics. Our experimental evaluation and user study verify the effectiveness of the system.
Kummerfeld, Jonathan K.; Roesner, Jessika; Dawborn, Tim; Haggerty, James; Curran, James R. & Clark, Stephen Faster parsing by supertagger adaptation Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,076]
We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highest-scoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG} supertagger and parser, obtaining significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG} parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.
Kuhlman, C.J.; Kumar, V.S.A.; Marathe, M.V.; Ravi, S.S. & Rosenkrantz, D.J. Finding critical nodes for inhibiting diffusion of complex contagions in social networks Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2010, 20-24 Sept. 2010 Berlin, Germany 2010 [1,077]
We study the problem of inhibiting diffusion of complex contagions such as rumors, undesirable fads and mob behavior in social networks by removing a small number of nodes (called critical nodes) from the network. We show that, in general, for any 1, even obtaining a p-approximate solution to these problems is NP-hard.} We develop efficient heuristics for these problems and carry out an empirical study of their performance on three well known social networks, namely epinions, wikipedia and slashdot. Our results show that the heuristics perform well on the three social networks.
Ganter, Viola & Strube, Michael Finding hedges by chasing weasels: hedge detection using Wikipedia tags and shallow linguistic features Proceedings of the ACL-IJCNLP 2009 Conference Short Papers 2009 [1,078]
We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.
Ollivier, Yann & Senellart, Pierre Finding related pages using green measures: an illustration with Wikipedia AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada 2007
We introduce a new method for finding nodes semantically related to a given node in a hyperlinked graph: the Green method, based on a classical Markov chain tool. It is generic, adjustment-free and easy to implement. We test it in the case of the hyperlink structure of the English version of Wikipedia, the on-line encyclopedia. We present an extensive comparative study of the performance of our method versus several other classical methods in the case of Wikipedia. The Green method is found to have both the best average results and the best robustness. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Giuliano, Claudio Fine-grained classification of named entities exploiting latent semantic kernels Proceedings of the Thirteenth Conference on Computational Natural Language Learning 2009 [1,079]
We present a kernel-based approach for fine-grained classification of named entities. The only training data for our algorithm is a few manually annotated entities for each class. We defined kernel functions that implicitly map entities, represented by aggregating all contexts in which they occur, into a latent semantic space derived from Wikipedia. Our method achieves a significant improvement over the state of the art for the task of populating an ontology of people, although requiring considerably less training instances than previous approaches.
Zimmer, Christian; Bedathur, Srikanta & Weikum, Gerhard Flood little, cache more: effective result-reuse in P2P IR systems Proceedings of the 13th international conference on Database systems for advanced applications 2008 [1,080]
State-of-the-art Peer-to-Peer} Information Retrieval (P2P} IR) systems suffer from their lack of response time guarantee especially with scale. To address this issue, a number of techniques for caching of multi-term inverted list intersections and query results have been proposed recently. Although these enable speedy query evaluations with low network overheads, they fail to consider the potential impact of caching on result quality improvements. In this paper, we propose the use of a cache-aware query routing scheme, that not only reduces the response delays for a query, but also presents an opportunity to improve the result quality while keeping the network usage low. In this regard, we make threefold contributions in this paper. First of all, we develop a cache-aware, multiround query routing strategy that balances between query efficiency and result-quality. Next, we propose to aggressively reuse the cached results of even subsets of a query towards an approximate caching technique that can drastically reduce the bandwidth overheads, and study the conditions under which such a scheme can retain good result-quality. Finally, we empirically evaluate these techniques over a fully functional P2P} IR} system, using a large-scale Wikipedia benchmark, and using both synthetic and real-world query workloads. Our results show that our proposal to combine result caching with multi-round, cache-aware query routing can reduce network traffic by more than half while doubling the result quality.
Kangpyo, Lee; Hyunwoo, Kim; Chungsu, Jang & Kim, Hyoung-Joo FolksoViz: A subsumption-based folksonomy visualization using wikipedia texts 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [1,081]
In this paper, targeting del.icio.us tag data, we propose a method, FolksoViz, for deriving subsumption relationships between tags by using Wikipedia texts, and visualizing a folksonomy. To fulfill this method, we propose a statistical model for deriving subsumption relationships based on the frequency of each tag on the Wikipedia texts, as well as the TSD} (Tag} Sense Disambiguation) method for mapping each tag to a corresponding Wikipedia text. The derived subsumption pairs are visualized effectively on the screen. The experiment shows that the FolksoViz} manages to find the correct subsumption pairs with high accuracy.
Pentzold, Christian & Seidenglanz, Sebastian Foucault@Wiki first steps towards a conceptual framework for the analysis of Wiki discourses WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark 2006 [1,082]
In this paper, we examine the discursive situation of Wikipedia. The primary goal is to explore principle ways of analyzing and characterizing the various forms of communicative user interaction using Foucault's discourse theory. First, the communicative situation of Wikipedia is addressed and a list of possible forms of communication is compiled. Second, the current research on the linguistic features of Wikis, especially Wikipedia, is reviewed. Third, some key issues of Foucault's theory are explored: the notion of 'discourse', the discursive formation, and the methods of archaeology and genealogy, respectively. Finally, first steps towards a qualitative discourse analysis of the English Wikipedia are elaborated. The paper argues, that Wikipedia can be understood as a discursive formation that regulates and structures the production of statements. Most of the discursive regularities named by Foucault are established in the collaborative writing processes of Wikipedia, too. Moreover, the editing processes can be described in Foucault's terms as discursive knowledge production.
Bollacker, Kurt; Cook, Robert & Tufts, Patrick Freebase: A shared database of structured general human knowledge AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada 2007
Freebase is a practical, scalable, graph-shaped database of structured general human knowledge, inspired by Semantic Web research and collaborative data communities such as the Wikipedia. Freebase allows public read and write access through an {HTTP-based} graph-query API} for research, the creation and maintenance of structured data, and application building. Access is free and all data in Freebase has a very open (e.g. Creative Commons, GFDL) license. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Weikum, Gerhard & Theobald, Martin From information to knowledge: Harvesting entities and relationships from web sources 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6, 2010 - June 11, 2010 Indianapolis, IN, United states 2010 [1,083]
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA} project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.
Bu, Fan; Zhu, Xingwei; Hao, Yu & Zhu, Xiaoyan Function-based question classification for general QA Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 2010 [1,084]
In contrast with the booming increase of internet data, state-of-art QA} (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA} system for its capability to answer questions of various sources, integrated from existed specialized Sub-QA} engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA.} In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA} and a question classifier based on MLN} (Markov} logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches.
Hecht, Brent; Starosielski, Nicole & Dara-Abrams, Drew Generating educational tourism narratives from wikipedia 2007 AAAI Fall Symposium, November 9, 2007 - November 11, 2007 Arlington, VA, United states 2007
We present a narrative theory-based approach to data mining that generates cohesive stories from a Wikipedia corpus. This approach is based on a data mining-friendly view of narrative derived from narratology, and uses a prototype mining algorithm that implements this view. Our initial test case and focus is that of field-based educational tour narrative generation, for which we have successfully implemented a proof-of-concept system called Minotour. This system operates on a client-server model, in which the server mines a Wikipedia database dump to generate narratives between any two spatial features that have associated Wikipedia articles. The server then delivers those narratives to mobile device clients.
Aker, Ahmet & Gaizauskas, Robert Generating image descriptions using dependency relational patterns Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,085]
This paper presents a novel approach to automatic captioning of geo-tagged images by summarizing multiple web-documents that contain information related to an image's location. The summarizer is biased by dependency pattern models towards sentences which contain features typically provided for different scene types such as those of churches, bridges, etc. Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE} scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries. Summaries generated using dependency patterns also lead to more readable summaries than those generated without dependency patterns.
Li, Peng; Jiang, Jing & Wang, Yinglin Generating templates of entity summaries with an entity-aspect model and pattern mining Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,086]
In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA} model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.
Overell, Simon E & Ruger, Stefan Geographic co-occurrence as a tool for GIR 4th ACM Workshop on Geographical Information Retrieval, GIR '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007 [1,087]
In this paper we describe the development of a geographic co-occurrence model and how it can be applied to geographic information retrieval. The model consists of mining co-occurrences of placenames from Wikipedia, and then mapping these placenames to locations in the Getty Thesaurus of Geographical Names. We begin by quantifying the accuracy of our model and compute theoretical bounds for the accuracy achievable when applied to placename disambiguation in free text. We conclude with a discussion of the improvement such a model could provide for placename disambiguation and geographic relevance ranking over traditional methods.
Song, Yi-Cheng; Zhang, Yong-Dong; Zhang, Xu; Cao, Juan & Li, Jing-Tao Google challenge: Incremental-learning for web video categorization on robust semantic feature space 17th ACM International Conference on Multimedia, MM'09, with Co-located Workshops and Symposiums, October 19, 2009 - October 24, 2009 Beijing, China 2009 [1,088]
With the advent of video sharing websites, the amount of videos on the internet grows rapidly. Web video categorization is an efficient methodology to organize the huge amount of data. In this paper, we propose an effective web video categorization algorithm for the large scale dataset. It includes two factors: 1) For the great diversity of web videos, we develop an effective semantic feature space called Concept Collection for Web Video Categorization (CCWV-CD) to represent web videos, which consists of concepts with small semantic gap and high distinguishing ability. Meanwhile, the online Wikipedia API} is employed to diffuse the concept correlations in this space. 2) We propose an incremental support vector machine with fixed number of support vectors (n-ISVM) to fit the large scale incremental learning problem in web video categorization. Extensive experiments are conducted on the dataset of 80024 most representative videos on YouTube} demonstrate that the semantic space with Wikipedia prorogation is more representative for web videos, and N-ISVM} outperforms other algorithms in efficiency when performs the incremental learning.
Curino, Carlo A.; Moon, Hyun J. & Zaniolo, Carlo Graceful database schema evolution: the PRISM workbench Proceedings of the VLDB Endowment VLDB Endowment Hompage 2008 [1,089]
Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB} Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database. Our PRISM} system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators to express concisely complex schema changes, (ii) tools that allow the DBA} to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM} solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM} tools and their ability to support legacy queries.
Hajishirzi, Hannaneh; Shirazi, Afsaneh; Choi, Jaesik & Amir, Eyal Greedy algorithms for sequential sensing decisions Proceedings of the 21st international jont conference on Artifical intelligence 2009 [1,090]
In many real-world situations we are charged with detecting change as soon as possible. Important examples include detecting medical conditions, detecting security breaches, and updating caches of distributed databases. In those situations, sensing can be expensive, but it is also important to detect change in a timely manner. In this paper we present tractable greedy algorithms and prove that they solve this decision problem either optimally or approximate the optimal solution in many cases. Our problem model is a POMDP} that includes a cost for sensing, a cost for delayed detection, a reward for successful detection, and no-cost partial observations. Making optimal decisions is difficult in general. We show that our tractable greedy approach finds optimal policies for sensing both a single variable and multiple correlated variables. Further, we provide approximations for the optimal solution to multiple hidden or observed variables per step. Our algorithms outperform previous algorithms in experiments over simulated data and live Wikipedia WWW} pages.
Kittur, Aniket & Kraut, Robert E. Harnessing the wisdom of crowds in wikipedia: Quality through coordination 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states 2008 [1,091]
Wikipedia's success is often attributed to the large numbers of contributors who improve the accuracy, completeness and clarity of articles while reducing bias. However, because of the coordination needed to write an article collaboratively, adding contributors is costly. We examined how the number of editors in Wikipedia and the coordination methods they use affect article quality. We distinguish between explicit coordination, in which editors plan the article through communication, and implicit coordination, in which a subset of editors structure the work by doing the majority of it. Adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not. Implicit coordination through concentrating the work was more helpful when many editors contributed, but explicit coordination through communication was not. Both types of coordination improved quality more when an article was in a formative stage. These results demonstrate the critical importance of coordination in effectively harnessing the wisdom of the crowd" in online production environments.
Zheng, Yi; Dai, Qifeng; Luo, Qiming & Chen, Enhong Hedge classification with syntactic dependency features based on an ensemble classifier Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [1,092]
We present our CoNLL-2010} Shared Task system in the paper. The system operates in three steps: sequence labeling, syntactic dependency parsing, and classification. We have participated in the Shared Task 1. Our experimental results measured by the in-domain and cross-domain F-scores on the biological domain are 81.11\% and 67.99\%, and on the Wikipedia domain 55.48\% and 55.41\%.
Clausen, David HedgeHunter: a system for hedge detection and uncertainty classification Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [1,093]
With the dramatic growth of scientific publishing, Information Extraction (IE) systems are becoming an increasingly important tool for large scale data analysis. Hedge detection and uncertainty classification are important components of a high precision IE} system. This paper describes a two part supervised system which classifies words as hedge or non-hedged and sentences as certain or uncertain in biomedical and Wikipedia data. In the first stage, our system trains a logistic regression classifier to detect hedges based on lexical and Part-of-Speech} collocation features. In the second stage, we use the output of the hedge classifier to generate sentence level features based on the number of hedge cues, the identity of hedge cues, and a Bag-of-Words} feature vector to train a logistic regression classifier for sentence level uncertainty. With the resulting classification, an IE} system can then discard facts and relations extracted from these sentences or treat them as appropriately doubtful. We present results for in domain training and testing and cross domain training and testing based on a simple union of training sets.
Kittur, Aniket; Pendleton, Bryan & Kraut, Robert E. Herding the cats: The influence of groups in coordinating peer production 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,094]
Peer production systems rely on users to self-select appropriate tasks and scratch their personal itch". However many such systems require significant maintenance work which also implies the need for collective action that is individuals following goals set by the group and performing good citizenship behaviors. How can this paradox be resolved? Here we examine one potential answer: the influence of social identification with the larger group on contributors' behavior. We examine Wikipedia
Kiayias, Aggelos & Zhou, Hong-Sheng Hidden identity-based signatures Proceedings of the 11th International Conference on Financial cryptography and 1st International conference on Usable Security 2007 [1,095]
This paper introduces Hidden Identity-based Signatures (Hidden-IBS), a type of digital signatures that provide mediated signer-anonymity on top of Shamir's Identity-based signatures. The motivation of our new signature primitive is to resolve an important issue with the kind of anonymity offered by group signatures" where it is required that either the group membership list is public or that the opening authority is dependent on the group manager for its operation. Contrary to this {Hidden-IBS} do not require the maintenance of a group membership list and they enable an opening authority that is totally independent of the group manager. As we argue this makes {Hidden-IBS} much more attractive than group signatures for a number of applications. In this paper we provide a formal model of {Hidden-IBS} as well as two efficient constructions that realize the new primitive. Our elliptic curve construction that is based on the SDH/DLDH} assumptions produces signatures that are merely 4605 bits long and can be implemented very efficiently. To demonstrate the power of the new primitive we apply it to solve a problem of current onion-routing systems focusing on the Tor system in particular. Posting through Tor is currently blocked by sites such as Wikipedia due to the real concern that anonymous channels can be used to vandalize online content. By injecting a {Hidden-IBS} inside the header of an {HTTP} POST} request and requiring the exit-policy of Tor to forward only properly signed POST} requests we demonstrate how sites like Wikipedia may allow anonymous posting while being ensured that the recovery of (say) the IP} address of a vandal would be still possible through a dispute resolution system. Using our new {Hidden-IBS} primitive in this scenario allows to keep the listing of identities (e.g.
Scarpazza, Daniele Paolo & Russell, Gregory F. High-performance regular expression scanning on the Cell/B.E. processor Proceedings of the 23rd international conference on Supercomputing 2009 [1,096]
Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30\% or more of most XML} processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD} units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD} instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E.} processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2\% of the clock cycles are Non-NOP} issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled} processors, and there is a general trend toward more and wider SIMD} instructions in architecture design.
Beesley, Angela How and why Wikipedia works WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark 2006 [1,097]
This talk discusses the inner workings of Wikipedia. Angela will address the roles, processes, and sociology that make up the project, with information on what happens behind the scenes and how the community builds and defends its encyclopedia on a daily basis. The talk will give some insight into why Wikipedia has worked so far and why we believe it will keep working in the the future despite the many criticisms that can be made of it. It is hoped that this review inspires further Wikipedia research. For this, please also see our Wikipedia Research workshop on Wednesday, which is open to walk-ins.
Riehle, Dirk How and why Wikipedia works: An interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark 2006 [1,098]
This article presents an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko. All three are leading Wikipedia practitioners in the English, German, and Japanese Wikipedias and related projects. The interview focuses on how Wikipedia works and why these three practitioners believe it will keep working. The interview was conducted via email in preparation of WikiSym} 2006, the 2006 International Symposium on Wikis, with the goal of furthering Wikipedia research [1]. Interviewer was Dirk Riehle, the chair of WikiSym} 2006. An online version of the article provides simplified access to URLs} [2].
Xu, Sean & Zhang, Xiaoquan How Do Social Media Shape the Information Environment in the Financial Market? 2009 [1,099]
Lindholm, Tancred & Kangasharju, Jaakko How to edit gigabyte XML files on a mobile phone with XAS, RefTrees, and RAXS Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services 2008 [1,100]
The Open Source mobility middleware developed in the Fuego Core project provides a stack for efficient XML} processing on limited devices. Its components are a persistent map API, advanced XML} serialization and out-of-order parsing with byte-level access (XAS), data structures and algorithms for lazy manipulation and random access to XML} trees (RefTree), and a component for XML} document management (RAXS) such as packaging, versioning, and synchronization. The components provide a toolbox of simple and lightweight XML} processing techniques rather than a complete XML} database. We demonstrate the Fuego XML} stack by building a viewer and multiversion editor capable of processing gigabyte-sized Wikipedia XML} files on a mobile phone. We present performance measurements obtained on the phone, and a comparison to implementations based on existing technologies. These show that the Fuego XML} stack allows going beyond what is commonly considered feasible on limited devices in terms of XML} processing, and that it provides advantages in terms of decreased set-up time and storage space requirements compared to existing approaches.
Medelyan, Olena; Frank, Eibe & Witten, Ian H. Human-competitive tagging using automatic keyphrase extraction Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 2009 [1,101]
This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, Maui"} that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers."
Yamada, Ichiro; Torisawa, Kentaro; Kazama, Jun'ichi; Kuroda, Kow; Murata, Masaki; Saeger, Stijn De; Bond, Francis & Sumida, Asuka Hypernym discovery based on distributional similarity and hierarchical structures Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2 2009 [1,102]
This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verb-noun dependencies (which we call RVD") and the other is based on a large-scale clustering of verb-noun dependencies (called {"CVD").} Our method achieved an attachment accuracy of 91.0\% for the top 10000 relations and an attachment accuracy of 74.5\% for the top 100000 relations when using CVD.} This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores
Iftene, Adrian & Balahur-Dobrescu, Alexandra Hypothesis transformation and semantic variability rules used in recognizing textual entailment Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing 2007 [1,103]
Based on the core approach of the tree edit distance algorithm, the system central module is designed to target the scope of TE} -- semantic variability. The main idea is to transform the hypothesis making use of extensive semantic knowledge from sources like DIRT, WordNet, Wikipedia, acronyms database. Additionally, we built a system to acquire the extra background knowledge needed and applied complex grammar rules for rephrasing in English.
Wen, Dunwei; Liu, Ming-Chi; Huang, Yueh-Min; Kinshuk & Hung, Pi-Hsia Identifying Animals with Dynamic Location-aware and Semantic Hierarchy-Based Image Browsing for Different Cognitive Style Learners Advanced Learning Technologies (ICALT), 2010 IEEE 10th International Conference on 2010
Lack of overall ecological knowledge structure is a critical reason for learners' failure in keyword-based search. To address this issue, this paper firstly presents the dynamic location-aware and semantic hierarchy (DLASH) designed for the learners to browse images, which aims to identify learners' current interesting sights and provide adaptive assistance accordingly in ecological learning. The main idea is based on the observation that the species of plants and animals are discontinuously distributed around the planet, and hence their semantic hierarchy, besides its structural similarity with WordNet, is related to location information. This study then investigates how different cognitive styles of the learners influence the use of DLASH} in their image browsing. The preliminary results show that the learners perform better when using DLASH} based image browsing than using the Flickr one. In addition, cognitive styles have more effects on image browsing in the DLASH} version than in the Flickr one.
Lipka, Nedim & Stein, Benno Identifying featured articles in Wikipedia: Writing style matters 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,104]
Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task Is} an article featured or not?" we present a machine learning approach that exploits an article's character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust straightforward to implement and outperforms existing solutions. We underpin these claims by an experiment design where among others the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. 2010 Copyright is held by the author/owner(s)."
Chang, Ming-Wei; Ratinov, Lev; Roth, Dan & Srikumar, Vivek Importance of semantic representation: Dataless classification 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states 2008
Traditionally, text categorization has been studied as the problem of training of a classifier using labeled data. However, people can categorize documents into named categories without any explicit training because we know the meaning of category names. In this paper, we introduce Dataless Classification, a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts. We propose a model for dataless classification and show that the label name alone is often sufficient to induce classifiers. Using Wikipedia as our source of world knowledge, we get 85.29\% accuracy on tasks from the 20 Newsgroup dataset and 88.62\% accuracy on tasks from a Yahoo! Answers dataset without any labeled or unlabeled data from the datasets. With unlabeled data, we can further improve the results and show quite competitive performance to a supervised learning algorithm that uses 100 labeled examples. Copyright 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Fuxman, Ariel; Kannan, Anitha; Goldberg, Andrew B.; Agrawal, Rakesh; Tsaparas, Panayiotis & Shafer, John Improving classification accuracy using automatically extracted training data Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 2009 [1,105]
Classification is a core task in knowledge discovery and data mining, and there has been substantial research effort in developing sophisticated classification models. In a parallel thread, recent work from the NLP} community suggests that for tasks such as natural language disambiguation even a simple algorithm can outperform a sophisticated one, if it is provided with large quantities of high quality training data. In those applications, training data occurs naturally in text corpora, and high quality training data sets running into billions of words have been reportedly used. We explore how we can apply the lessons from the NLP} community to KDD} tasks. Specifically, we investigate how to identify data sources that can yield training data at low cost and study whether the quantity of the automatically extracted training data can compensate for its lower quality. We carry out this investigation for the specific task of inferring whether a search query has commercial intent. We mine toolbar and click logs to extract queries from sites that are predominantly commercial (e.g., Amazon) and non-commercial (e.g., Wikipedia). We compare the accuracy obtained using such training data against manually labeled training data. Our results show that we can have large accuracy gains using automatically extracted training data at much lower cost.
MacKinnon, Ian & Vechtomova, Olga Improving complex interactive question answering with Wikipedia anchor text 30th Annual European Conference on Information Retrieval, ECIR 2008, March 30, 2008 - April 3, 2008 Glasgow, United kingdom 2008 [1,106]
When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document will often not be used in the most relevant information nugget in the document. In this paper, a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected automatically. Evaluated with the Nuggeteer automatic scoring software, an increase in the F-scores is found from the TREC} Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. 2008 Springer-Verlag} Berlin Heidelberg.
Wang, Pu; Hu, Jian; Zeng, Hua-Jun; Chen, Lijun & Chen, Zheng Improving text classification by using encyclopedia knowledge 7th IEEE International Conference on Data Mining, ICDM 2007, October 28, 2007 - October 31, 2007 Omaha, NE, United states 2007 [1,107]
The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the bag of words" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem we integrate background knowledge - in our application: Wikipedia - into the process of classifying text documents. The experimental evaluation on Reuters newsfeeds and several other corpus shows that our classification results with encyclopedia knowledge are much better than the baseline "bag of words" methods. "
Li, Yinghao; Luk, Wing Pong Robert; Ho, Kei Shiu Edward & Chung, Fu Lai Korris Improving weak ad-hoc queries using wikipedia asexternal corpus 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, July 23, 2007 - July 27, 2007 Amsterdam, Netherlands 2007 [1,108]
In an ad-hoc retrieval task, the query is usually short and the user expects to find the relevant documents in the first several result pages. We explored the possibilities of using Wikipedia's articles as an external corpus to expand ad-hoc queries. Results show promising improvements over measures that emphasize on weak queries.
Wan, Stephen & Paris, Cécile In-browser summarisation: generating elaborative summaries biased towards the reading context Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers 2008 [1,109]
We investigate elaborative summarisation, where the aim is to identify supplementary information that expands upon a key fact. We envisage such summaries being useful when browsing certain kinds of (hyper-)linked document sets, such as Wikipedia articles or repositories of publications linked by citations. For these collections, an elaborative summary is intended to provide additional information on the linking anchor text. Our contribution in this paper focuses on identifying and exploring a real task in which summarisation is situated, realised as an In-Browser} tool. We also introduce a neighbourhood scoring heuristic as a means of scoring matches to relevant passages of the document. In a preliminary evaluation using this method, our summarisation system scores above our baselines and achieves a recall of 57\% annotated gold standard sentences.
Wu, Fei; Hoffmann, Raphael & Weld, Daniel S. Information extraction from Wikipedia: Moving down the long tail 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, August 24, 2008 - August 27, 2008 Las Vegas, NV, United states 2008 [1,110]
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
Wagner, Christian & Prasarnphanich, Pattarawan Innovating collaborative content creation: The role of altruism and wiki technology 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states 2007 [1,111]
Wikipedia demonstrates the feasibility and success of an innovative form of content creation, namely openly shared, collaborative writing. This research sought to understand the success of Wikipedia as a collaborative model, considering both technology and participant motivations. The research finds that while participants have both individualistic and collaborative motives, collaborative (altruistic) motives dominate. The collaboration model differs from that of open source software development, which is less inclusive with respect to participation, and more selfish" with respect to contributor motives. The success of the Wikipedia model appears to be related to wiki technology and the "wiki way" of collaboration. "
Medelyan, Olena & Legg, Catherine Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Integration of ontologies begins with establishing mappings between their concept entries. We map categories from the largest manually-built ontology, Cyc, onto Wikipedia articles describing corresponding concepts. Our method draws both on Wikipedia's rich but chaotic hyperlink structure and Cyc's carefully defined taxonomic and common-sense knowledge. On 9,333 manual alignments by one person, we achieve an F-measure of 90\%; on 100 alignments by six human subjects the average agreement of the method with the subject is close to their agreement with each other. We cover 62.8\% of Cyc categories relating to common-sense knowledge and discuss what further information might be added to Cyc given this substantial new alignment. Copyright 2008.
Medelyan, Olena & Legg, Catherine Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Integration of ontologies begins with establishing mappings between their concept entries. We map categories from the largest manually-built ontology, Cyc, onto Wikipedia articles describing corresponding concepts. Our method draws both on Wikipedia's rich but chaotic hyperlink structure and Cyc's carefully defined taxonomic and common-sense knowledge. On 9,333 manual alignments by one person, we achieve an F-measure of 90\%; on 100 alignments by six human subjects the average agreement of the method with the subject is close to their agreement with each other. We cover 62.8\% of Cyc categories relating to common-sense knowledge and discuss what further information might be added to Cyc given this substantial new alignment. Copyright 2008.
Weld, Daniel S.; Wu, Fei; Adar, Eytan; Amershi, Saleema; Fogarty, James; Hoffmann, Raphael; Patel, Kayur & Skinner, Michael Intelligence in wikipedia 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08, July 13, 2008 - July 17, 2008 Chicago, IL, United states 2008
The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC).} Since IE} and CCC} are each powerful ways to produce large amounts of structured information, they have been studied extensively - but only in isolation. By combining the two methods in a virtuous feedback cycle, we aim for substantial synergy. While previous papers have described the details of individual aspects of our endeavor [25, 26, 24, 13], this report provides an overview of the project's progress and vision. Copyright 2008.
Halpin, Harry Is there anything worth finding on the semantic web? Proceedings of the 18th international conference on World wide web 2009 [1,112]
There has recently been an upsurge of interest in the possibilities of combining structured data and ad-hoc information retrieval from traditional hypertext. In this experiment, we run queries extracted from a query log of a major search engine against the Semantic Web to discover if the Semantic Web has anything of interest to the average user. We show that there is indeed much information on the Semantic Web that could be relevant for many queries for people, places and even abstract concepts, although they are overwhelmingly clustered around a Semantic Web-enabled export of Wikipedia known as DBPedia.
Sato, Issei & Nakagawa, Hiroshi Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12, 2007 - August 15, 2007 San Jose, CA, United states 2007 [1,113]
Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a Meta-data.Therefore, it is more and more important to analyze a relationship between a document and topics assigned to the document. In this paper, we proposed a novel probabilistic generative model of documents with multiple topics as a meta-data. By focusing on modeling the generation process of a document with multiple topics, we can extract specific properties of documents with multiple Topics.Proposed} model is an expansion of an existing probabilistic generative model: Parametric Mixture Model (PMM).} PMM} models documents with multiple topics by mixing model parameters of each single topic. Since, however, PMM} assigns the same mixture ratio to each single topic, PMM} cannot take into account the bias of each topic within a document. To deal with this problem, we propose a model that considers Dirichlet distribution as a prior distribution of the mixture Ratio.We} adopt Variational Bayes Method to infer the bias of each topic within a document. We evaluate the proposed model and PMM} using MEDLINE} Corpus.The} results of F-measure, Precision and Recall show that the proposed model is more effective than PMM} on multiple-topic classification. Moreover, we indicate the potential of the proposed model that extracts topics and document-specific keywords using information about the assigned topics.
Weikum, G. Knowledge on the Web: Robust and Scalable Harvesting of Entity-Relationship Facts Database Systems for Advanced Applications. 15th International Conference, DASFAA 2010, 1-4 April 2010 Berlin, Germany 2010 [1,114]
Summary form only given. The proliferation of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from semistructured and textual Web data have enabled the construction of very large knowledge bases. These knowledge collections contain facts about many millions of entities and relationships between them, and can be conveniently represented in the RDF} data model. Prominent examples are DBpedia, YAGO, Freebase, Trueknowledge, and others. These structured knowledge collections can be viewed as Semantic} Wikipedia Databases" and they can answer many advanced questions by SPARQL-like} query languages and appropriate ranking models. In addition the knowledge bases can boost the semantic capabilities and precision of entity-oriented Web search and they are enablers for value-added knowledge services and applications in enterprises and online communities. The talk discusses recent advances in the large-scale harvesting of entity-relationship facts from Web sources and it points out the next frontiers in building comprehensive knowledge bases and enabling semantic search services. In particular
Ponzetto, Simone Paolo & Navigli, Roberto Knowledge-rich Word Sense Disambiguation rivaling supervised systems Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,115]
One of the main obstacles to high-performance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet} with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD} systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.
Elbassuoni, Shady; Ramanath, Maya; Schenkel, Ralf; Sydow, Marcin & Weikum, Gerhard Language-model-based ranking for queries on RDF-graphs ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,116]
The success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large knowledge repositories" such as DBpedia} Freebase and YAGO.} These collections can be viewed as graphs of entities and relationships (ER} graphs) and can be represented as a set of subject-property-object (SPO) triples in the Semantic-Web} data model RDF.} Queries can be expressed in the W3C-endorsed} SPARQL} language or by similarly designed graph-pattern search. However exact-match query semantics often fall short of satisfying the users' needs by returning too many or too few results. Therefore IR-style} ranking models are crucially needed. In this paper
Ponzetto, Simone Paolo & Navigli, Roberto Large-scale taxonomy mapping for restructuring and integrating wikipedia Proceedings of the 21st international jont conference on Artifical intelligence 2009 [1,117]
We present a knowledge-rich methodology for disambiguating Wikipedia categories with WordNet} synsets and using this semantic information to restructure a taxonomy automatically generated from the Wikipedia system of categories. We evaluate against a manual gold standard and show that both category disambiguation and taxonomy restructuring perform with high accuracy. Besides, we assess these methods on automatically generated datasets and show that we are able to effectively enrich WordNet} with a large number of instances from Wikipedia. Our approach produces an integrated resource, thus bringing together the fine-grained classification of instances in Wikipedia and a well-structured top-level taxonomy from WordNet.
Luther, Kurt & Bruckman, Amy Leadership in online creative collaboration Proceedings of the 2008 ACM conference on Computer supported cooperative work 2008 [1,118]
Leadership plays a central role in the success of many forms of online creative collaboration, yet little is known about the challenges leaders must manage. In this paper, we report on a qualitative study of leadership in three online communities whose members collaborate over the Internet to create computer-animated movies called collabs." Our results indicate that most collabs fail. Collab leaders face two major challenges. First leaders must design collabora-tive projects. Second leaders must manage artists during the collab production process. We contrast these challenges with the available empirical research on leadership in open-source software and Wikipedia identifying four themes: originality completion
Wierzbicki, Adam; Turek, Piotr & Nielek, Radoslaw Learning about team collaboration from wikipedia edit history 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [1,119]
This work presents an evalation method of teams of authors in Wikipedia based on social network analysis. We have created an implicit social network based on the edit history of articles. This network consists of four dimensions: trust, distrust, acquaintance and knowledge. Trust and distrust are based on content modifications (copying and deleting respectively); acquaintance is based on the amount of discussion on articles' talk pages between a given pair of authors and knowledge is based on the categories in which an author typically contributes. As authors edit the Wikipedia, the social network grows and changes to take into account their collaboration patterns, creating a succinct summary of entire edit history.
Pasternack, Jeff & Roth, Dan Learning better transliterations Proceeding of the 18th ACM conference on Information and knowledge management 2009 [1,120]
We introduce a new probabilistic model for transliteration that performs significantly better than previous approaches, is language-agnostic, requiring no knowledge of the source or target languages, and is capable of both generation (creating the most likely transliteration of a source word) and discovery (selecting the most likely transliteration from a list of candidate words). Our experimental results demonstrate improved accuracy over the existing state-of-the-art by more than 10\% in Chinese, Hebrew and Russian. While past work has commonly made use of fixed-size n-gram features along with more traditional models such as {HMM} or Perceptron, we utilize an intuitive notion of productions" where each source word can be segmented into a series of contiguous non-overlapping substrings of any size each of which independently transliterates to a substring in the target language with a given probability. To learn these parameters we employ Expectation-Maximization (EM)
Napoles, Courtney & Dredze, Mark Learning simple Wikipedia: a cogitation in ascertaining abecedarian language Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids 2010 [1,121]
Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST} (Carroll} et al., 1999) and its module SYSTAR} (Canning} et al., 2000)), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.
Phan, Xuan-Hieu; Nguyen, Le-Minh & Horiguchi, Susumu Learning to classify short and sparse text web with hidden topics from large-scale data collections 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [1,122]
This paper presents a general framework for building classifiers that deal with short and sparse text Web segments by making the most of hidden topics discovered from large-scale data collections. The main motivation of this work is that many classification tasks working with short segments of text Web, such as search snippets, forum chat messages, blog news feeds, product reviews, and book movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification task, we collect a large-scale external data collection called universal dataset" and then build a classifier on both a (small) set of labeled training data and a rich set of hidden topics discovered from that data collection. The framework is general enough to be applied to different data domains and genres ranging from Web search results to medical text. We did a careful evaluation on several hundred megabytes of Wikipedia (30M} words) and MEDLINE} (18M} words) with two tasks: {"Web} search domain disambiguation" and "disease categorization for medical text" and achieved significant quality enhancement."
Milne, David & Witten, Ian H. Learning to link with wikipedia 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [1,123]
This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75\%. This performance is constant whether the system is evaluated on Wikipedia articles or -real world- documents. This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words-indexing, clustering, retrieval, and summarization to name a few-could use the techniques described here to draw on a vast network of concepts and semantics.
Druck, Gregory; Miklau, Gerome & McCallum, Andrew Learning to predict the quality of contributions to wikipedia 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality - vandalism, bias, and errors can be problems. Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and good Samaritan" users. As Wikipedia continues to grow however it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist users in creating and maintaining quality. In this paper we propose metrics that quantify the quality of contributions to Wikipedia through implicit feedback from the community. We then learn discriminative probabilistic models that predict the quality of a new edit using features of the changes made the author of the edit
Banerjee, Somnath; Chakrabarti, Soumen & Ramakrishnan, Ganesh Learning to rank for quantity consensus queries Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,124]
Web search is increasingly exploiting named entities like persons, places, businesses, addresses and dates. Entity ranking is also of current interest at INEX} and TREC.} Numerical quantities are an important class of entities, especially in queries about prices and features related to products, services and travel. We introduce Quantity Consensus Queries (QCQs), where each answer is a tight quantity interval distilled from evidence of relevance in thousands of snippets. Entity search and factoid question answering have benefited from aggregating evidence from multiple promising snippets, but these do not readily apply to quantities. Here we propose two new algorithms that learn to aggregate information from multiple snippets. We show that typical signals used in entity ranking, like rarity of query words and their lexical proximity to candidate quantities, are very noisy. Our algorithms learn to score and rankquantity intervals directly, combining snippet quantity and snippet text information. We report on experiments using hundreds of QCQs} with ground truth taken from TREC} QA, Wikipedia Infoboxes, and other sources, leading to tens of thousands of candidate snippets and quantities. Our algorithms yield about 20\% better MAP} and NDCG} compared to the best-known collective rankers, and are 35\% better than scoring snippets independent of each other.
Navigli, Roberto & Velardi, Paola Learning word-class lattices for definition and hypernym extraction Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,125]
Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches -- mostly focused on lexicosyntactic patterns -- suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose Word-Class} Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern generalization methods proposed in the literature.
Ganjisaffar, Yasser; Javanmardi, Sara & Lopes, Cristina Leveraging crowdsourcing heuristics to improve search in Wikipedia 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,126]
Wikipedia articles are usually accompanied with history pages, categories and talk pages. The meta - data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We analyze the quality of search results of the current major Web search engines (Google, Yahoo! and Live) in Wikipedia. We discuss how the rich meta - data available in wiki pages can be used to provide better search results in Wikipedia. We investigate the effect of incorporating the extent of review of an article into ranking of search results. The extent of review is measured by the number of distinct editors who have contributed to the articles and is extracted by processing Wikipedia's history pages. Our experimental results show that re - ranking search results of the three major Web search engines, using the review feature, improves quality of their rankings for Wikipedia-specific searches.
Kirschenbaum, Amit & Wintner, Shuly Lightly supervised transliteration for machine translation Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics 2009 [1,127]
We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38\% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based} transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76\% of the cases, and in 92\% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a {Hebrew-to-English} MT} system that uses our transliteration module.
Kaptein, Rianne; Serdyukov, Pavel & Kamps, Jaap Linking wikipedia to the web 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, July 19, 2010 - July 23, 2010 Geneva, Switzerland 2010 [1,128]
We investigate the task of finding links from Wikipedia pages to external web pages. Such external links significantly extend the information in Wikipedia with information from the Web at large, while retaining the encyclopedic organization of Wikipedia. We use a language modeling approach to create a full-text and anchor text runs, and experiment with different document priors. In addition we explore whether social bookmarking site Delicious can be exploited to further improve our performance. We have constructed a test collection of 53 topics, which are Wikipedia pages on different entities. Our findings are that the anchor text index is a very effective method to retrieve home pages. Url class and anchor text length priors and their combination leads to the best results. Using Delicious on its own does not lead to very good results, but it does contain valuable information. Combining the best anchor text run and the Delicious run leads to further improvements.
Weiss, Stephane; Urso, Pascal & Molli, Pascal Logoot: A scalable optimistic replication algorithm for collaborative editing on P2P networks 2009 29th IEEE International Conference on Distributed Computing Systems Workshops, ICDCS, 09, June 22, 2009 - June 26, 2009 Montreal, QC, Canada 2009 [1,129]
Massive collaborative editing becomes a reality through leading projects such as Wikipedia. This massive collaboration is currently supported with a costly central service. In order to avoid such costs, we aim to provide a peer-to-peer collaborative editing system. Existing approaches to build distributed collaborative editing systems either do not scale in terms of number of users or in terms of number of edits. We present the Logoot approach that scales in these both dimensions while ensuring causality, consistency and intention preservation criteria. We evaluate the Logoot approach and compare it to others using a corpus of all the edits applied on a set of the most edited and the biggest pages of Wikipedia.
Moon, Hyun J.; Curino, Carlo A.; Deutsch, Alin; Hou, Chien-Yi & Zaniolo, Carlo Managing and querying transaction-time databases under schema evolution Proceedings of the VLDB Endowment VLDB Endowment Hompage 2008 [1,130]
The old problem of managing the history of database information is now made more urgent and complex by fast spreading web information systems, such as Wikipedia. Our PRIMA} system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery.} The second key piece of technology is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly important for historical queries spanning over potentially hundreds of different schema versions and it is realized in PRIMA} by (i) introducing Schema Modification Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML} integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs.} The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB} schema evolution history.
Curino, Carlo A.; Moon, Hyun J. & Zaniolo, Carlo Managing the History of Metadata in Support for DB Archiving and Schema Evolution Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities 2008 [1,131]
Modern information systems, and web information systems in particular, are faced with frequent database schema changes, which generate the necessity to manage them and preserve the schema evolution history. In this paper, we describe the Panta Rhei Framework designed to provide powerful tools that: (i) facilitate schema evolution and guide the Database Administrator in planning and evaluating changes, (ii) support automatic rewriting of legacy queries against the current schema version, (iii) enable efficient archiving of the histories of data and metadata, and (iv) support complex temporal queries over such histories. We then introduce the Historical Metadata Manager (HMM), a tool designed to facilitate the process of documenting and querying the schema evolution itself. We use the schema history of the Wikipedia database as a telling example of the many uses and benefits of HMM.
Hu, Meiqun; Lim, Ee-Peng; Sun, Aixin; Lauw, Hady W. & Vuong, Ba-Quy Measuring article quality in wikipedia: Models and evaluation 16th ACM Conference on Information and Knowledge Management, CIKM 2007, November 6, 2007 - November 9, 2007 Lisboa, Portugal 2007 [1,132]
Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview} model introduces the review behavior into measuring article quality. Finally, our ProbReview} models extend PeerReview} with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.
Adler, B. Thomas; Alfaro, Luca De; Pye, Ian & Raman, Vishwanath Measuring author contributions to the Wikipedia 4th International Symposium on Wikis, WikiSym 2008, September 8, 2008 - September 10, 2008 Porto, Portugal 2008 [1,133]
We consider the problem of measuring user contributions to ver-sioned, collaborative bodies of information, such as wikis. Measuring the contributions of individual authors can be used to divide revenue, to recognize merit, to award status promotions, and to choose the order of authors when citing the content. In the context of the Wikipedia, previous works on author contribution estimation have focused on two criteria: the total text created, and the total number of edits performed. We show that neither of these criteria work well: both techniques are vulnerable to manipulation, and the total-text criterion fails to reward people who polish or re-arrange the content. We consider and compare various alternative criteria that take into account the quality of a contribution, in addition to the quantity, and we analyze how the criteria differ in the way they rank authors according to their contributions. As an outcome of this study, we propose to adopt total edit longevity as a measure of author contribution. Edit longevity is resistant to simple attacks, since edits are counted towards an author's contribution only if other authors accept the contribution. Edit longevity equally rewards people who create content, and people who rearrange or polish the content. Finally, edit longevity distinguishes the people who contribute little (who have contribution close to zero) from spammers or vandals, whose contribution quickly grows negative.
Volkovich, Yana; Litvak, Nelly & Zwart, Bert Measuring extremal dependencies in Web graphs 17th International Conference on World Wide Web 2008, WWW'08, April 21, 2008 - April 25, 2008 Beijing, China 2008 [1,134]
We analyze dependencies in power law graph data (Web} sample, Wikipedia sample and a preferential attachment graph) using statistical inference for multivariate regular variation. The well developed theory of regular variation is widely applied in extreme value theory, telecommunications and mathematical finance, and it provides a natural mathematical formalism for analyzing dependencies between variables with power laws. However, most of the proposed methods have never been used in the Web graph data mining. The present work fills this gap. The new insights this yields are striking: the three above-mentioned data sets are shown to have a totally different dependence structure between different graph parameters, such as in-degree and Page Rank.
Stuckman, Jeff & Purtilo, James Measuring the wikisphere 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,135]
Due to the inherent difficulty in obtaining experimental data from wikis, past quantitative wiki research has largely been focused on Wikipedia, limiting the degree that it can be generalized. We developed WikiCrawler, a tool that automatically downloads and analyzes wikis, and studied 151 popular wikis running Mediawiki (none of them Wikipedias). We found that our studied wikis displayed signs of collaborative authorship, validating them as objects of study. We also discovered that, as in Wikipedia, the relative contribution levels of users in the studied wikis were highly unequal, with a small number of users contributing a disproportionate amount of work. In addition, power-law distributions were successfully fitted to the contribution levels of most of the studied wikis, and the parameters of the fitted distributions largely predicted the high inequality that was found. Along with demonstrating our methodology of analyzing wikis from diverse sources, the discovered similarities between wikis suggest that most wikis accumulate edits through a similar underlying mechanism, which could motivate a model of user activity that is applicable to wikis in general.
Roth, Camille; Taraborelli, Dario & Gilbert, Nigel Measuring wiki viability: an empirical assessment of the social dynamics of a large sample of wikis Proceedings of the 4th International Symposium on Wikis 2008 [1,136]
This paper assesses the content- and population-dynamics of a large sample of wikis, over a timespan of several months, in order to identify basic features that may predict or induce different types of fate. We analyze and discuss, in particular, the correlation of various macroscopic indicators, structural features and governance policies with wiki growth patterns. While recent analyses of wiki dynamics have mostly focused on popular projects such as Wikipedia, we suggest research directions towards a more general theory of the dynamics of such communities.
Alfaro, Luca De & Ortega, Felipe Measuring Wikipedia: A hands-on tutorial 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,137]
This tutorial is an introduction to the best methodologies, tools and practices for Wikipedia research. The tutorial will be led by Luca de Alfaro (Wiki} Lab at UCSC, California, USA) and Felipe Ortega (Libresoft, URJC, Madrid, Spain). Both cumulate several years of practical experience exploring and processing Wikipedia data [1], [2], [3]. As well, their respective research groups have led the development of two cutting-edge software tools (WikiTrust} and WikiXRay), for analyzing Wikipedia. WikiTrust} implements an author reputation system, and a text trust system, for wikis. WikiXRay} is a tool automating the quantitative analysis of any language version of Wikipedia (in general, any wiki based on MediaWiki).
Morante, Roser; Asch, Vincent Van & Daelemans, Walter Memory-based resolution of in-sentence scopes of hedge cues Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [1,138]
In this paper we describe the machine learning systems that we submitted to the CoNLL-2010} Shared Task on Learning to Detect Hedges and Their Scope in Natural Language Text. Task 1 on detecting uncertain information was performed by an SVM-based} system to process the Wikipedia data and by a memory-based system to process the biological data. Task 2 on resolving in-sentence scopes of hedge cues, was performed by a memorybased system that relies on information from syntactic dependencies. This system scored the highest F1 (57.32) of Task 2.
Yasuda, Keiji & Sumita, Eiichiro Method for building sentence-aligned corpus from wikipedia 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, we perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, we use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that wc can obtain the English translation of 10\% of Japanese sentences while maintaining high alignment quality (rank-A} ratio of over 0.8). Copyright 2008.
Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian & Chen, Zheng Mining multilingual topics from wikipedia Proceedings of the 18th international conference on World wide web 2009 [1,139]
In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted 'universal' topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.
Witmer, Jeremy & Kalita, Jugal Mining wikipedia article clusters for geospatial entities and relationships Social Semantic Web: Where Web 2.0 Meets Web 3.0 - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states 2009
We present in this paper a method to extract geospatial entities and relationships from the unstructured text of the English language Wikipedia. Using a novel approach that applies SVMs} trained from purely structural features of text strings, we extract candidate geospatial entities and relation-ships. Using a combination of further techniques, along with an external gazetteer, the candidate entities and relationships are disambiguated and the Wikipedia article pages are modified to include the semantic information provided by the extraction process. We successfully extracted location entities with an F-measure of 81 \%, and location relations with an F-measure of 54\%. Copyright 2009, Association for the Advancement of Artificial Intelligence.
Yamangil, Elif & Nelken, Rani Mining wikipedia revision histories for improving sentence compression Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers 2008 [1,140]
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis} corpus. Using this new-found data, we propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.
Nelken, Rani & Yamangil, Elif Mining wikipedia's article revision history for training computational linguistics algorithms 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
We present a novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia's article revision history. By comparing adjacent versions of the same article, we extract voluminous training data for tasks for which data is usually scarce or costly to obtain. We illustrate this paradigm by applying it to three separate text processing tasks at various levels of linguistic granularity. We first apply this approach to the collection of textual errors and their correction, focusing on the specific type of lexical errors known as eggcorns". Second moving up to the sentential level we show how to mine Wikipedia revisions for training sentence compression algorithms. By dramatically increasing the size of the available training data we are able to create more discerning lexicalized models providing improved compression results. Finally
Minitrack Introduction Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences 2008 [1,141]
This year's minitrack on Open Movements:Open} Source Software and Open Content provides aforum for discussion of an increasingly importantmode of collaborative content and softwaredevelopment. OSS} is a broad term used to embracesoftware that is developed and released under somesort of open source license (as is free software, aclosely related phenomenon). There are thousands OfOSS} projects spanning a range of applications, Linuxand Apache being two of the most visible. Open Content refers to published content (e.g., articles,engineering deigns, pictures, etc.) released under alicense allowing the content to be freely used andpossibly modified and redistributed. Examples of OCare} Wikipedia and MIT's} Open Courseware.
Gan, Daniel Dengyang & Chia, Liang-Tien MobileMaps@sg - Mappedia version 1.1 IEEE International Conference onMultimedia and Expo, ICME 2007, July 2, 2007 - July 5, 2007 Beijing, China 2007
Technology has always been moving. Throughout the decades, improvements in various technological areas have led to a greater sense of convenience to ordinary people, whether it is cutting down time in accessing normal-to-day activities or getting privileged services. One of the technological areas that had been moving very rapidly is that of mobile computing. The common mobile device now has the mobility, provides entertainment via multimedia, connects to the Internet and is powered by intelligent and powerful chips. This paper will touch on an idea that is currently in the works, an integration of a recent technology that has netizens talking all over the world; Google Maps, that provide street and satellite images via the internet to the PC} and Wikipedia's user content support idea, the biggest free-content encyclopedia on the Internet. We will hit on how it is able to integrate such a technology with the idea of free form editing into one application in a small mobile device. The new features provided by this application will work toward supporting the development of multimedia application and computing.
Diaz, Oscar & Puente, Gorka Model-aware wiki analysis tools: The case of HistoryFlow 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [1,142]
Wikis are becoming mainstream. Studies confirm how wikis are finding their way into organizations. This paper focuses on requirements for analysis tools for corporate wikis. Corporate wikis differ from their grow-up counterparts such as Wikipedia. First, they tend to be much smaller. Second, they require analysis to be customized for their own domains. So far, most analysis tools focus on large wikis where handling efficiently large bulks of data is paramount. This tends to make analysis tools access directly the wiki database. This binds the tool to the wiki engine, hence, jeopardizing customizability and interoperability. However, corporate wikis are not so big while customizability is a desirable feature. This change in requirements advocates for analysis tools to be decoupled from the underlying wiki engines. Our approach argues for characterizing analysis tools in terms of their abstract analysis model (e.g. a graph model, a contributor model). How this analysis model is then map into wiki-implementation terms is left to the wiki administrator. The administrator, as the domain expert, can better assess which is the right terms/granularity to conduct the analysis. This accounts for suitability and interoperability gains. The approach is borne out for {HistoryFlow, an IBM} tool for visualizing evolving wiki pages and the interactions of multiple wiki authors.
Chi, Ed H. Model-Driven Research in Human-Centric Computing Visual Languages and Human-Centric Computing (VL/HCC), 2010 IEEE Symposium on 2010
How can we build systems that enable users to mix and match tools together? How will we know whether we have done a good job in creating usable visual interactive systems that help users accomplish a wide variety of goals? How can people share the results of their explorations with each other, and for innovative tools to be remixed? Widely-used tools such as Web Browsers, Wikis, spreadsheets, and analytics environments like R all contain models of how people mix and combine operators and functionalities. In my own research, system developments are very much informed by models such as information scent, sense making, information theory, probabilistic models, and more recently, evolutionary dynamic models. These models have been used to understand a wide-variety of user behaviors in human-centric computing, from individuals interacting with a search system like MrTaggy.com} to groups of people working on articles in Wikipedia. These models range in complexity from a simple set of assumptions to complex equations describing human and group behavior. In this talk, I will attempt to illustrate how a model-driven approach to answering the above questions should help to illuminate the path forward for {Human-Centric} Computing.
Burke, Moira & Kraut, Robert Mopping up: Modeling wikipedia promotion decisions 2008 ACM Conference on Computer Supported Cooperative Work, CSCW 08, November 8, 2008 - November 12, 2008 San Diego, CA, United states 2008 [1,143]
This paper presents a model of the behavior of candidates for promotion to administrator status in Wikipedia. It uses a policy capture framework to highlight similarities and differences in the community's stated criteria for promotion decisions to those criteria actually correlated with promotion success. As promotions are determined by the consensus of dozens of voters with conflicting opinions and unwritten expectations, the results highlight the degree to which consensus is truly reached. The model is fast and easily computable on the fly, and thus could be applied as a self-evaluation tool for editors considering becoming administrators, as a dashboard for voters to view a nominee's relevant statistics, or as a tool to automatically search for likely future administrators. Implications for distributed consensus-building in online communities are discussed.
Sarmento, Luis; Jijkuon, Valentin; de Rijke, Maarten & Oliveira, Eugenio More like these": growing entity classes from seeds" Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 2007 [1,144]
We present a corpus-based approach to the class expansion task. For a given set of seed entities we use co-occurrence statistics taken from a text collection to define a membership function that is used to rank candidate entities for inclusion in the set. We describe an evaluation framework that uses data from Wikipedia. The performance of our class extension method improves as the size of the text collection increases.
Chaudhuri, Kamalika; Kakade, Sham M.; Livescu, Karen & Sridharan, Karthik Multi-view clustering via canonical correlation analysis 26th Annual International Conference on Machine Learning, ICML'09, June 14, 2009 - June 18, 2009 Montreal, QC, Canada 2009 [1,145]
Clustering data in high dimensions is believed to be a hard problem in general. A number of effcient clustering algorithms developed in recent years address this problem by projecting the data into a lowerdimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).} Under the assumption that the views are uncorrelated given the cluster label, we show that the separation conditions required for the algorithm to be successful are signicantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure). Copyright 2009.
Kasneci, Gjergji; Suchanek, Fabian M.; Ifrim, Georgiana; Elbassuoni, Shady; Ramanath, Maya & Weikum, Gerhard NAGA: Harvesting, searching and ranking knowledge 2008 ACM SIGMOD International Conference on Management of Data 2008, SIGMOD'08, June 9, 2008 - June 12, 2008 Vancouver, BC, Canada 2008 [1,146]
The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Searching for explicit knowledge needs to consider inherent semantic structures involving entities and relationships. In this demonstration proposal, we describe a semantic search system named NAGA.} NAGA} operates on a knowledge graph, which contains millions of entities and relationships derived from various encyclopedic Web sources, such as the ones above. NAGA's} graph-based query language is geared towards expressing queries with additional semantic information. Its scoring model is based on the principles of generative language models, and formalizes several desiderata such as confidence, informativeness and compactness of answers. We propose a demonstration of NAGA} which will allow users to browse the knowledge base through a user interface, enter queries in NAGA's} query language and tune the ranking parameters to test various ranking aspects.
Han, Xianpei & Zhao, Jun Named entity disambiguation by leveraging wikipedia semantic knowledge ACM 18th International Conference on Information and Knowledge Management, CIKM 2009, November 2, 2009 - November 6, 2009 Hong Kong, China 2009 [1,147]
Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW} cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS} data sets. Empirical results show that the disambiguation performance of our method gets 10.7\% improvement over the traditional BOW} based methods and 16.7\% improvement over the traditional social network based methods.
Maskey, Sameer & Dakka, Wisam Named entity network based on wikipedia 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, September 6, 2009 - September 10, 2009 Brighton, United kingdom 2009
Named Entities (NEs) play an important role in many natural language and speech processing tasks. A resource that identifies relations between NEs} could potentially be very useful. We present such automatically generated knowledge resource from Wikipedia, Named Entity Network (NE-NET), that provides a list of related Named Entities (NEs) and the degree of relation for any given NE.} Unlike some manually built knowledge resource, NE-NET} has a wide coverage consisting of 1.5 million NEs} represented as nodes of a graph with 6.5 million arcs relating them. NE-NET} also provides the ranks of the related NEs} using a simple ranking function that we propose. In this paper, we present NE-NET} and our experiments showing how NE-NET} can be used to improve the retrieval of spoken (Broadcast} News) and text documents.
Krioukov, Andrew; Mohan, Prashanth; Alspaugh, Sara; Keys, Laura; Culler, David & Katz, Randy H. NapSAC: design and implementation of a power-proportional web cluster Proceedings of the first ACM SIGCOMM workshop on Green networking 2010 [1,148]
Energy consumption is a major and costly problem in data centers. A large fraction of this energy goes to powering idle machines that are not doing any useful work. We identify two causes of this inefficiency: low server utilization and a lack of power-proportionality. To address this problem we present a design for an power-proportional cluster consisting of a power-aware cluster manager and a set of heterogeneous machines. Our design makes use of currently available energy-efficient hardware, mechanisms for transitioning in and out of low-power sleep states, and dynamic provisioning and scheduling to continually adjust to workload and minimize power consumption. With our design we are able to reduce energy consumption while maintaining acceptable response times for a web service workload based on Wikipedia. With our dynamic provisioning algorithms we demonstrate via simulation a 63\% savings in power usage in a typically provisioned datacenter where all machines are left on and awake at all times. Our results show that we are able to achieve close to 90\% of the savings a theoretically optimal provisioning scheme would achieve. We have also built a prototype cluster which runs Wikipedia to demonstrate the use of our design in a real environment.
Brandes, Ulrik; Kenis, Patrick; Lerner, Jürgen & van Raaij, Denise Network analysis of collaboration structure in Wikipedia Proceedings of the 18th international conference on World wide web 2009 [1,149]
In this paper we give models and algorithms to describe and analyze the collaboration among authors of Wikipedia from a network analytical perspective. The edit network encodes who interacts how with whom when editing an article; it significantly extends previous network models that code author communities in Wikipedia. Several characteristics summarizing some aspects of the organization process and allowing the analyst to identify certain types of authors can be obtained from the edit network. Moreover, we propose several indicators characterizing the global network structure and methods to visualize edit networks. It is shown that the structural network indicators are correlated with quality labels of the associated Wikipedia articles.
Gardner, James; Krowne, Aaron & Xiong, Li NNexus: an automatic linker for collaborative web-based corpora Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology 2009 [1,150]
Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath} are becoming increasingly popular. We demonstrate NNexus, a generalization of the automatic linking engine of PlanetMath.org} and the first system that automates the process of linking disparate encyclopedia" entries into a fully-connected conceptual network. The main challenges of this problem space include: 1) linking quality (correctly identifying which terms to link and which entry to link to with minimal effort on the part of users) 2) efficiency and scalability and 3) generalization to multiple knowledge bases and web-based information environment. We present NNexus} that utilizes subject classification and other metadata to address these challenges and demonstrate its effectiveness and efficiency through multiple real world corpora."
von dem Bussche, Franziska; Weiand, Klara; Linse, Benedikt; Furche, Tim & Bry, François Not so creepy crawler: easy crawler generation with standard xml queries Proceedings of the 19th international conference on World wide web 2010 [1,151]
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far more uniformly structured than in the general Web and thus crawlers can use the structure of Web pages for more precise data extraction and more expressive analysis. In this demonstration, we present a focused, structure-based crawler generator, the Not} so Creepy Crawler" (nc2 ). What sets nc2 apart is that all analysis and decision tasks of the crawling process are delegated to an (arbitrary) XML} query engine such as XQuery} or Xcerpt. Customizing crawlers just means writing (declarative) XML} queries that can access the currently crawled document as well as the metadata of the crawl process. We identify four types of queries that together sufice to realize a wide variety of focused crawlers. We demonstrate nc2 with two applications: The first extracts data about cities from Wikipedia with a customizable set of attributes for selecting and reporting these cities. It illustrates the power of nc2 where data extraction from Wiki-style fairly homogeneous knowledge sites is required. In contrast the second use case demonstrates how easy nc2 makes even complex analysis tasks on social networking sites here exemplified by last.fm."
Wang, Gang & Forsyth, David Object image retrieval by exploiting online knowledge resources 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 23, 2008 - June 28, 2008 Anchorage, AK, United states 2008 [1,152]
We describe a method to retrieve images found on web pages with specified object class labels, using an analysis of text around the image and of image appearance. Our method determines whether an object is both described in text and appears in a image using a discriminative image model and a generative text model. Our models are learnt by exploiting established online knowledge resources (Wikipedia} pages for text; Flickr and Caltech data sets for image). These resources provide rich text and object appearance information. We describe results on two data sets. The first is Berg's collection of ten animal categories; on this data set, we outperform previous approaches [7, 33]. We have also collected five more categories. Experimental results show the effectiveness of our approach on this new data set.
Pedro, Vasco Calais; Niculescu, Radu Stefan & Lita, Lucian Vlad Okinet: Automatic extraction of a medical ontology from wikipedia 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
The medical domain provides a fertile ground for the application of ontological knowledge. Ontologies are an essential part of many approaches to medical text processing, understanding and reasoning. However, the creation of large, high quality medical ontologies is not trivial, requiring the analysis of domain sources, background knowledge, as well as obtaining consensus among experts. Current methods are labor intensive, prone to generate inconsistencies, and often require expert knowledge. Fortunately, semi structured information repositories, like Wikipedia, provide a valuable resource from which to mine structured information. In this paper we propose a novel framework for automatically creating medical ontologies from semi-structured data. As part of this framework, we present a Directional Feedback Edge Labeling (DFEL) algorithm. We successfully demonstrate the effectiveness of the DFEL} algorithm on the task of labeling the relations of Okinet, a Wikipedia based medical ontology. Current results demonstrate the high performance, utility, and flexibility of our approach. We conclude by describing ROSE, an application that combines Okinet with other medical ontologies.
Ortega, Felipe; Izquierdo-Cortazar, Daniel; Gonzalez-Barahona, Jesus M. & Robles, Gregorio On the analysis of contributions from privileged users in virtual open communities 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states 2009 [1,153]
Collaborative projects built around virtual communities on the Internet have gained momentum over the last decade. Nevertheless, their rapid growth rate rises some questions: which is the most effective approach to manage and organize their content creation process? Can these communities scale, controlling their projects as their size continues to grow over time? To answer these questions, we undertake a quantitative analysis of privileged users in FLOSS} development projects and in Wikipedia. From our results, we conclude that the inequality level of user contributions in both types of initiatives is remarkably distinct, even though both communities present almost identical patterns regarding the number of distinct contributors per file (in FLOSS} projects) or per article (in Wikipedia). As a result, totally open projects like Wikipedia can effectively deal with faster growing rates, while FLOSS} projects may be affected by bottlenecks on committers who play critical roles.
Ortega, Felipe; Gonzalez-Barahona, Jesus M. & Robles, Gregorio On the inequality of contributions to wikipedia 41st Annual Hawaii International Conference on System Sciences 2008, HICSS, January 7, 2008 - January 10, 2008 Big Island, {HI, United states 2008 [1,154]
Wikipedia is one of the most successful examples of massive collaborative content development. However, many of the mechanisms and procedures that it uses are still unknown in detail. For instance, how equal (or unequal) are the contributions to it has been discussed in the last years, with no conclusive results. In this paper, we study exactly that aspect by using Lorenz curves and Gini coefficients, very well known instruments to economists. We analyze the trends in the inequality of distributions for the ten biggest language editions of Wikipedia, and their evolution over time. As a result, we have found large differences in the number of contributions by different authors (something also observed in free, open source software development), and a trend to stable patterns of inequality in the long run.
Suda, Martin; Weidenbach, Christoph & Wischnewski, Patrick On the saturation of YAGO 5th International Joint Conference on Automated Reasoning, IJCAR 2010, July 16, 2010 - July 19, 2010 Edinburgh, United kingdom 2010 [1,155]
YAGO} is an automatically generated ontology out of Wikipedia and WordNet.} It is eventually represented in a proprietary flat text file format and a core comprises 10 million facts and formulas. We present a translation of YAGO} into the Bernays-Schonfinkel} Horn class with equality. A new variant of the superposition calculus is sound, complete and terminating for this class. Together with extended term indexing data structures the new calculus is implemented in Spass-YAGO.} YAGO} can be finitely saturated by Spass-YAGO} in about 1 hour. We have found 49 inconsistencies in the original generated ontology which we have fixed. Spass-YAGO} can then prove non-trivial conjectures with respect to the resulting saturated and consistent clause set of about 1.4 GB} in less than one second.
Wang, Huan; Jiang, Xing; Chia, Liang-Tien & Tan, Ah-Hwee Ontology enhanced web image retrieval: Aided by wikipedia spreading activation theory 1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08, August 30, 2008 - August 31, 2008 Vancouver, BC, Canada 2008 [1,156]
Ontology, as an efective approach to bridge the semantic gap in various domains, has attracted a lot of interests from multimedia researchers. Among the numerous possibilities enabled by ontology, we are particularly interested in ex- ploiting ontology for a better understanding of media task (particularly, images) on the World Wide Web. To achieve our goal, two open issues are inevitably in- volved: 1) How to avoid the tedious manual work for ontol- ogy construction? 2) What are the effective inference models when using an ontology? Recent works[11, 16] about ontol- ogy learned from Wikipedia has been reported in conferences targeting the areas of knowledge management and artificial intelligent. There are also reports of different inference mod- els being investigated[5, 13, 15]. However, so far there has not been any comprehensive solution. In this paper, we look at these challenges and attempt to provide a general solution to both questions. Through a careful analysis of the online encyclopedia Wikipedia's cate- gorization and page content, we choose it as our knowledge source and propose an automatic ontology construction ap- proach. We prove that it is a viable way to build ontology under various domains. To address the inference model is- sue, we provide a novel understanding of the ontology and consider it as a type of semantic network, which is similar to brain models in the cognitive research field. Spreading Activation Techniques, which have been proved to be a cor- rect information processing model in the semantic network, are consequently introduced for inference. We have imple- mented a prototype system with the developed solutions for web image retrieval. By comprehensive experiments on the canine category of the animal kingdom, we show that this is a scalable architecture for our proposed methods.
Yu, Jonathan; Thom, James A. & Tam, Audrey Ontology evaluation using wikipedia categories for browsing Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 2007 [1,157]
Ontology evaluation is a maturing discipline with methodologies and measures being developed and proposed. However, evaluation methods that have been proposed have not been applied to specific examples. In this paper, we present the state-of-the-art in ontology evaluation - current methodologies, criteria and measures, analyse appropriate evaluations that are important to our application - browsing in Wikipedia, and apply these evaluations in the context of ontologies with varied properties. Specifically, we seek to evaluate ontologies based on categories found in Wikipedia.
Aleahmad, Turadg; Aleven, Vincent & Kraut, Robert Open community authoring of targeted worked example problems 9th International Conference on Intelligent Tutoring Systems, ITS 2008, June 23, 2008 - June 27, 2008 Montreal, QC, Canada 2008 [1,158]
Open collaborative authoring systems such as Wikipedia are growing in use and impact. How well does this model work for the development of educational resources? In particular, can volunteers contribute materials of sufficient quality? Could they create resources that meet students' specific learning needs and engage their personal characteristics? Our experiment explored these questions using a novel web-based tool for authoring worked examples. Participants were professional teachers (math and non-math) and amateurs. Participants were randomly assigned to the basic tool, or to an enhanced version that prompted authors to create materials for a specific (fictitious) student. We find that while there are differences by teaching status, all three groups make contributions of worth and that targeting a specific student leads contributors to author materials with greater potential to engage students. The experiment suggests that community authoring of educational resources is a feasible model of development and can enable new levels of personalization. 2008 Springer-Verlag} Berlin Heidelberg.
Wu, Fei & Weld, Daniel S. Open information extraction using Wikipedia Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,159]
Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE} systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE} system which improves dramatically on TextRunner's} precision and recall. The key to WOE's} performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's} extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE} can operate in two modes: when restricted to POS} tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.
Utiyama, Masao; Tanimura, Midori & Isahara, Hitoshi Organizing English reading materials for vocabulary learning Proceedings of the ACL 2005 on Interactive poster and demonstration sessions 2005 [1,160]
We propose a method of organizing reading materials for vocabulary learning. It enables us to select a concise set of reading texts (from a target corpus) that contains all the target vocabulary to be learned. We used a specialized vocabulary for an English certification test as the target vocabulary and used English Wikipedia, a free-content encyclopedia, as the target corpus. The organized reading materials would enable learners not only to study the target vocabulary efficiently but also to gain a variety of knowledge through reading. The reading materials are available on our web site.
Gorgeon, Arnaud & Swanson, E. Burton Organizing the vision for web 2.0: a study of the evolution of the concept in Wikipedia Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,161]
Information Systems (IS) innovations are often characterized by buzzwords, reflecting organizing visions that structure and express the images and ideas formed by a wide community of users about their meaning and purpose. In this paper, we examine the evolution of Web 2.0, a buzzword that is now part of the discourse of a broad community, and look at its entry in Wikipedia over the three years since its inception in March 2005. We imported the revision history from Wikipedia, and analyzed and categorized the edits that were performed and the users that contributed to the article. The patterns of evolution of the types and numbers of contributors and edits lead us to propose four major periods in the evolution of the Web 2.0 article: Seeding, Germination, Growth and Maturity. During the Seeding period, the article evolved mostly underground, with few edits and few contributors active. The article growth took off during the Germination period, receiving increasing attention. Growth was the most active period of development, but also the most controversial. During the last period, Maturity, the article received a decreasing level of attention, current and potential contributors losing interest, as a consensus about what the concept of Web 2.0 means seemed to have been reached.
Basile, Pierpaolo; Gemmis, Marco De; Lops, Pasquale & Semeraro, Giovanni OTTHO: On the tip of my THOught European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2009, September 7, 2009 - September 11, 2009 Bled, Slovenia 2009 [1,162]
This paper describes OTTHO} (On} the Tip of my THOught), a system designed for solving a language game called Guillotine. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system exploits several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Our feeling is that the approach presented in this work has a great potential for other more practical applications besides solving a language game. 2009 Springer Berlin Heidelberg.
Paşca, Marius Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics 2009 [1,163]
A set of labeled classes of instances is extracted from text and linked into an existing conceptual hierarchy. Besides a significant increase in the coverage of the class labels assigned to individual instances, the resulting resource of labeled classes is more effective than similar data derived from the manually-created Wikipedia, in the task of attribute extraction over conceptual hierarchies.
Gabrilovich, E. & Markovitch, S. Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge Twenty-First National Conference on Artificial Intelligence (AAAI-06). Eighteenth Innovative Applications of Artificial Intelligence Conference (IAAI-06), 16-20 July 2006 Menlo Park, CA, USA} 2007
When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite brittle-they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence Wal-Mart} supply chain goes real time" how can a text categorization system know that Wal-Mart} manages its stock with RFID} technology? And having read that {"Ciprofloxacin} belongs to the quinolones group" how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We propose to enrich document representation through automatic use of a vast compendium of human knowledge-an encyclopedia. We apply machine learning techniques to Wikipedia the largest encyclopedia to date which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a concept
Ingawale, Myshkin; Roy, Rahul & Seetharaman, Priya Persistence of Cultural Norms in Online Communities: The Curious Case of WikiLove 2009 [1,164]
Tremendous progress in information and communication technologies in the last two decades has enabled the phenomenon of Internet-based groups and collectives, generally referred to as online communities. Many online communities have developed distinct cultures of their own, with accompanying norms. A particular research puzzle is the persistence and stability of such norms in online communities, even in the face of often exponential growth rates in uninitiated new users. We propose a network-theoretic approach to explain this persistence. Our approach consists of modelling the online community as a network of interactions, and representing cultural norms as transmissible ideas (or ‘memes’) propagating through this network. We argue that persistence of a norm over time depends, amongst other things, on the structure of the network through which it propagates. Using previous results from Network Science and Epidemiology, we show that certain structures are better than others to ensure persistence: namely, structures which have scale-free degree distributions and assortative mixing. We illustrate this theory using the case of the community of contributors at Wikipedia, a collaboratively generated online encyclopaedia.
Mimno, David; Wallach, Hanna M.; Naradowsky, Jason; Smith, David A. & McCallum, Andrew Polylingual topic models Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2 2009 [1,165]
Converse, Tim; Kaplan, Ronald M.; Pell, Barney; Prevost, Scott; Thione, Lorenzo & Walters, Chad Powerset's natural language wikipedia search engine 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
This demonstration shows the capabilities and features of Powerset's natural language search engine as applied to the English Wikipedia. Powerset has assembled scalable document retrieval technology to construct a semantic index of the World Wide Web. In order to develop and test our technology, we have released a search product (at http://www.powerset.com) that incorporates all the information from the English Wikipedia. The product also integrates community-edited content from Metaweb's Freebase database of structured information. Users may query the index using keywords, natural language questions or phrases. Retrieval latency is comparable to standard key-word based consumer search engines. Powerset semantic indexing is based on the XLE, Natural Language Processing technology licensed from the Palo Alto Research Center (PARC).} During both indexing and querying, we apply our deep natural language analysis methods to extract semantic facts" - relations and semantic connections between words and concepts - from all the sentences in Wikipedia. At query time advanced search-engineering technology makes these facts available for retrieval by matching them against facts or partial facts extracted from the query. In this demonstration we show how retrieved information is presented as conventional search results with links to relevant Wikipedia pages. We also demonstrate how the distilled semantic relations are organized in a browsing format that shows relevant subject/relation/object triples related to the user's query. This makes it easy both to find other relevant pages and to use our Search-Within-The-Page} feature to localize additional semantic searches to the text of the selected target page. Together these features summarize the facts on a page and allow navigation directly to information of interest to individual users. Looking ahead beyond continuous improvements to core search and scaling to much larger collections of content Powerset's automatic extraction of semantic facts can be used to create and extend knowledge resources including lexicons ontologies
Leskovec, Jure; Huttenlocher, Daniel & Kleinberg, Jon Predicting positive and negative links in online social networks Proceedings of the 19th international conference on World wide web 2010 [1,166]
We study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism). Such a mix of positive and negative links arise in a variety of online settings; we study datasets from Epinions, Slashdot and Wikipedia. We find that the signs of links in the underlying social networks can be predicted with high accuracy, using models that generalize across this diverse range of sites. These models provide insight into some of the fundamental principles that drive the formation of signed links in networks, shedding light on theories of balance and status from social psychology; they also suggest social computing applications by which the attitude of one user toward another can be estimated from evidence provided by their relationships with other members of the surrounding social network.
Wissner-Gross, Alexander D. Preparation of topical reading lists from the link structure of Wikipedia 6th International Conference on Advanced Learning Technologies, ICALT 2006, July 5, 2006 - July 7, 2006 Kerkrade, Netherlands 2006
Personalized reading preparation poses an important challenge for education and continuing education. Using a PageRank} derivative and graph distance ordering, we show that personalized background reading lists can be generated automatically from the link structure of Wikipedia. We examine the operation of our new tool in professional, student, and interdisciplinary researcher learning models. Additionally, we present desktop and mobile interfaces for the generated reading lists.
Dondio, Pierpaolo & Barrett, Stephen Presumptive selection of trust evidence Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems 2007 [1,167]
1. This paper proposes a generic method for identifying elements in a domain that can be used as trust evidences. As an alternative to external infrastructured approaches based on certificates or user recommendations we propose a computation based on evidences gathered directly from application elements that have been recognized to have a trust meaning. However, when the selection of evidences is done using a dedicated infrastructure or user's collaboration it remains a well-bounded problem. Instead, when evidences must be selected directly from domain activity selection is generally unsystematic and subjective, typically resulting in an unbounded problem. To address these issues, our paper proposes a general methodology for selecting trust evidences among elements of the domain under analysis. The method uses presumptive reasoning combined with a human-based and intuitive notion of Trust. Using the method the problem of evidence selection becomes the critical analysis of identified evidences plausibility against the situation and their logical consistency. We present an evaluation, in the context of the Wikipedia project, in which trust predictions based on evidences identified by our method are compared to a computation based on domain-specific expertise.
Moon, Hyun J.; Curino, Carlo A.; Ham, Myungwon & Zaniolo, Carlo PRIMA: archiving and querying historical data with evolving schemas Proceedings of the 35th SIGMOD international conference on Management of data 2009 [1,168]
Schema evolution poses serious challenges in historical data management. Traditionally, historical data have been archived either by (i) migrating them into the current schema version that is well-understood by users but compromising archival quality, or (ii) by maintaining them under the original schema version in which the data was originally created, leading to perfect archival quality, but forcing users to formulate queries against complex histories of evolving schemas. In the PRIMA} system, we achieve the best of both approaches, by (i) archiving historical data under the schema version under which they were originally created, and (ii) letting users express temporal queries using the current schema version. Thus, in PRIMA, the system rewrites the queries to the (potentially many) pertinent versions of the evolving schema. Moreover, the system o ers automatic documentation of the schema history, and allows the users to pose temporal queries over the metadata history itself. The proposed demonstration highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data), which include hundreds of schema versions from Wikipedia and Ensembl. The demonstration off ers a thorough walk-through of the system features and a hands-on system testing phase, where the audiences are invited to directly interact with the advanced query interface of PRIMA.
Riehle, Dirk & Noble, James Proceedings of the 2006 international symposium on Wikis HT '06 17th Conference on Hypertext and Hypermedia 2006 [1,169]
It is our great pleasure to welcome you to the 2nd International Symposium on Wikis -- WikiSym} 2006. As its name suggests, this is the second meeting of the academic community of research, practice, and use that has built up around lightweight editable web sites. Growing from Ward Cunningham's original wiki --- the Portland Pattern Repository --- up to Wikipedia, one of the largest sites on the Web (over 1.2 million pages in English alone at time of writing), wikis are rapidly becoming commonplace in the experience of millions of people every day. The Wiki Symposium exists to report new developments in wiki technology, to describe the state of practice, and to reflect on that Experience.The} call for papers for this year's symposium attracted 27 submissions of research papers and practitioner reports, on a wide variety of topics. Ten research papers and one practitioner report were accepted by the symposium programme committee and to be presented at the symposium: These presentations will make up the symposiums' technical programme. Every submitted paper was reviewed by at least three members of the programme committee; while papers submitted by committee members were reviewed by at least four committee members who had not themselves submitted papers to the Symposium.The} symposium proceedings also include abstracts for the keynote talks, panels, workshops, and demonstrations to provide a record of the whole of the symposium as well as an interview with Angela Beesley, the opening keynoter, on the topic of her talk.
Aguiar, Ademar & Bernstein, Mark Proceedings of the 4th International Symposium on Wikis WikiSym08 2008 International Symposium on Wikis 2008 [1,170]
Welcome to the 4th International Symposium on Wikis, Porto, Portugal. The Faculty of Engineering from University of Porto (FEUP) is honoured to host on its campus this year's WikiSym} - the premier conference devoted to research and practice on wikis. Once again, WikiSym} will gather researchers, professionals, writers, scholars and users to share knowledge and experiences on many topics related to wikis and wiki philosophy, ranging from wiki linguistics to graphic visualization in wikis, and from the vast Wikipedia to tiny location-based wikis. A large and diverse program committee exhaustively reviewed more than fifty technical papers, from which the best high quality ones were selected. During a meeting at Porto, the final structure of the conference program was defined, and later consolidated with workshops, panels, tutorials, posters and demos. For the first time, there will be a Doctoral Space, for young researchers, and the WikiFest, devoted for practitioners. More than twenty hours of OpenSpace} will be available for you to fill in. After the conference, WikiWalk} will take the symposium out into the streetscape, as investigators and users walk along the city of Porto, joined by citizens, journalists, and other leaders. Casual and spontaneous discussions will allow the users to share their experiences, concerns and challenges with wiki experts and researchers. Such a diverse conference program reflects the nature of wikis, the tremendous vitality of the wiki spirit, and its ever-widening community.
Ayers, Phoebe & Ortega, Felipe Proceedings of the 6th International Symposium on Wikis and Open Collaboration WikiSym '10 2010 International Symposium on Wikis and Open Collaboration 2010 [1,171]
Welcome to WikiSym} 2010, the 6th International Symposium on Wikis and Open Collaboration! WikiSym} 2010 is located in the picturesque city of Gdansk, Poland at the Dom Muzyka, a historic music academy. The event includes 3 days of cutting-edge research and practice on topics related to open collaboration. This proceedings of WikiSym} 2010 are aimed to act as a permanent record of the conference activities. This year for the first time, WikiSym} is co-located with Wikimania 2010, the international community conference of the Wikimedia Foundation projects, which is taking place right after WikiSym.} The general program of WikiSym} 2010 builds on the success of previous years, formally embracing different aspects of open collaboration research and practice. To support this, for the first time the program is divided into 3 complementary tracks, each focusing on a specific area of interest in this field. The Wiki Track includes contributions specifically dealing with research, deployment, use and management of wiki platforms and the communities around them. The Industry Track draws together practitioners, entrepreneurs and industry managers and employees to better understand open collaboration ecosystems in corporate environments. Finally, the Open Collaboration Track comprises all other aspects related to open cooperative initiatives and projects. Related to this, you will find a growing number of contributions dealing with nontechnical perspectives of open collaboration, such as debates on educational resources and sociopolitical aspects. You will also find the traditional technical papers, plus tutorials, workshops, panels and demos. The success of the new broadened scope of WikiSym} reflects the very high interest in wikis and open collaboration existing today. Cliff Lampe from Michigan State University will be opening the symposium with a talk on The} Machine in the Ghost: a SocioTechnical} Systems Approach to User-Generated} Content Research". Likewise Andrew Lih will be giving the closing keynote session on {"What} Hath Wikipedia Wrought". These represent only two of the talks and sessions that attendees will find at WikiSym} 2010. Fortyone research papers were submitted this year to the academic program and sixteen were accepted for an acceptance rate of 39\%. All papers were revised by at least three reviewers though some of them had up to five different reviewers. Authors from accepted papers come from 18 different countries
Nack, Frank Proceedings of the ACM workshop on Multimedia for human communication: from capture to convey SPAA99 11th Annual ACM Symposium on Parallel Algorithms and Architectures 2005 [1,172]
It gives us great pleasure to welcome you to the 1st ACM} International Workshop on Multimedia for Human Communication -- From Capture to Convey (MHC'05).} This workshop was inspired by the Dagstuhl meeting 05091 Multimedia} Research -- where do we need to go tomorrow" (http://www.dagstuhl.de/05091/) organised by Susanne Boll Ramesh Jain Tat-Seng} Chua and Navenka Dimitrova. Members of the working group were: Lynda Hardman Brigitte Kerhervé Stephen Kimani
Gil, Yolanda & Noy, Natasha Proceedings of the fifth international conference on Knowledge capture K--CAP '09 Fifth International Conference on Knowledge Capture 2009 2009 [1,173]
In today's knowledge-driven world, effective access to and use of information is a key enabler for progress. Modern technologies not only are themselves knowledge-intensive technologies, but also produce enormous amounts of new information that we must process and aggregate. These technologies require knowledge capture, which involve the extraction of useful knowledge from vast and diverse sources of information as well as its acquisition directly from users. Driven by the demands for knowledge-based applications and the unprecedented availability of information on the Web, the study of knowledge capture has a renewed importance. This volume presents the papers and poster and demo descriptions for the Fifth International Conference on Knowledge Capture (KCAP} 2009). K-CAP} 2009 brought together researchers that belong to several distinct research communities, including knowledge engineering, machine learning, natural language processing, human-computer interaction, artificial intelligence and the Semantic Web. This year's conference continues its tradition of being the premier forum for presentation of research results and experience reports on leading edge issues of knowledge capture. The call for papers attracted 81 submissions from Asia, Europe, and North America. The international program committee accepted 21 papers that cover a variety of topics, including research on knowledge extraction, ontologies and vocabularies, interactive systems, evaluation of knowledge-based systems, and other topics. In addition, this volume includes descriptions of 21 posters and demos that were presented at the conference. The K-CAP} 2009 program included two keynote talks. Professor Daniel Weld gave the keynote address entitled Machine} Reading: from Wikipedia to the Web". Professor Nigel Shadbolt talked about {"Web} Science: A New Frontier" in his keynote address. Two tutorials and four workshops rounded up the conference program. We hope that these proceedings will serve as a valuable reference for security researchers and developers."
Proceedings of WikiSym 2010 - The 6th International Symposium on Wikis and Open Collaboration 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010
The proceedings contain 35 papers. The topics discussed include: who integrates the networks of knowledge in wikipedia?; deep hypertext with embedded revision control implemented in regular expressions; semantic search on heterogeneous wiki systems; wikis at work: success factors and challenges for sustainability of enterprise wikis; model-aware wiki analysis tools: the case of {HistoryFlow;} ThinkFree:} using a visual wiki for IT} knowledge management in a tertiary institution; openness as an asset. a classification system for online communities based on actor-network theory; the Austrian way of wiki(pedia)! development of a structured wiki-based encyclopedia within a local Austrian context; a wiki-based collective intelligence approach to formulate a body of knowledge (BOK) for a new discipline; project management in the wikipedia community; a taxonomy of wiki genres in enterprise settings; and towards sensitive information redaction in a collaborative multilevel security environment.
Proceedings of WikiSym'06 - 2006 International Symposium on Wikis WikiSym'06 - 2006 International Symposium on Wikis, August 21, 2006 - August 23, 2006 Odense, Denmark 2006
The proceedings contain 26 papers. The topics discussed include: how and why wikipedia works; how and why wikipedia works: an interview with Angela Beesley, Elisabeth Bauer, and Kizu Naoko; intimate information: organic hypertext structure and incremental; the augmented wiki; wiki uses in teaching and learning; the future of wikis; translation the wiki way; the radeox wiki render engine; is there a space for the teacher in a WIKI?;} wikitrails: augmenting wiki structure for collaborative, interdisciplinary learning; towards wikis as semantic hypermedia; constrained wiki: an oxymoron?; corporate wiki users: results of a survey; workshop on wikipedia research; wiki markup standard workshop; wiki-based knowledge engineering: second workshop on semantic wikis; semantic wikipedia;and ontowiki: community-driven ontology engineering and ontology usage based on wikis.
Ung, Hang & Dalle, Jean-Michel Project management in the Wikipedia community Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,174]
A feature of online communities and notably Wikipedia is the increasing use of managerial techniques to coordinate the efforts of volunteers. In this short paper, we explore the influence of the organization of Wikipedia in so-called projects. We examine the project-based coordination activity and find bursts of activity, which appear to be related to individual leadership. Using time series, we show that coordination activity is positively correlated with contributions on articles. Finally, we bring evidence that this positive correlation is relying on two types of coordination: group coordination, with project leadership and articles editors strongly coinciding, and directed coordination, with differentiated online roles.
Yin, Xiaoshi; Huang, Xiangji & Li, Zhoujun Promoting Ranking Diversity for Biomedical Information Retrieval Using Wikipedia Advances in Information Retrieval. 32nd European Conference on IR Research, ECIR 2010, 28-31 March 2010 Berlin, Germany 2010
In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval.
Yin, Xiaoshi; Huang, Xiangji & Li, Zhoujun Promoting ranking diversity for biomedical information retrieval using wikipedia 32nd European Conference on Information Retrieval, ECIR 2010, March 28, 2010 - March 31, 2010 Milton Keynes, United kingdom 2010 [1,175]
In this paper, we propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval. The proposed method concerns with finding passages that cover many different aspects of a query topic. First, aspects covered by retrieved passages are detected and explicitly presented by Wikipedia concepts. Then, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, retrieved passages are re-ranked using the proposed cost-based re-ranking method which ranks a passage according to the number of new aspects covered by the passage and the query-relevance of aspects covered by the passage. A series of experiments conducted on the TREC} 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed method in promoting ranking diversity for biomedical information retrieval. 2010 Springer-Verlag} Berlin Heidelberg.
Luther, Kurt; Flaschen, Matthew; Forte, Andrea; Jordan, Christopher & Bruckman, Amy ProveIt: a new tool for supporting citation in MediaWiki Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,176]
ProveIt} is an extension to the Mozilla Firefox browser designed to support editors in citing sources in Wikipedia and other projects that use the MediaWiki} platform.
Nakatani, Makoto; Jatowt, Adam; Ohshima, Hiroaki & Tanaka, Katsumi Quality evaluation of search results by typicality and speciality of terms extracted from wikipedia 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009, April 21, 2009 - April 23, 2009 Brisbane, QLD, Australia 2009 [1,177]
In Web search, it is often di.cult for users to judge which page they should choose among search results and which page provides high quality and credible content. For example, some results may describe query topics from narrow or inclined viewpoints or they may contain only shallow information. While there are many factors in.uencing quality perception of search results, we propose two important aspects that determine their usefulness, topic coverage" and "topic detailedness". {"Topic} coverage" means the extent to which a page covers typical topics related to query terms. On the other hand "topic detailedness" measures how many special topics are discussed in a Web page. We propose a method to discover typical topic terms and special topics terms for a search query by using the information gained from the structural features of Wikipedia the free encyclopedia. Moreover we propose an application to calculate topic coverage and topic detailedness of Web search results by using terms extracted from Wikipedia."
Reinoso, Antonio J.; Gonzalez-Barahona, Jesus M.; Ortega, Felipe & Robles, Greogrio Quantitative analysis and characterization of Wikipedia requests Proceedings of the 4th International Symposium on Wikis 2008 [1,178]
Our poster describes the quantitative analysis carried out to study the use of the Wikipedia system by its users with special focus on the identification of time and kind-of-use patterns, characterization of traffic and workload, and comparative analysis of different language editions. By filtering and classifying a large sample of the requests directed to the Wikimedia systems over 7 days we have been able to identify important information such us the targeted namespaces, the visited resources or the requested actions. The results found include the identification of weekly and daily patterns, and several correlations between different actions on the articles. In summary, the study shows an overall picture of how the most visited language editions of the Wikipedia are being accessed by their users.
Ortega, Felipe & Barahona, Jesus M. Gonzalez Quantitative analysis of the wikipedia community of users Proceedings of the 2007 international symposium on Wikis 2007 [1,179]
Many activities of editors in Wikipedia can be traced using its database dumps, which register detailed information about every single change to every article. Several researchers have used this information to gain knowledge about the production process of articles, and about activity patterns of authors. In this analysis, we have focused onone of those previous works, by Kittur et al. First, we have followed the same methodology with more recent and comprehensive data. Then, we have extended this methodology to precisely identify which fraction of authors are producing most of the changes in Wikipedia's articles, and how the behaviour of these authors evolves over time. This enabled us not only to validate some of the previous results, but also to find new interesting evidences. We have found that the analysis of sysops is not a good method for estimating different levels of contributions, since it is dependent on the policy for electing them (which changes over time and for each language). Moreover, we have found new activity patterns classifying authors by their contributions during specific periods of time, instead of using their total number of contributions over the whole life of Wikipedia. Finally, we present a tool that automates this extended methodology, implementing a quick and complete quantitative analysis ofevery language edition in Wikipedia.
Xu, Yang; Jones, Gareth J.F. & Wang, Bin Query dependent pseudo-relevance feedback based on wikipedia Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,180]
Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method out-performs the baseline relevance model in terms of precision and robustness.
Xu, Yang; Jones, G.J.F. & Wang, Bin Query dependent pseudo-relevance feedback based on Wikipedia 32nd Annual ACM SIGIR Conference. SIGIR 2009, 19-23 July 2009 New York, NY, USA} 2009
Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF.} One problem with this approach is that one or more of the top retrieved documents may be non-relevant, which can introduce noise into the feedback process. Besides, existing methods generally do not take into account the significantly different types of queries that are often entered into an IR} system. Intuitively, Wikipedia can be seen as a large, manually edited document collection which could be exploited to improve document retrieval effectiveness within PRF.} It is not obvious how we might best utilize information from Wikipedia in PRF, and to date, the potential of Wikipedia for this task has been largely unexplored. In our work, we present a systematic exploration of the utilization of Wikipedia in PRF} for query dependent expansion. Specifically, we classify TREC} topics into three categories based on Wikipedia: 1) entity queries, 2) ambiguous queries, and 3) broader queries. We propose and study the effectiveness of three methods for expansion term selection, each modeling the Wikipedia based pseudo-relevance information from a different perspective. We incorporate the expansion terms into the original query and use language modeling IR} to evaluate these methods. Experiments on four TREC} test collections, including the large web collection GOV2, show that retrieval performance of each type of query can be improved. In addition, we demonstrate that the proposed method outperforms the baseline relevance model in terms of precision and robustness.
Kanhabua, Nattiya & Nrvag, Kjetil QUEST: Query expansion using synonyms over time European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010, September 20, 2010 - September 24, 2010 Barcelona, Spain 2010 [1,181]
A particular problem of searching news archives with named entities is that they are very dynamic in appearance compared to other vocabulary terms, and synonym relationships between terms change with time. In previous work, we proposed an approach to extracting time-based synonyms of named entities from the whole history of Wikipedia. In this paper, we present QUEST} (Query} Expansion using Synonyms over Time), a system that exploits time-based synonyms in searching news archives. The system takes as input a named entity query, and automatically determines time-based synonyms for a given query wrt. time criteria. Query expansion using the determined synonyms can be employed in order to improve the retrieval effectiveness. 2010 Springer-Verlag} Berlin Heidelberg.
Zaragoza, Hugo; Rode, Henning; Mika, Peter; Atserias, Jordi; Ciaramita, Massimiliano & Attardi, Giuseppe Ranking very many typed entities on wikipedia Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 2007 [1,182]
We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these approaches on the real task of ranking Wikipedia entities typed with a state-of-the-art named-entity tagger. Results show that both approaches can greatly increase the performance of methods based only on passage retrieval.
Heilman, Michael & Smith, Noah A. Rating computer-generated questions with Mechanical Turk Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk 2010 [1,183]
We use Amazon Mechanical Turk to rate computer-generated reading comprehension questions about Wikipedia articles. Such application-specific ratings can be used to train statistical rankers to improve systems' final output, or to evaluate technologies that generate natural language. We discuss the question rating scheme we developed, assess the quality of the ratings that we gathered through Amazon Mechanical Turk, and show evidence that these ratings can be used to improve question generation.
Nguyen, Dat P. T.; Matsuo, Yutaka & Ishizuka, Mitsuru Relation extraction from Wikipedia using subtree mining AAAI-07/IAAI-07 Proceedings: 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference, July 22, 2007 - July 26, 2007 Vancouver, BC, Canada 2007
The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia's English articles, which in turn can serve for intelligent systems to satisfy users' information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles. Copyright 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Utiyama, Masao & Yamamoto, Mikio Relevance feedback models for recommendation Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing 2006 [1,184]
We extended language modeling approaches in information retrieval (IR) to combine collaborative filtering (CF) and content-based filtering (CBF).} Our approach is based on the analogy between IR} and CF, especially between CF} and relevance feedback (RF).} Both CF} and RF} exploit users' preference/relevance judgments to recommend items. We first introduce a multinomial model that combines CF} and CBF} in a language modeling framework. We then generalize the model to another multinomial model that approximates the Polya distribution. This generalized model outperforms the multinomial model by 3.4\% for CBF} and 17.4\% for CF} in recommending English Wikipedia articles. The performance of the generalized model for three different datasets was comparable to that of a state-of-the-art item-based CF} method.
Collins, Allan M. Rethinking Education in the Age of Technology Proceedings of the 9th international conference on Intelligent Tutoring Systems 2008 [1,185]
All around us people are learning with the aid of new technologies: children are playing complex video games, workers are taking online courses to get an advanced degree, students are taking courses at commercial learning centers to prepare for tests, adults are consulting Wikipedia, etc. New technologies create learning opportunities that challenge traditional schools and colleges. These new learning niches enable people of all ages to pursue learning on their own terms. People around the world are taking their education out of school into homes, libraries, Internet cafes, and workplaces, where they can decide what they want to learn, when they want to learn, and how they want to learn.
Elsas, Jonathan L.; Arguello, Jaime; Callan, Jamie & Carbonell, Jaime G. Retrieval and feedback models for blog feed search 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008, July 20, 2008 - July 24, 2008 Singapore, Singapore 2008 [1,186]
Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC} 2007 Blog Distillation task [12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudorelevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22\% and 14\% improvement in MAP} over the unexpanded query for our baseline and federated algorithms respectively.
Jordan, Chris & Watters, Carolyn Retrieval of single wikipedia articles while reading abstracts 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states 2009 [1,187]
When reading online, users sometimes need auxiliary information to complement or fill in their own background knowledge in order to better understand a document that they are reading. We believe that delivering this information in the least intrusive fashion possible will improve their understanding. We have prototyped a system that selects a single Wikipedia article for users when they highlight text in an abstract. This prototype employs a contextual retrieval algorithm developed for high precision retrieval of Wikipedia articles that uses the terms in the abstract, currently being read, as a context for the search. The results from our evaluation reveal that the top-performing algorithm is able to respond with a single relevant article 77\% of the time. The user study that we conducted indicates that participants have a strong preference for this approach to searching while reading.
Retrieval of Single Wikipedia Articles While Reading Abstracts Proceedings of the 42nd Hawaii International Conference on System Sciences 2009 [1,188]
When reading online, users sometimes need auxiliary information to complement or fill in their own background knowledge in order to better understand a document that they are reading. We believe that delivering this information in the least intrusive fashion possible will improve their understanding. We have prototyped a system that selects a single Wikipedia article for users when they highlight text in an abstract. This prototype employs a contextual retrieval algorithm developed for high precision retrieval of Wikipedia articles that uses the terms in the abstract, currently being read, as a context for the search. The results from our evaluation reveal that the top-performing algorithm is able to respond with a single relevant article 77\% of the time. The user study that we conducted indicates that participants have a strong preference for this approach to searching while reading.
Schütt, Thorsten; Schintke, Florian & Reinefeld, Alexander Scalaris: reliable transactional p2p key/value store Proceedings of the 7th ACM SIGPLAN workshop on ERLANG 2008 [1,189]
We present Scalaris, an Erlang implementation of a distributed key/value store. It uses, on top of a structured overlay network, replication for data availability and majority based distributed transactions for data consistency. In combination, this implements the ACID} properties on a scalable structured overlay. By directly mapping the keys to the overlay without hashing, arbitrary key-ranges can be assigned to nodes, thereby allowing a better load-balancing than would be possible with traditional DHTs.} Consequently, Scalaris can be tuned for fast data access by taking, e.g. the nodes' geographic location or the regional popularity of certain keys into account. This improves Scalaris' lookup speed in datacenter or cloud computing environments. Scalaris is implemented in Erlang. We describe the Erlang software architecture, including the transactional Java interface to access Scalaris. Additionally, we present a generic design pattern to implement a responsive server in Erlang that serializes update operations on a common state, while concurrently performing fast asynchronous read requests on the same state. As a proof-of-concept we implemented a simplified Wikipedia frontend and attached it to the Scalaris data store backend. Wikipedia is a challenging application. It requires - besides thousands of concurrent read requests per seconds - serialized, consistent write operations. For Wikipedia's category and backlink pages, keys must be consistently changed within transactions. We discuss how these features are implemented in Scalaris and show its performance.
Forte, Andrea & Bruckman, Amy Scaling consensus: Increasing decentralization in Wikipedia governance 41st Annual Hawaii International Conference on System Sciences 2008, HICSS, January 7, 2008 - January 10, 2008 Big Island, {HI, United states 2008 [1,190]
How does self-governance" happen in Wikipedia? Through in-depth interviews with eleven individuals who have held a variety of responsibilities in the English Wikipedia we obtained rich descriptions of how various forces produce and regulate social structures on the site. Our analysis describes Wikipedia as an organization with highly refined policies norms and a technological architecture that supports organizational ideals of consensus building and discussion. We describe how governance in the site is becoming increasingly decentralized as the community grows and how this is predicted by theories of commons-based governance developed in offline contexts. The trend of decentralization is noticeable with respect to both content-related decision making processes and social structures that regulate user behavior. "
Ukkonen, Antti; Castilloz, Carlos; Donatoz, Debora & Gionisz, Aristides Searching the wikipedia with contextual information 17th ACM Conference on Information and Knowledge Management, CIKM'08, October 26, 2008 - October 30, 2008 Napa Valley, CA, United states 2008 [1,191]
We propose a framework for searching the Wikipedia with contextual information. Our framework extends the typical keyword search, by considering queries of the type hq; pi, where q is a set of terms (as in classical Web search), and p is a source Wikipedia document. The query terms q represent the information that the user is interested in finding, and the document p provides the context of the query. The task is to rank other documents in Wikipedia with respect to their relevance to the query terms q given the context document p. By associating a context to the query terms, the search results of a search initiated in a particular page can be made more relevant. We suggest a number of features that extend the classical querysearch model so that the context document p is considered. We then use RankSVM} (Joachims} 2002) to learn weights for the individual features given suitably constructed training data. Documents are ranked at query time using the inner product of the feature and the weight vectors. The experiments indicate that the proposed method considerably improves results obtained by a more traditional approach that does not take the context into account.
Bast, Holger; Suchanek, Fabian & Weber, Ingmar Semantic full-text search with ESTER: Scalable, easy, fast IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008, December 15, 2008 - December 19, 2008 Pisa, Italy 2008 [1,192]
We present a demo of ESTER, a search engine that combines the ease of use, speed and scalability of full-text search with the powerful semantic capabilities of ontologies. ESTER} supports full-text queries, ontological queries and combinations of these, yet its interface is as easy as can be: A standard search field with semantic information provided interactively as one types. ESTER} works by reducing all queries to two basic operations: prefix search and join, which can be implemented very efficiently in terms of both processing time and index space. We demonstrate the capabilities of ESTER} on a combination of the English Wikipedia with the Yago ontology, with response times below 100 milliseconds for most queries, and an index size of about 4 GB.} The system can be run both stand-alone and as a Web application.
Völkel, Max; Krötzsch, Markus; Vrandecic, Denny; Haller, Heiko & Studer, Rudi Semantic Wikipedia Proceedings of the 15th international conference on World Wide Web 2006 [1,193]
Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its contents are barely machine-interpretable. Structural knowledge, e.,g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows the typing of links between articles and the specification of typed data inside the articles in an easy-to-use Manner.Enabling} even casual users to participate in the creation of an open semantic knowledge base, Wikipedia has the chance to become a resource of semantic statements, hitherto unknown regarding size, scope, openness, and internationalisation. These semantic enhancements bring to Wikipedia benefits of today's semantic technologies: more specific ways of searching and browsing. Also, the RDF} export, that gives direct access to the formalised knowledge, opens Wikipedia up to a wide range of external applications, that will be able to use it as a background knowledge Base.In} this paper, we present the design, implementation, and possible uses of this extension.
Haller, Heiko; Krötzsch, Markus; Völkel, Max & Vrandecic, Denny Semantic Wikipedia Proceedings of the 2006 international symposium on Wikis 2006 [1,194]
Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But its contents are barely machine-interpretable. Structural knowledge, e. g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual Meaning.We} provide an extension to be integrated in Wikipedia, that allows even casual users the typing of links between articles and the specification of typed data inside the articles. Wiki users profit from more specific ways of searching and browsing. Each page has an RDF} export, that gives direct access to the formalised knowledge. This allows applications to use Wikipedia as a background knowledge base.
Demartini, G.; Firan, C.S.; Iofciu, T. & Nejdl, W. Semantically enhanced entity ranking Web Information Systems Engineering - WISE 2008. 9th International Conference, 1-3 Sept. 2008 Berlin, Germany 2008 [1,195]
Users often want to find entities instead of just documents, i.e., finding documents entirely about specific real-world entities rather than general documents where the entities are merely mentioned. Searching for entities on Web scale repositories is still an open challenge as the effectiveness of ranking is usually not satisfactory. Semantics can be used in this context to improve the results leveraging on entity-driven ontologies. In this paper we propose three categories of algorithms for query adaptation, using (1) semantic information, (2) NLP} techniques, and (3) link structure, to rank entities in Wikipedia. Our approaches focus on constructing queries using not only keywords but also additional syntactic information, while semantically relaxing the query relying on a highly accurate ontology. The results show that our approaches perform effectively, and that the combination of simple NLP, link analysis and semantic techniques improves the retrieval performance of entity search.
Filippova, Katja & Strube, Michael Sentence fusion via dependency graph compression Proceedings of the Conference on Empirical Methods in Natural Language Processing 2008 [1,196]
We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet} and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay \& McKeown} (2005) with respect to readability.
Sakai, Tetsuya & Nogami, Kenichi Serendipitous search via wikipedia: a query log analysis Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,197]
We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.
Ciglan, Marek & Nrvag, Kjetil SGDB - Simple graph database optimized for activation spreading computation 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, April 1, 2010 - April 4, 2010 Tsukuba, Japan 2010 [1,198]
In this paper, we present SGDB, a graph database with a storage model optimized for computation of Spreading Activation (SA) queries. The primary goal of the system is to minimize the execution time of spreading activation algorithm over large graph structures stored on a persistent media; without pre-loading the whole graph into the memory. We propose a storage model aiming to minimize number of accesses to the storage media during execution of SA} and we propose a graph query type for the activation spreading operation. Finally, we present the implementation and its performance characteristics in scope of our pilot application that uses the activation spreading over the Wikipedia link graph. 2010 Springer-Verlag} Berlin Heidelberg.
Novotney, Scott & Callison-Burch, Chris Shared task: crowdsourced accessibility elicitation of Wikipedia articles Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk 2010 [1,199]
Mechanical Turk is useful for generating complex speech resources like conversational speech transcription. In this work, we explore the next step of eliciting narrations of Wikipedia articles to improve accessibility for low-literacy users. This task proves a useful test-bed to implement qualitative vetting of workers based on difficult to define metrics like narrative quality. Working with the Mechanical Turk API, we collected sample narrations, had other Turkers rate these samples and then granted access to full narration {HITs} depending on aggregate quality. While narrating full articles proved too onerous a task to be viable, using other Turkers to perform vetting was very successful. Elicitation is possible on Mechanical Turk, but it should conform to suggested best practices of simple tasks that can be completed in a streamlined workflow.
Parton, Kristen; McKeown, Kathleen R.; Allan, James & Henestroza, Enrique Simultaneous multilingual search for translingual information retrieval Proceeding of the 17th ACM conference on Information and knowledge management 2008 [1,200]
We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR} that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed pseudo-parallel" corpus where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation we compared a statistical machine translation (SMT) approach to a dictionary-based approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based} dictionary worked better than SMT} alone. Simultaneous multilingual search also has other important features suited to translingual search since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR} and SMT} allows us to improve result translation in addition to IR} results."
Blumenstock, Joshua E. Size matters: word count as a measure of quality on wikipedia Proceeding of the 17th international conference on World Wide Web 2008 [1,201]
Wikipedia, the free encyclopedia" now contains over two million English articles and is widely regarded as a high-quality authoritative encyclopedia. Some Wikipedia articles however
Pirolli, Peter; Wollny, Evelin & Suh, Bongwon So you know you're getting the best possible information: a tool that increases Wikipedia credibility Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,202]
An experiment was conducted to study how credibility judgments about Wikipedia are affected by providing users with an interactive visualization (WikiDashboard) of article and author editing history. Overall, users who self-reported higher use of Internet information and higher rates of Wikipedia usage tended to produce lower credibility judgments about Wikipedia articles and authors. However, use of WikiDashboard} significantly increased article and author credibility judgments, with effect sizes larger than any other measured effects of background media usage and attitudes on Wikiepedia credibility. The results suggest that increased exposure to the editing/authoring histories of Wikipedia increases credibility judgments.
Rodrigues, Eduarda Mendes & Milic-Frayling, Natasa Socializing or knowledge sharing?: characterizing social intent in community question answering Proceeding of the 18th ACM conference on Information and knowledge management 2009 [1,203]
Knowledge sharing communities, such as Wikipedia or Yahoo! Answers, add greatly to the wealth of information available on the Web. They represent complex social ecosystems that rely on user paricipation and the quality of users' contributions to prosper. However, quality is harder to achieve when knowledge sharing is facilitated through a high degree of personal interactions. The individuals' objectives may change from knowledge sharing to socializing, with a profound impact on the community and the value it delivers to the broader population of Web users. In this paper we provide new insights into the types of content that is shared through Community Question Answering (CQA) services. We demonstrate an approach that combines in-depth content analysis with social network analysis techniques. We adapted the Undirected Inductive Coding method to analyze samples of user questions and arrive at a comprehensive typology of the user intent. In our analysis we focused on two types of intent, social vs. non-social, and define measures of social engagement to characterize the users' participation and content contributions. Our approach is applicable to a broad class of online communities and can be used to monitor the dynamics of community ecosystems.
Atzenbeck, Claus & Hicks, David L. Socs: increasing social and group awareness for Wikis by example of Wikipedia Proceedings of the 4th International Symposium on Wikis 2008 [1,204]
Many wikis provide good workspace awareness. Users see quickly what changes have been made or get notified about modifications on selected pages. However, they do not support a more sophisticated social or group awareness. Being aware of social structures is important for collaborative work. Adequate tools permit team members to reflect upon their and others' roles, detect and solve related conflicts in good time, and provide a means to communicate team developments. This makes such applications an effective means for new collaborators (to understand the team), long term team members (to see what is going on), and team coordinators (to manage teams and identify potential problems). This becomes especially important for fragile, large, or ad hoc virtual teams as we find around many wikis, such as Wikipedia. Furthermore, we argue that social and group awareness increases the quality of articles indirectly and is beneficial for both experts and novice users. We introduce Socs, a prototype that permits authoring social structures using spatial hypertext methods via a so-called social space". It serves as a means to express store and communicate social information about people such as wiki authors. Furthermore Socs integrates a Web browser and the system-wide address book that act as sources for the social space and as a basis for sophisticated awareness services. Socs provides awareness about the authors of a wiki page and which of them are part of the user's structure on the social space
West, Andrew G.; Kannan, Sampath & Lee, Insup Spatio-temporal analysis of Wikipedia metadata and the STiki anti-vandalism tool Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,205]
The bulk of Wikipedia anti-vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki} is a real-time, On-Wikipedia} tool leveraging this technique. The associated poster reviews STiki's} methodology and performance. We find competing anti-vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations.
Lim, Ee-Peng; Maureen; Ibrahim, Nelman Lubis; Sun, Aixin; Datta, Anwitaman & Chang, Kuiyu SSnetViz: a visualization engine for heterogeneous semantic social networks Proceedings of the 11th International Conference on Electronic Commerce 2009 [1,206]
SSnetViz} is an ongoing research to design and implement a visualization engine for heterogeneous semantic social networks. A semantic social network is a multi-modal network that contains nodes representing different types of people or object entities, and edges representing relationships among them. When multiple heterogeneous semantic social networks are to be visualized together, SSnetViz} provides a suite of functions to store heterogeneous semantic social networks, to integrate them for searching and analysis. We will illustrate these functions using social networks related to terrorism research, one crafted by domain experts and another from Wikipedia.
West, Andrew G.; Kannan, Sampath & Lee, Insup STiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,207]
STiki} is an anti-vandalism tool for Wikipedia. Unlike similar tools, STiki} does not rely on natural language processing (NLP) over the article or diff text to locate vandalism. Instead, STiki} leverages spatio-temporal properties of revision metadata. The feasibility of utilizing such properties was demonstrated in our prior work, which found they perform comparably to NLP-efforts} while being more efficient, robust to evasion, and language independent. STiki} is a real-time, On-Wikipedia} implementation based on these properties. It consists of, (1) a server-side processing engine that examines revisions, scoring the likelihood each is vandalism, and, (2) a client-side GUI} that presents likely vandalism to end-users for definitive classification (and if necessary, reversion on Wikipedia). Our demonstration will provide an introduction to spatio-temporal properties, demonstrate the STiki} software, and discuss alternative research uses for the open-source code.
Stein, Benno; zu Eissen, Sven Meyer & Potthast, Martin Strategies for retrieving plagiarized documents Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval 2007 [1,208]
For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using the entire Wikipedia corpus we compile n-gram indexes and compare them to a new kind of fingerprint index in a plagiarism analysis use case. Our index provides an analysis speed-up by factor 1.5 and is an order of magnitude smaller, while being equivalent in terms of precision and recall.
Plank, Barbara Structural correspondence learning for parse disambiguation Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop 2009 [1,209]
The paper presents an application of Structural Correspondence Learning (SCL) (Blitzer} et al., 2006) for domain adaptation of a stochastic attribute-value grammar (SAVG).} So far, SCL} has been applied successfully in NLP} for Part-of-Speech} tagging and Sentiment Analysis (Blitzer} et al., 2006; Blitzer et al., 2007). An attempt was made in the CoNLL} 2007 shared task to apply SCL} to non-projective dependency parsing (Shimizu} and Nakagawa, 2007), however, without any clear conclusions. We report on our exploration of applying SCL} to adapt a syntactic disambiguation model and show promising initial results on Wikipedia domains.
Han, Xianpei & Zhao, Jun Structural semantic relatedness: a knowledge-based method to named entity disambiguation Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,210]
Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW} based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7\% and 14.7\%.
Sabel, Mikalai Structuring wiki revision history Proceedings of the 2007 international symposium on Wikis 2007 [1,211]
Revision history of a wiki page is traditionally maintained as a linear chronological sequence. We propose to represent revision history as a tree of versions. Every edge in the tree is given a weight, called adoption coefficient, indicating similarity between the two corresponding page versions. The same coefficients are used to build the tree. In the implementation described, adoption coefficients are derived from comparing texts of the versions, similarly to computing edit distance. The tree structure reflects actual evolution of page content, revealing reverts, vandalism, and edit wars, which is demonstrated on Wikipedia examples. The tree representation is useful for both human editors and automated algorithms, including trust and reputation schemes for wiki.
Nguyen, Dat P. T.; Matsuo, Yutaka & Ishizuka, Mitsuru Subtree mining for relation extraction from Wikipedia NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers 2007 [1,212]
In this study, we address the problem of extracting relations between entities from Wikipedia's English articles. Our proposed method first anchors the appearance of entities in Wikipedia's articles using neither Named Entity Recognizer (NER) nor coreference resolution tool. It then classifies the relationships between entity pairs using SVM} with features extracted from the web structure and subtrees mined from the syntactic structure of text. We evaluate our method on manually annotated data from actual Wikipedia articles.
Cosley, Dan; Frankowski, Dan; Terveen, Loren & Riedl, John SuggestBot: using intelligent task routing to help people find work in wikipedia Proceedings of the 12th international conference on Intelligent user interfaces 2007 [1,213]
Member-maintained communities ask their users to perform tasks the community needs. From Slashdot, to IMDb, to Wikipedia, groups with diverse interests create community-maintained artifacts of lasting value (CALV) that support the group's main purpose and provide value to others. Said communities don't help members find work to do, or do so without regard to individual preferences, such as Slashdot assigning meta-moderation randomly. Yet social science theory suggests that reducing the cost and increasing the personal value of contribution would motivate members to participate More.We} present SuggestBot, software that performs intelligent task routing (matching people with tasks) in Wikipedia. SuggestBot} uses broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks. SuggestBot's} intelligent task routing increases the number of edits by roughly four times compared to suggesting random articles. Our contributions are: 1) demonstrating the value of intelligent task routing in a real deployment; 2) showing how to do intelligent task routing; and 3) sharing our experience of deploying a tool in Wikipedia, which offered both challenges and opportunities for research.
Ye, Shiren; Chua, Tat-Seng & Lu, Jie Summarizing definition from Wikipedia Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 2009 [1,214]
Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA} 2004--2006 evaluations. The results show that the model is effective in summarization and definition QA.
Meij, Edgar & de Rijke, Maarten Supervised query modeling using wikipedia Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval 2010 [1,215]
We use Wikipedia articles to semantically inform the generation of query models. To this end, we apply supervised machine learning to automatically link queries to Wikipedia articles and sample terms from the linked articles to re-estimate the query model. On a recent large web corpus, we observe substantial gains in terms of both traditional metrics and diversity measures.
Bai, Bing; Weston, Jason; Grangier, David; Collobert, Ronan; Sadamasa, Kunihiko; Qi, Yanjun; Chapelle, Olivier & Weinberger, Kilian Supervised semantic indexing Proceeding of the 18th ACM conference on Information and knowledge management 2009 [1,216]
In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI} our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH).} We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.
Raisanen, Teppo Supporting the sense-making processes of web users by using a proxy server 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states 2009 [1,217]
This paper presents a study on how we can support knowledge creation - especially a process called comprehension - in a Web 2.0 environment by providing new functionalities to users of existing Web services. The contribution of this paper is twofold. Firstly, a framework for providing new functionalities is presented. Secondly, a prototype Web service is implemented and evaluated. The prototype uses Wikipedia as an example and as a knowledge repository. The emphasis is on the prototype, a service that allows us to 1) insert sticky notes in Wikipedia articles, 2) enhance the translation capabilities of Wikipedia, and 3) highlight texts in Wikipedia. The analysis of the prototype service shows that we can provide new functionalities to Web users with a proxy server and that the implemented tools offers some support for knowledge creation process called comprehension. The translation service proved especially useful.
Supporting the Sense-Making Processes of Web Users by Using a Proxy Server Proceedings of the 42nd Hawaii International Conference on System Sciences 2009 [1,218]
This paper presents a study on how we can support knowledge creation - especially a process called comprehension - in a Web 2.0 environment by providing new functionalities to users of existing Web services. The contribution of this paper is twofold. Firstly, a framework for providing new functionalities is presented. Secondly, a prototype Web service is implemented and evaluated. The prototype uses Wikipedia as an example and as a knowledge repository. The emphasis is on the prototype, a service that allows us to 1) insert sticky notes in Wikipedia articles, 2) enhance the translation capabilities of Wikipedia, and 3) highlight texts in Wikipedia. The analysis of the prototype service shows that we can provide new functionalities to Web users with a proxy server and that the implemented tools offers some support for knowledge creation process called comprehension. The translation service proved especially useful.
Ortega, Felipe & Izquierdo-Cortazar, Daniel Survival analysis in open development projects Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development 2009 [1,219]
Open collaborative projects, like FLOSS} development projects and open content creation projects (e.g. Wikipedia), heavily depend on contributions from their respective communities to improve. In this context, an important question for both researchers and practitioners is: what is the expected lifetime of contributors in a community? Answering this question, we will be able to characterize these communities as an appropriate model can show whether or not users maintain their interest to contribute, for how long we could expect them to collaborate and, as a result, improve the organization and management of the project. In this paper, we demonstrate that survival analysis, a wellknown statistical methodology in other research areas such as epidemiology, biology or demographic studies, is a useful methodology to undertake a quantitative comparison of the lifetime of contributors in open collaborative initiatives, like the development of FLOSS} projects and the Wikipedia, providing insightful answers to this challenging question.
Stampouli, Anastasia; Giannakidou, Eirini & Vakali, Athena Tag disambiguation through Flickr and Wikipedia 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, April 1, 2010 - April 4, 2010 Tsukuba, Japan 2010 [1,220]
Given the popularity of social tagging systems and the limitations these systems have, due to lack of any structure, a common issue that arises involves the low retrieval quality in such systems due to ambiguities of certain terms. In this paper, an approach for improving the retrieval in these systems, in case of ambiguous terms, is presented that attempts to perform tag disambiguation and, at the same time, provide users with relevant content. The idea is based on a mashup that combines data and functionality of two major web 2.0 sites, namely Flickr and Wikipedia and aims at enhancing content retrieval for web users. A case study with the ambiguous notion Apple"} illustrates the value of the proposed approach. 2010 Springer-Verlag} Berlin Heidelberg."
Burke, Moira & Kraut, Robert Taking up the mop: identifying future wikipedia administrators CHI '08 CHI '08 extended abstracts on Human factors in computing systems 2008 [1,221]
As Wikipedia grows, so do the messy byproducts of collaboration. Backlogs of administrative work are increasing, suggesting the need for more users with privileged admin status. This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship. The model can be applied as an AdminFinderBot"} to automatically search all editors' histories and pick out likely future admins as a self-evaluation tool or as a dashboard of relevant statistics for voters evaluating admin candidates."
Viegas, Fernanda B.; Wattenberg, Martin; Kriss, Jesse & Ham, Frank Van Talk before you type: Coordination in Wikipedia 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states 2007 [1,222]
Wikipedia, the online encyclopedia, has attracted attention both because of its popularity and its unconventional policy of letting anyone on the internet edit its articles. This paper describes the results of an empirical analysis of Wikipedia and discusses ways in which the Wikipedia community has evolved as it has grown. We contrast our findings with an earlier study [11] and present three main results. First, the community maintains a strong resilience to malicious editing, despite tremendous growth and high traffic. Second, the fastest growing areas of Wikipedia are devoted to coordination and organization. Finally, we focus on a particular set of pages used to coordinate work, the Talk"} pages. By manually coding the content of a subset of these pages we find that these pages serve many purposes notably supporting strategic planning of edits and enforcement of standard guidelines and conventions. Our results suggest that despite the potential for anarchy the Wikipedia community places a strong emphasis on group coordination policy
Konieczny, Piotr Teaching with Wikipedia and other Wikimedia foundation wikis Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,223]
Wikipedia and other wikis operated by the Wikimedia Foundation are finding increasing applications in teaching and learning. This workshop will demonstrate how teachers from academia and beyond can use those wikis in their courses. Wikipedia can be used for various assignments: for example, students can be asked to reference an unreferenced article or create a completely new one. Students can also work on creating a free textbook on Wikibooks, learn about journalism on Wikinews or engage in variety of media-related projects on Commons. In doing so, students will see that writing an article and related assignments are not a 'tedious assignment' but activities that millions do 'for fun'. They will also gain a deeper understanding of what Wikipedia is, and how (un)reliable it can be. They and the course leaders are assisted by a lively, real world community. Last, but not least, their work will also benefit -- and be improved upon -- by the entire world. The workshop will focus on English Wikipedia, the most popular WMF} wiki with regards to where the teaching assignments are taking place, but will also discuss the educational opportunities on other WMF} wikis, such as Wikibooks. An overview of the Wikipedia School and University Project will be presented. There will be a discussion of Wikipedia policies related to teaching assignments, and a presentation of tools developed to make teaching with Wikipedia easier. The participants will see what kind of assignments can be done on Wikipedia (from learning simple wiki editing skills, through the assignments designed to teach students about proper referencing and sources reliability, to writing paper assignments with the goal of developing Good and Featured Articles), and how they can be implemented most easily and efficiently, avoiding common pitfalls and dealing with common problems (such as how to avoid having your students' articles deleted minutes after creation). Finally, the participants will be given an opportunity to create a draft syllabus for a future course they may want to teach on a WMF} wiki (bringing your laptops for that part is highly recommended).
Longo, Luca; Dondio, Pierpaolo & Barrett, Stephen Temporal factors to evaluate trustworthiness of virtual identities Security and Privacy in Communications Networks and the Workshops, 2007. SecureComm 2007. Third International Conference on 2007
In this paper we investigate how temporal factors (i.e. factors computed by considering only the time-distribution of interactions) can be used as an evidence of an entity’s trustworthiness. While reputation and direct experience are the two most widely used sources of trust in applications, we believe that new sources of evidence and new applications should be investigated [1]. Moreover, while these two classical techniques are based on evaluating the outcomes of interactions (direct or indirect), temporal factors are based on quantitative analysis, representing an alternative way of assessing trust. Our presumption is that, even with this limited information, temporal factors could be a plausible evidence of trust that might be aggregated with more traditional sources. After defining our formal model of four main temporal factors - activity, presence, regularity, frequency, we performed an evaluation over the Wikipedia project, considering more than 12000 users and 94000 articles. Our encouraging results show how, based solely on temporal factors, plausible trust decisions can be achieved.
Mousavidin, Elham & Silva, Leiser Testimonial Knowledge and Trust in Virtual Communities: A Research in Progress of the Case of Wikipedia 2009 [1,224]
Wikipedia is one of the fastest growing phenomena of today’s world. People increasingly rely on this free source for gaining encyclopedic knowledge regardless of its controversial nature. The goal of this research in progress is to investigate the reasons why people trust the content of Wikipedia. The findings of this research might shed light on understanding knowledge sharing in virtual communities. There has been extensive research on virtual communities as sources of knowledge. Researchers have studied the factors that motivate individuals to consult and contribute to the content of these communities. This concept has been studied from perspectives such as sociology, psychology and economics. Moreover, trust has been mentioned as an important factor in consulting these communities as sources of knowledge. This research in progress intends to apply the idea of “testimonial knowledge,‿ popularized by John Hardwig to studying Wikipedia. In so doing, we discuss trust in the accuracy of content of Wikipedia.
Gupta, Rakesh & Ratinov, Lev Text categorization with knowledge transfer from heterogeneous data sources Proceedings of the 23rd national conference on Artificial intelligence - Volume 2 2008 [1,225]
Multi-category classification of short dialogues is a common task performed by humans. When assigning a question to an expert, a customer service operator tries to classify the customer query into one of N different classes for which experts are available. Similarly, questions on the web (for example questions at Yahoo Answers) can be automatically forwarded to a restricted group of people with a specific expertise. Typical questions are short and assume background world knowledge for correct classification. With exponentially increasing amount of knowledge available, with distinct properties (labeled vs unlabeled, structured vs unstructured), no single knowledge-transfer algorithm such as transfer learning, multi-task learning or selftaught learning can be applied universally. In this work we show that bag-of-words classifiers performs poorly on noisy short conversational text snippets. We present an algorithm for leveraging heterogeneous data sources and algorithms with significant improvements over any single algorithm, rivaling human performance. Using different algorithms for each knowledge source we use mutual information to aggressively prune features. With heterogeneous data sources including Wikipedia, Open Directory Project (ODP), and Yahoo Answers, we show 89.4\% and 96.8\% correct classification on Google Answers corpus and Switchboard corpus using only 200 features/class. This reflects a huge improvement over bag of words approaches and 48-65\% error reduction over previously published state of art (Gabrilovich} et. al. 2006).
Wang, Kai; Lin, Chien-Liang; Chen, Chun-Der & Yang, Shu-Chen The Adoption of Wikipedia: A Community- and Information Quality-Based View 2008 [1,226]
Trattner, Christoph; Hasani-Mavriqi, Ilire; Helic, Denis & Leitner, Helmut The Austrian way of wiki(pedia)! Development of a structured wiki-based encyclopedia within a local Austrian context 6th International Symposium on Wikis and Open Collaboration, WikiSym 2010, July 7, 2010 - July 9, 2010 Gdansk, Poland 2010 [1,227]
Although the success of online encyclopedias such as Wiki-pedia is indisputable, researchers have questioned usefulness of Wikipedia in educational settings. Problems such as copypaste syndrome, unchecked quality, or fragmentation of knowledge have been recognized as serious drawbacks for a wide spread application of Wikipedia in universities or high schools. In this paper we present a Wiki-based encyclopedia called Austria-Forum} that aims to combine openness and collaboration aspects of Wikipedia with approaches to build a structured, quality inspected, and context-sensitive online encyclopedia. To ensure tractability of the publishing process the system focuses on providing information within a local Austrian context. It is our experience that such an approach represents a first step of a proper application of online encyclopedias in educational settings.
Huner, Kai M. & Otto, Boris The effect of using a semantic wiki for metadata management: A controlled experiment 42nd Annual Hawaii International Conference on System Sciences, HICSS, January 5, 2009 - January 9, 2009 Waikoloa, {HI, United states 2009 [1,228]
A coherent and consistent understanding of corporate data is an important factor for effective management of diversified companies and implies a need for company-wide unambiguous data definitions. Inspired by the success of Wikipedia, wiki software has become a broadly discussed alternative for corporate metadata management. However, in contrast to the performance and sustainability of wikis in general, benefits of using semantic wikis have not been investigated sufficiently. The paper at hand presents results of a controlled experiment that investigates effects of using a semantic wiki for metadata management in comparison to a classical wiki. Considering threats to validity, the analysis (i.e. 74 subjects using both a classical and a semantic wiki) shows that the semantic wiki is superior to the classical variant regarding information retrieval tasks. At the same time, the results indicate that more effort is needed to build up the semantically annotated wiki content in the semantic wiki.
Chen, Jilin; Ren, Yuqing & Riedl, John The effects of diversity on group productivity and member withdrawal in online volunteer groups Proceedings of the 28th international conference on Human factors in computing systems 2010 [1,229]
The wisdom of crowds" argument emphasizes the importance of diversity in online collaborations such as open source projects and Wikipedia. However decades of research on diversity in offline work groups have painted an inconclusive picture. On the one hand the broader range of insights from a diverse group can lead to improved outcomes. On the other hand individual differences can lead to conflict and diminished performance. In this paper
Khalid, M.A.; Jijkoun, V. & de Rijke, M. The impact of named entity normalization on information retrieval for question answering Advances in Information Retrieval. 30th European Conference on IR Research, ECIR 2008, 30 March-3 April 2008 Berlin, Germany 2008
In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question answering. We find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval. Moreover, better normalization results in better retrieval performance.
Kamps, J. & Koolen, M. The importance of link evidence in Wikipedia Advances in Information Retrieval. 30th European Conference on IR Research, ECIR 2008, 30 March-3 April 2008 Berlin, Germany 2008
Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX} 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX} ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC} experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval.
Huang, Wei Che; Trotman, Andrew & Geva, Shlomo The importance of manual assessment in link discovery Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,230]
Using a ground truth extracted from the Wikipedia, and a ground truth created through manual assessment, we show that the apparent performance advantage seen in machine learning approaches to link discovery are an artifact of trivial links that are actively rejected by manual assessors.
Vora, Parul; Komura, Naoko & Team, Stanton Usability The n00b Wikipedia Editing Experience Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,231]
Wikipedia is one of the largest online collaborative projects. At present the multi-lingual encyclopedia is the fifth most popular website and contains more than 13 million articles in 271 languages. The technical barriers to contribution, however, remain quite high. This paper describes the qualitative research and design methods used in our efforts to identify and reduce those barriers to participation for non-editors and measurably increase their ability to contribute to the project
Curino, Carlo A.; Moon, Hyun J.; Ham, MyungWon & Zaniolo, Carlo The PRISM Workwench: Database Schema Evolution without Tears Proceedings of the 2009 IEEE International Conference on Data Engineering 2009 [1,232]
Information Systems are subject to a perpetual evolution, which is particularly pressing in Web Information Systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious rami﬿cations of such changes upon database-centric Information System softwares. Therefore, there is a need to automate and simplify the schema evolution process and to ensure predictability and logical independence upon schema changes. Current relational technology makes it easy to change the database content or to revise the underlaying storage and indexes but does little to support logical schema evolution which nowadays remains poorly supported by commercial tools. The PRISM} system demonstrates a major new advance toward automating schema evolution (including query mapping and database conversion), by improving predictability, logical independence, and auditability of the process. In fact, PRISM} exploits recent theoretical results on mapping composition, invertibility and query rewriting to provide DB} Administrators with an intuitive, operational workbench usable in their everyday activities—thus enabling graceful schema evolution. In this demonstration, we will show (i) the functionality of PRISM} and its supportive AJAX} interface, (ii) its architecture built upon a simple SQL–inspired} language of Schema Modi﬿cation Operators, and (iii) we will allow conference participants to directly interact with the system to test its capabilities. Finally, some of the most interesting evolution steps of popular Web Information Systems, such as Wikipedia, will be reviewed in a brief Saga} of Famous Schema Evolutions"."
Kaisser, Michael The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session 2008 [1,233]
This paper describes the online demo of the QuALiM} Question Answering system. While the system actually gets answers from the web by querying major search engines, during presentation answers are supplemented with relevant passages from Wikipedia. We believe that this additional information improves a user's search experience.
Jain, Shaili & Parkes, David C. The role of game theory in human computation systems Proceedings of the ACM SIGKDD Workshop on Human Computation 2009 [1,234]
The paradigm of human computation" seeks to harness human abilities to solve computational problems or otherwise perform distributed work that is beyond the scope of current AI} technologies. One aspect of human computation has become known as "games with a purpose" and seeks to elicit useful computational work in fun (typically) multi-player games. Human computation also encompasses distributed work (or "peer production") systems such as Wikipedia and Question and Answer forums. In this short paper we survey existing game-theoretic models for various human computation designs and outline research challenges in advancing a theory that can enable better design."
Suh, Bongwon; Convertino, Gregorio; Chi, Ed H. & Pirolli, Peter The singularity is not near: Slowing growth of Wikipedia 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,235]
Prior research on Wikipedia has characterized the growth in content and editors as being fundamentally exponential in nature, extrapolating current trends into the future. We show that recent editing activity suggests that Wikipedia growth has slowed, and perhaps plateaued, indicating that it may have come against its limits to growth. We measure growth, population shifts, and patterns of editor and administrator activities, contrasting these against past results where possible. Both the rate of page growth and editor growth has declined. As growth has declined, there are indicators of increased coordination and overhead costs, exclusion of newcomers, and resistance to new edits. We discuss some possible explanations for these new developments in Wikipedia including decreased opportunities for sharing existing knowledge and increased bureaucratic stress on the socio-technical system itself.
Dutta, Amitava; Roy, Rahul; Seetharaman, Priya & Ingawale, Myshkin The Small Worlds of Wikipedia: Implications for Growth, Quality and Sustainability of Collaborative Knowledge Networks 2009 [1,236]
This work is a longitudinal network analysis of the interaction networks of Wikipedia, a free, user-led collaborativelygenerated online encyclopedia. Making a case for representing Wikipedia as a knowledge network, and using the lens of contemporary graph theory, we attempt to unravel its knowledge creation process and growth dynamics over time. Typical small-world characteristics of short path-length and high clustering have important theoretical implications for knowledge networks. We show Wikipedia’s small-world nature to be increasing over time, while also uncovering power laws and assortative mixing. Investigating the process by which an apparently un-coordinated, diversely motivated swarm of assorted contributors, create and maintain remarkably high quality content, we find an association between Quality and Structural Holes. We find that a few key high degree, cluster spanning nodes - ‘hubs’ - hold the growing network together, and discuss implications for the networks’ growth and emergent quality.
Geiger, R. Stuart The social roles of bots and assisted editing programs 5th International Symposium on Wikis and Open Collaboration, WiKiSym 2009, October 25, 2009 - October 27, 2009 Orlando, FL, United states 2009 [1,237]
This paper investigates software programs as non-human social actors in Wikipedia, arguing that influence must not be overlooked in social scientific research of the on-line encyclopedia project. Using statistical and archival methods, the roles of assisted editing programs and bots are examined. proportion of edits made by these non-human actors is shown to be significantly more than previously described in earlier research.
Viegas, Fernanda B. The visual side of Wikipedia 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states 2007 [1,238]
The name Wikipedia"} has been associated with terms such as collaboration volunteers reliability vandalism and edit-war. Fewer people might think of "images
Wang, Yafang; Zhu, Mingjie; Qu, Lizhen; Spaniol, Marc & Weikum, Gerhard Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia Proceedings of the 13th International Conference on Extending Database Technology 2010 [1,239]
Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include time as a first-class dimension. In this paper, we introduce Timely YAGO, which extends our previously built knowledge base YAGO} with temporal aspects. This prototype system extracts temporal facts from Wikipedia infoboxes, categories, and lists in articles, and integrates these into the Timely YAGO} knowledge base. We also support querying temporal facts, by temporal predicates in a SPARQL-style} language. Visualization of query results is provided in order to better understand of the dynamic nature of knowledge.
Medelyan, Olena; Witten, Ian H. & Milne, David Topic indexing with wikipedia 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Wikipedia can be utilized as a controlled vocabulary for identifying the main topics in a document, with article titles serving as index terms and redirect titles as their synonyms. Wikipedia contains over {4M} such titles covering the terminology of nearly any document collection. This permits controlled indexing in the absence of manually created vocabularies. We combine state-of-the-art strategies for automatic controlled indexing with Wikipedia's unique property-a richly hyperlinked encyclopedia. We evaluate the scheme by comparing automatically assigned topics with those chosen manually by human indexers. Analysis of indexing consistency shows that our algorithm performs as well as the average person.
Wahabzada, Mirwaes; Xu, Zhao & Kersting, Kristian Topic models conditioned on relations European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010, September 20, 2010 - September 24, 2010 Barcelona, Spain 2010 [1,240]
Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of documents. Recently, it was even shown that relations among documents such as hyper-links or citations allow one to share information between documents and in turn to improve topic generation. Although fully generative, in many situations we are actually not interested in predicting relations among documents. In this paper, we therefore present a Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations. On networks of scientific abstracts and of Wikipedia documents we show that this approach meets or exceeds the performance of several baseline topic models. 2010 Springer-Verlag} Berlin Heidelberg.
Nastase, Vivi Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation Proceedings of the Conference on Empirical Methods in Natural Language Processing 2008 [1,241]
Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic -- a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To understand" the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC} summarization data. The system implemented ranks high compared to the participating systems in the DUC} competitions confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system."
Gehres, Peter; Singleton, Nathan; Louthan, George & Hale, John Toward sensitive information redaction in a collaborative, multilevel security environment Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,242]
Wikis have proven to be an invaluable tool for collaboration. The most prominent is, of course, Wikipedia. Its open nature is not suitable for all environments; in corporate, government, and research environments it is often necessary to control access to some or all of the information due to confidentially, privacy, or security concerns. This paper proposes a method by which information classified at multiple sensitivity levels can be securely stored and made accessible via the wiki only to authenticated and authorized users. The model allows for each page to be viewed at appropriate levels of classification transparently included or excluded based on the user's access level.
Wang, Pu & Domeniconi, Carlotta Towards a universal text classifier: Transfer learning using encyclopedic knowledge 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, December 6, 2009 - December 6, 2009 Miami, FL, United states 2009 [1,243]
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents based on background knowledge. As such, we build a classifier that can effectively group documents based on their content, under the guidance of few words describing the classes of interest. Background knowledge is modeled using encyclopedic knowledge, namely Wikipedia. The universal text classifier can also be used to perform document retrieval. In our experiments with real data we test the feasibility of our approach for both the classification and retrieval tasks.
Ronchetti, Marco & Sant, Joseph Towards automatic syllabi matching Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education 2009 [1,244]
Student mobility is a priority in the European Union since it not only allows academic interchange but also fosters the awareness of being a European citizen amongst students. The Bologna Process aimed at homogenizing the structure of the European Universities to facilitate the recognition of academic titles as foreseen by the Lisbon Recognition Convention and student mobility during their matriculation. Over one and a half million students have already benefited from mobility programs such as the Erasmus programme. Students that participate in a mobility program must consider a destination, a selection of courses to follow abroad and how their home institution will recognize their foreign credits. Selecting the most appropriate courses is not a simple task since a course title doesn't always reflect its content. As a result, manual inspection of syllabi is necessary. This makes the task time-consuming since it might require manual inspection and comparison of many syllabi from different institutions. It would be nice to be able to at least partially automate the process -- i.e. given a set of syllabi from two different universities, to be able to automatically find the best match among courses in the two institutions. We started experimenting with this possibility, and although we do not yet have final results we will present the main idea of our project. Our plan is to try to apply similarity matching algorithms to available documents. Similarity matching is often based on co-occurrence of common words. However, a naïve application of such an algorithm would probably end up generating spurious similarities from the co-occurrence of general terms like hour exercise exam...". Using a stop-word strategy in which these words are catalogued and ignored might seem a viable solution but generally does not significantly improve the results: words that may be considered irrelevant in one context might be important in a different context. The path we are following is to assume the existence of a reference ontology where all terms have a description
Kotov, Alexander & Zhai, ChengXiang Towards natural question-guided search 19th International World Wide Web Conference, WWW2010, April 26, 2010 - April 30, 2010 Raleigh, NC, United states 2010 [1,245]
Web search is generally motivated by an information need. Since asking well-formulated questions is the fastest and the most natural way to obtain information for human beings, almost all queries posed to search engines correspond to some underlying questions, which reflect the user's information need. Accurate determination of these questions may substantially improve the quality of search results and usability of search interfaces. In this paper, we propose a new framework for question-guided search, in which a retrieval system would automatically generate potentially interesting questions to users based on the search results of a query. Since the answers to such questions are known to exist in the search results, these questions can potentially guide users directly to the answers that they are looking for, eliminating the need to scan the documents in the result list. Moreover, in case of imprecise or ambiguous queries, automatically generated questions can naturally engage users into a feedback cycle to refine their information need and guide them towards their search goals. Implementation of the proposed strategy raises new challenges in content indexing, question generation, ranking and feedback. We propose new methods to address these challenges and evaluated them with a prototype system on a subset of Wikipedia. Evaluation results show the promise of this new question-guided search strategy.
Veale, Tony Tracking the Lexical Zeitgeist with WordNet and Wikipedia Proceeding of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy 2006 [1,246]
Gleich, David F.; Constantine, Paul G.; Flaxman, Abraham D. & Gunawardana, Asela Tracking the random surfer: empirically measured teleportation parameters in PageRank Proceedings of the 19th international conference on World wide web 2010 [1,247]
PageRank} computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank} scores where PageRank} is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.
Désilets, Alain; Gonzalez, Lucas; Paquet, Sébastien & Stojanovic, Marta Translation the Wiki way Proceedings of the 2006 international symposium on Wikis 2006 [1,248]
This paper discusses the design and implementation of processes and tools to support the collaborative creation and maintenance of multilingual wiki content. A wiki is a website where a large number of participants are allowed to create and modify content using their Web browser. This simple concept has revolutionized collaborative authoring on the web, enabling among others, the creation of Wikipedia, the world's largest online encyclopedia. On many of the largest and highest profile wiki sites, content needs to be provided in more than one language. Yet, current wiki engines do not support the efficient creation and maintenance of such content. Consequently, most wiki sites deal with the issue of multilingualism by spawning a separate and independent site for each language. This approach leads to much wasted effort since the same content must be researched, tracked and written from scratch for every language. In this paper, we investigate what features could be implemented in wiki engines in order to deal more effectively with multilingual content. We look at how multilingual content is currently managed in more traditional industrial contexts, and show how this approach is not appropriate in a wiki world. We then describe the results of a User-Centered} Design exercise performed to explore what a multilingual wiki engine should look like from the point of view of its various end users. We describe a partial implementation of those requirements in our own wiki engine (LizzyWiki), to deal with the special case of bilingual sites. We also discuss how this simple implementation could be extended to provide even more sophisticated features, and in particular, to support the general case of a site with more than two languages. Finally, even though the paper focuses primarily on multilingual content in a wiki context, we argue that translating in this Wiki} Way" may also be useful in some traditional industrial settings as a way of dealing better with the fast and ever-changing nature of our modern internet world."
Platt, John C.; Toutanova, Kristina & tau Yih, Wen Translingual document representations from discriminative projections Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 2010 [1,249]
Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA).} Both of these variants start with a basic model of documents (PCA} and PLSA).} Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA} and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA} method is shown to perform best.
Vukovic, Maja; Kumara, Soundar & Greenshpan, Ohad Ubiquitous crowdsourcing Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing 2010 [1,250]
Web 2.0 provides the technological foundations upon which the crowdsourcing paradigm evolves and operates, enabling networked experts to work on various problem solving and data-intensive tasks. During the past decade crowdsourcing grew from a number of purpose-built initiatives, such as Wikipedia and Mechanical Turk, to a technique that today attracts and engages over 2 million people worldwide. As the computing systems are becoming more intimately embedded in physical and social contexts, promising truly ubiquitous computing, crowdsourcing takes new forms. Increasingly, crowds are engaged through mobile devices, to capture, share and validate sheer amount data (e.g. reporting security threats or capturing social events). This workshop challenges researchers and practitioners to think about three key aspects of ubiquitous crowdsourcing. Firstly, to establish technological foundations, what are the interaction models and protocols between the ubiquitous computing systems and the crowd? Secondly, how is crowdsourcing going to face the challenges in quality assurance, while providing valuable incentive frameworks that enable honest contributions? Finally, what are the novel applications of crowdsourcing enabled by ubiquitous computing systems?
Täckström, Oscar; Velupillai, Sumithra; Hassel, Martin; Eriksson, Gunnar; Dalianis, Hercules & Karlgren, Jussi Uncertainty detection as approximate max-margin sequence labelling Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task 2010 [1,251]
This paper reports experiments for the CoNLL-2010} shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the insentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the bioencoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5.
Billings, Matt & Watts, Leon A. Understanding dispute resolution online: using text to reflect personal and substantive issues in conflict Proceedings of the 28th international conference on Human factors in computing systems 2010 [1,252]
Conflict is a natural part of human communication with implications for the work and well-being of a community. It can cause projects to stall or fail. Alternatively new insights can be produced that are valuable to the community, and membership can be strengthened. We describe how Wikipedia mediators create and maintain a 'safe space'. They help conflicting parties to express, recognize and respond positively to their personal and substantive differences. We show how the 'mutability' of wiki text can be used productively by mediators: to legitimize and restructure the personal and substantive issues under dispute; to actively and visibly differentiate personal from substantive elements in the dispute, and to maintain asynchronous engagement by adjusting expectations of timeliness. We argue that online conflicts could be effectively conciliated in other text-based web communities, provided power differences can be controlled, by policies and technical measures for maintaining special 'safe' conflict resolution spaces.
Ingawale, Myshkin Understanding the wikipedia phenomenon: a case for agent based modeling Proceeding of the 2nd PhD workshop on Information and knowledge management 2008 [1,253]
Wikipedia, the user led and monitored open" encyclopedia has been an undoubted popular success. Of particular interest are the diffusion process of the innovation throughout the "contributor" community and the question as to why unpaid often well qualified volunteers contribute content and time. Explanations for 'altruistic' contributor behavior based on the positivistic paradigm and with roots in organizational psychology
Hu, Jian; Wang, Gang; Lochovsky, Fred; tao Sun, Jian & Chen, Zheng Understanding user's query intent with wikipedia Proceedings of the 18th international conference on World wide web 2009 [1,254]
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.
Tan, Bin & Peng, Fuchun Unsupervised query segmentation using generative language models and wikipedia Proceeding of the 17th international conference on World Wide Web 2008 [1,255]
In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4\% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM} optimization further improves the performance by 14.3\%. Additional knowledge from Wikipedia provides another improvement of 24.3\%, adding up to a total of 46\% improvement (from 0.530 to 0.774).
Yan, Yulan; Okazaki, Naoaki; Matsuo, Yutaka; Yang, Zhenglu & Ishizuka, Mitsuru Unsupervised relation extraction by mining Wikipedia texts using information from the web Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2 2009 [1,256]
This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. Fundamentally, our method demonstrates how deep linguistic patterns contribute complementarily with Web surface patterns to the generation of various relations.
Syed, Zareen & Finin, Tim Unsupervised techniques for discovering ontology elements from Wikipedia article links Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading 2010 [1,257]
We present an unsupervised and unrestricted approach to discovering an infobox like ontology by exploiting the inter-article links within Wikipedia. It discovers new slots and fillers that may not be available in the Wikipedia infoboxes. Our results demonstrate that there are certain types of properties that are evident in the link structure of resources like Wikipedia that can be predicted with high accuracy using little or no linguistic analysis. The discovered properties can be further used to discover a class hierarchy. Our experiments have focused on analyzing people in Wikipedia, but the techniques can be directly applied to other types of entities in text resources that are rich with hyperlinks.
de Melo, Gerard & Weikum, Gerhard Untangling the cross-lingual link structure of Wikipedia Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,258]
Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world's entities and concepts.
Wang, Qihua; Jin, Hongxia & Li, Ninghui Usable access control in collaborative environments: Authorization based on people-tagging 14th European Symposium on Research in Computer Security, ESORICS 2009, September 21, 2009 - September 23, 2009 Saint-Malo, France 2009 [1,259]
We study attribute-based access control for resource sharing in collaborative work environments. The goal of our work is to encourage sharing within an organization by striking a balance between usability and security. Inspired by the great success of a number of collaboration-based Web 2.0 systems, such as Wikipedia and Del.icio.us, we propose a novel attribute-based access control framework that acquires information on users' attributes from the collaborative efforts of all users in a system, instead of from a small number of trusted agents. Intuitively, if several users say that someone has a certain attribute, our system believes that the latter indeed has the attribute. In order to allow users to specify and maintain the attributes of each other, we employ the mechanism of people-tagging, where users can tag each other with the terms they want, and tags from different users are combined and viewable by all users in the system. In this article, we describe the system framework of our solution, propose a language to specify access control policies, and design an example-based policy specification method that is friendly to ordinary users. We have implemented a prototype of our solution based on a real-world and large-scale people-tagging system in IBM.} Experiments have been performed on the data collected by the system. 2009 Springer Berlin Heidelberg.
Albertsen, Johannes & Bouvin, Niels Olof User defined structural searches in mediawiki Proceedings of the nineteenth ACM conference on Hypertext and hypermedia 2008 [1,260]
Wikipedia has been the poster child of user contributed content using the space of MediaWiki} as the canvas on which to write. While well suited for authoring simple hypermedia documents, MediaWiki} does not lend itself easily to let the author create dynamically assembled documents, or create pages that monitor other pages. While it is possible to create such special" pages it requires PHP} coding and thus administrative rights to the MediaWiki} server. We present in this paper work on a structural query language (MediaWiki} Query Language - MWQL) to allow users to add dynamically evaluated searches to ordinary wiki-pages."
Itakura, Kelly Y. & Clarke, Charles L. A. Using dynamic markov compression to detect vandalism in the wikipedia Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,261]
We apply the Dynamic Markov Compression model to detect spam edits in the Wikipedia. The method appears to outperform previous efforts based on compression models, providing performance comparable to methods based on manually constructed rules.
Coursey, Kino; Mihalcea, Rada & Moen, William Using encyclopedic knowledge for automatic topic identification Proceedings of the Thirteenth Conference on Computational Natural Language Learning 2009 [1,262]
This paper presents a method for automatic topic identification using an encyclopedic graph derived from Wikipedia. The system is found to exceed the performance of previously proposed machine learning algorithms for topic identification, with an annotation consistency comparable to human annotations.
Irvine, Ann & Klementiev, Alexandre Using Mechanical Turk to annotate lexicons for less commonly used languages Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk 2010 [1,263]
In this work we present results from using Amazon's Mechanical Turk (MTurk) to annotate translation lexicons between English and a large set of less commonly used languages. We generate candidate translations for 100 English words in each of 42 foreign languages using Wikipedia and a lexicon induction framework. We evaluate the MTurk} annotations by using positive and negative control candidate translations. Additionally, we evaluate the annotations by adding pairs to our seed dictionaries, providing a feedback loop into the induction system. MTurk} workers are more successful in annotating some languages than others and are not evenly distributed around the world or among the world's languages. However, in general, we find that MTurk} is a valuable resource for gathering cheap and simple annotations for most of the languages that we explored, and these annotations provide useful feedback in building a larger, more accurate lexicon.
Shieh, Jyh-Ren; Yeh, Yang-Ting; Lin, Chih-Hung; Lin, Ching-Yung & Wu, Ja-Ling Using Semantic Graphs for Image Search 2008 IEEE International Conference on Multimedia and Expo, ICME 2008, June 23, 2008 - June 26, 2008 Hannover, Germany 2008 [1,264]
In this paper, we propose a Semantic Graphs for Image Search (SGIS) system, which provides a novel way for image search by utilizing collaborative knowledge in Wikipedia and network analysis to form semantic graphs for search-term suggestion. The collaborative article editing process of Wikipedia's contributors is formalized as bipartite graphs that are folded into networks between terms. When user types in a search term, SGIS} automatically retrieves an interactive semantic graph of related terms that allow users easily find related images not limited to a specific search term. Interactive semantic graph then serves as an interface to retrieve images through existing commercial search engines. This method significantly saves users' time by avoiding multiple search keywords that are usually required in generic search engines. It benefits both naive user who does not possess a large vocabulary (e.g., students) and professionals who look for images on a regular basis. In our experiments, 85\% of the participants favored SGIS} system than commercial search engines.
Blohm, Sebastian & Cimiano, Philipp Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases 2007 [1,265]
Textual patterns have been used effectively to extract information from large text collections. However they rely heavily on textual redundancy in the sense that facts have to be mentioned in a similar manner in order to be generalized to a textual pattern. Data sparseness thus becomes a problem when trying to extract information from hardly redundant sources like corporate intranets, encyclopedic works or scientific Databases.We} present results on applying a weakly supervised pattern induction algorithm to Wikipedia to extract instances of arbitrary relations. In particular, we apply different configurations of a basic algorithm for pattern induction on seven different datasets. We show that the lack of redundancy leads to the need of a large amount of training data but that integrating Web extraction into the process leads to a significant reduction of required training data while maintaining the accuracy of Wikipedia. In particular we show that, though the use of the Web can have similar effects as produced by increasing the number of seeds, it leads overall to better results. Our approach thus allows to combine advantages of two sources: The high reliability of a closed corpus and the high redundancy of the Web.
Kaptein, Rianne; Koolen, Marijn & Kamps, Jaap Using wikipedia categories for ad hoc search Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,266]
In this paper we explore the use of category information for ad hoc retrieval in Wikipedia. We show that techniques for entity ranking exploiting this category information can also be applied to ad hoc topics and lead to significant improvements. Automatically assigned target categories are good surrogates for manually assigned categories, which perform only slightly better.
Wang, Pu; Domeniconi, Carlotta & Hu, Jian Using wikipedia for co-clustering based cross-domain text classification 8th IEEE International Conference on Data Mining, ICDM 2008, December 15, 2008 - December 19, 2008 Pisa, Italy 2008 [1,267]
Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related domain. One would like to use the related labeled data as auxiliary information to accomplish the classification task in the target domain. Recently, the paradigm of transfer learning has been introduced to enable effective learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, we extend the idea underlying this approach by making the latent semantic relationship between the two domains explicit. This goal is achieved with the use of Wikipedia. As a result, the pathway that allows to propagate labels between the two domains not only captures common words, but also semantic concepts based on the content of documents. We empirically demonstrate the efficacy of our semantic-based approach to cross-domain classification using a variety of real data.
Gabay, David; Ziv, Ben-Eliahu & Elhadad, Michael Using wikipedia links to construct word segmentation corpora 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, using Wikipedia hyperlinks. The resulting corpus contains information about the correct segmentation of 523,599 non-consecutive words in 363,090 sentences. We used our method to construct a corpus of Modern Hebrew (which we have made available at http://www.cs.bgu.ac.il/-nlpproj). The method can also be applied to other languages where word segmentation is difficult to determine, such as East and South-East} Asian languages. Copyright 2008.
Finin, Tim; Syed, Zareen; Mayfield, James; Mcnamee, Paul & Piatko, Christine Using wikitology for cross-document entity coreference resolution Learning by Reading and Learning to Read - Papers from the AAAI Spring Symposium, March 23, 2009 - March 25, 2009 Stanford, CA, United states 2009
We describe the use of the Wikitology knowledge base as a resource for a variety of applications with special focus on a cross-document entity coreference resolution task. This task involves recognizing when entities and relations mentioned in different documents refer to the same object or relation in the world. Wikitology is a knowledge base system constructed with material from Wikipedia, DBpedia} and Freebase that includes both unstructured text and semi-structured information. Wikitology was used to define features that were part of a system implemented by the Johns Hopkins University Human Language Technology Center of Excellence for the 2008 Automatic Content Extraction cross-document coreference resolution evaluation organized by National Institute of Standards and Technology.
Zesch, Torsten; Müller, Christof & Gurevych, Iryna Using wiktionary for computing semantic relatedness Proceedings of the 23rd national conference on Artificial intelligence - Volume 2 2008 [1,268]
We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI} applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of different concept representations like Wiktionary pseudo glosses, the first paragraph of Wikipedia articles, English WordNet} glosses, and GermaNet} pseudo glosses. We show that: (i) Wiktionary is the best lexical semantic resource in the ranking task and performs comparably to other resources in the word choice task, and (ii) the concept vector based approach yields the best results on all datasets in both evaluations.
Missen, Malik Muhammad Saad & Boughanem, Mohand Using wordnet's semantic relations for opinion detection in blogs 31th European Conference on Information Retrieval, ECIR 2009, April 6, 2009 - April 9, 2009 Toulouse, France 2009 [1,269]
The Opinion Detection from blogs has always been a challenge for researchers. One of the challenges faced is to find such documents that specifically contain opinion on users' information need. This requires text processing on sentence level rather than on document level. In this paper, we have proposed an opinion detection approach. The proposed approach focuses on above problem by processing documents on sentence level using different semantic similarity relations of WordNet} between sentence words and list of weighted query words expanded through encyclopedia Wikipedia. According to initial results, our approach performs well with MAP} of 0.28 and P@10 of 0.64 with improvement of 27\% over baseline results. TREC} Blog 2006 data is used as test data collection. Springer-Verlag} Berlin Heidelberg 2009.
Xu, Yang; Ding, Fan & Wang, Bin Utilizing phrase based semantic information for term dependency Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval 2008 [1,270]
Previous work on term dependency has not taken into account semantic information underlying query phrases. In this work, we study the impact of utilizing phrase based concepts for term dependency. We use Wikipedia to separate important and less important term dependencies, and treat them accordingly as features in a linear feature-based retrieval model. We compare our method with a Markov Random Field (MRF) model on four TREC} document collections. Our experimental results show that utilizing phrase based concepts improves the retrieval effectiveness of term dependency, and reduces the size of the feature set to large extent.
Roth, Camille Viable wikis: struggle for life in the wikisphere Proceedings of the 2007 international symposium on Wikis 2007 [1,271]
Wikis are collaborative platforms enabling collective elaboration of knowledge, the most famous and possibly the most successful thereof being the Wikipedia. There are currently plenty of other active open-access wikis, with varying success: some recruit many users and achieve sustainability, while others strive to attract sufficient active contributors, irrespective of the topic of the wiki. We make an exploratory investigation of some factors likely to account for these various destinies (such as distinct policies, norms, user incentives, technical and structural features), examining the demographics of a portion of the wikisphere. We underline the intertwining of population and content dynamics and emphasize the existence of different periods of development of a wiki-based community, from bootstrapping by founders with a pre-established set of rules, to more stable regimes where constant enrollment and training of new users balances out the occasional departure of more advanced users.
Chan, Bryan; Wu, Leslie; Talbot, Justin; Cammarano, Mike & Hanrahan, Pat Vispedia*: Interactive visual exploration of wikipedia data via search-based integration 445 Hoes Lane - P.O.Box} 1331, Piscataway, NJ} 08855-1331, United States 2008 [1,272]
Wikipedia is an example of the collaborative, semi-structured data sets emerging on the Web. These data sets have large, non-uniform schema that require costly data integration into structured tables before visualization can begin. We present Vispedia, a Web-based visualization system that reduces the cost of this data integration. Users can browse Wikipedia, select an interesting data table, then use a search interface to discover, integrate, and visualize additional columns of data drawn from multiple Wikipedia articles. This interaction is supported by a fast path search algorithm over DBpedia, a semantic graph extracted from Wikipedia's hyperlink structure. Vispedia can also export the augmented data tables produced for use in traditional visualization systems. We believe that these techniques begin to address the long tail" of visualization by allowing a wider audience to visualize a broader class of data. We evaluated this system in a first-use formative lab study. Study participants were able to quickly create effective visualizations for a diverse set of domains performing data integration as needed.
Chan, Bryan; Talbot, Justin; Wu, Leslie; Sakunkoo, Nathan; Cammarano, Mike & Hanrahan, Pat Vispedia: on-demand data integration for interactive visualization and exploration Proceedings of the 35th SIGMOD international conference on Management of data 2009 [1,273]
Wikipedia is an example of the large, collaborative, semi-structured data sets emerging on the Web. Typically, before these data sets can be used, they must transformed into structured tables via data integration. We present Vispedia, a Web-based visualization system which incorporates data integration into an iterative, interactive data exploration and analysis process. This reduces the upfront cost of using heterogeneous data sets like Wikipedia. Vispedia is driven by a keyword-query-based integration interface implemented using a fast graph search. The search occurs interactively over DBpedia's} semantic graph of Wikipedia, without depending on the existence of a structured ontology. This combination of data integration and visualization enables a broad class of non-expert users to more effectively use the semi-structured data available on the Web.
Cruz, Pedro & Machado, Penousal Visualizing empires decline SIGGRAPH '10 ACM SIGGRAPH 2010 Posters 2010 [1,274]
This is an information visualization project that narrates the decline of the British, French, Portuguese and Spanish empires during the 19th and 20th centuries. These empires were the main maritime empires in terms of land area during the referred centuries {[Wikipedia].} The land area of the empires and its former colonies is continuously represented in the simulation. The size of the empires varies during the simulation as they gain, or lose, territories. The graphic representation forms were selected to attain a narrative that depicts the volatility, instability and dynamics of the expansion and decline of the empires. Furthermore, the graphic representation also aims at emphasizing the contrast between their maximum and current size, and portraying the contemporary heritage and legacy of the empires.
Athenikos, Sofia J. & Lin, Xia Visualizing intellectual connections among philosophers using the hyperlink \& semantic data from Wikipedia Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,275]
Wikipedia, with its unique structural features and rich usergenerated content, is being increasingly recognized as a valuable knowledge source that can be exploited for various applications. The objective of the ongoing project reported in this paper is to create a Web-based knowledge portal for digital humanities based on the data extracted from Wikipedia (and other data sources). In this paper we present the interesting results we have obtained by extracting and visualizing various connections among 300 major philosophers using the structured data available in Wikipedia.
Sundara, Seema; Atre, Medha; Kolovski, Vladimir; Das, Souripriya; Wu, Zhe; Chong, Eugene Inseok & Srinivasan, Jagannathan Visualizing large-scale RDF data using subsets, summaries, and sampling in oracle 26th IEEE International Conference on Data Engineering, ICDE 2010, March 1, 2010 - March 6, 2010 Long Beach, CA, United states 2010 [1,276]
The paper addresses the problem of visualizing large scale RDF} data via a {3-S} approach, namely, by using, 1) Subsets: to present only relevant data for visualisation; both static and dynamic subsets can be specified, 2) Summaries: to capture the essence of RDF} data being viewed; summarized data can be expanded on demand thereby allowing users to create hybrid (summary-detail) fisheye views of RDF} data, and 3) Sampling: to further optimize visualization of large-scale data where a representative sample suffices. The visualization scheme works with both asserted and inferred triples (generated using RDF(S) and OWL} semantics). This scheme is implemented in Oracle by developing a plug-in for the Cytoscape graph visualization tool, which uses functions defined in a Oracle PL/SQL} package, to provide fast and optimized access to Oracle Semantic Store containing RDF} data. Interactive visualization of a synthesized RDF} data set (LUBM} 1 million triples), two native RDF} datasets (Wikipedia} 47 million triples and UniProt} 700 million triples), and an OWL} ontology (eClassOwl} with a large class hierarchy including over 25,000 OWL} classes, 5,000 properties, and 400,000 class-properties) demonstrates the effectiveness of our visualization scheme.
Viégas, Fernanda & Wattenberg, Martin Visualizing the inner lives of texts Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,277]
Visualization is often viewed as a way to unlock the secrets of numeric data. But what about political speeches, novels, and blogs? These texts hold at least as many surprises. On the Many Eyes site, a place for collective visualization, we have seen an increasing appetite for analyzing documents. We present a series of techniques for visualizing and analyzing unstructured text. We also discuss how a technique developed for visualizing the authoring patterns of Wikipedia articles has recently revealed the collective lives of a much broader class of documents.
Harrer, Andreas; Moskaliuk, Johannes; Kimmerle, Joachim & Cress, Ulrike Visualizing wiki-supported knowledge building: co-evolution of individual and collective knowledge Proceedings of the 4th International Symposium on Wikis 2008 [1,278]
It is widely accepted that wikis are valuable tools for successful collaborative knowledge building. In this paper, we describe how processes of knowledge building with wikis may be visualized, citing Wikipedia as an example. The underlying theoretical basis of our paper is the framework for collaborative knowledge building with wikis, as introduced by Cress and Kimmerle [2], [3], [4]. This model describes collaborative knowledge building as a co-evolution of individual and collective knowledge, or of cognitive and social systems respectively. These co-evolutionary processes may be visualized graphically, applying methods from social network analysis, especially those methods that take dynamic changes into account [5], [18]. For this purpose, we have undertaken to analyze, on the one hand, the temporal development of an article in the German version of Wikipedia and related articles that are linked to this core article. On the other hand, we analyzed the temporal development of those users who worked on these articles. The resulting graphics show an analogous process, both with regard to the articles that refer to the core article and to the users involved. These results provide empirical support for the co-evolution model. Some implications of our findings and the potential for future research on collaborative knowledge building with wikis and on the application of social network analysis are discussed at the end of the article.
Sherwani, J.; Yu, Dong; Paek, Tim; Czerwinski, Mary; Ju, Y.C. & Acero, Alex Voicepedia: Towards speech-based access to unstructured information 8th Annual Conference of the International Speech Communication Association, Interspeech 2007, August 27, 2007 - August 31, 2007 Antwerp, Belgium 2007
Currently there are no dialog systems that enable purely voice-based access to the unstructured information on websites such as Wikipedia. Such systems could be revolutionary for non-literate users in the developing world. To investigate interface issues in such a system, we developed VoicePedia, a telephone-based dialog system for searching and browsing Wikipedia. In this paper, we present the system, as well as a user study comparing the use of VoicePedia} to SmartPedia, a Smartphone GUI-based} alternative. Keyword entry through the voice interface was significantly faster, while search result navigation, and page browsing were significantly slower. Although users preferred the GUI-based} interface, task success rates between both systems were comparable - a promising result for regions where Smartphones and data plans are not viable.
Gordon, Jonathan; Durme, Benjamin Van & Schubert, Lenhart Weblogs as a source for extracting general world knowledge Proceedings of the fifth international conference on Knowledge capture 2009 [1,279]
Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE} tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia.
Pantel, Patrick; Crestan, Eric; Borkovsky, Arkady; Popescu, Ana-Maria & Vyas, Vishnu Web-scale distributional similarity and entity set expansion Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2 2009 [1,280]
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce} framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia.
Jesus, Rut What cognition does for Wikis Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,281]
Theoretical frameworks need to be developed to account for the phenomenon of Wikipedia and writing in Wikis. In this paper, a cognitive framework divides processes into the categories of Cognition for Planning and Cognition for Improvising. This distinction is applied to Wikipedia to understand the many small and the few big edits by which Wikipedia's articles grow. The paper relates the distinction to Lessig' Read-Only} and Read-Write, to Benkler's modularity and granularity of contributions and to Turkle and Papert's bricoleurs and planners. It argues that Wikipedia thrives because it harnesses a Cognition for Improvising surplus oriented by kindness and trust towards distant others and proposes that Cognition for Improvising is a determinant mode for the success of Wikis and Wikipedia. The theoretical framework can be a starting point for a cognitive discussion of wikis, peer-produced commons and new patterns of collaboration.
Amer-Yahia, Sihem & Halevy, Alon What does web 2.0 have to do with databases? Proceedings of the 33rd international conference on Very large data bases 2007 [1,282]
Web 2.0 is a buzzword we have been hearing for over 2 years. According to Wikipedia, it hints at an improved form of the World Wide Web where technologies such as weblogs, social bookmarking, RSS} feeds, photo and video sharing, based on an architecture of participation and democracy that encourages users to add value to the application as they use it. Web 2.0 enables social networking on the Web by allowing users to contribute content, share it, rate it, create a network of friends, and decide what they like to see and how they want it to look like.
Oxley, Meghan; Morgan, Jonathan T.; Zachry, Mark & Hutchinson, Brian What i know is...": establishing credibility on Wikipedia talk pages" Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,283]
This poster presents a new theoretical framework and research method for studying the relationship between specific types of authority claims and the attempts of contributors to establish credibility in online, collaborative environments. We describe a content analysis method for coding authority claims based on linguistic and rhetorical cues in naturally occurring, text-based discourse. We present results from a preliminary analysis of a sample of Wikipedia talk page discussions focused on recent news events. This method provides a novel framework for capturing and understanding these persuasion-oriented behaviors, and shows potential as a tool for online communication research, including automated text analysis using trained natural language processing systems.
Kittur, A.; Chi, E. H & Suh, B. What’s in Wikipedia? Mapping Topics and Conflict Using Socially Annotated Category Structure 2009 [1,284]
Wikipedia is an online encyclopedia which has undergone tremendous growth. However, this same growth has made it difficult to characterize its content and coverage. In this paper the authors develop measures to map Wikipedia using its socially annotated, hierarchical category structure. They introduce a mapping technique that takes advantage of socially-annotated hierarchical categories while dealing with the inconsistencies and noise inherent in the distributed way that they are generated. The technique is demonstrated through two applications: mapping the distribution of topics in Wikipedia and how they have changed over time; and mapping the degree of conflict found in each topic area. The authors also discuss the utility of the approach for other applications and datasets involving collaboratively annotated category hierarchies.
Thom-Santelli, J.; Cosley, D. & Gay, G. What’s Mine is Mine: Territoriality in Collaborative Authoring 2009 [1,285]
Territoriality, the expression of ownership towards an object, can emerge when social actors occupy a shared social space. In the case of Wikipedia, the prevailing cultural norm is one that warns against ownership of one’s work. However, the authors observe the emergence of territoriality in online space with respect to a subset of articles that have been tagged with the Maintained template through a qualitative study of 15 editors who have self-designated as Maintainers. The participants communicated ownership, demarcated boundaries and asserted their control over artifacts for the sake of quality by appropriating existing features of Wikipedia. The authors then suggest design strategies to support these behaviours in the proper context within collaborative authoring systems more generally.
Kittur, Aniket; Chi, Ed H. & Suh, Bongwon What's in Wikipedia?: mapping topics and conflict using socially annotated category structure Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,286]
Wikipedia is an online encyclopedia which has undergone tremendous growth. However, this same growth has made it difficult to characterize its content and coverage. In this paper we develop measures to map Wikipedia using its socially annotated, hierarchical category structure. We introduce a mapping technique that takes advantage of socially-annotated hierarchical categories while dealing with the inconsistencies and noise inherent in the distributed way that they are generated. The technique is demonstrated through two applications: mapping the distribution of topics in Wikipedia and how they have changed over time; and mapping the degree of conflict found in each topic area. We also discuss the utility of the approach for other applications and datasets involving collaboratively annotated category hierarchies.
Thom-Santelli, Jennifer; Cosley, Dan R. & Gay, Geri What's mine is mine: territoriality in collaborative authoring Proceedings of the 27th international conference on Human factors in computing systems 2009 [1,287]
Territoriality, the expression of ownership towards an object, can emerge when social actors occupy a shared social space. In the case of Wikipedia, the prevailing cultural norm is one that warns against ownership of one's work. However, we observe the emergence of territoriality in online space with respect to a subset of articles that have been tagged with the Maintained template through a qualitative study of 15 editors who have self-designated as Maintainers. Our participants communicated ownership, demarcated boundaries and asserted their control over artifacts for the sake of quality by appropriating existing features of Wikipedia. We then suggest design strategies to support these behaviors in the proper context within collaborative authoring systems more generally.
Halatchliyski, Iassen; Moskaliuk, Johannes; Kimmerle, Joachim & Cress, Ulrike Who integrates the networks of knowledge in Wikipedia? Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,288]
In the study presented in this article we investigated two related knowledge domains, physiology and pharmacology, from the German version of Wikipedia. Applying the theory of knowledge building to this community, we studied the authors of integrative knowledge. Network analysis indices of betweenness and closeness centrality were calculated for the network of relevant articles. We compared the work of authors who wrote exclusively in one domain with that of authors who contributed to both domains. The position of double-domain authors for a knowledge building wiki community is outstanding. They are not only responsible for the integration of knowledge from a different background, but also for the composition of the single-knowledge domains. Predominantly they write articles which are integrative and central in the context of such domains.
Halim, Felix; Yongzheng, Wu & Yap, Roland Wiki credibility enhancement Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,289]
Wikipedia has been very successful as an open encyclopedia which is editable by anybody. However, the anonymous nature of Wikipedia means that readers may have less trust since there is no way of verifying the credibility of the authors or contributors. We propose to automatically transfer external information about the authors from outside Wikipedia to Wikipedia pages. This additional information is meant to enhance the credibility of the content. For example, it could be the education level, professional expertise or affiliation of the author. We do this while maintaining anonymity. In this paper, we present the design and architecture of such system together with a prototype.
Zhang, Yuejiao Wiki means more: hyperreading in Wikipedia Proceedings of the seventeenth conference on Hypertext and hypermedia 2006 [1,290]
Based on the open-sourcing technology of wiki, Wikipedia has initiated a new fashion of hyperreading. Reading Wikipedia creates an experience distinct from reading a traditional encyclopedia. In an attempt to disclose one of the site's major appeals to the Web users, this paper approaches the characteristics of hyperreading activities in Wikipedia from three perspectives. Discussions are made regarding reading path, user participation, and navigational apparatus in Wikipedia.
Kumaran, A.; Saravanan, K.; Datha, Naren; Ashok, B. & Dendi, Vikram WikiBABEL: a wiki-style platform for creation of parallel data Proceedings of the ACL-IJCNLP 2009 Software Demonstrations 2009 [1,291]
In this demo, we present a wiki-style platform -- WikiBABEL} -- that enables easy collaborative creation of multilingual content in many Non-English} Wikipedias, by leveraging the relatively larger and more stable content in the English Wikipedia. The platform provides an intuitive user interface that maintains the user focus on the multilingual Wikipedia content creation, by engaging search tools for easy discoverability of related English source material, and a set of linguistic and collaborative tools to make the content translation simple. We present two different usage scenarios and discuss our experience in testing them with real users. Such integrated content creation platform in Wikipedia may yield as a by-product, parallel corpora that are critical for research in statistical machine translation systems in many languages of the world.
Gaio, Loris; den Besten, Matthijs; Rossi, Alessandro & Dalle, Jean-Michel Wikibugs: using template messages in open content collections Proceedings of the 5th International Symposium on Wikis and Open Collaboration 2009 [1,292]
In the paper we investigate an organizational practice meant to increase the quality of commons-based peer production: the use of template messages in wiki-collections to highlight editorial bugs and call for intervention. In the context of SimpleWiki, an online encyclopedia of the Wikipedia family, we focus on Complex}, a template which is used to flag articles disregarding the overall goals of simplicity and readability. We characterize how this template is placed on and removed from articles and we use survival analysis to study the emergence and successful treatment of these bugs in the collection.
Nunes, Sérgio; Ribeiro, Cristina & David, Gabriel WikiChanges: exposing Wikipedia revision activity Proceedings of the 4th International Symposium on Wikis 2008 [1,293]
Wikis are popular tools commonly used to support distributed collaborative work. Wikis can be seen as virtual scrap-books that anyone can edit without having any specific technical know-how. The Wikipedia is a flagship example of a real-word application of wikis. Due to the large scale of Wikipedia it's difficult to easily grasp much of the information that is stored in this wiki. We address one particular aspect of this issue by looking at the revision history of each article. Plotting the revision activity in a timeline we expose the complete article's history in a easily understandable format. We present WikiChanges, a web-based application designed to plot an article's revision timeline in real time. WikiChanges} also includes a web browser extension that incorporates activity sparklines in the real Wikipedia. Finally, we introduce a revisions summarization task that addresses the need to understand what occurred during a given set of revisions. We present a first approach to this task using tag clouds to present the revisions made.
Mihalcea, Rada & Csomai, Andras Wikify!: linking documents to encyclopedic knowledge Proceedings of the sixteenth ACM conference on Conference on information and knowledge management 2007 [1,294]
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.
Frisa, Raquel; Anglés, Rosana & Puyal, Óscar WikiMob: wiki mobile interaction Proceedings of the 4th International Symposium on Wikis 2008 [1,295]
This paper describes a feasible way to integrate Wiki-based projects with the access from mobile devices, in order to contribute to the convergence between Internet Services in the new mobile telecommunications space. Vodafone R\&D, as one of the leading telecommunications operators, launches this initiative due to the social phenomenon Wikipedia, supported by the Wikimedia Foundation, has meant for the Internet community. This is the most successful project based on a wiki environment and thanks to it, collaborative tools like wikis have become in the perfect artifact to spread the knowledge across the Internet. Additionally, we contribute to the creation of open tools to access contents published under a free content license.
Wikipedia and Artificial Intelligence: An Evolving Synergy - Papers from the 2008 AAAI Workshop 2008 AAAI Workshop, July 13, 2008 - July 13, 2008 Chicago, IL, United states 2008
The proceedings contain 13 papers. The topics discussed include: the fast and the numerous - combining machine and community intelligence for semantic annotation; learning to predict the quality of contributions to Wikipedia; integrating Cyc and Wikipedia: folksonomy meets rigorously defined common-sense; topic indexing with Wikipedia; an effective, low-cost measure of semantic relatedness obtained; mining Wikipedia's article revision history for training computational linguistics algorithms; Okinet: automatic extraction of a medical ontology from Wikipedia; automatic vandalism detection in Wikipedia: towards a machine learning approach; enriching the crosslingual link structure of Wikipedia - a classification-based approach; augmenting Wikipedia-extraction with results from the Web; using Wikipedia links to construct word segmentation corpora; and method for building sentence-aligned corpus from Wikipedia.
Dooley, Patricia L. Wikipedia and the two-faced professoriate Proceedings of the 6th International Symposium on Wikis and Open Collaboration 2010 [1,296]
A primary responsibility of university teachers is to guide their students in the process of using only the most accurate research resources in their completion of assignments. Thus, it is not surprising to hear that faculty routinely coach their students to use Wikipedia carefully. Even more pronounced Anti-Wikipedia} backlashes have developed on some campuses, leading faculty to forbid their students to use the popular on-line compendium of information. Within this context, but directing the spotlight away from students, this pilot study uses survey and content analysis research methods to explore how faculty at U.S.} universities and colleges regard Wikipedia's credibility as an information source, as well as how they use Wikipedia in their academic work. The results of the survey reveal that while none of the university faculty who completed it regard Wikipedia as an extremely credible source of information, more than half stated it has moderate to high credibility, and many use it in both their teaching and research. The results of the content analysis component of the study demonstrates that academic researchers from across the disciplines are citing Wikipedia as a source of scholarly information in their peer-reviewed research reports. Although the study's research findings are not generalizable, they are surprising considering the professoriate's oft-stated lack of trust in Wikipedia.
Tonelli, Sara & Giuliano, Claudio Wikipedia as frame information repository Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1 2009 [1,297]
In this paper, we address the issue of automatic extending lexical resources by exploiting existing knowledge repositories. In particular, we deal with the new task of linking FrameNet} and Wikipedia using a word sense disambiguation system that, for a given pair frame -- lexical unit (F, l), finds the Wikipage that best expresses the the meaning of l. The mapping can be exploited to straightforwardly acquire new example sentences and new lexical units, both for English and for all languages available in Wikipedia. In this way, it is possible to easily acquire good-quality data as a starting point for the creation of FrameNet} in new languages. The evaluation reported both for the monolingual and the multilingual expansion of FrameNet} shows that the approach is promising.
Hansen, Sean; Berente, Nicholas & Lyytinen, Kalle Wikipedia as rational discourse: An illustration of the emancipatory potential of information systems 40th Annual Hawaii International Conference on System Sciences 2007, HICSS'07, January 3, 2007 - January 6, 2007 Big Island, {HI, United states 2007 [1,298]
Critical social theorists often emphasize the control and surveillance aspects of information systems, building upon a characterization of information technology as a tool for increased rationalization. The emancipatory potential of information systems is often overlooked. In this paper, we apply the Habermasian ideal of rational discourse to Wikipedia as an illustration of the emancipatory potential of information systems. We conclude that Wikipedia does embody an approximation of rational discourse, while several challenges remain.
Santamaría, Celina; Gonzalo, Julio & Artiles, Javier Wikipedia as sense inventory to improve diversity in Web search results Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics 2010 [1,299]
Is it possible to use sense inventories to improve Web search results diversity for one word queries? To answer this question, we focus on two broad-coverage lexical resources of a different nature: WordNet, as a de-facto standard used in Word Sense Disambiguation experiments; and Wikipedia, as a large coverage, updated encyclopaedic resource which may have a better coverage of relevant senses in Web pages. Our results indicate that (i) Wikipedia has a much better coverage of search results, (ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and (iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70\% more Wikipedia senses than the original search engine rankings.
Wales, Jimmy Wikipedia in the free culture revolution OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications 2005 [1,300]
Jimmy Jimbo"} Wales is the founder of Wikipedia.org the free encyclopedia project and Wikicities.com which extends the social concepts of Wikipedia into new areas. Jimmy was formerly a futures and options trader in Chicago and currently travels the world evangelizing the success of Wikipedia and the importance of free culture. When not traveling
Potthast, Martin Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval 2007 [1,301]
We develop and implement a new indexing technology which allows us to use complete (and possibly very large) documents as queries, while having a retrieval performance comparable to a standard term query. Our approach aims at retrieval tasks such as near duplicate detection and high similarity search. To demonstrate the performance of our technology we have compiled the search index Wikipedia} in the Pocket" which contains about 2 million English and German Wikipedia articles.1 This index--along with a search interface--fits on a conventional CD} (0.7 gigabyte). The ingredients of our indexing technology are similarity hashing and minimal perfect hashing."
Nakayama, Kotaro; Hara, Takahiro & Nishio, Shojiro Wikipedia mining for an association web thesaurus construction 8th International Conference on Web Information Systems Engineering, WISE 2007, December 3, 2007 - December 7, 2007 Nancy, France 2007
Wikipedia has become a huge phenomenon on the WWW.} As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL} identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path} Frequency - Inversed Backward link Frequency) and the extension method forward / backward link weighting (FB} weighting)" in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.} Springer-Verlag} Berlin Heidelberg 2007."
Sunercan, Orner & Birturk, Aysenur Wikipedia missing link discovery: A comparative study 2010 AAAI Spring Symposium, March 22, 2010 - March 24, 2010 Stanford, CA, United states 2010
In this paper, we describe our work on discovering missing links in Wikipedia articles. This task is important for both readers and authors of Wikipedia. The readers will benefit from the increased article quality with better navigation support. On the other hand, the system can be employed to support the authors during editing. This study combines the strengths of different approaches previously applied for the task, and adds its own techniques to reach satisfactory results. Because of the subjectivity in the nature of the task; automatic evaluation is hard to apply. Comparing approaches seems to be the best method to evaluate new techniques, and we offer a semi-automatized method for evaluation of the results. The recall is calculated automatically using existing links in Wikipedia. The precision is calculated according to manual evaluations of human assessors. Comparative results for different techniques arc presented, showing the success of our improvements. We employ Turkish Wikipedia, we arc the first to study on it, to examine whether a small instance is scalable enough for such purposes. 2010, Association for the Advancement of Artificial Intelligence.
Dutta, Amitava; Roy, Rahul & Seetharaman, Priya Wikipedia Usage Patterns: The Dynamics of Growth 2008 [1,302]
Wikis have attracted attention as a powerful technological platform on which to harness the potential benefits of collective knowledge. Current literature identifies different behavioral factors that modulate the interaction between contributors and wikis. Some inhibit growth while others enhance it. However, while these individual factors have been identified in the literature, their collective effects have not yet been identified. In this paper, we use the system dynamics methodology, and a survey of Wikipedia users, to propose a holistic model of the interaction among different factors and their collective impact on Wikipedia growth. The model is simulated to examine its ability to replicate observed growth patterns of Wikipedia metrics. Results indicate that the model is a reasonable starting point for understanding observed Wiki growth patterns. To the best of our knowledge, this is the first attempt in the literature to synthesize a holistic model of the forces underlying Wiki growth.
Tu, Xinhui; He, Tingting; Chen, Long; Luo, Jing & Zhang, Maoyuan Wikipedia-based semantic smoothing for the language modeling approach to information retrieval 32nd European Conference on Information Retrieval, ECIR 2010, March 28, 2010 - March 31, 2010 Milton Keynes, United kingdom 2010 [1,303]
Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM} algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC} Ad Hoc Track (Disks} 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing. 2010 Springer-Verlag} Berlin Heidelberg.
Gray, D.; Kozintsev, I.; Wu, Yi & Haussecker, H. Wikireality: augmenting reality with community driven Websites 2009 IEEE International Conference on Multimedia and Expo (ICME), 28 June-3 July 2009 Piscataway, NJ, USA} 2009 [1,304]
We present a system for making community driven websites easily accessible from the latest mobile devices. Many of these new devices contain an ensemble of sensors such as cameras, GPS} and inertial sensors. We demonstrate how these new sensors can be used to bring the information contained in sites like Wikipedia to users in a much more immersive manner than text or maps. We have collected a large database of images and articles from Wikipedia and show how a user can query this database by simply snapping a photo. Our system uses the location sensors to assist with image matching and the inertial sensors to provide a unique and intuitive user interface for browsing results.
Strube, Michael & Ponzetto, Simone Paolo WikiRelate! computing semantic relatedness using wikipedia AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2 2006 [1,305]
Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet.} In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet} on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet} when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet} and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP} application processing naturally occurring texts.
Toms, Elaine G.; Mackenzie, Tayze; Jordan, Chris & Hall, Sam wikiSearch: enabling interactivity in search Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval 2009 [1,306]
WikiSearch, is a search engine customized for the Wikipedia corpus but with design features that may be generalized to other search systems. Its features enhance basic functionality and enable more fluid interactivity while supporting both workflow in the search process and the experimental process used in lab testing.
Ferrari, Luna De; Aitken, Stuart; van Hemert, Jano & Goryanin, Igor WikiSim: simulating knowledge collection and curation in structured wikis Proceedings of the 4th International Symposium on Wikis 2008 [1,307]
The aim of this work is to model quantitatively one of the main properties of wikis: how high quality knowledge can emerge from the individual work of independent volunteers. The approach chosen is to simulate knowledge collection and curation in wikis. The basic model represents the wiki as a set of of true/false values, added and edited at each simulation round by software agents (users) following a fixed set of rules. The resulting WikiSim} simulations already manage to reach distributions of edits and user contributions very close to those reported for Wikipedia. WikiSim} can also span conditions not easily measurable in real-life wikis, such as the impact of various amounts of user mistakes. WikiSim} could be extended to model wiki software features, such as discussion pages and watch lists, while monitoring the impact they have on user actions and consensus, and their effect on knowledge quality. The method could also be used to compare wikis with other curation scenarios based on centralised editing by experts. The future challenges for WikiSim} will be to find appropriate ways to evaluate and validate the models and to keep them simple while still capturing relevant properties of wiki systems.
West, Robert; Pineau, Joelle & Precup, Doina Wikispeedia: an online game for inferring semantic distances between concepts Proceedings of the 21st international jont conference on Artifical intelligence 2009 [1,308]
Ponzetto, Simone Paolo & Strube, Michael WikiTaxonomy: A Large Scale Knowledge Resource Proceeding of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence 2008 [1,309]
Mazur, Pawet & Dale, Robert WikiWars: a new corpus for research on temporal expressions Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 2010 [1,310]
The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2} tags. The corpus contains around 120000 tokens, and 2600 TIMEX2} expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.
Sarini, Marcello; Durante, Federica & Gabbiadini, Alessandro Workflow management social systems: A new socio-psychological perspective on process management Business Process Management Workshops - BPM 2009 International Workshops, September 7, 2009 - September 7, 2009 Ulm, Germany 2010 [1,311]
The paper presents a study about one of the most successful cases of social software: Wikipedia. In particular we focused on the investigation of some socio-psychological aspects related to the use of the Italian Wikipedia. In our study, we considered Wikipedia active users classified into three different roles: registered users, administrators, and bureaucrats in order to discuss our findings with respect to these different groups of users. Workflow Management Systems are applications designed to support the definition and execution of business processes. Since we consider that social aspects are relevant in the accomplishment and coordination of activities managed by such technologies, we advocate for a new class of Workflow Management Systems, i.e., Workflow Management Social Systems. These systems should emphasize the social nature of workflow management. For this reason, we propose to consider some of the relevant psychological aspects we identified in our study, interpreted in the light of some relevant socio-psychological theories, for the design of this socially enriched workflow technology.
Ortega, Felipe; Reagle, Joseph; Reinoso, Antonio J. & Jesus, Rut Workshop on interdisciplinary research on Wikipedia and wiki communities Proceedings of the 4th International Symposium on Wikis 2008 [1,312]
A growing number of projects seek to build upon the collective intelligence of Internet users, looking for more dynamic, open and creative approaches to content creation and knowledge sharing. To this end, many projects have chosen the wiki, and it is therefore the subject of much research interest, particularly Wikipedia, from varied disciplines. The array of approaches to study wikis is a source of wealth, but also a possible source of confusion: What are appropriate methodologies for the analysis of wiki communities? Which are the most critical parameters (both quantitative and qualitative) for study in wiki evolution and outcomes? Is it possible to find effective interdisciplinary approaches to augment our overall understanding of these dynamic, creative environments? This workshop intends to provide an opportunity to explore these questions, by researchers and practitioners willing to participate in a brainstorming research meeting"."
Voss, Jakob Workshop on Wikipedia research Proceedings of the 2006 international symposium on Wikis 2006 [1,313]
In the first Workshop on Wikipedia Research an overview of current research in and around the free encyclopedia will be given, as well as some practical guidelines on methods how to get and analyze data, and to get in contact with the community. Together we want to talk about differences and commonalities of Wikipedia and other wikis, and hot topics in Wikipedia research.
Amer-Yahia, Sihem; Baeza-Yates, Ricardo; Consens, Mariano P. & Lalmas, Mounia XML Retrieval: DB/IR in theory, web in practice Proceedings of the 33rd international conference on Very large data bases 2007 [1,314]
The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML} model. Data in Digital Libraries and in Enterprise Environments also shares many of the semi-structured characteristics of web data. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. In particular, we consider the application of DB} and IR} research to querying Web data in the context of online communities. With Web 2.0, the question arises: how can search interfaces remain simple when users are allowed to contribute content (Wikipedia), share it (Flickr), and rate it (YouTube)?} When they can decide who their friends are (del.icio.us), what they like to see, and how they want it to look like (MySpace)?} While we want to keep the user interface simple (keyword search), we would like to study the applicability of querying structure and content to a context where new forms of data-driven dynamic web content (e.g. user feed-back, tags, contributed multimedia) are provided. This tutorial will provide an overview of the different issues and approaches put forward by the IR} and DB} communities and survey the DB-IR} integration efforts as they focus in the problem of retrieval from XML} content. In particular, the context of querying content in online communities is an excellent example of such an application. Both earlier proposals as well as recent ones will be discussed. A variety of application scenarios for XML} Retrieval will be covered, including examples of current tools and techniques.
Suchanek, Fabian M.; Kasneci, Gjergji & Weikum, Gerhard Yago: a core of semantic knowledge Proceedings of the 16th international conference on World Wide Web 2007 [1,315]
We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO} builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A} hierarchy as well as non-taxonomic relations between entities (such as {HASONEPRIZE).} The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet:} in quality by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95\%. YAGO} is based on a logically clean model, which is decidable, extensible, and compatible with RDFS.} Finally, we show how YAGO} can be further extended by state-of-the-art information extraction techniques.