User:Yurik/Query API/User Manual
Attention Query API users:
Overview
[edit]Query API provides a way for your applications to query data directly from the MediaWiki servers. One or more pieces of information about the site and/or a given list of pages can be retrieved. Information may be returned in either a machine (xml, json, php, wddx) or a human readable format. More than one piece of information may be requested with a single query.
- Note: Query API is being migrated into the new API interface. Please use the new API, which is now a part of the standard MediaWiki engine.
- New API live: http://en.wikipedia.org/w/api.php
- Query API live: http://en.wikipedia.org/w/query.php
- View the Source Code
Installation
[edit]These notes cover my experience - Fortyfoxes 00:50, 8 August 2006 (UTC) - of installing query.php on a shared virtual host [1], and may not apply to all set ups. I have the following configuration:
- MediaWiki: 1.7.1
- PHP: 5.1.2 (cgi-fcgi)
- MySQL: 5.0.18-standard-log
Installation is fairly straight forward once you got the principles. Query.php is not like other documented "extensions" to MediaWiki - it does its own thing, and does not need integrating into the overall environment so that it can be called within wiki pages - so no registering with LocalSettings.php (my first mistake).
Installation Don'ts
[edit]Explicitly - do *NOT* place a "# require_once( "extensions/query.php" ); line in LocalSettings.php!
Installation Do's
[edit]All Query API files must be placed two levels below the main MediaWiki directory. For example:
/home/myuserName/myDomainDir/w/extensions/botquery/query.php
where the directory "w/" is the standard MediaWiki directory named in such a way as not to clash - ie not MediaWiki or Wiki. This allows easier redirection with .htaccess for tidier urls.
Apache Rewrite Rules and URls
[edit]This is not required, but might be desirable for shorter URLs to debug
- In progress - have to see how pointing a subdomain (wiki.mydomain.org) at the installation affects query.php!
Short URLs with a symlink
[edit]Using the conventions above:
$ cd /home/myuserName/myDomainDir/w # change to directory containing LocalSettings.php $ ln -s extensions/botquery/query.php .
Short URLs in proper way
[edit]If you've got permission to edit "httpd.conf" file (Apache server configuration file), it's much better to create alias for "query.php". To do that, just add the following line to "httpd.conf" aliases section:
Alias /w/query.php "c:/wamp/www/w/extensions/botquery/query.php"
Of course, the path could be different on your system. Enjoy. --CodeMonk 16:00, 27 January 2007 (UTC)
Usage
[edit]Python
[edit]This sample uses the simplejson library found here.
import simplejson, urllib, urllib2 QUERY_URL = u"http://en.wikipedia.org/w/query.php" HEADERS = {"User-Agent" : "QueryApiTest/1.0"} def Query(**args): args.update({ "noprofile": "", # Do not return profiling information "format" : "json", # Output in JSON format }) req = urllib2.Request(QUERY_URL, urllib.urlencode(args), HEADERS) return simplejson.load(urllib2.urlopen(req)) # Request links for Main Page data = Query(titles="Main Page", what="links") # If exists, print the list of links from 'Main Page' if "pages" not in data: print "No pages" else: for pageID, pageData in data["pages"].iteritems(): if "links" not in pageData: print "No links" else: for link in pageData["links"]: # To safelly print unicode characters on the console, set 'cp850' for Windows and 'iso-8859-1' for Linux print link["*"].encode("cp850", "replace")
Ruby
[edit]This example prints all the links on the Ruby (programming language) page.
require 'net/http' require 'yaml' require 'uri' @http = Net::HTTP.new("en.wikipedia.org", 80) def query(args={}) options = { :format => "yaml", :noprofile => "" }.merge args url = "/w/query.php?" << options.collect{|k,v| "#{k}=#{URI.escape v}"}.join("&") response = @http.start do |http| request = Net::HTTP::Get.new(url) http.request(request) end YAML.load response.body end result = query(:what => 'links', :titles => 'Ruby (programming language)') if result["pages"].first["links"] result["pages"].first["links"].each{|link| puts link["*"]} else puts "no links" end
Browser-based
[edit]You want to use the JSON output by setting format=json. However, until you're figured out the parameters to supply query.php with and where the data will be, you can use format=jsonfm instead.
Once this is done, you eval the response text returned by query.php and extract your data from it.
JavaScript
[edit]// this function attempts to download the data at url. // if it succeeds, it runs the callback function, passing // it the data downloaded and the article argument function download(url, callback, article) { var http = window.XMLHttpRequest ? new XMLHttpRequest() : window.ActiveXObject ? new ActiveXObject("Microsoft.XMLHTTP") : false; if (http) { http.onreadystatechange = function() { if (http.readyState == 4) { callback(http.responseText, article); } }; http.open("GET", url, true); http.send(null); } } // convenience function for getting children whose keys are unknown // such as children of pages subobjects, whose keys are numeric page ids function anyChild(obj) { for(var key in obj) { return obj[key]; } return null; } // tell the user a page that is linked to from article function someLink(article) { // use format=jsonfm for human-readable output var url = "http://en.wikipedia.org/w/query.php?format=json&what=links&titles=" + escape(article); download(url, finishSomeLink, article); } // the callback, run after the queried data is downloaded function finishSomeLink(data, article) { try { // convert the downloaded data into a javascript object eval("var queryResult=" + data); // we could combine these steps into one line var page = anyChild(queryResult.pages); var links = page.links; } catch (someError) { alert("Oh dear, the JSON stuff went awry"); // do something drastic here } if (links && links.length) { alert(links[0]["*"] + " is linked from " + article); } else { alert("No links on " + article + " found"); } } someLink("User:Yurik");
How to run javascript examples
[edit]In Firefox, drag JSENV link (2nd) at this site to your bookmarks toolbar. While on a wiki site, click the button and copy/paste the code into the debug window. Click Execute at the top.
Perl
[edit]This example was inherited from MediaWiki perl module code by User:Edward Chernenko.
- Do NOT get MediaWiki data using LWP. Please use a module such as MediaWiki::API instead.
use LWP::UserAgent; sub readcat($) { my $cat = shift; my $ua = LWP::UserAgent->new(); my $res = $ua->get("http://en.wikipedia.org/w/query.php?format=xml&what=category&cptitle=$cat"); return unless $res->is_success(); $res = $res->content(); # good for MediaWiki module, but ugly as example! # it should _parse_ XML, not match known parts... while($res =~ /(?<=<page>).*?(?=<\/page>)/sg) { my $page = $&; $page =~ /(?<=<ns>).*?(?=<\/ns>)/; my $ns = $&; $page =~ /(?<=<title>).*?(?=<\/title>)/; my $title = $&; if($ns == 14) { my @a = split /:/, $title; shift @a; $title = join ":", @a; push @subs, $title; } else { push @pages, $title; } } return(\@pages, \@subs); } my($pages_p, $subcat_p) = readcat("Unix"); print "Pages: " . join(", ", sort @$pages_p) . "\n"; print "Subcategories: " . join(", ", sort @$subcat_p) . "\n";
C# (Microsoft .NET Framework 2.0)
[edit]The following function is a simpified code fragment of DotNetWikiBot Framework.
- Attention: This example needs to be revised to remove RegEx parsing of the XML data. There are plenty of XML, JSON, and other parsers available or built into the framework. --Yurik 05:44, 13 February 2007 (UTC)
using System; using System.Text.RegularExpressions; using System.Collections.Specialized; using System.Net; using System.Web; /// <summary>This internal function gets all page titles from the specified /// category page using "Query API" interface. It gets titles portion by portion. /// It gets subcategories too. The result is contained in "strCol" collection. </summary> /// <param name="categoryName">Name of category with prefix, like "Category:...".</param> public void FillAllFromCategoryEx(string categoryName) { string src = ""; StringCollection strCol = new StringCollection(); MatchCollection matches; Regex nextPortionRE = new Regex("<category next=\"(.+?)\" />"); Regex pageTitleTagRE = new Regex("<title>([^<]*?)</title>"); WebClient wc = new WebClient(); do { Uri res = new Uri(site.site + site.indexPath + "query.php?what=category&cptitle=" + categoryName + "&cpfrom=" + nextPortionRE.Match(src).Groups[1].Value + "&format=xml"); wc.Credentials = CredentialCache.DefaultCredentials; wc.Encoding = System.Text.Encoding.UTF8; wc.Headers.Add("Content-Type", "application/x-www-form-urlencoded"); wc.Headers.Add("User-agent", "DotNetWikiBot/1.0"); src = wc.DownloadString(res); matches = pageTitleTagRE.Matches(src); foreach (Match match in matches) strCol.Add(match.Groups[1].Value); } while (nextPortionRE.IsMatch(src)); }
PHP
[edit] // Please remember that this example requires PHP5.
ini_set('user_agent', 'Draicone\'s bot');
// This function returns a portion of the data at a url / path
function fetch($url,$start,$end){
$page = file_get_contents($url);
$s1=explode($start, $page);
$s2=explode($end, $page);
$page=str_replace($s1[0], '', $page);
$page=str_replace($s2[1], '', $page);
return $page;
}
// This grabs the RC feed (-bots) in xml format and selects everything between the pages tags (inclusive)
$xml = fetch("http://en.wikipedia.org/w/query.php?what=recentchanges&rchide=bots&format=xml","<pages>","</pages>");
// This establishes a SimpleXMLElement - this is NOT available in PHP4.
$xmlData = new SimpleXMLElement($xml);
// This outputs a link to the curr diff of each article
foreach($xmlData->page as $page) {
echo "<a href=\"http://en.wikipedia.org/w/index.php?title=". $page->title . "&diff=curr\">". $page->title . "</a><br />\n";
}
;; Write a list of html links to the latest changes ;; ;; NOTES ;; http:GET takes a URL and returns the document as a character string ;; SSAX:XML->SXML reads a character-stream of XML from a port and returns ;; a list of SXML equivalent to the XML. ;; sxpath takes an sxml path and produces a procedure to return a list of all ;; nodes corresponding to that path in an sxml expression. ;; (require-extension http-client) (require-extension ssax) (require-extension sxml-tools) ;; (define sxml (with-input-from-string (http:GET "http://en.wikipedia.org/w/query.php?what=recentchanges&rchide=bots&format=xml&rclimit=200") (lambda () (SSAX:XML->SXML (current-input-port) '())))) (for-each (lambda (x) (display x)(newline)) (map (lambda (x) (string-append "<a href=\"http://en.wikipedia.org/w/index.php?title=" (cadr x) "&diff=cur\">" (cadr x) "</a><br/>")) ((sxpath "yurik/pages/page/title") sxml)))