User talk:Yurik/Query API
Attention visitors
This page discusses the Query Interface at http://en.wikipedia.org/w/query.php
The new API is being developed at mw:API, using the knowledge gained from this project and your feedback. The API will allow querying as well as posting back to wiki.
I will be spending most of my available time on the new API, so for now the work on new query features is postponed. Any feedback is welcome, especially on the way the new API will incorporate the existing capabilities. --Yurik 06:41, 17 September 2006 (UTC)
Completed Requests
[edit]See Completed Requests Archive for older requests.
Postponed issues
[edit]Core Changes to the API (IMPORTANT!)
[edit]Please monitor this section as it may impact your code, and make any suggestions below.
- This section will no longer be maintained, due to the new API currently being developed at mw:API. Please make any suggestions there.
- --Yurik 06:12, 15 September 2006 (UTC)
There are several kinds of page entries the API currently returns:
- Redirect entries: has <redirect> tag. With what=redirects, an <to id=xxx>Redirect Target</to> is included, with the id being 0 for missing pages. (JSON/PHP - dictionary key is the same as the redirect page ID)
- Name normalization entries: When the given page title is different than proper wiki title (localized namespaces, different case, extra spacing, ...), an entry is created with <normalizedTitle> tag containing the proper page title. A <refid> is also given, and is 0 when page does not exist. Page info for normalized title is always included as a separate entry. (for JSON/PHP, dict key is negative)
- Non-existing pages have <id> tag set to 0 (for JSON/PHP, dict key is negative)
- When non-existent pageids=xxx is specified, an entry with the <badId> element and an <id> set to 0 is created. (for JSON/PHP, dict key is negative)
- A regular page has a positive <id> element and non of the above mentioned elements. (for JSON/PHP, dict key is the same as the page ID)
- All of the above entries reside inside the <pages> element
- When revids=xxx is given, pages that contain listed revisions are included. No special errors are returned for incorrect revids.
Proposals
- Instead of having everything inside the <pages> element, create <normalized>, <missing> and <redirects> top-level elements. The <id> tag will be removed from badID and missing page entries. This will optimize processing code, as it will no longer have to make multi-pass scans when requesting multiple pages.
- what=redirects should include the pages that are targets of the redirects, up to 2 (by default), or up to relimit parameter (max=20?). When asked for P1 that redirects to P2, both P1 and P2 will be returned. In case P2 is also a redirect to P3, it will also be included by default, but not more. Setting relimit to 0 will make query API behave as it does now. Without what=redirects, no target of the redirect will be included in the result, although the page will still be included in the <redirects> section. Additionally, regardless of the what, <redirect> tag might be slightly changed to reduce the output size.
Outstanding issues:
- Separating bulk page requests from unbounded items per page to limit server load.
- Example - query for N revisions for M given pages - the moment more than one page was requested, paging becomes increasingly difficult. If the limit is set to 50, and 10 pages were requested, and the very first page has 200 revisions, paging is practically impossible. Initial idea would be to disallow paging for multi-page requests. The check will be made that <pages> section has 0 or 1 element. In other words - query API should have two modes of operation - single page and multi-page modes. Some operations will be single page mode only, whereas the rest will work in both single and multi-page modes.
- Deleted pages
- Logs -- I see two usage patterns: to generate pages based on the log, and to provide log data about given pages.
- PageGenerator: users may request log entries matching some criteria, similar to Special:Log. Such request may generate a list of pages to be populated by other properties.
- Multiple log entries may point to the same page - need to insert it into the page element.
- Not all log entries may refer to real pages - some refer to Special:Renameuser or Special:Userlogin, which currently cannot be handled by the query.
- PageProperty: similar to PageGenerator, except that this time other pages are already given (either by titles=... parameter, or by some other generators like allpages). Log information is retrieved for the given pages (optionally with additional filters).
- PageGenerator: users may request log entries matching some criteria, similar to Special:Log. Such request may generate a list of pages to be populated by other properties.
- It has occurred to me that only one of requests such as "logs", "watched pages", "all pages", "users", or any other that return a list of pages should be allowed at the same time. There is no reason to ask for logs AND watched pages. We should separate them from the what=... parameter, instead using something like pagegen=..., or similar. In addition, most of these requests may be used as either page generators or as properties on existing pages. Example is the logs - you can ask for logs on given list of pages, or you can ask for properties on all pages that were in the logs (see logs discussion above). More thinking is needed.
Add contfrom to revisions and usercontribs queries
[edit]This would enable things like Interiot's javascript edit counter to use the query interface instead of screen-scraping contribs pages. I'm not sure why the rvoffset parameter is disappearing (has disappeared?) but it'd be good to have a way to query entire article histories if that's really what's needed. Lupin|talk|popups 13:14, 12 June 2006 (UTC)
- rvoffset: Basically one feature of the query is conflicting with another. On one hand, query allows many pages to be requested at once (allpages, recentchanges, and nolanglinks properties; titles parameter). On the other, it allows to request many things about each found page - (langlinks, templates, ...). This is fine if the list for each page is fairly small - you wouldn't have more than 200 langlinks or that many templates in each page. On the other hand, things like revisions and backlinks may be practically infinite - i.e. {{disambig}} is used by half the pages.
- So the problem is to combine the two - allow many pages at once, and unbound number of subitems for each page.
- Revisions: i can these usage scenarios.
- Get last N revisions for each page.
- An antivandal bot may also ask for uniqusr for recentchanges|revisions or for titles + revisions - thus in one request it gets all the possible vandalism candidates, and takes actions. uniqusr is very slow at the moment, and i might have to rethink this approach.
- Get revisions for one title, and page through them as appropriate. Various filters may be applied. rvlimit in this case mean just the maximum number of records to return. If more than one page is given, and the first page has more results then the limit, it will just return results for page #1, and on subsequent calls continue exactly from the point it left off - the rest of the results for page one and results from page #2, etc. This is exactly how contfrom is implemented for other properties. For this exact reason, contfrom in backlinks is currently (not guaranteed in the future) has the format ns|db_key|page_id - the last page query was processing and the first page_id of the backlink to continue from.
- As the result: If you ask for two pages with backlinks, if one of them has no backlinks in the result it may mean that either a) there are no links, or b) they exist but you didn't get to them yet - use contfrom until its empty to guarantee the completeness of the result.
- Get last N revisions for each page.
- Revisions: i can these usage scenarios.
- All this brings the question - how revisions should be returned? Should i limit each request for revisions to just one page? Or implement paging similar to backlinks described above for #2 (i just noticed it's very slow, might not be the right approach). Will have to think hard on this one :) Any suggestions are welcome.
--Yurik 17:57, 12 June 2006 (UTC)
- My preference (possibly impractical) would be to always allow multiple queries in one request. This would cut down on network traffic and be easier for script writers to code for. (At the moment, if I want to get two pieces of information in two queries then I have to synchronize them as I don't know when they'll arrive.) If this was done, then you could restrict the revisions query to single articles without sacrificing functionality. Lupin|talk|popups 00:17, 13 June 2006 (UTC)
- I'm not sure i'm following. You already have the functionality of asking for multiple things (using the '|' symbol). --Yurik 13:20, 13 June 2006 (UTC)
- Oh, I hadn't realised this. Are the queries submitted in this way independent, though? I'm a bit unclear about how chained queries like that affect one another. Lupin|talk|popups 03:26, 14 June 2006 (UTC)
- I'm not sure i'm following. You already have the functionality of asking for multiple things (using the '|' symbol). --Yurik 13:20, 13 June 2006 (UTC)
- My preference (possibly impractical) would be to always allow multiple queries in one request. This would cut down on network traffic and be easier for script writers to code for. (At the moment, if I want to get two pieces of information in two queries then I have to synchronize them as I don't know when they'll arrive.) If this was done, then you could restrict the revisions query to single articles without sacrificing functionality. Lupin|talk|popups 00:17, 13 June 2006 (UTC)
- Every query appends result to the common data object, like toys onto a Christmas tree (or similar to the pork barrel legislature for politically inclined). There are meta queries - those formulate the basis of the tree - the page objects. allpages work that way, as well as titles and pageids. Later, the other properties get ran one by one, appending various information onto that tree for all the previously found pages. This way you can have recentchanges with revisions requested together, in which case you will get last N revisions for all recently changed pages. --Yurik 04:55, 14 June 2006 (UTC)
Log queries
[edit]I would like to have a bot interface for Special:Log - i.e. a way to query the logging table. It should be possible to filter the result by type (upload, delete, new account, etc), page, page-namespace, user and timestamp. Ideally, it should be possible to give more than one value for the type and namespace options, maybe also for user and page.
This feature could be used by CommonsTicker as a fallback in case the replication lag on the toolserver gets too large (as it has in the löast few days).
Thanks for your great work! -- G. Gearloose (?!) 22:53, 19 June 2006 (UTC)
- I got a bit confused by m:Logging_table layout. At some point i will dig through the code to figure it out. --Yurik 19:44, 24 June 2006 (UTC)
- Hm, what confuses you? I have been working a lot with that table for CommonsTicker -- de:Benutzer:Duesentrieb 10:00, 3 July 2006 (UTC)
- Please describe its fields at m:Logging_table. Thanks! --Yurik 15:46, 21 July 2006 (UTC)
Categories
[edit]Is it possible to list the contents of a category (possibly filtered by namespace) using query.php? I couldn't figure it out... It would be nice to have, especially as I find people spidering CategoryTree to retrieve the category structure. -- de:Benutzer:Duesentrieb 10:00, 3 July 2006 (UTC)
- Yes. For example, http://en.wikipedia.org/w/query.php?what=category&cptitle=Category:Wikipedia_tools. Lupin|talk|popups 19:45, 3 July 2006 (UTC)
Conversly, is it posibble to return the categories to which on a page belongs (or constrain the links to ns14 or something)? Or am I stupidly overlooking that possibility? maarten 00:16, 8 July 2006 (UTC)
- Done. --Yurik 00:15, 24 July 2006 (UTC)
- Oops, i did the opposite only -- there is currently no way to list categories page belongs to. I will add it shortly. --Yurik 14:06, 26 July 2006 (UTC)
- Now its done :)... pending servers sync. --Yurik 14:42, 27 July 2006 (UTC)
- Ah, that clears up the confusion ;-), thanks. maarten 17:12, 27 July 2006 (UTC)
- Is there a delay with the syncronisation; or am I just being impatient? maarten 16:24, 1 August 2006 (UTC)
- Unfortunately admins are reluctant to do a sync at this point because some breaking changes are in the works. Query API does not depend on any of them, so it can be safely synced, but to do that would require someone to pester admins on the IRC tech channel. You are welcome to do that :) --Yurik 17:11, 1 August 2006 (UTC)
Image links and namespace
[edit]Hi, great tool!, would it be possible to always give the namespace attribute, even when it is 0, for example when the page is mainspace the ns attribute is missed out. Not critical, but would make coding a bit easier. thanks Martin 10:34, 12 July 2006 (UTC)
- Given the capabilities of the XmlTextReader class from .Net 2.0 we use on AWB, this is rather easy to handle. So, I wouldn't change that on the Query API implementation in favor for saving bandwidth. BTW, many thanks to Yurik and all contributors from my side, too. Very nice API. --Ligulem 11:24, 12 July 2006 (UTC)
- Its good to have things standardised though. Also, is there a way to find What redirects to an article? Martin 11:59, 12 July 2006 (UTC)
- I would rather try to save bandwidth on this one. The xml schema can have defaults, so that solves it. By the way, if someone would contribute the schema file, it will simplify life for everyone doing it in XML. Thanks! --Yurik 14:44, 12 July 2006 (UTC)
- Duh. If I'd be very bored enough I'd do it in RELAX NG#Compact syntax notation. Would that be helpful? --Ligulem 15:19, 12 July 2006 (UTC)
- Isn't there a converter of schema to/from RelaxNG? Might be nice to have the proper schema + ng :) --Yurik 17:49, 12 July 2006 (UTC)
- Duh. If I'd be very bored enough I'd do it in RELAX NG#Compact syntax notation. Would that be helpful? --Ligulem 15:19, 12 July 2006 (UTC)
Counting feature?
[edit]What about having a counting feature for example of how many pages there are in a combination of categories. --193.175.201.126 12:19, 26 July 2006 (UTC)
- I don't think this should be part of the query - we are getting too much into analyzing data vs. simply accessing it. count(*) is notoriously expensive queries for large aggregates. --Yurik 14:09, 26 July 2006 (UTC)
- I've just added category listings for pages - use that to get category lists for all pages, and do the stats. You can also combine what=category|categories, and provide cptitle to be one of the categories you are interested in - this way you can get all the other categories the pages in the cptitle category belongs to. --Yurik 14:43, 27 July 2006 (UTC)
Special pages
[edit]Would it be possible to query special pages? e.g. Special:Deadendpages but obviously not others like Special:Userlogin, if so it would be great if the results for each different page used exactly the same format, so then the only variable would be the url. Martin 10:40, 27 July 2006 (UTC)
- Its possible, but very hard - every special page is a code of its own, some with its own result caching. I would have to understand each page in turn, and then re-implement most of their logic. If we ever want to separate data access layer from the presentation layer, query API is well positioned to become that, but tons of work will have to be done to get all the special queries into it. --Yurik 15:51, 27 July 2006 (UTC)
- So official queries are okay, but user-requested queries are probably asking too much? (which, I guess is probably one good place to draw the line) Though if official queries are okay, one would think that count(*) should be no problem... --Interiot 16:03, 27 July 2006 (UTC)
- Ok, don't worry about it, I can already parse the HTML of the special pages, but it is not nearly as clean as using the API, as I do with categories etc. Martin 16:35, 27 July 2006 (UTC)
- I will add any queries requested (that also make sense me ;)), as long as they do not tax the database too much. If you have any specific requests, please describe how that information should be returned (in the present structure), parameters it should support, and, if possible, the SQL that can get it for me. Thanks! --Yurik 16:41, 27 July 2006 (UTC)
HTML rendering of the pages
[edit]Render wiki markup as HTML and return that, without the skin customizations. Not sure if its possible to have a generic html version of the page that so that CSS changes everything... --Yurik 15:51, 27 July 2006 (UTC)
- Would this be very different/more efficient than action=render? Lupin|talk|popups 02:03, 28 July 2006 (UTC)
- It might be, because query is being optimized for an extremelly fast startup time and very little overhead. Plus it can work on multiple pages in one request. In addition, users will be able to get multiple things at once, without making many requests - like requesting interwiki, links, wiki markup and rendered page in one go. --Yurik 02:11, 28 July 2006 (UTC)
- I see - sounds nice :) Lupin|talk|popups 02:22, 28 July 2006 (UTC)
- It might be, because query is being optimized for an extremelly fast startup time and very little overhead. Plus it can work on multiple pages in one request. In addition, users will be able to get multiple things at once, without making many requests - like requesting interwiki, links, wiki markup and rendered page in one go. --Yurik 02:11, 28 July 2006 (UTC)
Revids with direction=prev/next
[edit]These urls [1] [2] gives you revisions adjacent to revision 61168673 of the Main Page. Could query.php support such relative revid specifications? I'd like to use this to download two revisions in a single request in order to perform a diff on them, for example to generate a preview for this link. Currently I can perform two separate requests with action=raw (which does support the direction parameter), but it should be possible to make just one request. Lupin|talk|popups 02:51, 28 July 2006 (UTC)
It just occurred to me that it may be nice to support YAML output. Not that I need it or anything :P Just a thought. -- G. Gearloose (?!) 10:38, 4 August 2006 (UTC)
- If you find a yaml php code, I will add it :) --Yurik 12:27, 4 August 2006 (UTC)
/sign this request. http://spyc.sourceforge.net/, too. (I was feeling lucky with Google.) AKX 11:57, 13 August 2006 (UTC)
- Done. Should be available shortly. --Yurik 22:53, 15 August 2006 (UTC)
Bug?
[edit]This query's XML output doesn't parse in MSIE, Firefox, or the Perl parser I'm using. Firefox displays the error "XML Parsing Error: xml declaration not at start of external entity". It looks like there's an extra line at the beginning that's throwing the parsers off. (a workaround of removing all whitespace from the beginning worked for me) --Interiot 05:48, 6 August 2006 (UTC)
- I think it might be related to the bug i made with the last checkin - in process of syncing it up, and will see if it works afterwards. Thanks! --Yurik 19:17, 6 August 2006 (UTC)
- Very strange - everything works fine the moment you replace zh-yue: with en:. Thanks! --Yurik 00:22, 8 August 2006 (UTC)
- I think this issue was resolved, as it is no longer happening on zh-yue. Query API has not been changed, so it must have been the startup code somewhere. --Yurik 16:11, 13 August 2006 (UTC)
- I got this problem and resolved it by removing an extra newline or two folowing the closing php tag in the Localsettings.php file. This was sending the extra newline to the rss feed somehow... --Kevincolyer 09:50, 31 March 2007 (UTC)
Installation
[edit]ERRORMESSAGE: after uploading query.php into mysite/wiki/extensions/botquery/query.php I can run it, but it stops immediately saying:
Parse error: parse error, unexpected '{' in /.../wiki/extensions/botquery/query.php on line 580
* MediaWiki: 1.6.10 * PHP: 4.3.10-19 (apache2handler) * MySQL: 4.1.11-Debian_4sarge7-log
Do you have an idea about the problem???
Written by Fortyfoxes 00:30, 7 August 2006 (UTC) With installed latest version of query.php installed and the following server set up:
* MediaWiki: 1.6.7 * PHP: 4.4.2 (cgi) * MySQL: 5.0.18-standard-log
I get the following error:
Parse error: syntax error, unexpected '&', expecting T_VARIABLE or '$' in pathToMyDomain/w/extensions/query.php on line 557... and also 721, 722, 740....
Also, now that I have php5 and a later vesion of mediaWiki installed:
* MediaWiki: 1.6.7 * PHP: 4.4.2 (cgi) * MySQL: 5.0.18-standard-log
I get the following:
Warning: require_once(/home/.goethe/fortyfoxes/architex.tv/w/extensions/../../includes/Defines.php) [function.require-once]: failed to open stream: No such file or directory in /home/.goethe/fortyfoxes/architex.tv/w/extensions/query.php on line 56
Fatal error: require_once() [function.require]: Failed opening required '/home/.goethe/fortyfoxes/architex.tv/w/extensions/../../includes/Defines.php' (include_path='/home/.goethe/fortyfoxes/architex.tv/w:/home/.goethe/fortyfoxes/architex.tv/w/includes:/home/.goethe/fortyfoxes/architex.tv/w/languages:.:/usr/local/php5/lib/php') in /home/.goethe/fortyfoxes/architex.tv/w/extensions/query.php on line 56</nowiki>
I have a .htaccess rewrite for short-url's which may be causing the problem?
RewriteEngine on
# uncomment this rule if you want Apache to redirect from www.mysite.com/ to www.mysite.com/wiki/Main_Page # RewriteRule ^$ /wiki/Main_Page [R] # do the rewrite RewriteRule ^wiki/?(.*)$ /w/index.php?title=$1 [L,QSA]</nowiki>
Either way need a few more instructions to be able to set up on a virtual host (in my case DreamHost)?
- Solved over IM. You do not need any of this. Query is not a typical extension - it does not need to be activated by including it into localsettings.php. You do not need any virtual host or anything at all. If LocalSettings.php is in the folder w/, the query.php must be in the folder w/extensions/botquery/query.php. It is likely that the folder names are not important - what IS important is that it is two levels below localsettings.php. --Yurik 00:06, 8 August 2006 (UTC)
Watchlist
[edit]It would be great to have output for the currently logged in users watchlist. Presently I can parse the watchlist page directly, but it isn't as nice as using this API. It's nothing to worry about, especially if it is particularly difficult for any reason, just would be useful to have eventually. Martin 19:23, 6 August 2006 (UTC)
Edit interface
[edit]Would it be possible to set up an edit interface so bots could edit pages without downloading tens of kilobytes of unneeded HTML? --Carnildo 20:04, 6 August 2006 (UTC)
- Actually, it could be enough to have four values returned (together with the raw page) to be used as wpStarttime/wpEdittime/wpEditToken/wpAutoSummary for submitting edits via the regular interface. I see the interface already returns rollback tokens (if one is logged as an admin), so this might be possible? (Liberatore, 2006). 10:39, 11 August 2006 (UTC)
- if you don't get an edit conflict, this is nice. if you do, you want to have the version your edit conflicted with. you'd still have to scrap it from the page the server returns after the store. -- ∂ 15:04, 11 August 2006 (UTC)
- In case of an edit conflict, you can still come back to query.php and get four new values. Yes, I agree you end up loading more data in this case (and you increase the probability of another edit conflict), but I do not think that bots generally encounter many edit conflicts (Liberatore, 2006). 15:32, 11 August 2006 (UTC)
- About once in a blue moon. OrphanBot's edit-conflicted twice in 130,000+ edits. --Carnildo 18:03, 11 August 2006 (UTC)
- We talked about this at the hacking days, take a look at API for the results of that discussion :) Amongst others, an editing interface is something that is part of the basic requirements for a mediawiki API. Henna 09:14, 12 August 2006 (UTC)
- About once in a blue moon. OrphanBot's edit-conflicted twice in 130,000+ edits. --Carnildo 18:03, 11 August 2006 (UTC)
- In case of an edit conflict, you can still come back to query.php and get four new values. Yes, I agree you end up loading more data in this case (and you increase the probability of another edit conflict), but I do not think that bots generally encounter many edit conflicts (Liberatore, 2006). 15:32, 11 August 2006 (UTC)
- if you don't get an edit conflict, this is nice. if you do, you want to have the version your edit conflicted with. you'd still have to scrap it from the page the server returns after the store. -- ∂ 15:04, 11 August 2006 (UTC)
Since this request has not been (formally) rejected (so far :-(), I guess I may post another suggestion here. An easy way to make editing possible could be to add two "global" fields when querying "what=revisions":
- wpStarttime, produced by calling
wfTimestampNow()
before anything else; - wpEditToken, generated by calling
$wguser->editToken()
(or maybehtmlspecialchars($wguser->editToken())
?).
Of the other two hidden fields I mentioned above, wpAutoSummary seems not necessary (so far). As for wpEdittime: if the page already exists, wpEdittime is the timestamp of the last revision of the article, so this is already returned when querying "what=revisions"; otherwise, it is equal to wpStarttime. (Liberatore, 2006). 16:01, 28 August 2006 (UTC)
SOAP
[edit]Hiya, I think it may be useful to make this work as a SOAP. If there is a WSDL file then third-party application developers will have a much easier time developing tools to use Query. We made something similar for LyricWiki's SOAP API (check out the External links on that page for help implementing a SOAP in PHP using nuSOAP).
ISBN & images
[edit]Is it possible to somehow expose the ISBN-numbers on a given page, used for those Special:Booksources-links (would be great for combining data)? Also, I fail to understand how to use a (imageinfo?) query to return the (upload.wikimedia.*) url's of the images, displayed on a given page. maarten 12:28, 15 August 2006 (UTC)
Ordering output
[edit]I note that "what=links" returns the links in an alphabetically sorted order. Would it be possible to add an optional flag such as "sort=none" or something like that so that one could retrieve them in the same order in which they appear within the text? This could also be used with "what=categories", "what=langlinks" and "what=templates". --Russ Blau (talk) 14:08, 16 August 2006 (UTC)
- Much of the link data comes straight from the database, and is not parsed from the wikitext every time. I don't think the database stores the links in any meaningful order, does it? --Interiot 14:13, 16 August 2006 (UTC)
- Correct, all data comes from the database, no way to get the ordering. Parsing pages every time would kill the server :) --Yurik 14:11, 17 August 2006 (UTC)
Querying user status
[edit]I suppose this is a feature request. But being able to query a users status, or even fetch the entire list of admins or whatever would be tremendously helpful to me. joshbuddy, talk 18:26, 23 August 2006 (UTC)
touched
[edit]What is exactly the meaning of the touched attribute? I wrongly supposed it was the last modified date, but I have a bunch of articles which don't match... So I suppose it's the last generated date (templates & co).
Is there a way to get the last modified date?
Thanks
Gonioul 20:37, 26 August 2006 (UTC)
backlinks
[edit]Is there any way to filter the output of what=backlinks
to determine which of the linking pages are actually redirects? At first I thought blfilter=redirects
would do this, but that is not what it does -- it filters the list of titles, not the output. (Presumably there should be a way to implement this, since the output of [[Special:Whatlinkshere/Page]] does show which links are redirects, and even includes the backlinks to those pages.) --Russ Blau (talk) 20:04, 30 August 2006 (UTC)
Namespace for "recentchanges"
[edit]Hi, query is great :) but I think the "recentchanges" property is missing a "ns" field. I would be interested in filtering the RC and retrieving those related to talk pages, not articles. Thanks, keep the good work. Dake 21:39, 6 September 2006 (UTC)
wfMsg() and wfMsgForContent()
[edit]Some of my JavaScript extensions look up interface strings which are stored in the MediaWiki namespace and also in the "message cache". It turns out this is not always very easy or possible with action=raw and even when it is can require up to 4 xmlhttprequests per message. It would be nice to have a Query API which takes the arguments for these functions, calls them, and returns the result in some very very simple format - preferably as simple as action=raw — Hippietrail 12:36, 8 September 2006 (UTC)
more options for allpages
[edit]I've written two autocomplete tools that use allpages, one for the search inputbox, and one for the article edit box (see links at top of my user page). There are two things I need to make them better:
- add <to id="foo">bar</to> to the output for redirects for what=allpages, maybe something like &apredirto
- case insensitive searching, like Special:Allpages, maybe something like &apcase=false.
Otherwise, query.php rocks :) Zocky | picture popups 17:30, 12 September 2006 (UTC)
- OK, stupid me, just figured out the redirects. BTW do you have a standard javascript function for accessing query.php? The one I use on User:Zocky/Link Complete does a fine job, but standardization is probably a good idea. Zocky | picture popups 18:03, 12 September 2006 (UTC)
Please test
[edit]May I ask the people watching this page to test the following links:
I get the incoming links in the first three cases, in a second. However, I get a timeout on the the fourth query; this is just the combination of both pages plus the restriction on the namespace. (Liberatore, 2006). 14:19, 22 September 2006 (UTC)
- I get a timeout as well. Martin 15:10, 22 September 2006 (UTC)
- Works for me :( --Yurik 22:25, 25 September 2006 (UTC)
- It now works to me as well. Thanks! (Liberatore, 2006). 16:19, 29 September 2006 (UTC)
- Works for me :( --Yurik 22:25, 25 September 2006 (UTC)
server snag
[edit]Just FYI, I've discovered a little snag using the api. If a query URL is too long, you get a HTTP 400 error that contains "The following error was encountered: Invalid URL". 17:56, 25 September 2006 (UTC)
- That's not a server snag, but a URL length limitation :) -- pass the long parameters as a post request instead. --Yurik 21:53, 25 September 2006 (UTC)
List of used images
[edit]Hi, how can I make query.php return a list of images used in an article? "what=links" doesn't seem to include images. --Magnus Manske 12:11, 26 September 2006 (UTC)
- There is currently no query for that. Needs to be added. --Yurik 16:14, 26 September 2006 (UTC)
- In the meta:API i redefined what imagelinks means, and added the imgembeddedin command. Once its implemented... :) --Yurik 16:30, 26 September 2006 (UTC)
- Thanks, that will be a great help in rewriting my missing images tool :-) --Magnus Manske 17:40, 26 September 2006 (UTC)
- In the meta:API i redefined what imagelinks means, and added the imgembeddedin command. Once its implemented... :) --Yurik 16:30, 26 September 2006 (UTC)
- Hi, I would like to renew this request if possible. I have reported it as a bug, after asking Rob Church in the mediawiki IRC. We have had a request at AutoWikiBrowser to get a list of images on the page, and i hear your the guy! As an extension of "what=links" would be brilliant. Keep up the good work!! Reedy Boy 22:13, 30 March 2007 (UTC)
- This has now been added! Thankyou!! Reedy Boy 10:02, 21 May 2007 (UTC)
Counting deleted edits
[edit]It was recommended here that I bring my request to this page. On the old Kate's tool, it was possible to display deleted edits as well as "actual" edits; could this feature be in the new query.php? Thanks Batmanand | Talk 19:21, 27 September 2006 (UTC)
- For a while I had user contribution counter, but recently discovered it was a huge load on the server, so it was stopped. I am not sure the deleted counter will be available, unless DB admins approve. --Yurik 20:24, 27 September 2006 (UTC)
- OK if that is the way it has to be, so be it. It was only a thought. Thank again Batmanand | Talk 22:18, 27 September 2006 (UTC)
LinkBatch isEmpty() undefined method
[edit]- mediawiki-1.6.7
- PHP Version 5.1.2
Running into an issue where the main query.php page is functioning and some of the queries are working, but when I try to perform something like this:
query.php?what=content|templates&titles=Junk
I'm getting:
Fatal error: Call to undefined method LinkBatch::isEmpty() in /var/www/mediawiki-1.6.7/w/BotQuery/query.php on line 2040
I did notice this commit, but I don't think that is in 1.6.7. Is that version just not supported? Note: putting in the isEmpty() and getSize() functions fixes the problem, but I'm not sure what else could be missing (worried about production level readiness on 1.6.7). Thanks! --Malcom 04:09, 9 October 2006 (UTC)
- Malcom, there were very few checkins i did outside of query that were needed for proper query operation. Query is readonly interface, thus any issues would be limited to a bad response. --Yurik 17:18, 9 October 2006 (UTC)
- Agreed, but this code (in query.php) does require LinkBatch to have an isEmpty() method; which from what I can see is not in 1.6.7:
1599 $linkBatch = new LinkBatch; ... ... 1620 if( $linkBatch->isEmpty() ) {
- Am I just seeing this wrong? --Malcom 04:02, 10 October 2006 (UTC)
- Sorry for not being clear -- there are very few commits made by me outside of the query (and the current api). You will easily find any other changes required, and I seriously doubt there is anything else, other than LinkBatch change that you need to run query.php. Since query.php is a readonly interface, even if something else is missing, you are not risking anything, just some minor broken functionality that you will easily notice. --Yurik 14:45, 10 October 2006 (UTC)
- Am I just seeing this wrong? --Malcom 04:02, 10 October 2006 (UTC)
Info about interwiki/language links
[edit]Hi, Yurik. It is possible for your query.php to provide info about interwiki/language links? I only need language links (links that will disappear and put into the left bar), so I don't have to parse Special:SiteMatrix and put in my bot special knowledge about commons and other exceptions; but complete information about interwiki links and their URL templates wouldn't be bad either. Anyway, yes or no, I want to thank you very much for this useful tool. Greetings. --es:Usuario:Angus 19:12, 18 November 2006 (UTC)
- Isn't this what is done by, for example http://en.wikipedia.org/w/query.php?what=langlinks&titles=Rome ? Tizio 18:05, 20 November 2006 (UTC)
- No, that's the language links used in an specific article. I want a list of the prefixes that will generate a language link in the project ("ar", "af", ..., "zh" for any wikipedia or commons, nothing for meta, etc.). Sorry for being unclear. es:Usuario:Angus 16:09, 21 November 2006 (UTC)
Query API limiting results
[edit]Hi Yurik. Congratulations for the work on Query API/api.php, at last someone is working on it. I'm trying to rewrite our Java library for interfacing (when possible) with Query API/api.php, but all I'm getting are limited results - more precisely, the error «Error: User requested 99999 pages, which is over 500 pages allowed (cplimit)». It doesn't seem to be possible to query for something like a "give me everything", like querying for "ALL articles in a category", am I correct? Could you please explain (or point me to a relevant discussion on the subject) why - if I am correct (hope not) - or how do I achieve such results? Best regards, Nuno Tavares 03:01, 2 January 2007 (UTC)
- You have to use paging to receive all results. Api automatically gives you the point from which to continue getting the next batch of information. This is done mostly to prevent abuse and reduce the server load. --Yurik 20:05, 26 January 2007 (UTC)
Special:Prefixindex
[edit]Hello, i would like to know if some special page can be use to extend the query API. For exemple i need Special:Prefixindex. Thanks a lot. fr:user:bayo 193.248.56.95 01:54, 26 January 2007 (UTC)
Problem at Memory Alpha
[edit]Hi, I expected to get a list of 10 articles starting at "h" with this:
http://memory-alpha.org/en/query.php?what=allpages&aplimit=10&apnamespace=0&apfrom=h
But as you can see that is not what is returned. Is this the expected bahaviour, amd am I missing something? Also, is there a way to make the results case-insensitive? I'm using it for this. Thanks. --Bp0 01:11, 2 February 2007 (UTC)
- Appears to be a problem with capitalization, as this one works:
http://memory-alpha.org/en/query.php?what=allpages&aplimit=10&apnamespace=0&apfrom=H
- (the difference is that `H' is now capitalized). Tizio 12:06, 2 February 2007 (UTC)
- Ah right, I forgot for a second that 'h' comes after 'Z'. I'll change the script to upper-case the first letter. --216.16.66.30 16:26, 2 February 2007 (UTC)
limited userinfo on other users
[edit]userinfo only provides information on the current user, but it would be very useful to be able to query some information on other users; for example, ProxyDB needs to check if a proxy is blocked. This could be done by only using $wgUser if no username is provided, possibly providing less information (no preferences, for example) on other users.
It would also be very useful to be able to query multiple users, returning the results in an array:
http://en.wikipedia.org/w/query.php?what=userinfo&uiisblocked&names=foo|bar <yurik> <meta> <user> <name>foo</name> <isblocked>0</isblocked> </user> <user> <name>bar</name> <isblocked>1</isblocked> </user> </meta> </yurik>
—{admin} Pathoschild 09:02:43, 08 February 2007 (UTC)
- More generally, it would be useful to have a sort of "direct access" to the records of the various tables, excluding the fields that are not indirectly accessible via the web interface to anon users. Tizio 14:53, 9 February 2007 (UTC)
UnusedImage suggestion
[edit]Hi. I have a small suggestion is to add a query for the list of unused images. The list would be similar to Special:UnusedImages. --Jutiphan | Talk - 05:28, 12 February 2007 (UTC)
New bug with iishared
[edit]This url gives an error, which seems to be the result of a recent change. Lupin|talk|popups 22:43, 27 February 2007 (UTC)
revision length?
[edit]Is there any way to make the API return the length of a revision when retrieving the RecentChanges list (e.g. via rc_old_len and rc_new_len) - those values don't seem to be covered by the API yet!? I'm trying to use it with this query... -- Ace NoOne 14:33, 14 March 2007 (UTC)
User edit count
[edit]Does this contain deleted edits or not? There's about a one-thousand edit difference between query.php and Wannabe Kate with my edit count. Will (We're flying the flag all over the world) 01:59, 15 May 2007 (UTC)
- I think the query API one just does mainspace edits... Reedy Boy 08:40, 15 May 2007 (UTC)
- [7] gives my edit count at just over 17,000.
- [8] gives it at about 15,600.
- Will (We're flying the flag all over the world) 16:47, 15 May 2007 (UTC)
- Yurik, I left a similar question on your main talk page before I noticed your request to put such comments here. I think deleted edits are getting picked up, but is there a way to know for sure? Thanks again for the very quick counter. Casey Abell 18:33, 15 May 2007 (UTC)
- By the way, there's an interesting test of the deleted edit question coming up. I'm due to get two edits deleted (boo-hoo) on an orphaned image on May 16. If the difference between Interiot's counter and Yurik's counter increases from its current 36 to 38, then the mystery is solved. Casey Abell 19:08, 15 May 2007 (UTC)
- Yes. It does count deleted edits. Will (We're flying the flag all over the world) 01:04, 16 May 2007 (UTC)
- No doubt about it. My two edits on the orphaned image just got deleted, and the difference between the two counters went from 36 to 38. Casey Abell 12:34, 16 May 2007 (UTC)
iishared now gives a blank page
[edit]http://en.wikipedia.org/w/query.php?titles=Image:Hope_Diamond.jpg&what=imageinfo&iishared
This is coming up blank for me right now... if you remove iishared then it behaves correctly. Any idea what's wrong? Lupin|talk|popups 16:38, 3 June 2007 (UTC)
- Hmm. weird. I guess they changed something in the commons code again :(. Will have to look. Btw, if you can, see if you can start using the new api.php. (images are still not there, but should be soon). --Yurik 12:54, 4 June 2007 (UTC)
- It no longer comes up blank, but it does give an error.
*------ Error: This site does not have shared image repository (ii_noshared) ------*
- Is this the intended behaviour? (The same happens if I change en to commons in the url... I think one of them should probably not give an error, at least.) Lupin|talk|popups 09:09, 3 July 2007 (UTC)
Categories (copied from VPT)
[edit][the following was on WP:VPT; I have copied it here. Tizio 12:12, 5 June 2007 (UTC)]
Does anyone know why the first of these two queries fails but the second one works? success error. — Carl (CBM · talk) 16:32, 4 June 2007 (UTC)
- Replacing %20 with underscore works. –Pomte 16:37, 4 June 2007 (UTC)
- Thanks, that is at least a temporary hack. Both of the queries have two-word continuations, and both of them have %20 in the URL, but only one works, so something more complex is going on. — Carl (CBM · talk) 16:42, 4 June 2007 (UTC)
- Actually, the same error occurs for this query which has only one word in the continuation and is returned by this query as right one to fetch next. — Carl (CBM · talk) 16:46, 4 June 2007 (UTC)
- It seems that %20 is actually the right character to use, according to User_talk:Yurik/Query_API/Completed_Requests#Category_downloading. Was "Diffusion process" returned in a "category next" tag? Tizio 16:54, 4 June 2007 (UTC)
- No, but this one is the first error I get using %20. That's the second continuation, the first is this. — Carl (CBM · talk) 17:03, 4 June 2007 (UTC)
- It seems that %20 is actually the right character to use, according to User_talk:Yurik/Query_API/Completed_Requests#Category_downloading. Was "Diffusion process" returned in a "category next" tag? Tizio 16:54, 4 June 2007 (UTC)
- Actually, the same error occurs for this query which has only one word in the continuation and is returned by this query as right one to fetch next. — Carl (CBM · talk) 16:46, 4 June 2007 (UTC)
Bug
[edit]I emptied the category Category:Stub-Class mathematics articles any I still get pi_badpageids when I try to query its contents, instead of the usual "emptyresult". — Carl (CBM · talk) 13:09, 9 June 2007 (UTC)
- The behavior changed from this morning to this evening, most likely because someone fixed something. Now all the math-related categories appear to be queryable (there were other broken cats before). Thanks, it's appreciated. — Carl (CBM · talk) 01:38, 10 June 2007 (UTC)
Same timestamp edits
[edit]Ok, I know that this is about to be replaced by API, but I am still using old scripts, so I noticed this bug: when two edits have the very same timestamp, they may be listed in the wrong order. For example:
http://en.wikipedia.org/w/query.php?format=xml&what=revisions&titles=Homicidal&rvlimit=3
lists 169395135 before 169395138, while the web interface got them in the right order:
http://en.wikipedia.org/w/index.php?title=Homicidal&action=history
I don't know if this is really worth fixing. Tizio 18:51, 6 December 2007 (UTC)