User:Mx. Granger/Photo program
Appearance
(Redirected from User:Mx. Granger/Photo programs)
I've written a Python program to find Wikipedia articles that need photos near a particular location.
It's far from perfect—it gets false negatives and false positives for various reasons. If you notice a bug that you think I would be able to correct, let me know at User talk:Mx. Granger.
If you have any questions, please also let me know.
You may also find Special:Nearby, WikiShootMe, Wikidata locations, or the Commons mobile app useful. You can find older versions of the program in this page's history.
I release all of the text on this page, including the program, under CC-0.
How to use
[edit]- Make sure that Python 3 is installed on your computer.
- Make sure you have requests installed (run
pip install requests
in the terminal, or possiblypip3 install requests
). - Save the program with a name such as
findNeededPhotos.py
. Make sure to change the values that are labeled "REQUIRED" near the top of the file. - Run the program in the terminal with
python findNeededPhotos.py
. After a little while (usually takes less than a minute, but depends how many Wikipedia articles there are in the area), it'll start displaying a list of articles needing pictures that are closest to the location you entered, sorted by distance.
For more information see mw:Extension:GeoData#list=geosearch and mw:API:Images.
The program
[edit]import requests import time import codecs #################################################### ######## Options to be changed by the user ######### # REQUIRED: Set this to your username or name and a way for the WMF to contact you # This is needed because the program uses the API, see [[meta:User-Agent policy]] for details headers = {'User-Agent': 'Insert username or name and contact info here'} # REQUIRED: Change these numbers to the latitude and longitude you're interested in. This can be your current location, somewhere you're planning to travel to, or any place where you want to take pictures for Wikipedia. mylat = 27.175 mylong = 78.041944 # Optional: If you want to search a different (non-English) Wikipedia, you can change the language code. lang = "en" # Optional: You can change this number to increase or decrease the maximum number of articles the program will output. (It may still output less than the maximum.) maxplaces = 40 # The maximum distance (in meters) between the location inputted above and the articles that will be returned. Unfortunately the API doesn't support distances greater than 10000 meters. maxdistance = 10000 #################################################### #################################################### def stripcomments(text): text = text + " " #This line is to deal with the rare case where an article ends with "<" or "<-" newtext = "" comment = False for i in range(0,len(text)): if(text[i]=='<' and text[i+1]=='!' and text[i+2]=='-'): comment=True elif(comment==False): newtext = newtext + text[i] elif(comment==True): if(text[i-2]=='-' and text[i-1]=='-' and text[i]=='>'): comment=False return newtext mylat = str(mylat) mylong = str(mylong) maxdistance = str(maxdistance) listOfNearbyArticles = [] url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=query&list=geosearch&gscoord=" + mylat + "|" + mylong + "&gsradius=" + maxdistance + "&gslimit=500" r=requests.get(url, headers=headers) splitOutput = codecs.decode(r.text,'unicode-escape').split('"title":"') for part in splitOutput[1:]: #using [1:] to get rid of the first part, which doesn't have a page title. newTitle = part.split('"', 1)[0] listOfNearbyArticles.append(newTitle) #This is the list of file extensions which I consider to indicate a plausibly good image (an image that probably disqualifies the article in question from being outputted by my program) listOfFileExtensions = [".jpg", ".JPG", ".jpeg", ".JPEG"] listOfArticlesToOutput = [] for title in listOfNearbyArticles: urlTitle = title.replace("&","%26") url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=query&prop=images&titles=" + urlTitle + "&imlimit=40" r=requests.get(url, headers=headers) textFromAPI = r.text if any(x in textFromAPI for x in listOfFileExtensions): #The page probably has at least one good image #Second check url= "https://" + lang + ".wikipedia.org/w/api.php?format=json&action=parse&page=" + urlTitle + "&prop=wikitext" r=requests.get(url, headers=headers) article = stripcomments(r.text) if any(x in article for x in listOfFileExtensions): pass #both the list of images and the wikitext contain one of our file extensions, so it seems the article has a good image else: #The page has a good image, but it's not in the wikitext, so probably in a navigation template or similar listOfArticlesToOutput.append(title) else: #The page has no good images listOfArticlesToOutput.append(title) print(title) if(len(listOfArticlesToOutput)==maxplaces): break