#!/usr/bin/env python # print the urls of all the images in a category of Wikimedia Commons # example: # $ python get_commons.py "Category:Illustrations_of_fungi" # pipe to wget for download: # $ python get_commons.py [category] | wget -i - --wait 1 import sys import json import urllib2 from urllib import quote def make_api_query(category, q_continue=""): if q_continue: q_continue = '&gcmcontinue=' + q_continue url = 'http://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=' + category + q_continue + '&gcmlimit=500&prop=imageinfo&iiprop=url&format=json' request = json.loads(urllib2.urlopen(url).read()) if 'error' in request: sys.exit(request['error']['info']) for page in request['query']['pages'].values(): try: print page['imageinfo'][0]['url'] except KeyError: pass # there is a maximum of 500 results in one request, for paging # we use the query-continue value: if 'query-continue' in request: q_continue = quote(request['query-continue']['categorymembers']['gcmcontinue']) make_api_query(category, q_continue) if __name__ == "__main__": if len(sys.argv) != 2: sys.exit("usage: python get_commons.py [category]") make_api_query(sys.argv[1])
We used a variation on this script while creating a design for Radio Panik.
by baseline - February 3, 2012 5:54 PM
Reply
This is an example of a simple script composed in the Python programming language.
print
outputs lines to the terminal. there is one command line argument, which is the name of the category the script will retrieve the image files for. If you would just run the python scriptpython get_commons.py Illustrations_of_Mushrooms
you would get a line of urls printed to the terminal. The script was designed to be used in a pipeline with the wget program (which, if you’ve installed Homebrew, you can install by typingbrew install wget
). In the wget program, we specify-i -
which means take input from the standard input.by tellyou - February 3, 2012 5:57 PM
Reply
The script requests a url that returns list of images, and their properties, in a json format (just like the twitter api provides). The script contains a function that calls a url from the wikipedia api, then extracts the image locations available in the response. If there are more images available then can be displayed at once, wikipedia’s api also returns a new url to query that will provide the next set of images and so on. To handle this, the function that queries wikipedia, checks if such a ‘continue-url’ is present, and if so, calls itself with this url as an argument. The fact that a programming function can call itself is rather baffling at first. This is what programmers call recursion. Python is not optimized for recursion, but there are styles of programming that are built around this concept. Recursion is an appealing analytical trick. I imagine there is some kind of abstraction gland stimulated by paradoxes and self-referential logic tricks—a kind of weird sister to sexual excitement. Like the symbols preferred by medieaval alchemists. Jenseits likes circles, spirals and Oroburouses.
by habitus - February 3, 2012 6:01 PM
Reply