I like tight pants and downloading funghi, from wikimedia commons

February 3, 2012

by bnf

bnf’s comments

Dear Ned, thanks for your reaction, and the link to … read more
As a programmer, what I find more worrying is that … read more
Welcome read more
Comparatively, the offer of WYSIWYG libraries is meagre. Luckily, the … read more
Good call. So I should add a part to the … read more
By the way, a related project (that might make ufo2otf … read more
Never mind, I have found a solution more easy still. … read more
In PHP, it would be something like this:<?php header("Content-type: text/css"); … read more
With PHP, there still exists some link between the urls … read more
hh read more

Other writers

glit
jenseits
habitus
tellyou
baseline
bnf

Previous / Next

Downloading funghi, from Wikimedia Commons

#!/usr/bin/env python
# print the urls of all the images in a category of Wikimedia Commons
# example:
# $ python get_commons.py "Category:Illustrations_of_fungi"

# pipe to wget for download:
# $ python get_commons.py [category] | wget -i - --wait 1

import sys
import json
import urllib2
from urllib import quote

def make_api_query(category, q_continue=""):
    if q_continue:
        q_continue = '&gcmcontinue=' + q_continue
    url = 'http://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=' + category + q_continue + '&gcmlimit=500&prop=imageinfo&iiprop=url&format=json'
    request = json.loads(urllib2.urlopen(url).read())
    if 'error' in request:
        sys.exit(request['error']['info'])
    for page in request['query']['pages'].values():
        try:
            print page['imageinfo'][0]['url']
        except KeyError: pass
    # there is a maximum of 500 results in one request, for paging
    # we use the query-continue value:
    if 'query-continue' in request:
        q_continue = quote(request['query-continue']['categorymembers']['gcmcontinue'])
        make_api_query(category, q_continue)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        sys.exit("usage: python get_commons.py [category]")
    make_api_query(sys.argv[1])

Download

3 Comments

We used a variation on this script while creating a design for Radio Panik.

by baseline - February 3, 2012 6:54 PM
Reply

This is an example of a simple script composed in the Python programming language. print outputs lines to the terminal. there is one command line argument, which is the name of the category the script will retrieve the image files for. If you would just run the python script python get_commons.py Illustrations_of_Mushrooms you would get a line of urls printed to the terminal. The script was designed to be used in a pipeline with the wget program (which, if you’ve installed Homebrew, you can install by typing brew install wget). In the wget program, we specify -i - which means take input from the standard input.

by tellyou - February 3, 2012 6:57 PM
Reply

The script requests a url that returns list of images, and their properties, in a json format (just like the twitter api provides). The script contains a function that calls a url from the wikipedia api, then extracts the image locations available in the response. If there are more images available then can be displayed at once, wikipedia’s api also returns a new url to query that will provide the next set of images and so on. To handle this, the function that queries wikipedia, checks if such a ‘continue-url’ is present, and if so, calls itself with this url as an argument. The fact that a programming function can call itself is rather baffling at first. This is what programmers call recursion. Python is not optimized for recursion, but there are styles of programming that are built around this concept. Recursion is an appealing analytical trick. I imagine there is some kind of abstraction gland stimulated by paradoxes and self-referential logic tricks—a kind of weird sister to sexual excitement. Like the symbols preferred by medieaval alchemists. Jenseits likes circles, spirals and Oroburouses.

by habitus - February 3, 2012 7:01 PM
Reply

Name

Email address

URL

Text:

For formatting your post, you can use html tags like <p> and <br />. Whitespace for now is eaten up, like in HTML. we are sorry—we will have a more friendly commenting option in place shortly!

By submitting your comment, you agree to license it under the Creative Commons Attribution Share-Alike license, the same license used for the text of the blog.

Anti-Spam: It’s not Strasbourg but the other city where the members of the Euro parliament hang out:

Latest articles

We meet again: March 11, 2026 2:52 PM
Hybrid Publishing Back To The Future Publishing Theses at the KABK: May 10, 2018 12:07 PM
Finding Red Letterboxes In Belgium: May 10, 2018 12:05 PM
The Underwater Screen Or Lessons From Wordperfect: June 10, 2014 8:21 PM
Hacker Culture and the Fear of WYSIWYG: May 23, 2014 1:40 AM
Graphic Design Is A Nostalgic Field: February 19, 2014 8:02 PM
Who gets to write the web: the power struggles around the standards: November 20, 2013 10:08 PM
ufo2otf Makes OTF’s, Webfonts and CSS From UFO’s: November 5, 2013 9:57 PM
No-one Starts From Scratch: Type Design and the Logic of the Fork: October 9, 2013 10 PM
I Need My Generic Font Medicine: October 9, 2013 9:55 PM

Latest Comments

E.S.: Nog eens proberen read more
Sujon Ahmad: Keep it up read more
Carly Durocher: I really like the stylized penrose triangle would you mind … read more
michaël: The recent evolution of MediaWiki is worth mentioning. One of … read more
jaromil: Check the Linux Action Show on Tomb https://www.dyne.org/software/tomb there can … read more
Ian Hickson: The member list is the one on the charter (it's … read more
habitus: Dear Ian, thank you for taking the time to respond. … read more
Ian Hickson: The part about the WHATWG membership is wrong. The WHATWG … read more
Erik: Good points. Also: writing UI is hard. read more
Tolan Blundell: As someone who's written a moderately complex CMS from scratch … read more

i . like tight pants . net