Serving Python scripts with Apache mod_wsgi, part I

5.00 avg. rating (98% score) - 5 votes

I admit it, I’m a long-time PHP programmer. I got stuck with it for a long time with all my web-based stuff, simply because it is so easy to set up. Well, there is no set-up, it just works. You just add pages to your web server root and give a .php extension to them, and they get requests and give back responses. Nice and simple.

I still use PHP for quick and dirty things (like running shell commands through web interfaces – yes, really, really naughty…) For doing more complex work, I prefer Python. But I miss the PHP-like way of just adding pages or “web applications” with zero-setup. I will examine the possibilities in Python for this kind of behaviour in this article.

Please remember, there are many, some say even too many, Python web frameworks available already that handle all this stuff for you “automagically”. You are almost certainly better off with using one of them, if you want to get work done. But they all require some kind of setup work.

Then there’s the thing with all kinds of frameworks, that instead of you calling some library, the behaviour of which you understand, to do your work, a framework calls your code. That is all right when things work as expected. But whenever there are glitches, you need to start digging around the framework code to see exactly what’s wrong. And if it is a big framework, that could mean a lot of digging around. For that reason, if you’re a reasonably seasoned programmer, I think it might not be half bad an idea to create your own minimalistic framework, using existing, good quality libraries the behaviour of which you understand, and which you can easily poke in the Python shell if you think something’s not working like it should. Or just to see what’s available, and try out the available code.

Also, knowing how stuff works never hurts. This article is also about learning how the Apache http requests are dispatched to Python code through mod_wsgi. I insist on knowing how things work, so I’m doing it the hard way.

Apache configuration

Now let’s get on to the subject matter. First, we must install the mod_wsgi module for Apache. This is for Python 2.

There’s a module for Python 3 as well in the repos if you wish to use it. I haven’t tested this with Python 3 but it should work if all the modules are there.

The following Apache directives will make mod_wsgi handle all requests for files with a .py extension. You can save it for example into file /etc/apache2/mods-enabled/wsgi_script.conf:

Note that any compiled .pyc or .pyo files must be denied explicitly. You don’t want people downloading your compiled code.

One more thing to configure for Apache: the directory where your scripts are must have the ExecCGI option set (for example in /etc/apache2/sites-enabled/000-default):

This might be a security issue if you have other executable stuff exposed. Now, after reloading the Apache configuration, of course, we are set to add .py pages.

Pages in mod_wsgi and Python

One thing to note about mod_wsgi is that it will require you to set up a function (or any callable) which handles the requests. By default, it calls the function called “application“, with a couple of parameters. Here’s a simple example. Let’s put it in a new directory /var/www/python_app and call it hello.py:

Now, point your browser to the hello.py script (for example http://localhost/python_app/hello.py) and you should see a page.

As you can see, mod_wsgi passes two arguments to the function application. The first contains lots of environment variables from the http request, like parameters, headers, server and client information, etc.

The second argument is a function which should be used to start the response. It takes the http status code and the response headers as arguments. The start_response function must be called in order to produce something else than an internal server error to the client.

The last thing we did here, is construct a simple web page, and return it as the content.

The mod_wsgi environment

Now, let’s take a look at the environment argument. It is probably easiest to dump it to YAML, so let’s install the Python module:

Add the following application into for example /var/www/python_app/environ.py:

And go check what the page (http://localhost/python_app/environ.py) shows you. There should be a list of environment variables, both from Apache and mod_wsgi. Something like this:

A lot of info! Let’s go throught some of those. The definitive reference can be found in PEP 3333.

DOCUMENT_ROOT: /var/www

This is obviously the Apache document root.

PATH_INFO: ”

This is everything in the URL path part that comes after the script name. Here it is empty, because there’s nothing after our script name. But if you go to, say http://localhost/python_app/environ.py/here/is/a/path, PATH_INFO will contain the string ‘/here/is/a/path’.

QUERY_STRING: ”

These are the URL parameters, meaning the stuff that comes after the question mark sign in the URL. We didn’t give any parameters, so it is empty. Try this: http://localhost/python_app/environ.py?a=1&b=2. Now the QUERY_STRING should be ‘a=1&b=2’.

REQUEST_METHOD: GET

Well, this is the http method used to request the page. It could be GET, POST, PUT, DELETE, HEAD, or possibly something else.

REQUEST_URI: /python_app/environ.py

This is the complete URL requested from the server. For example for http://localhost/python_app/environ.py/here/is/a/path/?a=1&b=2 the REQUEST_URI will contain the string ‘/python_app/environ.py?/here/is/a/path/?a=1&b=2’.

SCRIPT_FILENAME: /var/www/python_app/environ.py

This is the full file system path of the script on the server. This is “me” from the application’s point of view, in the file system.

SCRIPT_NAME: /python_app/environ.py

This is path of the URL containing the script that was called. This is “me” from the application’s point of view, in the requested URL.

mod_wsgi.application_group: 127.0.1.1|/python_app/environ.py

mod_wsgi has a concept of “application groups”. This means, that you can configure a bunch of applications to run in the same context, or Python (sub-)interpreter. See the WSGIApplicationGroup configuration directive for more information. For the purposes of this article, we are using the default group given to us by mod_wsgi, which means we have a separate group for each of our scripts.

mod_wsgi.callable_object: application

This is the function (or any callable) mod_wsgi will try to call upon receiving a request. It is configurable through the WSGICallableObject configuration directive.

wsgi.errors: !!python/object:mod_wsgi.Log {}

An output stream (file-like object) to which error output can be written. Stuff written here should go to the error log.

wsgi.file_wrapper: !!python/name:None.file_wrapper ”

You can use this to output any file-like object as the response to the requestor. The second argument is a “block-size suggestion” according to the documentation. I don’t know what mod_wsgi does with it. Here’s how to use it to return a file as a response:

wsgi.input: !!python/object:mod_wsgi.Input {}

An input stream (file-like object) from which the HTTP request body bytes can be read. You can read the raw POST or PUT request content through this.

wsgi.multiprocess: true

This tells us whether we are running in multiprocessing mode or not. As we are, we must expect that multiple instances of the script are running in parallel as separate processes. As they are separate processes, we can use our own local variables without worry at this point. What we do have to worry about is using shared objects outside the process, like files.

wsgi.multithread: false

This tells us whether we are running in multithreaded process or not. If this would be true, we would have to be very careful and use locking or thread-local objects. I encourage you to take care of it anyway, if you expect someone will actually be using your application. It is very likely that that someone will enable multithreading and run into very strange, random errors that are hard to debug. But let’s not worry about it at this point, we’ll dive into this subject in a later article.

wsgi.run_once: false

If this were true, it would mean the script is loaded and executed from scratch for every request. As this is not true, mod_wsgi will actually only reload the script if it detects the file has changed. So you can do all kinds of initializations outside the application callable that stay permanently for the life time of the process. This usually means that the first request will take a bit more time, as that is when the loading and initialization is done, but all subsequent requests the same process handles are much quicker (just executing the application).

The mod_wsgi directive WSGIImportScript can be used to preload scripts before even the first request comes in.

wsgi.url_scheme: http

This would be “https” for SSL-encrypted connections.

Modules and packages

After coding for awhile, you want to start moving stuff to separate modules and package them up. Usually (like from the command line) you could just add module files in the same directory where you are running your Python program. But unfortunately this does not work with mod_wsgi so easily. Let me show you. Add a module /var/www/python_app/mymodule.py:

Then, add an import statement in hello.py:

If you try to get the page, you’ll end up with a 500 Internal Server Error and an error message in the Apache log file saying something like this:

So the script directory is not in the list of paths where Python is looking for modules and packages. We could do something like this:

Now that works, but it is ugly, and I don’t want to be adding boilerplate code like that to every script of mine. There’s a mod_wsgi directive called WSGIPythonPath which can be used to achieve the same goal. But it has another problem: you can’t include it in any .htaccess file, because it needs to be set outside the scope of any VirtualHost or Directory directive in the Apache configuration. But you can do it if you want. Just add a line like this to your Apache default site configuration (just add it out of the VirtualHost directive – perhaps as the first line in the file, for example):

And reload Apache. Now the import statement will work. The downside is, now all your scripts will use that path as an import dir, which may or may not be what you want. You could also do this:

Reload, and make your app directory a Python package:

Now you can import your module thus:

It works, but feels a bit odd as you’re actually inside the package doing the import. And web site directories are something that I would like to be able to rename without changing code (common modules should anyway be located somewhere else, like /usr/local/lib/python2.7/site-packages).

So, there are a couple of ways of doing the imports, neither of them perfect. The choice is yours. If you happen to know a nice, clean way of doing imports in a mod_wsgi script, please report to me! Thanks.

Hiding things we don’t want to serve

What’s left in this section is checking out what happens when the user tries to request our modules, compiled modules, and packages, which we don’t want to serve.

First the module: http://localhost/python_app/mymodule.py

That gives a 404 Not Found. Seems fine to me.

Then the compiled module: http://localhost/python_app/mymodule.pyc

That gives a 403 Forbidden. Remember, we explicitly forbade the users from fetching our .pyc or .pyo files in the Apache configuration (right at the beginning of this article).

What about the directory: http://localhost/python_app/

That gives a 403 Forbidden as well. You could add an “index.html” file there to get a default page, but we can do better.

Lastly, what about the package __init__.py file: http://localhost/python_app/__init__.py

That gives a 404 Not Found.

If this mix of 403’s and 404’s hurts your sense of harmony like it does mine, you can alter the /etc/apache2/mods-enabled/wsgi_script.conf to look like this:

And Apache reload. Now there’s a 404 Not Found for all the forbidden stuff. So now you can put your Python applications and helper modules all in the web site document root. If a .py file has the application callable, it is a web page or application, and will be served to the client. If it doesn’t have the application callable, it is a page not found.

Also, with the DirectoryIndex directive, you can add a default page for any directory by adding an index.py page (with the application callable).

Now almost everything seems perfect. Let’s move on to the last subject for tonight.

WebOb – WSGI request and response objects

Parsing the raw WSGI environment variables for each request and adding the response headers and status codes with robust exception handling is a tedious, error-prone and most of all boring job after awhile. Because I don’t want you to start hacking away implementing that next, I will just quickly introduce you to an excellent library for doing all the boring stuff, and a bit more. That library is called WebOb. So let’s just:

And then make a WebOb enabled app (let’s make it hello_webob.py):

It’s a bit longer than the previous hello.py, but it does a lot more. Try giving it parameters using the inputs, using URL vars, or even both. And it takes extra path arguments, everything at the same time.

This sunday night is getting late so that’s all for now. I will write a follow-up on this. I need to discuss WebOb some more, multiprocessing and multithreading issues, exception handling, and there’s lots of other stuff to talk about.

Have fun with trying it out! And report any security issues my setup might have, or any other insights.

Edit: Check out part II – mod_rewrite

Useful links:

4 thoughts on “Serving Python scripts with Apache mod_wsgi, part I”

  1. Superbly written , I appreciate the details and lucid manner of your writing .I was trying to configure python for a couple of days and It saved me hours of searching all over the internet.
    Keep up the good work
    cheers

  2. I would like to say thank you very much.
    This is all that i need.
    For along time to search on internet , i found your blog and this article (this post). It is very useful for me at this time.

Leave a Reply