Serving Python scripts with Apache mod_wsgi, part II – mod_rewrite

5.00 avg. rating (91% score) - 1 vote

In part I, we learned how to configure Apache to server any .py file as a web application using mod_wsgi. I promised to tell you more about WebOb and multiprocessing and multithreading, and exception handling. I’ll save those topics for later articles. Instead, in this part I will talk about using mod_rewrite – if, why and how to get rid of the .py extension. You will need the test apps from part I to try these out.

Removing the .py extension using mod_rewrite

Many people (including myself) will think that they want to get rid of the .py extension in the URLs. There are valid reasons for this:

  • The URLs would be more readable. As we all know it is much easier to remember and understand a URL like http://localhost/myapp/myfeature, than something like http://localhost/myapp.py?action=myfeature
  • The URLs would be more portable. You could, say, re-write your page with C or Haskell, or with whatever you desire, and the users would not notice a thing.

There are also valid reasong speaking against rewriting URLs with mod_rewrite:

  • It might be confusing for the developer, or the application integrator, or the webadmin taking care of the production server. If you have both a script called myapp.py and a directory myapp, which one is going to be called?
  • mod_rewrite allows some really nasty tricks in the wrong hands, so for security reasons, it is not a good idea to allow anybody on a shared server to just write their own rules without admin review.

Keeping these things in mind, I’ll try to show you how to use mod_rewrite to hide the .py extension. First, enable the rewrite module:


Then add this to Apache site configuration (or .htaccess in /var/www if you have AllowOverride +FileInfo +Options set for your dir):

Restart Apache. Now you can browse to http://localhost/python_app/hello or http://localhost/python_app/environ. In the latter, observe that the REQUEST_URI variable has no .py extension, while the SCRIPT_NAME still does:

Dissecting the rewrite rules

So what do all those Apache directives do? The first line, Options +FollowSymLinks, is required for rewritten URLs to work, otherwise Apache will deny the requests. The RewriteEngine on directive is needed in order to have any rewriting take place at all. The real magic happens in the last three lines. According to the documentation, the RewriteCond directive “Defines a condition under which rewriting will take place”, while the RewriteRule directive “Defines rules for the rewriting engine”. All the conditions preceding the rule must evaluate to true in order for the rule to be followed and the URL be rewritten.

This states, that the the requested filename (eg. “hello”) must not (!) be an existing directory (-d).

This states, that the requested file (eg. “hello”) with a .py extension added (“hello.py”) must be an existing file (-f).

If the above two conditions were met, the requested URL (eg. “/python_app/hello”), denoted here with the regular expression “^(.*)$”, should be rewritten with a .py extension (“/python_app/hello.py”). The $1 is a backreference to the regular expression. The final [L] says that processing should stop here. It has no meaning if you don’t have any other rules. But when you add more rules, mod_rewrite will continue evaluating them if you don’t tell it to stop here.

Observing URL dispatch after rewriting

Now, what happens if we create a directory called “hello” in the python_app directory? There’s already a “hello.py” file, which we should be able to request without .py extension due to our new rewrite rules. Let’s try what happens:

Now, http://localhost/python_app/hello goes to the directory. Was this expected? Well, yes it is. After all, the first rewrite condition explicitly stated that the requested file must not be an existing directory for any rewriting to take place.

If you want to have it the other way around, you need to take the first RewriteCond out and do a bit more configuring. If you look closely, Apache actually redirected you to another url with an additional slash at the end (http://localhost/python_app/hello/). The “culprit” for the extra slash is mod_dir and the DirectorySlash directive, which is on by default. Turning it off and taking out the first RewriteCond will make the url without the slash call the script, and with the slash call the directory. Please note that this might be a security risk, because requesting a directory name without a slash will by default list all files in that directory. So if you have a directory, but no script with the same name, your directory contents can be listed.

Anyway, I think the latter behaviour is harder to use and understand. And I think that it is a good idea to avoid having same names for directories and scripts. Have apache handle the URL dispatch up to your script, and you handle it from there on in your code.

Useful resources:

 

4 thoughts on “Serving Python scripts with Apache mod_wsgi, part II – mod_rewrite”

  1. good manual – but You can also accomplish (almost) the same with WSGIScriptAliasMatch :

    Example (put this in the directory section of the site configuration file) :

    WSGIScriptAliasMatch ^/python_app/([^/]+) /var/wsgiscripts/$1.py

    the advantage is that You can also supply additional path information, what You cant do with the method You introduced here.

    If You have the hello.py in /var/wsgiscripts/ You can call :
    http://localhost/python_app/hello/test?a=1 and supply “test” as additional path.

    WSGIScriptAliasMatch is equivalent to :

    SetHandler wsgi-script
    Options ExecCGI
    Rewrite …

    So if You use WSGIScriptAliasMatch You dont need the mod_rewrite, nor the additional configuration to set the wsgi-handler for *.py files, nor the configuration for the rewriting –
    resulting in a clean and more easy to understand configuration.

    for more information see : http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives

    Since I am new to WSGI and WebOb I wait desperately for Your posts about the WebOb multiprocessing, multithreading and exception handling issues …

    yours sincerely

    Ing. Robert Nowotny

    1. Thanks for the feedback, that really seems like a better way to set it up. I need to test it myself. I’ll get back to mod_wsgi in a couple of weeks after I finish my current project.

  2. Dear Sir,
    I have a python script which contains some function which can be executed by root only.may you please explain how can i execute such python script from using mod_wsgi or mod_python.

    1. That’s a bit tricky because Apache is probably not running as root. What you could do is make your function an external script file and then use the “sudo” facility to give the apache server the permission to run it as root.

      So perhaps put your root-only stuff in an external script called /usr/local/bin/root_only_script. Add execute permission.

      Then edit sudoers file using the command “visudo”. Put a line looking something like this there:

      /usr/local/bin/root_only_script = NOPASSWD: www-data

      “www-data” is the Apache user on Ubuntu, check out what yours is. Then call the external command from mod_python using sudo, something like this:

      import subprocess
      subprocess.call([“/usr/bin/sudo”, “/usr/local/bin/root_only_script”])

      Please note that I didn’t test any of this so it might not work just like that. But I think that’s how it should work in principle.

      And please be careful with root privilege scripts. Sanitize all your input and make sure the script only does the one single thing that needs root privileges. Everything else in scripts with regular user privs.

Leave a Reply to kortsi Cancel reply