Monday, May 07, 2007

APACHE Rewrite Module mod_rewrite for URL

Looking around the web, you’ve run across plenty of URLs that look like:


Server side scripts generate the content of those pages. The content of a particular page is uniquely determined by the URL, just as if you requested a page with the URL /content/2000-02-01.html or /article/46.1.html. These pages are different than server-generated pages created in response to a form like a shopping cart, or enrollment. However, search engines will not index these content pages, because search engines ignore pages generated by CGI scripts as potential blind alleys.

A search engine would follow a URL like


so some way of mapping a URL like /content/2000/02/21 to the script /content.cgi?date=2000-02-21 would be useful. Not only will search engines follow such a link, but the URL itself is easy to remember. A frequent visitor to the site would know how to reach the page for any day the site published content. When I changed the interface for viewing entries by topic in my WebLog from /meta.php3?meta=XML to /meta/XML, search engines such as Google started indexing, and I’m getting more visits referred by search engines.

The trick is to tell the outside world that your interface is one thing: /content/YYYY/MM/DD, but when you fetch the page, you’re accessing /content.cgi?date=YYYY-MM-DD. Web servers such as Apache and content management systems such as Userland’s Manila and the open source Zope support this abstraction.

The abstraction is also useful because a site’s infrastructure is rarely stable over time. When engineering replaces the Perl CGI scripts with Java Server Pages, and the URLs become /content.jsp?date=YYYY-MM-DD, your users’ bookmarked URLs break. When you use an abstraction, your users bookmark /content/YYYY/MM/DD, and when you change your back end, you update /content/YYYY/MM/DD to point at /content.jsp?date=YYYY-MM-DD without breaking bookmarks.

If you’re not publishing content dynamically, and have URIs like:


you don’t have the problem with indexing that the dynamic content has. However, you still may want to adopt this type of URI for consistency with other sites. Remember people coming to your site want to use an interface they are familiar with, and URIs are part of your interface.

Rewriting the URL in Apache

The Apache Web server is ubiquitous on both Unix and NT, and it has an optional component, mod_rewrite, that will rewrite URLs for you. It isn’t part of the standard install. Pair Networks, Dreamhost, and Hurricane Electric, have it enabled on their servers. If you are running your own server, check with your systems administrator to see if it’s installed, or have her install it for you.

The mod_rewrite module works by examining each requested URL. If the requested URL matches one of the URL rewriting rules, that rule is triggered, and the request is handled by the rewritten URL.

If you’re not familiar with Apache, you’ll want to read up on the way its configuration files work. The best place to run mod_rewrite from is your server’s httpd.conf file, but you can call it from the per directory .htaccess file as well. If you don’t have control of your server’s configuration files, you’ll need to use .htaccess, but understand there’s a performance hit because Apache has to read .htaccess every time a URL is requested.

The Goal

The goal is to create a mod_rewrite ruleset that will turn code such as that shown below:


into a parameterized version such as is shown next, or into something similar, as long as it’s the right URI for your script.


The Plan

We start with the URI /content/YYYY/MM/DD and want to get to /content.cgi?date=YYYY-MM-DD. So we need to do a few things:

  1. Recognize the URI
  2. Extract /YYYY/MM/DD and turn it into YYYY-MM-DD
  3. Write the final form of the URI /archives.cgi?date=YYYY-MM-DD

Regular Expressions and RewriteRule

This transform will require two of the directives from mod_rewrite: RewriteEngine and RewriteRule. RewriteEngine’s a directive which flips the rewrite switch on and off. It’s there to save administrators typing when they want or need to disable rewriting URLs. RewriteRule uses a regular-expression parser that compares the URL or URI to a rule and fires if it matches.

If we’re setting the rule from the directory it fires using the .htaccess file, then we need the following:

RewriteEngine On
RewriteRule ^archives/([0-9]+)/([0-9]+)/([0-9]+)»

What that rule did was first match on the string ‘archives’ followed by any three groups of one or more digits (the [0-9]+) separated by ‘/’s, and rewrote it as archives.cgi?date=YYYY-MM-DD. The parser keeps a back reference for each match string in parentheses, and we can substitute those back in using $1, $2, $3, etc.

If your page has relative links, the links will resolve as relative to /archives/YYYY/MM/DD, not /archives. That means your relative links will break. You should use the base element in the head of the page to reanchor the page.

RewriteRule for Static Content

If you have a series of static HTML files at your document root:


...and want your readers to access them with URLs like /archives/1999/12/31, then you would need a rewrite rule at the document root, such as:

RewriteRule ^archives/([0-9]+)/([0-9]+)/»
([0-9]+)$ /news-$1-$2-$3.html
RewriteRule ^archives$ /index.html

If the news-YYYY-MM-DD.html files are in a folder called /archives, the rewrite rule should be:

RewriteRule ^/archives/([0-9]+)/»
([0-9]+)/([0-9]+)$ /archives/»

If you want to use an .htaccess file at the archive folder level, then the rule becomes:

RewriteRule ^([0-9]+)/([0-9]+)/»
([0-9]+)$ news-$1-$2-$3.html

Also, you may delete the second rewrite rule since you can use a DirectoryIndex rule instead.

DirectoryIndex index.html

Corner Cases

What if someone enters instead of The rule is that mod_rewrite steps through each rewrite rule in turn until one matches or no rules are left. We can add another rule to handle that case.

RewriteEngine On
RewriteRule ^archives/([0-9]+)/([0-9]+)/([0-9])+»
RewriteRule ^archives$ index.html

In this case, redirect to an index page. But you could redirect to a page that generates a search interface.

What If My Server’s not Apache?

Unfortunately IIS does not come with a rewrite mechanism. You can write an ISAPI filter to do this for you.

If you are running the Manila content management system that comes with Userland’s Frontier, the options allow you to map a particular story in the system to a simple URL.

The Zope publishing system also supports mapping of paths into arguments for server scripts.


Good URLs are part of interface design. Jakob Nielsen discusses this in his Alertbox column:

This article was inspired in part by Tim Berners-Lee’s observation that good URLs don’t change:

Rafe Engelschall has many examples of mod_rewrite in ‘cookbook’ form at his site:


cutepig said...

Do you know the Atlantica online Gold, in the game you need the Atlantica Gold. It can help you increase your level. My friends always asked me how to buy Atlantica online Gold, I do not know he spend how much money to buy the Atlantica online money, when I see him in order to play the game and search which the place can buy the cheap Atlantica online Gold . I am also happy with him.

Peejay Li said...

Thank you for sharing such a nice article.
chaussures puma
puma speed cat
Nike Tn Chaussures
requin tn
nike shox
puma shoes
puma CAT
puma basket
puma speed
baskets puma
puma sport
puma femmes
puma shox r4 torch
nike air max requin
nike shox r3
shox rival r3
tn plus
chaussures shox
nike shox r4 torch
air max tn requin
nike tn femme
pas cher nike
tn chaussures
nike rift
nike shox nz
chaussures shox
nike shox rival
shox rival
chaussures requin
jeans online
cheap armani jeans
cheap G-star jeans

lucyliu said...

nike air max 90
nike air max 95
nike air max tn
nike air rift
nike shox r4
nike air max 360
nike shox nz
puma mens shoes
puma shoes
puma speed
nike shoes
nike air
nike air shoes
puma cat
air max trainers
mens nike air max
nike shoes air max
nike shoes shox
air shoes
nike shoe cart
puma future
cheap puma
sports shoes
nike air rifts
nike air rift trainer
nike air
nike rift
nike rift shoes
cheap nike air rifts
bape shoes
jeans shop
diesel jeans
levis jeans

Sneakers hobbies said...

nice post!!
the spyder jackets is the most professional ski clothing,if you like ski and want equipmented yourself with spyder jacket but don't cost too much ,welcome visit our online outlet,we're not only supply high quality but also cheap spyder jackets!!

Dean said...

greetings to all.
I would first like to thank the writers of this blog by sharing information, a few years ago I read a book called costa rica investment in this book deal with questions like this one.

niz said...

Hello .. firstly I would like to send greetings to all readers. After this, I recognize the content so interesting about this article. For me personally I liked all the information. I would like to know of cases like this more often. In my personal experience I might mention a book called Generic Viagra in this book that I mentioned have very interesting topics, and also you have much to do with the main theme of this article.