Content and Language Arbitration

by Edward Chow



To support different languages, a web server can provide web pages written in different languages with additional extension identified the language version, e.g. index.html.en for english version. The webclient can specify the language preference through the HTTP protocol request meta header
Accept-Language: it,en-us
Most browsers provide a menu for the user to select the language preference.
Apache provides directives allow the webmaster to specify the list of language version, their file suffix, the default language version that will be provided.

 Site.multiview


AddLanguage it .it
AddLanguage en .en
AddLanguage ko .ko
LanguagePriority it en
#LanguagePriority en it ko

<Directory /mpc/home/guest/sites/site.multiview/htdocs>
Options Multiviews
</Directory>

Note that here if we did not specify ko  in the list of LanguagePriority, ko will be the first choice.  The apache web server basically follows the order of AddLanguage directives with the latest to have highest priority.

We also found that Apache_1.3.12 requires very specific MIME-language parameter as the first parameter for the AddLanguage.  For example, the above AddLanguage en .en will not match with the meta header, Accept-Language: en-us, in the http request.  We need to change it to

AddLanguage en-us .en
LanguagePriority it en-us
#LanguagePriority en-us it ko

or we add server both cases by adding en and en-us

AddLanguage en-us .en
AddLanguage en .en
LanguagePriority it en-us en
#LanguagePriority en-us en it ko
 

To change the language preference in the browser,
in netscape, select the language menu in the Edit/preference menu.

Make sure you use the shift-reload to avoid caching.
Here is what the http request web server received from a proxy server (on frodo) on behalf of a netscape browser (on wallop):

GET / HTTP/1.0
Host: bilbo:8888
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Charset: iso-8859-1,*,utf-8
Accept-Encoding: gzip
Accept-Language: it,en-us
User-Agent: Mozilla/4.5 [en] (Win98; I)
 

In MS IE, select tools | Internet options... | Languages

then you can add or adjust the language preference.
If we specifies us english, italy, and chinese (Taiwan) in that order, then here is the http request will be issued by the  IE browser:

bash$ /mpc/home/chow/src/ws 8088
argv[1]=8088
portno=8088
before getsockname socket has port #8088
socket has port #8088
rcvd msg-->GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-comet, */*
Accept-Language: en-us,it;q=0.7,zh-tw;q=0.3
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: bilbo:8088
Connection: Keep-Alive

Here we have it;q=0.7 and zh-tw;q=0.3.   The italy is associated with a "quality" value of 0.7.  en-us has a default quality value of 1.  These quality values are computed by the browser according the priority order.
HTTP1.1 specifies the quality value as follows:

   Accept-Language = "Accept-Language" ":"
                         1#( language-range [ ";" "q" "=" qvalue ] )
       language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1". For example,

   Accept-Language: da, en-gb;q=0.8, en;q=0.7

would mean: "I prefer Danish, but will accept British English and other types of English."
 

Note that there is en for English and en.us for English [United States]. On the Apache web server,  if we would like to deal with  both types of  language preference request submission , two AddLanguage directives need to be added:

AddLanguage en-us .en
AddLanguage en .en
LanguagePriority it en-us en
#LanguagePriority en-us en it ko


Typemap is a more general way to do mapping of content and language.
Instead of hardcoded with AddLanguage  and LanguagePriority.  We can use a file, called typemap file, to specify the language or content preferences.  We can change this type mapping without having to restart the web server.
For reference, see http://frodo.uccs.edu/manual/content-negotiation.html
Here are the directives added:

AddType application/x-type-map var
AddHandler type-map var

DirectoryIndex index.var

Here is typemap file index.var:

URI: index;  vary="language"
URI: index.en.html
Content-type: text/html
Content-language: en

URI: index.en-us.html
Content-type: text/html
Content-language: en-us

# Seems we _must_ have the Content-type or it doesn't work...
URI: index.it.html
Content-type: text/html
Content-language: it

Type map files have an entry for each available variant; these entries consist of contiguous HTTP-format header lines. Entries for different variants are separated by blank lines. Blank lines are illegal within an entry. It is conventional to begin a map file with an entry for the combined entity as a whole (although this is not required, and if present will be ignored).

If we comment out the first entry (en), the web server will serve index.en-us.html for en-GB, even though the language is en-us in index.var.     If the browser did not specify language preference,  then the last entry in the index.var seems to have higher priority.

An example map file is:

  URI: foo

  URI: foo.en.html
  Content-type: text/html
  Content-language: en

  URI: foo.fr.de.html
  Content-type: text/html;charset=iso-8859-2
  Content-language: fr, de

If the variants have different source qualities, that may be indicated by the "qs" parameter to the media type, as in this picture (available as jpeg, gif, or ASCII-art):

  URI: foo

  URI: foo.jpeg
  Content-type: image/jpeg; qs=0.8

  URI: foo.gif
  Content-type: image/gif; qs=0.5

  URI: foo.txt
  Content-type: text/plain; qs=0.01

qs values can vary in the range 0.000 to 1.000. Note that any variant with a qs value of 0.000 will never be chosen. Variants with no 'qs' parameter value are given a qs factor of 1.0. The qs parameter indicates the relative 'quality' of this variant compared to the other available variants, independent of the client's capabilities. For example, a jpeg file is usually of higher source quality than an ascii file if it is attempting to represent a photograph. However, if the resource being represented is an original ascii art, then an ascii representation would have a higher source quality than a jpeg representation. A qs value is therefore specific to a given variant depending on the nature of the resource it represents.

The full list of headers recognized is:

URI:

uri of the file containing the variant (of the given media type, encoded with the given content encoding). These are interpreted as URLs relative to the map file; they must be on the same server (!), and they must refer to files to which the client would be granted access if they were to be requested directly.
Content-Type:
media type --- charset, level and "qs" parameters may be given. These are often referred to as MIME types; typical media types are image/gif, text/plain, or text/html; level=3.
Content-Language:
The languages of the variant, specified as an Internet standard language tag from RFC 1766 (e.g., en for English, kr for Korean, etc.).
Content-Encoding:
If the file is compressed, or otherwise encoded, rather than containing the actual raw data, this says how that was done. Apache only recognizes encodings that are defined by an AddEncoding directive. This normally includes the encodings x-compress for compress'd files, and x-gzip for gzip'd files. The x-prefix is ignored for encoding comparisons.
Content-Length:
The size of the file. Specifying content lengths in the type-map allows the server to compare file sizes without checking the actual files.
Description:
A human-readable textual description of the variant. If Apache cannot find any appropriate variant to return, it will return an error response which lists all available variants instead. Such a variant list will include the human-readable variant descriptions.

Indexing

 Site.fancyindex

<Directory /mpc/home/guest/sites/site.fancyindex/htdocs>
FancyIndexing on
#AddDescription "One of our wonderful catalogs" catalog_summer.html catalog_autu
mn.html
AddDescription "One of our wonderful catalogs" catalog*.html
#IndexIgnore *.jpg
IndexIgnore  ..
IndexIgnore  icons HEADER README
AddIconByType (PL_,/icons/sphere1.gif) application/*
AddIconByType (JPEG,/icons/image1.gif) image/jpeg
AddIconByType (GIF,/icons/image3.gif) image/gif
AddIconByType (CAT,/icons/bomb.gif) text/*
DefaultIcon /icons/burst.gif
AddIcon /icons/blank.gif ^^BLANKICON^^
AddIcon (DIR,/icons/dir.gif) ^^DIRECTORY^^
HeaderName HEADER
ReadMeName README
</Directory>

Here make sure you have / before icons in the above httpd.conf file.
It refers to the icons directory in htdocs DocumentRoot directory.
The original icons directory only comes with 3 gif files.  I copied the icons directory from the apache_1.3.12 distribution to /mpc/home/guest/sites/site.fancyindex/htdocs

For the apache and browsers to recognize the .pl files, you need to add pl file extension to the application/x-httpd-cgi  line in mime.types file.

application/x-httpd-cgi cgi pl

The PL_, IMG, and CAT will be displayed on those browsers that do not have
graphic capability (such as lynx).  On our Linux PCs,  you can type
lynx bilbo:<portno>
to access the web site you set up on bilbo in text mode.


Redirection

Alias  can be used to rationalize directories spread around the file systems.
For example, we have a web page directory at /mpc/home/cs401/public_html and we can map it to /cs401 by using

Alias /cs401 /mpc/home/cs401/public_html

We then can access the directory using http://bilbo.uccs.edu:<portno>/cs401/

Site.alias
User webuser
Group webgroup
ServerName bilbo.uccs.edu
Port 8088

ServerAdmin sales@butterthlies.com
DocumentRoot /mpc/home/guest/sites/site.alias/htdocs/customers
ErrorLog /mpc/home/guest/sites/site.alias/logs/customers/error_log
TransferLog /mpc/home/guest/sites/site.alias/logs/customers/access_log
Alias /cs401 /mpc/home/cs401/public_html
ScriptAlias /cgi-bin /mpc/home/guest/sites/cgi-bin

<VirtualHost sales-bilbo.uccs.edu>
ServerAdmin sales_mgr@butterthlies.com
Redirect  /secrets  http://frodo.uccs.edu

DocumentRoot /mpc/home/guest/sites/site.alias/htdocs/salesmen
ServerName bilbo.uccs.edu
Port 8088
ErrorLog /mpc/home/guest/sites/site.alias/logs/salesmen/error_log
TransferLog /mpc/home/guest/sites/site.alias/logs/salesmen/access_log
</VirtualHost>

For redirect the browser will be informed (through metaheader Location).
Here is the result of accessing /secrets on sales-bilbo.uccs.edu

wetterhorn>  telnet  bilbo.uccs.edu  8088
Connected to bilbo.uccs.edu.
Escape character is '^]'.
GET  /secrets   HTTP/1.1
Host: sales-bilbo.uccs.edu

HTTP/1.1 302 Found
Date: Wed, 04 Oct 2000 06:30:46 GMT
Server: Apache/1.3.12 (Unix)
Location: http://frodo.uccs.edu/
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://frodo.uccs.edu/">here</A>.<P>
</BODY></HTML>
 

Rewrite: A more powerful and flexible directive

Here is an example for using part of the path info in url as parameters for a cgi script and for mapping an url  with ~<login>to user's public_html directory.  We use a special version httpd where the rewrite module is compiled as DSO, Dynamic Shared Object, module.  It will be loaded in with "LoadModule rewrite_module     /path/to/apache/libexec/mod_rewrite.so" directive.
The shared module is in /path/to/apache/libexe.
The httpd is in /path/to/apache/bin/httpd therefore we modify the go script in /mpc/home/guest/sites/site.rewrite accordingly.

For reference of Rewrite rule,  see the description in
http://frodo.uccs.edu/manual/mod/mod_rewrite.html#RewriteRule

RewriteRule

     Syntax: RewriteRule Pattern Substitution [Flags]
 
 

Site.rewrite

LoadModule rewrite_module     /path/to/apache/libexec/mod_rewrite.so
 

RewriteEngine on
RewriteLog logs/rewrite
RewriteLogLevel 9
RewriteRule ^/info/([^/]+)/([^/]+)$   /cgi-bin/cardinfo?$2+$1 [PT]
RewriteRule ^/~([^/]+)/(.*)$   /home/$1/public_html/$2
 

Here [PT] is the passthrough flag.  It allow other uri to filename mapping directives such as ScriptAlias, Alias, or Redirect to take effect after the current rewrite.  Without it, it will not be interpreted as request for the cgi script execution.
[R] is the Redirect flag.

For the above set up, if you enter "http://sales-bilbo.uccs.edu:8088/info/t1/p1"
the web server will return "You make the query p1 on the card t1 "
Note that you can not enter "http://sales-bilbo.uccs.edu:8088/info/t1/p1/"
the web server will return file not found, since according to syntax of the regular expression, the last parameter can not be terminated with /!!