Thursday, March 29, 2007

Why HTML renders differently in different browsers

The fundamental reason why your site may look slightly different in various browsers.

Margins and Padding

One of the main causes for the many positional differences between layouts in various browsers is due to the default stylesheet each browser applies to give styling to certain elements. This usually involves setting default margins and padding to some elements to make them behave in a certain way.

For instance, paragraph (p) tags will have a margin applied to them so that each paragraph is separated by vertical white space and do not run into each other. The same applies to many other tags including heading tags (h1 etc). The problem occurs because the amount of margin (or padding) applied to these elements is not consistent across browsers. On many occasions Mozilla/Firefox will add a top margin to the element as well as a bottom margin. IE will however only add a bottom margin. If you were then to view these two browsers side by side you would see that the alignment would be different due to the top margin applied by Mozilla which could make your design not line up as expected.

In some designs this may not be a problem but in cases where position is important, such as aligning with other elements on the page, then the design may look bad or at least not as expected.

Here are some styles taken from the default Firefox 2.0 stylesheet (html.css) and immediately shows what is going on here:

CSS:
  1. body {
  2. display: block;
  3. margin: 8px;
  4. }

  5. p, dl {
  6. display: block;
  7. margin: 1em 0;
  8. }
  9. h1 {
  10. display: block;
  11. font-size: 2em;
  12. font-weight: bold;
  13. margin: .67em 0;
  14. }

  15. h2 {
  16. display: block;
  17. font-size: 1.5em;
  18. font-weight: bold;
  19. margin: .83em 0;
  20. }

  21. h3 {
  22. display: block;
  23. font-size: 1.17em;
  24. font-weight: bold;
  25. margin: 1em 0;
  26. }

  27. h4 {
  28. display: block;
  29. font-weight: bold;
  30. margin: 1.33em 0;
  31. }

  32. h5 {
  33. display: block;
  34. font-size: 0.83em;
  35. font-weight: bold;
  36. margin: 1.67em 0;
  37. }

  38. h6 {
  39. display: block;
  40. font-size: 0.67em;
  41. font-weight: bold;
  42. margin: 2.33em 0;
  43. }

As you can clearly see there are various properties that have been set but the most important are the margins and padding as they vary considerably. If you were to look at the default IE stylesheet you would find that there would indeed be few styles that were the same as the above.

What Can Be Done

Since we can never be sure whether the browser's stylesheet has applied margin or padding to an element the only real option is to explicitly set the margins and padding ourselves. This way we can over-ride the default stylesheet so that we know exactly how each element will behave in each browser.

As we don't really know what elements have default styling applied to them (across all browsers) we must set the margin and padding for every element we use. In most cases we are just talking about block level elements -- you do not need to do this for inline elements such as em, strong, a, etc which seldom have any margin or padding applied to them. Although em and strong will have some styling already applied to them to give them their strong and emphasized look.

Here is how you can reset the padding and margin of block elements when you use them:

CSS:
  1. html,body{margin:0;padding:0}
  2. p {margin:0 0 1em 0;padding:0}
  3. h1{margin:0 0 .7em 0;padding:0}
  4. form {margin:0;padding:0}

Take the body element for example, and notice that we have included the html element also, and then we have re-set padding and margins to zero. As explained above, various browsers will apply different amounts of margin to the body to give the default gap around the page. It is important to note that Opera does not use margins for the default body spacing but uses padding instead. Therefore we must always reset padding and margins to be 100% sure we are starting on an even footing.

If you did not reset the margins or padding and you simply defined something like this:

CSS:
  1. body{margin:1em}

Then in Opera you would now have the default padding on the body plus the extra margin you just defined there by doubling the initial space around the body in error.

Also be wary of doing things like this:

CSS:
  1. html,body {margin:0;padding:1em}

You have now defined 1em padding on the html element and 1em padding on the body element giving you 2em padding overall which probably was not what you intended.

Global White Space Reset

These days it is common to use the global reset technique which uses the universal selector (*) to reset all the padding and margins to zero in one fell swoop and save a lot of messing around with individual styles.

e.g.

CSS:
  1. * {margin:0;padding:0}

The universal selector (the asterisk *) matches any element at all and to turn all elements blue we could do something like this:

CSS:
  1. * {color:blue}

(Of course they would only be blue as long as they have not been over-ridden by more specific styles later on in the stylesheet.)

The global reset is a neat little trick that saves you having to remember to reset every element you use and you can be sure that all browsers are now starting on even footing.

Lists need a special mention here as it is not often understood that the default space or the bullet in lists is simply provided via the default stylesheet in the provision of some left margin. Usually about 16px left margin is added by default to the UL to allow the bullet image to show; otherwise there is nowhere for it to go. As with the problems already mentioned we also need to cater for some browsers that don't use left margin but use left padding instead.

This can be quite a big issue if, for instance, you have not reset the default padding and margin to zero and try something like this.

CSS:
  1. ul {padding:1em}

In browsers that have a default margin applied you will now get the default left margin of 16px (approx) and a default padding of 1em, giving you approximately twice the amount of space on the left side of the list. This would, of course, make the design look quite different in the various browsers and not something you would wish to do.

In essence the margin should have been reset to zero, either initially with the global reset, or by simply doing the following:

CSS:
  1. ul {margin:0;padding:1em}

Now all browsers will display the same, but you will need to ensure that the 1em is still enough room for the bullet to show. I usually allow 16px left margin (or padding) as a rough guide and that seems to work well. (You can use either padding or margin for the default bullet space.)

Drawbacks

However, as with all things that make life easier there is a price to be paid.

First of all, certain form elements are affected by this global reset and do not behave as per their normal defaults. The input button in Mozilla will lose its "depressed when clicked effect" and will not show any action when clicked other than submitting the form, of course. IE and Opera do not suffer from this problem and it is not really a major issue but any loss of visual clues can be a detriment to accessibility.

You may think that you can simply re-instate the margin and padding to regain the depressed effect in Mozilla but alas this is not so. Once you have removed the padding then that changes the elements behavior and it cannot be restored even by adding more padding.

There is also an issue with select/option drop down lists in Mozilla and Opera. You will find that using the global reset will take away the right padding/margin on the drop down list items and that they will be flush against the drop down arrow and look a little squashed. Again, we have problems in re-instating this padding/margin in a cross browser way.

You can't add padding to the select element because Mozilla will add the padding all around which includes the little drop down arrow that now suddenly becomes detached from its position and has a big white gap around it. You can, however, add padding right to the option element instead to give you some space and this looks fine in Mozilla but unfortunately doesn't work in Opera. Opera in fact needs the padding on the select element which as we already found out messes up Mozilla.

Here is an image showing the problems in Firefox and Opera:

Select element in Firefox and Opera

There is no easy fix -- it's something you have to live with if you use the global reset method.

If you do not have any forms in your site (unlikely) then you don't have to worry about these issues or you can simply choose to ignore them if you think your forms are still accessible and don't look too bad. This will vary depending on the complexity of your form design and is something you will need to design for yourself. If you are careful with the amount of padding you add then you can get away with a passable design that doesn't look too bad cross-browser.

Another perceived drawback, of which there has been a lot of discussion, is whether the global reset method could have speed implications on the browsers rendering of the page. As the universal selector applies to every single element on the page, including elements that don't really need it, it has been put forward that this could slow the browser down in cases where the html is very long and there are many nodes for the parser to travel.

While I agree with this logic and accept that this may be true I have yet to encounter an occasion where this has been an issue. Even if it were an issue I doubt very much that in the normal scheme of things it would even be noticeable but of course is still something to be aware of and to look out for.

The final drawback of using the global reset method is that it is like taking a hammer to your layout when a screwdriver would have been better. As I have noted above there is no need to reset things like em, b , i, a, strong etc anyway and perhaps it's just as easy to set the margins and padding as you go.

As an example of what I mean take this code.

CSS:
  1. * {margin:0;padding:0}
  2. p,ol,ul,h1,h2,h3,h4,h5,h5,h6 {margin:0 0 1em 0}

I have negated the padding on all elements and then given a few defaults for the most popular elements that I am going to use. However, when coding the page, I get to the content section and decide I need some different margins so I define the following:

CSS:
  1. #content p {margin-top:.5em}

So now I have a situation where I have addressed that element three times already. If I hadn't used the global reset or the default common styling as shown above then I could simply have said:

#content p {margin:.5em 0 1em 0;padding:0}

This way I have addressed the element only once and avoided all issues related to the global reset method. It is likely that you will apply alternate styling to all the elements that you use on the page anyway and therefore you simply need to remember to define the padding and margin as you go.

CSS:
  1. form{width:300px;margin:0;padding;0}
  2. h1{color:red;background:white;margin:1em; padding:2px;}

Conclusion

The safest method is simply to define the margins and padding as you go because nine times out of ten you will be changing something on these elements and more than likely, it will involve the padding and margins. This saves duplication and also solves all the issue that the global reset may have.

The global reset is useful for beginners who don't understand that they need to control everything or who simply forget that elements like forms have a great big margin in IE but none in other browsers.

In the end it's a matter of choice and of consistency. Whatever method you use make sure you are consistent and logical and you won't go wrong. It is up to the designer to take control of the page and explicitly control every element that is used. Do not let the browser's defaults get in your way and be aware that elements can have different amounts of padding and margin as determined by the browser's own default stylesheet. It is your job to control this explicitly.

Top 20 Websites in US

Web metrics firm Compete has an interesting post, outlining the top 20 websites (for US traffic). According to Compete, all 20 of them got over 20 million unique visitors in October 2006. Here is the chart:



A couple of people noted in the comments that if you add Microsoft's 4 top 20 properties together (msn.com, live.com,microsoft.com and passport.net), then they would probably be number 1. However a counter to that is that a lot of passport.net domains currently re-direct to live.com. I think there may be some crossover between live.com and MSN too. So it may well be that Yahoo remains number 1, even accounting for Microsoft's multiple brands. Plus of course Yahoo and Google both have separately branded properties too - e.g. Flickr, YouTube. If I was to estimate, I'd put Microsoft at number 2 overall - but interested to hear what others think.


Compete notes that Adobe.com, Live.com, Wikipedia.org and YouTube.com are new to the top 20 over the past year,while Expedia.com, Monster.com, Paypal.com and Weather.com have all dropped out.


Looking at the Alexa data for US traffic, the top 20 is quite different:


1. yahoo.com
2. google.com
3. myspace.com
4. msn.com
5. ebay.com
6. amazon.com
7. youtube.com
8. craigslist.org
9. wikipedia.org
10. cnn.com
11. facebook.com
12. go.com
13. live.com
14. blogger.com
15. aol.com
16. microsoft.com
17. comcast.net
18. imdb.com
19. digg.com
20. flickr.com

The Alexa list is (I think) only counting US traffic, but it is quite different from Compete's stats. The presence of non-mainstream web 2.0 sites in Alexa's top 20 (blogger.com, digg, flickr) suggests that the traffic is heavily skewed towards technical users - which makes sense, given Alexa relies on toolbar downloads to get their stats.


Also interesting to note there is just one Microsoft property in the top 10 in Alexa, compared to 3 for Compete.

The Top 100 Alternative Search Engine List

AllTha.at www.allth.at The search engine that keeps on looking.
Ask Mobile www.m.ask.com Mobile search engine from Ask.com
ASK VOX www.askvox.com A second talking female user interface.
AnswerBus www.answerbus.com Ask in English, French, Spanish, German or Italian.
Blabline www.blabline.com Podcast / videocast search engine
blinkx www.blinkx.com Video Search
boing www.boing.mobi Search the Mobile web
bookmach.com www.bookmach.com Searches for posts related to your keywords.
ChaCha www.chacha.com Human Guides are available to aid in your search.
ClipBlast! www.clipblast.com Video Search
Clusty www.clusty.com Clustering search engine
collarity www.collarity.com Behavioral personalized search / Collarity Compass
CONGOO www.congoo.com Searches for Premium Content
crossEngine www.crossengine.com Searches Search Engines; formerly mrSAPO
d e c i p h o www.decipho.com Behavioral personalized search / Social Meter
Ditto www.ditto.com Visual search engine
Dogpile www.dogpile.com MetaSearch Engine
dumbfind www.dumbfind.com Featuring the Two-Box search method.
exalead www.exalead.com/search Web / Image search with a European flavor
factbites www.factbites.com Search Result snippets are complete sentences.
fazzle www.fazzle.com Search engine that emphasizes Boolean Search
filangy www.filangy.com Personalized Search Engine
FIND FORWARD www.findforward.com Multi-featured search engine; check this one out!
FindSounds www.findsounds.com Search for sound effects and musical samples.
FyberSearch www.fybersearch.com Parent site for some interesting new search engines.
GIGABLAST www.gigablast.com A multi-featured search engine.
girafa www.girafa.com Visual search engine - results are thumbnails
gnod www.gnod.net Oustanding recommendation search engines
gnosh www.gnosh.org Metasearch engine
GoLexa www.golexa.com "COMPLETE page analysis for each result."
goshme Beta 3.0 www.goshme.com A search engine for search engines. Top 10 pick.
GoYams www.goyams.com Metasearch engine where you select the mix.
grokker www.grokker.com A multi-featured meta-search engine.
GRUUVE www.gruuve.com Groovy music recommendation search engine.
hakia www.hakia.com "Meaning based" search engine
ICEROCKET www.icerocket.com Blog search engine
ixquick www.ixquick.com Metasearch engine
KartOO www.kartoo.com Visually appealling clustering search engine
Lexxe www.lexxe.com Natural language processing (NLP) search engine
like www.like.com Visual shopping engine; see also riya
liveplasma www.liveplasma.com Attractive music / movies clustering / recommendation engine
Local.com www.local.com Search for local businesses, products, and services
lurpo www.lurpo.com Searches for custom Google search engines
mamma www.mamma.com metasearch engine
MetaGlossary www.metaglossary.com Searches for definitions, phrases and acronyms.
mnemomap www.mnemo.org Clustering search engine
Mojeek www.mojeek.com Customize your own personal search engine.
Mooter www.mooter.com Clustering search engine
mrquery www.mrquery.com Metasearch engine / metasearch providers
MS. DEWEY www.msdewey.com Unique user interface - enough said.
Omgili www.omgili.com Social community search engine
onkosh www.onkosh.com Arabic / English Search Engine
Pagebull www.pagebull.com Visual results search engine
pipl http://pipl.com People search engine
PlanetSearch www.planetsearch.com Metasearch engine
PolyMeta www.polymeta.com Metasearch and clustering search engine
pronto.com www.pronto.com Metasearch engine
qksearch www.qksearch.com Multi-featured "3-in-1" multi-search engine
Quintura www.quintura.com Clustering search engine with a new interface
Quintura for kids http://kids.quintura.com/ Search engine for kids by Quintura
RedZee www.redzee.com Search Engine with nice preview results
retrievr http://labs.systemone.at/retrievr/ Visual search engine
riya www.riya.com Visual search engine; see also Like
scirus http://scirus.com Scientific information only search engine
searchbots www.searchbots.net Have a little fun, create your own searchbot.
SearchTheWeb2 www.searchtheweb2.com Search The Popular Head and The Long Tail
sidekiq www.sidekiq.com Multi-category search engine. Very nice.
Slideshow http://slideshow.zmpgroup.com/ Displays search results as a moving slideshow.
Slifter www.slifter.com A mobile shopping search engine.
soople www.soople.com A simplified version of Google's search options.
Speegle www.speegle.com The speeglebot talks to you.
Sphere www.sphere.com A blog search engine.
Sproose www.sproose.com Social search engine
S R C H R www.srchr.com Metasearch engine
SurfWax www.surfwax.com Meaning-based search engine
Swamii www.swamii.com Search engine that keeps on searching for you.
Swoogle http://swoogle.umbc.edu Semantic Web search engine
thefind.com www.thefind.com Shopping search engine
Trexy www.trexy.com Follow "trails" and "trailblazers" with Trexy.
turboscout www.turboscout.com Metasearch engine
TWERQ www.twerq.com Multi-category search engine with tabbed results.
UJIKO www.ujiko.com A fun interface where you can vote on the results.
url.com www.url.com "Search with many" community metasearch engine.
VMGO.com www.vmgo.com Vote on the search results with emoticons.
WASALive www.wasalive.com A new member of the list.
Web 2.0 www.web20searchengine.com Web 2.0 search engines
WEBBRAIN www.webbrain.com Clustering "see the web" search engine.
whonu? www.whonu.com Deluxe metasearch engine.
WIKIO www.wikio.com "Live information from 33981 media and blogs"
Windows Live Mobile www.wls.live.com Windows Live Mobile search engine
WiseNut www.wisenut.com Clustering search engine
Yahoo! Mobile http://m.yahoo.com Yahoo! Mobile search engine
Yahoo! MINDSET www.mindset.research.yahoo.com Intention-driven search; commercial versus research
yoono www.yoono.com People-rated community web search
yoople www.yoople.net Yoople! = Yahoo! + Google + People
yubnub www.yubnub.org Use command lines to search the web.
ZABASEARCH www.zabasearch.com People and Public Information Search Engine.
zapmeta www.zapmeta.com Metasearch engine
Zippy www.zippy.co.uk Search engine for webmasters
ZUULA www.zuula.com Multi-category, multi-search engine, with good tabs.

Wednesday, March 28, 2007

Apache : Authentiaction, Authorization and Access Control

Authentication, Authorization, and Access Control

Introduction

Apache has three distinct ways of dealing with the question of whether a particular request for a resource will result in that resource actually be returned. These criteria are called Authorization, Authentication, and Access control.

Authentication is any process by which you verify that someone is who they claim they are. This usually involves a username and a password, but can include any other method of demonstrating identity, such as a smart card, retina scan, voice recognition, or fingerprints. Authentication is equivalent to showing your drivers license at the ticket counter at the airport.

Authorization is finding out if the person, once identified, is permitted to have the resource. This is usually determined by finding out if that person is a part of a particular group, if that person has paid admission, or has a particular level of security clearance. Authorization is equivalent to checking the guest list at an exclusive party, or checking for your ticket when you go to the opera.

Finally, access control is a much more general way of talking about controlling access to a web resource. Access can be granted or denied based on a wide variety of criteria, such as the network address of the client, the time of day, the phase of the moon, or the browser which the visitor is using. Access control is analogous to locking the gate at closing time, or only letting people onto the ride who are more than 48 inches tall - it's controlling entrance by some arbitrary condition which may or may not have anything to do with the attributes of the particular visitor.

Because these three techniques are so closely related in most real applications, it is difficult to talk about them separate from one another. In particular, authentication and authorization are, in most actual implementations, inextricable.

If you have information on your web site that is sensitive, or intended for only a small group of people, the techniques in this tutorial will help you make sure that the people that see those pages are the people that you wanted to see them.

Basic authentication

As the name implies, basic authentication is the simplest method of authentication, and for a long time was the most common authentication method used. However, other methods of authentication have recently passed basic in common usage, due to usability issues that will be discussed in a minute.


How basic authentication works

When a particular resource has been protected using basic authentication, Apache sends a 401 Authentication Required header with the response to the request, in order to notify the client that user credentials must be supplied in order for the resource to be returned as requested.

Upon receiving a 401 response header, the client's browser, if it supports basic authentication, will ask the user to supply a username and password to be sent to the server. If you are using a graphical browser, such as Netscape or Internet Explorer, what you will see is a box which pops up and gives you a place to type in your username and password, to be sent back to the server. If the username is in the approved list, and if the password supplied is correct, the resource will be returned to the client.

Because the HTTP protocol is stateless, each request will be treated in the same way, even though they are from the same client. That is, every resource which is requested from the server will have to supply authentication credentials over again in order to receive the resource.

Fortunately, the browser takes care of the details here, so that you only have to type in your username and password one time per browser session - that is, you might have to type it in again the next time you open up your browser and visit the same web site.

Along with the 401 response, certain other information will be passed back to the client. In particular, it sends a name which is associated with the protected area of the web site. This is called the realm, or just the authentication name. The client browser caches the username and password that you supplied, and stores it along with the authentication realm, so that if other resources are requested from the same realm, the same username and password can be returned to authenticate that request without requiring the user to type them in again. This caching is usually just for the current browser session, but some browsers allow you to store them permanently, so that you never have to type in your password again.

The authentication name, or realm, will appear in the pop-up box, in order to identify what the username and password are being requested for.

Configuration: Protecting content with basic authentication

There are two configuration steps which you must complete in order to protect a resource using basic authentication. Or three, depending on what you are trying to do.

  1. Create a password file
  2. Set the configuration to use this password file
  3. Optionally, create a group file


Create a password file

In order to determine whether a particular username/password combination is valid, the username and password supplied by the user will need to be compared to some authoritative listing of usernames and password. This is the password file, which you will need to create on the server side, and populate with valid users and their passwords.

Because this file contains sensitive information, it should be stored outside of the document directory. Although, as you will see in a moment, the passwords are encrypted in the file, if a cracker were to gain access to the file, it would be an aid in their attempt to figure out the passwords. And, because people tend to be sloppy with the passwords that they choose, and use the same password for web site authentication as for their bank account, this potentially be a very serious breach of security, even if the content on your web site is not particularly sensitive.

Caution: Encourage your users to use a different password for your web site than for other more essential things. For example, many people tend to use two passwords - one for all of their extremely important things, such as the login to their desktop computer, and for their bank account, and another for less sensitive things, the compromise of which would be less serious.

To create the password file, use the htpasswd utility that came with Apache. This will be located in the bin directory of wherever you installed Apache. For example, it will probably be located at /usr/local/apache/bin/htpasswd if you installed Apache from source.

To create the file, type:

htpasswd -c /usr/local/apache/passwd/passwords username

htpasswd will ask you for the password, and then ask you to type it again to confirm it:

# htpasswd -c /usr/local/apache/passwd/passwords rbowen
New password: mypassword
Re-type new password: mypassword
Adding password for user rbowen

Note that in the example shown, a password file is being created containing a user called rbowen, and this password file is being placed in the location /usr/local/apache/passwd/passwords. You will substitute the location, and the username, which you want to use to start your password file.

If htpasswd is not in your path, you will have to type the full path to the file to get it to run. That is, in the example above, you would replace htpasswd with /usr/local/apache/bin/htpasswd

The -c flag is used only when you are creating a new file. After the first time, you will omit the -c flag, when you are adding new users to an already-existing password file.

htpasswd /usr/local/apache/passwd/passwords sungo

The example just shown will add a user named sungo to a password file which has already been created earlier. As before, you will be asked for the password at the command line, and then will be asked to confirm the password by typing it again.

Caution: Be very careful when you add new users to an existing password file that you don't use the -c flag by mistake. Using the -c flag will create a new password file, even if you already have an existing file of that name. That is, it will remove the contents of the file that is there, and replace it with a new file containing only the one username which you were adding.

The password is stored in the password file in encrypted form, so that users on the system will not be able to read the file and immediately determine the passwords of all the users. Nevertheless, you should store the file in as secure a location as possible, with whatever minimum permissions on the file so that the web server itself can read the file. For example, if your server is configured to run as user nobody and group nogroup, then you should set permissions on the file so that only the webserver can read the file and only root can write to it:

chown root.nogroup /usr/local/apache/passwd/passwords
chmod 640 /usr/local/apache/passwd/passwords

On Windows, a similar precaution should be taken, changing the ownership of the password file to the web server user, so that other users cannot read the file.


Set the configuration to use this password file

Once you have created the password file, you need to tell Apache about it, and tell Apache to use this file in order to require user credentials for admission. This configuration is done with the following directives:

AuthType Authentication type being used. In this case, it will be set to Basic
AuthName The authentication realm or name
AuthUserFile The location of the password file
AuthGroupFile The location of the group file, if any
Require The requirement(s) which must be satisfied in order to grant admission

These directives may be placed in a .htaccess file in the particular directory being protected, or may go in the main server configuration file, in a section, or other scope container.

The example shown below defines an authentication realm called ``By Invitation Only''. The password file located at /usr/local/apache/passwd/passwords will be used to verify the user's identity. Only users named rbowen or sungo will be granted access, and even then only if they provide a password which matches the password stored in the password file.

AuthType Basic
AuthName "By Invitation Only"
AuthUserFile /usr/local/apache/passwd/passwords
Require user rbowen sungo

The phrase ``By Invitation Only'' will be displayed in the password pop-up box, where the user will have to type their credentials.

You will need to restart your Apache server in order for the new configuration to take effect, if these directives were put in the main server configuration file. Directives placed in .htaccess files take effect immediately, since .htaccess files are parsed each time files are served.

The next time that you load a file from that directory, you will see the familiar username/password dialog box pop up, requiring that you type the username and password before you are permitted to proceed.

Note that in addition to specifically listing the users to whom you want to grant access, you can specify that any valid user should be let in. This is done with the valid-user keyword:

Require valid-user


Optionally, create a group file

Most of the time, you will want more than one, or two, or even a dozen, people to have access to a resource. You want to be able to define a group of people that have access to that resource, and be able to manage that group of people, adding and removing members, without having to edit the server configuration file, and restart Apache, each time.

This is handled using authentication groups. An authentication group is, as you would expect, a group name associated with a list of members. This list is stored in a group file, which should be stored in the same location as the password file, so that you are able to keep track of these things.

The format of the group file is exceedingly simple. A group name appears first on a line, followed by a colon, and then a list of the members of the group, separated by spaces. For example:

authors: rich daniel allan

Once this file has been created, you can Require that someone be in a particular group in order to get the requested resource. This is done with the AuthGroupFile directive, as shown in the following example.

AuthType Basic
AuthName "Apache Admin Guide Authors"
AuthUserFile /usr/local/apache/passwd/passwords
AuthGroupFile /usr/local/apache/passwd/groups
Require group authors

The authentication process is now one step more involved. When a request is received, and the requested username and password are supplied, the group file is first checked to see if the supplied username is even in the required group. If it is, then the password file will be checked to see if the username is in there, and if the supplied password matches the password stored in that file. If any of these steps fail, access will be forbidden.


Frequently asked questions about basic auth

The following questions tend to get asked very frequently with regard to basic authentication. It should be understood that basic authentication is very basic, and so is limited to the set of features that has been presented above. Most of the more interesting things that people tend to want, need to be implemented using some alternate authentication scheme.


How do I log out?

Since browsers first started implementing basic authentication, website administrators have wanted to know how to let the user log out. Since the browser caches the username and password with the authentication realm, as described earlier in this tutorial, this is not a function of the server configuration, but is a question of getting the browser to forget the credential information, so that the next time the resource is requested, the username and password must be supplied again. There are numerous situations in which this is desirable, such as when using a browser in a public location, and not wishing to leave the browser logged in, so that the next person can get into your bank account.

However, although this is perhaps the most frequently asked question about basic authentication, thus far none of the major browser manufacturers have seen this as being a desirable feature to put into their products.

Consequently, the answer to this question is, you can't. Sorry.


How can I change what the password box looks like?

The dialog that pops up for the user to enter their username and password is ugly. It contains text that you did not indicate that you wanted in there. It looks different in Internet Explorer and Netscape, and contains different text. And it asks for fields that the user might not understand - for example, Netscape asks the user to type in their ``User ID'', and they might not know what that means. Or, you might want to provide additional explanatory text so that the user has a better idea what is going on.

Unfortunately, these things are features of the browser, and cannot be controlled from the server side. If you want the login to look different, then you will need to implement your own authentication scheme. There is no way to change what this login box looks like if you are using basic authentication.


How to I make it not ask me for my password the next time?

Because most browsers store your password information only for the current browser session, when you close your browser it forgets your username and password. So, when you visit the same web site again, you will need to re-enter your username and password.

There is nothing that can be done about this on the server side.

However, the most recent versions of the major browsers contain the ability to remember your password forever, so that you never have to log in again. While it is debatable whether this is a good idea, since it effectively overrides the entire point of having security in the first place, it is certainly convenient for the user, and simplifies the user experience.


Why does it sometimes ask me for my password twice?

When entering a password-protected web site for the first time, you will occasionally notice that you are asked for your password twice. This may happen immediately after you entered the password the first time, or it may happen when you click on the first link after authenticating the first time.

This happens for a very simple, but nonetheless confusing, reason, again having to do with the way that the browser caches the login information.

Login information is stored on the browser based on the authentication realm, specified by the AuthName directive, and by the server name. In this way, the browser can distinguish between the Private authentication realm on one site and on another. So, if you go to a site using one name for the server, and internal links on the server refer to that server by a different name, the browser has no way to know that they are in fact the same server.

For example, if you were to visit the URL http://example.com/private/, which required authentication, your browser would remember the supplied username and password, associated with the hostname example.com. If, by virtue of an internal redirect, or fully-qualified HTML links in pages, you are then sent to the URL http://www.example.com/private/, even though this is really exactly the same URL, the browser does not know this for sure, and is forced to request the authentication information again, since example.com and www.example.com are not exactly the same hostname. Your browser has no particular way to know that these are the same web site.


Security caveat

Basic authentication should not be considered secure for any particularly rigorous definition of secure.

Although the password is stored on the server in encrypted format, it is passed from the client to the server in plain text across the network. Anyone listening with any variety of packet sniffer will be able to read the username and password in the clear as it goes across.

Not only that, but remember that the username and password are passed with every request, not just when the user first types them in. So the packet sniffer need not be listening at a particularly strategic time, but just for long enough to see any single request come across the wire.

And, in addition to that, the content itself is also going across the network in the clear, and so if the web site contains sensitive information, the same packet sniffer would have access to that information as it went past, even if the username and password were not used to gain direct access to the web site.

Don't use basic authentication for anything that requires real security. It is a detriment for most users, since very few people will take the trouble, or have the necessary software and/or equipment, to find out passwords. However, if someone had a desire to get in, it would take very little for them to do so.

Basic authentication across an SSL connection, however, will be secure, since everything is going to be encrypted, including the username and password.

Digest authentication

Addressing one of the security caveats of basic authentication, digest authentication provides an alternate method for protecting your web content. However, it to has a few caveats.

How digest auth works

Digest authentication is implemented by the module mod_auth_digest. There is an older module, mod_digest, which implemented an older version of the digest authentication specification, but which will probably not work with newer browsers.

Using digest authentication, your password is never sent across the network in the clear, but is always transmitted as an MD5 digest of the user's password. In this way, the password cannot be determined by sniffing network traffic.

The full specification of digest authentication can be seen in the internet standards document RFC 2617, which you can see at http://www1.ics.uci.edu/pub/ietf/http/rfc2617.txt. Additional information and resources about MD5 can be found at http://userpages.umbc.edu/ mabzug1/cs/md5/md5.html

Configuration: Protecting content with digest authentication

The steps for configuring your server for digest authentication are very similar for those for basic authentication.

  1. Create the password file
  2. Set the configuration to use this password file
  3. Optionally, create a group file

Creating a password file

As with basic authentication, a simple utility is provided to create and maintain the password file which will be used to determine whether a particular user's name and password are valid. This utility is called htdigest, and will be located in the bin directory of wherever you installed Apache. If you installed Apache from some variety of package manager, htdigest is likely to have been placed somewhere in your path.

To create a new digest password file, type:

htdigest -c /usr/local/apache/passwd/digest realm username

htdigest will ask you for the desired password, and then ask you to type it again to confirm it.

Note that the realm for which the authentication will be required is part of the argument list.

Once again, as with basic authentication, you are encouraged to place the generated file somewhere outside of the document directory.

And, as with the htpasswd utility, the -c flag creates a new file, or, if a file of that name already exists, deletes the contents of that file and generates a new file in its place. Omit the -c flag in order to add new user information to an existing password file.

Set the configuration to use this password file

Once you have created a password file, you need to tell Apache about it in order to start using it as a source of authenticated user information. This configuration is done with the following directives:

AuthType Authentication type being used. In this case, it will be set to Digest
AuthName The authentication realm or name
AuthDigestFile The location of the password file
AuthDigestGroupFile Location of the group file, if any
Require The requirement(s) which must be satisfied in order to grant admission

These directives may be placed in a .htaccess file in the particular directory being protected, or may go in the main server configuration file, in a section, or another scope container.

The following example defines an authentication realm called "Private". The password file located at /usr/local/apache/passwd/digest will be used to verify the user's identity. Only users named drbacchus or dorfl will be granted access, if they provide a password that patches the password stored in the password file.

AuthType Digest
AuthName "Private"
AuthDigestFile /usr/local/apache/passwd/digest
Require user drbacchus dorfl

The phrase "Private" will be displayed in the password pop-up box, where the user will have to type their credentials.

Optionally, create a group file

As you have observed, there are not many differences between this configuration process and that required by basic authentication, described in the previous section. This is true also of group functionality. The group file used for digest authentication is exactly the same as that used for basic authentication. That is to say, lines in the group file consist the name of the group, a colon, and a list of the members of that group. For example:

admins: jim roy ed anne

Once this file has been created, you can Require that someone be in a particular group in order to get the requested resource. This is done with the AuthDigestGroupFile directive, as shown in the following example.

AuthType Digest
AuthName "Private"
AuthDigestFile /usr/local/apache/passwd/digest
AuthDigestGroupFile /usr/local/apache/passwd/digest.groups
Require group admins

The authentication process is the same as that used by basic authentication. It is first verified that the user is in the required group, and, if this is true, then the password is verified.

Caveats

Before you leap into using digest authentication instead of basic authentication, there are a few things that you should know about.

Most importantly, you need to know that, although digest authentication has this great advantage that you don't send your password across the network in the clear, it is not supported by all major browsers in use today, and so you should not use it on a web site on which you cannot control the browsers that people will be using, such as on your intranet site. In particular, Opera 4.0 or later, Microsoft Internet Explorer 5.0 or later, Mozilla 1.0.1 and Netscape 7 or later as well as Amaya support digest authentication, while various other browsers do not.

Next, with regard to security considerations, you should understand two things. Although your password is not passed in the clear, all of your data is, and so this is a rather small measure of security. And, although your password is not really sent at all, but a digest form of it, someone very familiar with the workings of HTTP could use that information - just your digested password - and use that to gain access to the content, since that digested password is really all the information required to access the web site.

The moral of this is that if you have content that really needs to be kept secure, use SSL.

Database authentication modules

Basic authentication and digest authentication both suffer from the same major flaw. They use text files to store the authentication information. The problem with this is that looking something up in a text file is very slow. It's rather like trying to find something in a book that has no index. You have to start at the beginning, and work through it one page at a time until you find what you are looking for. Now imagine that the next time you need to find the same thing, you don't remember where it was before, so you have to start at the beginning again, and work through one page at a time until you find it again. And the next time. And the time after that.

Since HTTP is stateless, authentication has to be verified every time that content is requested. And so every time a document is accessed which is secured with basic or digest authentication, Apache has to open up those text password files and look through them one line at a time, until it finds the user that is trying to log in, and verifies their password. In the worst case, if the username supplied is not in there at all, every line in the file will need to be checked. On average, half of the file will need to be read before the user is found. This is very slow.

While this is not a big problem for small sets of users, when you get into larger numbers of users (where "larger" means a few hundred) this becomes prohibitively slow. In many cases, in fact, valid username/password combinations will get rejected because the authentication module just had to spend so much time looking for the username in the file that Apache will just get tired of waiting and return a failed authentication.

In these cases, you need an alternative, and that alternative is to use some variety of database. Databases are optimized for looking for a particular piece of information in a very large data set. It builds indexes in order to rapidly locate a particular record, and they have query languages for swiftly locating records that match particular criteria.

There are numerous modules available for Apache to authenticate using a variety of different databases. In this section, we'll just look at two modules which ship with Apache.

mod_auth_db and mod_auth_dbm

mod_auth_db and mod_auth_dbm are modules which lets you keep your usernames and passwords in DB or DBM files. There are few practical differences between DB files and DBM files. And, on some operating systems, such as various BSDs, and Linux, they are exactly the same thing. You should pick whichever of the two modules makes the most sense on your particular platform of choice. If you do not have DB support on your platform, you may need to install it. You download an implementation of DB at http://www.sleepycat.com/.

Berkeley DB files

DB files, also known as Berkeley database files, are the simplest form of database, and are rather ideally suited for the sort of data that needs to be stored for HTTP authentication. DB files store key/value pairs. That is, the name of a variable, and the value of that variable. While other databases allow the storage of many fields in a given record, a DB file allows only this pairing of key and value.1 This is ideal for authentication, which requires only the pair of a username and password.

Installing mod_auth_db

For the purposes of this tutorial, we'll talk about installing and configuring mod_auth_db. However, everything that is said here can be directly applied to mod_auth_dbm by simply replacing 'db' with 'dbm' and 'DB' with 'DBM' in the various commands, file names, and directives.

Since mod_auth_db is not compiled in by default, you will need to rebuild Apache in order to get the functionality, unless you built in everything when we started. Note that if you installed Apache with shared object support, you may be able to just build the module and load it in to Apache.

To build Apache from scratch with mod_auth_db built in, use the following ./configure line in your apache source code directory.

./configure --enable-module=auth_db

Or, if you had a more complex configure command line, you can just add the -enable-module=auth_db option to that command line, and you'll get mod_auth_db built into your server.

Protecting a directory with mod_auth_db

Once you have compiled the mod_auth_db module, and loaded it into your web server, you'll find that there's very little difference between using regular authentication and using mod_auth_db authentication. The procedure is the same as that we went through with basic and digest authentication:

  1. Create the user file.
  2. Configure Apache to use that file for authentication.
  3. Optionally, create a group file.

Create the user file

The user file for authentication is, this time, not a flat text file, but is a DB file2. Fortunately, once again, Apache provides us with a simple utility for the purpose of managing this user file. This time, the utility is called dbmmanage, and will be located in the bin subdirectory of wherever you installed Apache.

dbmmanage is somewhat more complicated to use than htpasswd or htdigest, but it is still fairly simple. The syntax which you will usually be using is as follows:

dbmmanage passwords.dat adduser montressor

As with htpasswd, you will at this point be prompted for a password, and then asked to confirm that password by typing it again. The main difference here is that rather than a text file being created, you are creating a binary file containing the information that you have supplied.

Type dbmmanage with no arguments to get the full list of options available with this utility.

Creating your user file with Perl

Note that, if you are so inclined, you can manage your user file with Perl, or any other language which has a DB-file module, for interfacing with this type of database. This covers a number of popular programming languages.

The following Perl code, for example, will add a user 'rbowen', with password 'mypassword', to your password file:

use DB_File;
tie %database, 'DB_File', "passwords.dat"
or die "Can't initialize database: $!\n";

$username = 'rbowen';
$password = 'mypassword';
@chars=(0..9,'a'..'z');
$salt = $chars[int rand @chars] . $chars[int rand @chars];

$crypt = crypt($password, $salt);
$database{$username} = $crypt;

untie %database;

As you can imagine, this makes it very simple to write tools to manage the user and password information stored in these files.

Passwords are stored in Unix crypt format, just as they were in the "regular" password files. The 'salt' that is created in the middle there is part of the process, generating a random starting point for that encryption. The technique being used is called a 'tied hash'. The idea is to tie a built-in data structure to the contents of the file, such that when the data structure is changed, the file is automatically modified at the same time.

Configuring Apache to use this password file

Once you have created the password file, you need to tell Apache about it, and tell Apache to use this file to verify user credentials. This configuration will look almost the same as that for basic authentication. This configuration can go in a .htaccess file in the directory to be protected, or can go in the main server configuration, in a section, or other scope container directive.

The configuration will look something like the following:

AuthName "Members Only"
AuthType Basic
AuthDBUserFile /usr/local/apache/passwd/passwords.dat
require user rbowen

Now, users accessing the directory will be required to authenticate against the list of valid users who are in /usr/local/apache/passwd/passwords.dat.


Optionally, create a group file

As mentioned earlier, DB files store a key/value pair. In the case of group files, the key is the name of the user, and the value is a comma-separated list of the groups to which the user belongs.

While this is the opposite of the way that group files are stored elsewhere, note that we will primarily be looking up records based on the username, so it is more efficient to index the file by username, rather than by the group name.

Groups can be added to your group file using dbmmanage and the add command:

dbmmanage add groupfile rbowen one,two,three

In the above example, groupfile is the literal name of the group file, rbowen is the user being added, and one, two, and three are names of three groups to which this user belongs.

Once you have your groups in the file, you can require a group in the regular way:

AuthName "Members Only"
AuthType Basic
AuthDBUserFile /usr/local/apache/passwd/passwords.dat
AuthDBGroupFile /usr/local/apache/passwd/groups.dat
require group three

Note that if you want to use the same file for both password and group information, you can do so, but this is a little more complicated to manage, as you have to encrypt the password yourself before you feed it to the dbmmanage utility.

Access control

Authentication by username and password is only part of the story. Frequently you want to let people in based on something other than who they are. Something such as where they are coming from. Restricting access based on something other than the identity of the user is generally referred to as Access Control.

Allow and Deny

The Allow and Deny directives let you allow and deny access based on the host name, or host address, of the machine requesting a document. The directive goes hand-in-hand with these is the Order directive, which tells Apache in which order to apply the filters.

The usage of these directives is:

allow from address

where address is an IP address (or a partial IP address) or a fully qualified domain name (or a partial domain name); you may provide multiple addresses or domain names, if desired.

For example, if you have someone spamming your message board, and you want to keep them out, you could do the following:

deny from 11.22.33.44

Visitors coming from that address will not be able to see the content behind this directive. If, instead, you have a machine name, rather than an IP address, you can use that.

deny from hostname.example.com

And, if you'd like to block access from an entire domain, or even from an entire tld (top level domain, such as .com or .gov) you can specify just part of an address or domain name:

deny from 192.101.205
deny from exampleone.com exampletwo.com
deny from tld

Using Order will let you be sure that you are actually restricting things to the group that you want to let in, by combining a deny and an allow directive:

Order Deny,Allow
Deny from all
Allow from hostname.example.com

Listing just the allow directive would not do what you want, because it will let users from that host in, in addition to letting everyone in. What you want is to let in only users from that host.

Satisfy

The Satisfy directive can be used to specify that several criteria may be considered when trying to decide if a particular user will be granted admission. Satisfy can take as an argument one of two options - all or any. By default, it is assumed that the value is all. This means that if several criteria are specified, then all of them must be met in order for someone to get in. However, if set to any, then several criteria may be specified, but if the user satisfies any of these, then they will be granted entrance.

A very good example of this is using access control to assure that, although a resource is password protected from outside your network, all hosts inside the network will be given free access to the resource. This would be accomplished by using the Satisfy directive, as shown below.


AuthType Basic
AuthName intranet
AuthUserFile /www/passwd/users
AuthGroupFile /www/passwd/groups
Require group customers
Order allow,deny
Allow from internal.com
Satisfy any

In this scenario, users will be let in if they either have a password, or if they are in the internal network.

Summary

The various authentication modules provide a number of ways to restrict access to your host based on the identity of the user. They offer a somewhat standard interface to this functionality, but provide different back-end mechanisms for actually authenticating the user.

And the access control mechanism allows you to restrict access based on criteria unrelated to the identity of the user.