Tuesday, April 17, 2007

Save an entire website into a single file

Most of us might have saved web pages while browsing and it might have struggled a lot in managing those pages since saving a single page will result in lots of files in the disk and the broken links. I have often see my friends trying to save some web sites by clicking each links in the page and trying to save those pages individually. But life can be made easy if you follow some painless steps while saving the web pages. I often follow these steps if I need to save a web page.

  1. Use a web copier/Off line browser for saving an entire website to the disk. This will help you to browse the saved web page as if you are browsing online.
  2. There are so many softwares available for this purpose, but to my experience WinHTTrack is the best one. It is a free ware and works good. (Click the link to go to the WinHTTrack download page)
  3. Run the program after installing it.
  4. Keep the default settings for the proxy and other things.
  5. Give appropriate name to the project and click next to continue.
  6. Here give the name of the website(s) you want to download in the URL box.
  7. Click next, keep default settings and click finish to start copying the website.
  8. When httrack finishes copying the web site you can find a file called index.html on the project folder you have chosen, double click the file to open the saved web page.
  9. Now you have the web page copied on your computer and can browse it.
  10. But even this could cause problems while copying it to some other drive or moving it. It also consumes large space in disk (larger than total size of all files) . This is because each page is contained of many small files. But in the disk there is term called cluster size which gives the minimum space which can be allocated for saving a file. For example if there is a file of 1KB and the cluster size is 4KB, then that file will take a minimum of 4KB on the disk and the rest of the 3KB will remain empty. But if we makes the entire file into a single one like zip archeive, then we can save a lot of space on the disk.
  11. This can be made possible mainly in two ways (according to my knowledge). First one is to make it .CHM (windows compiled html help file) file or use custom packager softwares like WebSiteZip Packer which is a free EBook Compiler (Here you will also get option to password protect the output file).
Using WebSiteZip Paker

  1. Download and install WebSiteZip Packer.
  2. When you run it you will see something like this.
  3. click the wizard button and the click next.
  4. Now click the homepage button and browse to the index.html file in the project folder of the website copied using httrack.
  5. Click next and select EXE format for getting a stand alone E-book.
  6. Give the output file name and click next.
  7. Now you can keep the default settings or make changes as you like till you reach the page below.
  8. In this page check the option if you want to make the output from files which was taken from different sites.
  9. Now click next, and make changes if you want and click Finish to start compiling the EBook from web site.
Using CHM makers (CHM compilers)

  1. If you search the net for chm compilers or html to chm converters, the you will get so many softwares. But most of them will be sharewares or trials having limitations and expiry period.
  2. Of these most the have similar interface as that of WebSiteZip Packer and can be used in a similar way.
  3. One of chm builder which I found useful is Easy chm. It comes as a 30 day trial.
  4. Screen shot of Easy Chm.
  5. Click the new button to start the new project.
  6. Then select the folder containing downloaded files as the project folder. Make sure that the "Include sub directories box is selected".
  7. If the site contains files other than listed extensions the choose *.* from the dropdown menu and click ok.
  8. Now click the preview tab to see the preview of the web pages.
  9. The unnecessary folders like the cache created by the httrack by selecting it and clicling the 'x' button, if necessary other files can be added by clicking the '+' button.
  10. When everything is over click the compile button.
  11. Make changes if necessary and click the create button to create the CHM output file.