HTML files are not ideal for archiving as they’re often dependent on other external files like images, CSS and fonts. A better way to store webpages is to use a portable format like PDF.
PDF files are self-contained since they store all the files needed to render the document within the file itself. This includes images, external fonts and annotations (comments) like notes, highlights, audio files etc.
Likewise, it’s easy to convert HTML pages to PDF since modern browsers like Chrome and Firefox include a Save to PDF printer. Alternatively, one can use the default Microsoft PDF printer that ships with Windows 10 or one of the many free and commercial PDF printers that exist. Online PDF converters are also a convenient option in case you don’t want to install additional software.
While all these options work reasonably well, none of them are ideal for bulk converting multiple webpages or local HTML files. I say reasonably well since most PDF printers are only as good as the source material — the more complex the HTML, the more likely it’s going to produce a poor quality PDF file with formatting issues.
Specialized HTML to PDF conversion tools like PDF Friendly and wkhtmltopdf, however, tend to produce superior PDFs. The latter is not only open source and cross-platform, but best of all can be used to carry out batch conversions. Note however it has no GUI but rather uses the command-line. Nevertheless, it’s quite easy to use as you only need to give it the input and output files.
I’ve personally used wkhtmltopdf to batch convert numerous HTML files that I had downloaded using HTTrack and the output PDFs were great. The program even automatically generates PDF bookmarks from the heading tags (h2, h3, h4 etc.) which is incredibly useful when converting long articles. Did I mention it’s incredibly fast?
Batch Convert HTML to PDF using wkhtmltopdf
- Download and install wkhtmltopdf from the official page. There’s also a portable version provided in a 7z archive that you can use if you prefer not to use the installer version. For this guide I’ll be using Windows, however the program is also available on macOS and various Linux distributions.
- Put all your HTML files (and their linked images / folders if any) in an easily accessibly folder (e.g
C:\HTML). Avoid long paths with spaces to lessen the likelihood of errors.
- The command for converting a single HTML to PDF is as follows:
wkhtmltopdf.exe input.html output.pdf. We’ll use the same logic using a batch (bat) file to convert multiple HTML files. Open notepad then copy and paste the following script:
@echo off for /R %%i in (*.html) do "C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" "%%i" "C:\HTML\output\%%~ni.pdf"
C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exewith the path where
wkhtmltopdf.exeis located if you installed in a different folder or are using the portable version.
C:\HTML\output\is the path where the PDF files will be saved. You can change it if you want to use a different output folder.
*.htmlwith the correct extension of your HTML files. For instance, some programs save HTML files as
.htmin which case you’ll need to use
*.htminstead. If you have a mix of both, you can use
*.htm*which will catch both extensions.
- The latest version (0.12.6) at the time of this writing gives the error: ‘Warning: Blocked access to file’ when it encounters local files in the HTML like images. Consequently, the output PDF does not contain images. To avoid this error you can either use the previous version (0.12.5) which is provided in the Archive section of the download page, or you can use the option
--enable-local-file-accessin the command as follows:
@echo off for /R %%i in (*.html) do "C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --enable-local-file-access "%%i" "C:\HTML\output\%%~ni.pdf"
- Save the file as a bat file (e.g. pdf.bat) in the root of the folder where the HTML files are located (i.e C:\HTML)
- Double-click the bat file to run the command. A command prompt window will open and begin converting sequentially to PDF all the HTML files inside the folder.
Note that wkhtmltopdf has many options which you can use to customize the output of the PDF. You can explore this options in its documentations by running
wkhtmltopdf.exe -H. Cheers!