So the other day I was referring to an offline document that was actually just a compiled source of many HTML pages. Opening each page every time from the index proved to be quite tasking not to mention time intensive.
For that reason I set out to compile all these pages into one HTML so that I could easily search through one page instead of the then 22 pages.
At first I thought I could get a free software online that would do just that and lucky enough I did get HTML Merge from SourceForge. However, my luck did not last that long on account of this software not living up to this task: it refused these particular HTML pages on account of them being in an unsupported encoding and then forgot to mention what it supported.
So I tried saving a few of them in what I thought was the standard (UTF-8) but that threw me the same exact error.
Left with no option I decided to go the manual route: open each HTML individually then use the godsend that is copy and paste. Having gotten through some few pages is when I recalled that Notepad++ had once aided me not long ago to combine multiple plain text files. So what about HTML? Turned out it could handle those too.
Merging HTMLs using Combine Plugin
- When I say combine or merge, I mean just that: appending one file after the other with no kind of HTML tag editing whatsoever.
- A less ambiguous though unpopular term for this I believe is concatenating. As such I wouldn’t advise using this method for content you plan publishing on a website. However, for offline HTML documents from the same source (like a book) I don’t see the harm.
1. Open Notepad++ first. You can get the portable or installable version here.
2. Now we need to install a plugin called Combine (previously called NPP Combine) for this to work. You can do that in either two ways:
a. While connected to the Internet, go to Plugins in the menu and under Plugin Manager select Show Plugin Manager. The plugin manager will automatically fetch all available plugins and list them there. Look for and select Combine then hit the Install button.
b. Get the plugin manually from the developers page and install it. To install, just copy the downloaded file (combine.dll) in the plugins sub-folder located inside Notepad++ installation folder. Restart the program to load the plugin.
3. Open all the HTML files you need to merge using Notepad++. To do this the easy way, just select all of them from your file manager then drag and drop them inside Notepad++ window.
4. Now go to the Plugins menu and select Start under the Combine plugin.
That will launch the plugins window with some few settings. Since this is a HTML I don’t think it’s wise to add anything so just hit the OK button.
5. Doing that will combine all the opened files into one large file in the order they’ve been opened (i.e. from the first to the last tab). To finish, save this new file and you’re done.
You can now go ahead and open the merged HTML page with your browser to see the output. If you need to remove any repeated element from the pages (like images or navigation linking to other pages that are now non-existent), just open the merged file using Notepad++ and use the Replace function (Ctrl+H) to remove the elements in one go.
After that, if it’s a book like in my case, I presume you’d like to convert the merged HTML into a more portable format like PDF or Word if you wish to edit the content.
Exporting the HTML to Other Formats
1. HTML to PDF
For PDF, I would recommend opening the HTML file using Chrome Browser and using it’s superb export to PDF feature which also offers some neat customizations.
If you’re on Windows 8/10, you can also use Windows in-built PDF printer to export to PDF from any Browser. There are also plenty of free software and online services that can help you with that.
2. HTML to WORD
For Word, the good news is that pretty much any MS Word version handles HTML files by default. MS Word actually renders the actual HTML rather than displaying its raw output.
So just open the file using MS Word then save the document in an editable Word Format (*docx, *.doc).