Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.

HPR3367: Making books with linux - part 1

Hosted by Andrew Conway on 2021-06-29 00:00:00
Download or Listen

Andrew and Dave describe a common itch they have been scratching. Andrew talks through his approach to document creation in this episode and Dave will describe his in the next episode.

Andrew was inspired by a simple and elegant approach to eBook creation by Jon Kulp, possibly from listening to HPR 1909 several years ago.

In Andrew's approach, bash and python scripts assemble various text files into the book, inserting figures and tables using a simple home-brew tag system to generate reference numbers such as Figure 3.7 or Table 2.2. Such auto-numbering functionality is of course provided by many other document authoring systems, such as LaTeX, but the script also uses the tags to hunt down data in CSV files and convert them into the figures. In this way, nearly all information in the book can start off as text and then be processed into anything — prose, graphics, sound or even movies — that can be included with HTML. Also a clean separation between content and appearance is kept by using a CSS file.

This is not WYSIWYG (what you see is what you get) but using the entr command to monitor file changes can allow auto-generation of the HTML and even a browser refresh (using a feature found in Midori and Falkon but not many other browsers).

Dave describes how he achieves something similar to what Andrew has created by using make to co-ordinate the processing. The process of compiling the source text files into a final document does have some similarities with code compilation.

Dave and Andrew discuss how useful their methods might be to others. Some of Andrew's scripts are too bespoke to his use for wider consumption but the figure processing code is available online as part of the content and code of his book How Scotland Works.

Andrew describes the horror of the suggestion that a non-fiction book does not need an index which prompted him to create his simple code to generate an index from a PDF. This was also motivated by laziness and a reluctance to read his own writing for the umpteenth time. Andrew then describes how this code works. The code itself can be found here.

Dave brings up the issue of other formats such as epub which have no concept of pages, or at least do not insist on it natively.

The discussion moves on to other tools for document and text processing that are relevant to the tasks involved such as pandoc, LaTex and ASCIIdoc. In particular, Dave mentions that the "look" of LaTeX is simpler to control these days, at least as compared to the 1990s!

HPR Comments

Mastodon Comments

More Information...

Copyright Information

Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

The HPR Website Design is released to the Public Domain.