5.0 Post Processing
5.1 Introduction
In a previous section, we applied proofreading checks to our plain-text source file. Now we will post-process it into HTML and EPUB3 formats for posting to bookcove.net or other sites.
Our text file will become the HTML "source of truth" for a short story, Old Slowpoke, from Western Stories Magazine, July 19, 1930.
5.2 Overview
There are many ways to go from the text file we have to a source file we could use to generate other formats. I have used these as source formats, to name a few: XML, TEI, nroff, markdown, and even "no markup," where code attempts to mimic human recognition of poetry, block quotes, etc. Some of the more successful projects were mixed source designs.
For most projects on bookcove, I have decided to use HTML5 as the source file markup. It is HTML5 with concessions to XHTML. These are mostly trivial to implement. From the single HTML "source of truth," we generate HTML, EPUB3 and plain text versions.
You can see the source files for "Old Slowpoke" on GitHub: https://github.com/bookcovebooks/old-slowpoke. That HTML and EPUB are an example of a project where everything fits in one chunk.
A more advanced version is available for "The Old Lady Flies". That story is organized by chapters, requiring a more complex EPUB. Everything is the same, though: get the HTML right and everything else flows from that. The repository for "The Old Lady Flies" is on GitHub: https://github.com/bookcovebooks/the-old-lady-flies.
5.3 Walkthrough
We need to generate an HTML file from the plain text. The text is almost entirely paragraphs. Those can be converted with a regex to change (\S)\n\n(\S) to \1</p>\n\n<p>\2 with manual adjustments at the start and end. Edit everything in the body to HTML5. Then do the header and footer. Also add the images folder. Let's look at the different pieces next.
5.3.1 The header block
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8"/>
<title>Old Slowpoke</title>
<meta name="author" content="Howard E. Morgan"/>
<meta name="author-sort" content="Morgan, Howard E."/>
<meta name="publisher" content="Street & Smith Publications"/>
<meta name="source" content="Western Story Magazine, July 19, 1930"/>
<meta name="date" content="1930-07-19"/>
<meta name="identifier" content="urn:uuid:10471f15-0855-4499-9932-48c744d16c00"/>
<style>
body { line-height: 1.1; margin: 0 auto; padding: 40px 8%; }
p { text-indent:1.15em; margin-top:0.1em; margin-bottom:0.1em;
text-align:justify; }
.tac { text-align:center; }
hr.tb { border: none; height: 1em; }
h1 { margin-bottom: 0.4em; font-weight: normal; text-align: center;
font-size: 1.25em; margin-top: 1em;}
.ni { text-indent: 0; }
.byline { text-align: center; margin-bottom: 1em; }
.frontispiece {
text-align: center;
margin: 2em 0;
}
.frontispiece img {
max-width: 90%;
height: auto;
}
</style>
</head>
<body>
Note: you get the uuid by running the EPUB3 creation program. It will give you the string to use in the header. Once you have edited it into the header, rerun the conversion script. Each time after that, it will use the same uuid.
5.3.2 The body
In the
, there are a few guidelines to follow.- close all self-closing tags explicitly. example: use
<br />and not<br>. - prefer using actual UTF-8 characters if possible. Example, use
—instead of—or— - use numeric, not named entities, for UTF-8 characters that you don't want to have in source as the actual character. A good example is using
 for a non-breaking space.
5.3.3 The footer
The footer is the usual </body> followed by </html>.
5.4 The Makefile
Once the HTML file is created, a series of steps does the checks and the generation of the outputs. I'll simply show the Makefile unless there is interest for a more detailed walkthough.
FILENAME := $(shell ls -1 *.html)
NAME := $(basename $(FILENAME))
# for GitHub
GHNAME := $(NAME)-GH
RESOURCES := /tank/home/rfrank/books/Resources
BK := bookcove
GITHUB_ORG := bookcovebooks
REPO_FULL := $(GITHUB_ORG)/$(NAME)
# for levenshtein checks
THRESHOLD ?= 2
ifeq (,$(FILENAME))
$(error No *.html file found - aborting)
endif
info:
@echo "Makefile for $(NAME)"
@echo "Prepared by volunteers at BookCove (bookcove.net)"
# "make init" will start a repository for this book
# in the current directory for use during development
init:
@rm -rf .git
@rm -f .gitignore
@touch $(NAME).html
@touch README.txt
@echo "*" > .gitignore
@echo '!'$(NAME).html >> .gitignore
@echo '!Makefile' >> .gitignore
@echo '!README.txt' >> .gitignore
@git init
@git add .
@git commit -am 'Initial commit'
# "make snap" will commit any changes to the local repository
snap:
@git add .
@git commit -am "snapshot" || true
tidy:
@tidy -e -utf8 -q --doctype html5 $(NAME).html
vhtml5:
html5validator $(NAME).html
vxhtml:
validate_xhtml $(NAME).html
epub:
html_to_epub $(NAME).html $(NAME).epub
epubcheck:
epubcheck $(NAME).epub
sr:
cp $(NAME).txt $(NAME)-sr.txt
@dos2unix -q $(NAME)-sr.txt
@unix2dos -q $(NAME)-sr.txt
@check_bare_cr_lf $(NAME)-sr.txt
@file $(NAME)-sr.txt
@rm -f $(NAME)-sr.zip
@(zip -rq $(NAME)-sr.zip $(NAME)-sr.txt)
@rm $(NAME)-sr.txt
# text file checks
# output is to the terminal. only capitalized words are checked
# to change the threshold: "make lev THRESHOLD=1"
pplev:
@echo "running Levensthein distance checks for edit distance=$(THRESHOLD)"
@pplev --threshold $(THRESHOLD) --infile $(NAME).txt
ppspell:
@ppspell $(NAME).txt /tmp/ppspell-report.txt 30
@cat /tmp/ppspell-report.txt
pptext:
pptext $(NAME).txt -o /tmp/pptext-report.txt
cat /tmp/pptext-report.txt
loupe:
cp $(NAME).txt /tmp/$(NAME).txt
unix2dos /tmp/$(NAME).txt
perl -pi -e 's/“/"/g' /tmp/$(NAME).txt
perl -pi -e 's/”/"/g' /tmp/$(NAME).txt
perl -pi -e "s/‘/'/g" /tmp/$(NAME).txt
perl -pi -e "s/’/'/g" /tmp/$(NAME).txt
bookloupe /tmp/$(NAME).txt
# OPENAI_API_KEY must be set in the environment
aiproof:
@rm -f result.txt
proofread_api_chunks $(NAME).txt --model gpt-4.1 --temperature 0 --min-words 200 --max-words 300 --format text > report.txt
@echo "results in report.txt"
# packaging for GitHub ------------------------------------------------------
ghinit:
@if [ -d $(GHNAME) ]; then \
echo "Error: $(GHNAME) already exists. Stopping."; \
exit 1; \
fi
@mkdir -p $(GHNAME)
@cp $(RESOURCES)/METADATA-json.txt $(GHNAME)/metadata.json
@cp $(RESOURCES)/RIGHTS-md.txt $(GHNAME)/RIGHTS.md
@cp $(NAME).html $(GHNAME)
@cp $(NAME).txt $(GHNAME)
@cp -r images $(GHNAME)
(cd $(GHNAME) && for file in *.md *.txt *.json *.html; do dos2unix $$file; done)
find $(GHNAME) -type d -exec chmod 0755 '{}' \;
find $(GHNAME) -type f -exec chmod 0644 '{}' \;
(cd $(GHNAME) && exiftool -overwrite_original -all= images/*)
@echo "cover size: $$(du -h $(GHNAME)/images/cover.jpg | cut -f1)"
@echo "dimensions (px): $$(identify -format '%wx%h' $(GHNAME)/images/cover.jpg)"
(cd $(GHNAME) && lastlook $(NAME))
@echo "edit metadata and generate README.md in subdirectory $(GHNAME)"
@echo "for Claude:"
@echo "Based on the story excerpt I will provide next, write a concise, back-cover–style flyleaf description (3–5 sentences). Then generate a list of no more than 10 SEO-optimized keywords that reflect the story’s themes, setting, and genre. Avoid spoilers. Optimize keywords for ebook and search engine discoverability."
@echo 'Output the result as valid JSON with two fields: “description”: a single string and “keywords”: an array of up to 10 keyword strings.'
@echo "Do not include any explanatory text outside the JSON. Use clear, professional language suitable for literary or archival publication."
# Using Claude (or ChatGPT), provide the prompt generated by "make show-prompt"
# and the first hundred lines or so of the book, not including the title page.
# it will provide a JSON segment to be placed in metadata.json
#
# Edit that JSON file to add *ALL* metadata.
ghreadme:
cd $(GHNAME) && \
python3 $(RESOURCES)/json_to_readme_md.py metadata.json
# repository
# need to be setup to use `gh` and have local repo in place
ghcreate:
cp $(NAME).epub $(GHNAME)
@test -f $(GHNAME)/$(NAME).epub || { echo "$(NAME).epub missing"; exit 1; }
@test -f $(GHNAME)/$(NAME).txt || { echo "$(NAME).txt missing"; exit 1; }
@test -f $(GHNAME)/$(NAME).html || { echo "$(NAME).html missing"; exit 1; }
cd $(GHNAME) && \
rm -rf .git .gitignore && \
git init && \
git add . && \
git commit -am "Initial commit"
@echo "Creating GitHub repository $(REPO_FULL)..."
cd $(GHNAME) && gh repo create $(REPO_FULL) --public --source=. --push
@echo "Repository created and code pushed!"
showbc:
@echo "ssh rfrank@rfrank.net"
@echo "cd /var/www/bookcove/html/scripts"
@echo "bash ./add_book.sh $(NAME)"
showpr:
@echo "Based on the story excerpt I will provide next, write a concise, back-cover–style flyleaf description (3–5 sentences). Then generate a list of no more than 10 SEO-optimized keywords that reflect the story’s themes, setting, and genre. Avoid spoilers. Optimize keywords for ebook and search engine discoverability."
@echo 'Output the result as valid JSON with two fields: “description”: a single string and “keywords”: an array of up to 10 keyword strings.'
@echo "Do not include any explanatory text outside the JSON. Use clear, professional language suitable for literary or archival publication."
delete-ghrepo:
@echo "WARNING: This will permanently delete $(REPO_FULL)!"
@read -p "Are you sure? (yes/no): " confirm && [ "$$confirm" = "yes" ]
gh repo delete $(REPO_FULL) --yes
@echo "Repository deleted!"
# utilities -------------------------------------------------------------------
cover-size:
@(cd images && jpegoptim --size=512k cover.jpg)
all-images:
@(cd images && jpegoptim --size=512k *.jpg)
ebm:
rm -rf $(NAME)-ebm.zip
zip -r $(NAME)-ebm.zip $(NAME).html images
clean:
@rm -rf report.txt
5.5 Making a PG version
To make a postable book for PG:
1. cd to book directory
2. export BOOK=${NAME}
3. makepgzip
That will create an uploadable zip file for Project Gutenberg.