OFT Specifications as PDF

Sebastian July 3, 2018July 6, 2018 4 Comments

PDFs are a fixed size document format, which means that they made more sense in days when PCs all had about the same video resolutions and screen geometries. But even then they were never perfect for displaying them on a screen because most are in portrait mode and monitors very seldom were. Nowadays displays especially in mobile devices come in all shapes and sizes, so fixed size formats are even more obsolete.

What PDFs excel at to the present day is a universal document exchange format for read-only documents — especially if you plan on printing them.

While specifications seldom get printed these days, they still tend to get archived (especially PDF-A) and a universally accepted format helps. That being said, we plan to make converting OFT-native (aka. “requirement-enhanced Markdown”) easy.

The requirements are:

Creates PDFs
Is platform-independent
Separates content from layout
Customizable style (so that projects or companies can apply their corporate design)
Based on free software

While there is nothing wrong with users replacing parts of the tool chain with proprietary choices, our reference implementation is going to be free-software only.

These are the options we are discussing so far:

HTML + CSS + HTML2PDF Renderer

This is a variant where we let a Markdown renderer Create HTML for us and use printer-centric CSS as style and layout customization method. The benefits are that you have a broad base of developers these days who know how to tweak CSS, so it is easy for them to tweak the CSS stylesheet however they need. The downside is that the quality depends mostly on how good the PDF converter is.

LaTeX 2 PDF

LaTeX makes absolutely beautiful and professional documents. There is no doubt about that. The ability to customize via macros it is only limited by the user’s imagination.

On the other hand outside of the academic world there are not so many people who have previous experience with LaTeX. Also there are a lot of dependencies involved and they differ depending on the platform.

DocBook

DocBook shares the basic concept of separation of content and style with TeX. DocBook layouts are customizable through XSLT stylesheets. The DocBook documents are XML files that have a strictly defined schema, so the content structure is not customizable like in TeX. Depending on your perspective this is either a weakness or a strength (since it enforces a uniform document format).

DocBook is also known for producing quality PDFs. And the dependencies are should be pretty homogeneous between platforms.

Popularity Contest

I tried to find numbers about the popularity of LaTeX vs. DocBook. Since non popped up right away, I went for a different approach: comparing search term popularity.

https://trends.google.com/trends/explore?q=DocBook,%2Fm%2F04mdr

I know that the results need to be treated with a healthy dose of skepticism, since more searches could simply mean one of the two is harder to use. Also while there is only one DocBook, there is a whole bunch of TeX variants out there.

If the search term popularity is any indicating LaTeX wins this contest with flying colors.

What’s your opinion? Any arguments I missed?

Join the Conversation

4 Comments

kaklakariada says:

July 4, 2018 at 18:42

I would prefer an integrated toolchain that requires no external tools, i.e. a maven or gradle plugin without external dependencies.

Example: Asciidoctor (https://asciidoctor.org/, written in Ruby) can be used with Maven (https://github.com/asciidoctor/asciidoctor-maven-plugin) and Gradle (https://asciidoctor.org/docs/asciidoctor-gradle-plugin/).

There are old Docbook Plugins for gradle (https://github.com/spring-gradle-plugins/docbook-reference-plugin) and Maven (http://maven-plugins.sourceforge.net/maven-sdocbook-plugin/index.html).

I see the following alternatives:
* Find a native converter from markdown to pdf
* Use Asciidoc instead of markdown and use existing converters
* Build a maven/gradle plugin for md-html-pdf using https://github.com/jhonnymertz/java-wkhtmltopdf-wrapper

Solutions with an intermediate format like html or dobook-xml have the disadvantage of added complexity but may be the only feasible way.

Reply
Sascha says:

July 5, 2018 at 06:23

Hi,

why not using the Swiss army knife (Pandoc) when it comes to converting files from one markup format into another.

Pandoc can convert documents in

*Markdown
*reStructuredText
*textile
*HTML
*DocBook
*LaTeX
*MediaWiki markup,
*TWiki markup
*TikiWiki markup
*Creole 1.0
*Vimwiki markup
*OPML
*Emacs Org-Mode
*Emacs Muse
*txt2tags
*Microsoft Word docx
*LibreOffice ODT
*EPUB
*Haddock markup

to

HTML formats
XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides

Word processor formats
Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Microsoft PowerPoint.

Ebooks
EPUB version 2 or 3, FictionBook2

Documentation formats
DocBook version 4 or 5, TEI Simple, GNU TexInfo, Groff man, Groff ms, Haddock markup

Archival formats
JATS

Page layout formats
InDesign ICML

Outline formats
OPML

TeX formats
LaTeX, ConTeXt, LaTeX Beamer slides

PDF
via pdflatex, xelatex, lualatex, pdfroff, wkhtml2pdf, prince, or weasyprint.

Lightweight markup formats
Markdown (including CommonMark and GitHub-flavored Markdown), reStructuredText, AsciiDoc, Emacs Org-Mode, Emacs Muse, Textile, txt2tags, MediaWiki markup, DokuWiki markup, TikiWiki markup, TWiki markup, Vimwiki markup, and ZimWiki markup.

Custom formats
custom writers can be written in lua.

There is also already a template system and a system for writing filters included in Pandoc

Reply
Simon says:

July 8, 2018 at 15:06

HTML+CSS -> PDF is of course a modern technology but as it was designed for displaying content in web browsers there will be a lot of downsides when it comes to print production (e.g. no real pagination that you can refer to in the text when referring to a figure/table, no outline with page numbers, typesetting will always drag behind well-established print solutions). I also searched for a professional free solution to convert HTML+CSS into PDF some time ago, but it turned out that most of the free solutions lack behind in features compared to the commercial flagship Prince (https://www.princexml.com).

I personally worked a lot with LaTeX and consider this technology to be very professional and powerful for documentation. The possibilities of maintaining the code are similar to DocBook (e.g. including chapters from separate files), while the typesetting of the generated high-quality PDF documents is outstanding. A lot of available plugins enrich the system so the possibilities are basically endless. Of course, a LaTeX distribution is quite huge (some Gigabytes), which makes it hard to bundle this into the toolchain. But I could live with the effort of manually installing LaTeX as a prerequisite to be able to use the functionality of generate high-quality specification PDFs on top of the main functionality of OFT to do requirements tracing.

Regarding DocBook I am a bit disinclined: I have worked with that technology for almost a year now and still find it extremely hard to modify the formatting and typesetting, because there are so many different technologies involved which one must overcome (e.g. XML, XPath, XSLT, XSL-FO). Also, it is quite cumbersome to realize certain things in DocBook compared to a single line of code in LaTeX along with including an external package containing the actual functionality. In DocBook the XSL-FO code to achieve a relatively simple document layout can be a few thousand lines long, just because for each thing you want to change there are so many commands necessary. Other drawbacks are listed here: https://en.wikipedia.org/wiki/XSL_Formatting_Objects#Drawbacks

Reply
redcatbear says:

July 8, 2018 at 17:45

Thanks everyone for your opinions and for adding new pros and cons to the different solutions. Looks like the options are in a draw situation.
I think this calls for prototyping the solutions and see which of the prototypes gives our users the best effort-to-benefit ratio. I would suggest starting a separate repository for each approach below “itsallcode”.
Volunteers for a prototype of one of the different approaches?

Reply