About this site

— last updated

Technical notes about how this site is made (spoiler: hugo, emacs and org-mode).

Hugo, native org

As you probably saw, this site is built with hugo. Posts are written in org-mode, I'm using the "native" hugo support for org. Hugo's primary format is markdown and its support of org is obviously not as extensive as emacs's, so an alternative is to use ox-hugo to export org to markdown. This way one can enjoy full org features and customize or extend within emacs. I took another road, using hugo's org when it's enough and, when necessary, export (emacs) org to "hugo org".

In the following I describe the difficulties I met and how I dealt with them.

Links to RFCs

In the GNU mailutils post I use a custom kind of org links to refer to RFC documents (either their status or their content). This is obviously not known by hugo. The solution is to setup things so that when exporting to org, rfc links are exported as classic org url link.

Also, by default a "timestamp" comment line is added at the top of the generated org file:

# Created 2025-01-22 Wed 17:55

But this is where hugo expects to find the front matter! Fortunately this is just a matter of setting the timestamp option to nil.

#+OPTIONS: timestamp:nil

Handling bibliographies

The post "The Evolution of an OCaml Programmer" was the first time I used a bibtex bibliography in org. The initial setup is quite straightforward, but I've had to fix a few things to feed hugo something it can manage.

Generating the bibliography

Generating a bibliography for a org document can be done with the citeproc-org package (installed through melpa). This reads bib entries from a bibtex file, then exports to various backends, according to a CSL (style) file. The style is defined in variable citeproc-org-default-style-file:

(defcustom citeproc-org-default-style-file nil
  "Default CSL style file.
If nil then the chicago-author-date style is used as a fallback."
  :type 'file
  :group 'citeproc-org)

Org keywords can be used to give all necessary information:

  • what bib file(s) to get the information from, the export style to use:

       #+bibliography: biblio.bib
       #+cite_export: csl
    
  • and the place where to put the bibliography:

       #+BIBLIOGRAPHY: here
    

Run the command citeproc-org-setup and then export the org file to… org. The new file (file.org.org) contains the bibliography and can be given to hugo, but several "small" problems needed to be fixed.

Now, there are no less than three problems to solve: the first two are clearly related to limitations or bugs in hugo's org support, I'm not sure about the status of the last one (but it's not hugo related anyway).

Links to bib entries

citeproc generates references as

[[citeproc_bib_item_13][Unknown 1990]]

and bib entries as

<<citeproc_bib_item_13>>Unknown. 1990. “The Evolution of a Programmer.”
http://www.pvv.ntnu.no/~steinl/vitser/evolution.html.

But too bad hugo apparently does not know about the <<dedicated targets>> org syntax… Since we have to export org to org to include the bibliography, it's certainly doable to directly generate html links and anchors through the @@html:...@ construct (that allows to insert litteral html code in the org document).

The bib cites and anchors are defined in citeproc-formatters.el:

(defconst citeproc-fmt--org-alist
  `((unformatted . identity)
    (href . ,#'citeproc-fmt--org-link)
    (cited-item-no . ,(lambda (x y) (concat "[[citeproc_bib_item_" y "][" x "]]")))
    (bib-item-no . ,(lambda (x y) (concat "<<citeproc_bib_item_" y ">>" x)))
    ;; Warning: The next four formatter lines put protective zero-width spaces

It seems obvious to modify this constant to directly insert html constructs,1 like this:

(defconst citeproc-fmt--org-alist
  `((unformatted . identity)
    (href . ,#'citeproc-fmt--org-link)
    (cited-item-no . ,(lambda (x y) (concat "@@html:<a href=\"#citeproc_bib_item_" y "\">" x "</a>@@")))
    (bib-item-no . ,(lambda (x y) (concat "@@html:<a id=\"#citeproc_bib_item_" y "\"></a>@@" x)))
    ;; Warning: The next four formatter lines put protective zero-width spaces

but for some reason, this does not work.2

So I decided to simply run a post processing function on the exported org file to rewrite the bib refs and anchors and save the new file where hugo will look for it:

(defun ox-org-postprocess-cite-html (file)
  "Rewrite citeproc_bib_item links as html code in FILE.org.  Save rewritten file."  (find-file (concat file ".org"))
  (while  (re-search-forward "\\[\\[citeproc_bib_item_\\([^]]*\\)\\]\\[\\([^]]*\\)\\]\\]" nil t)
    (replace-match "@@html:<a href=\"#citeproc_bib_item_\\1\">\\2</a>@@"))
  (goto-char 1)
  (while (re-search-forward "<<citeproc_bib_item_\\(.*\\)>>" nil t)
    (replace-match "@@html:<a id=\"citeproc_bib_item_\\1\"></a>@@"))
  (write-file (concat "~/hugo/content/posts/" file))))

Now these links are correctly handled by hugo.

URLs in bib entries and org links

The org documentations describes org links this way:

Org recognizes plain URIs, possibly wrapped within angle brackets(1),
and activate them as clickable links.

   The general link format, however, looks like this:

     [[LINK][DESCRIPTION]]

or alternatively

     [[LINK]]

So an org link is normally written as [[link proper][description]] (altough for visual convenience only the description is shown in a emacs buffer) but org also recognizes plain URIs, such as https://deleuzec.gitlab.io where the description is implicitely a copy of the link. Guess what, hugo does not work very well with such "plain links".

Precisely I found that:

  1. Such a plain link is not recognized as such if it starts at the beginning of a line. This is surprising but not likely to be a problem in practice.
  2. More seriously, the link may gobble following non white space characters. In particular a period after the plain link will be considered part of the link by hugo (whereas emacs will consider it after the link). This is a serious problem as bib entries generated by citeproc-org will often be something like

    <<citeproc_bib_item_11>>Ruehr, Fritz. 2001. “The Evolution of a Haskell Programmer.”
    https://www.willamette.edu/~fruehr/haskell/evolution.html.
    

    In the html generated by hugo, the trailing period will be included in the html link!

Some tests about that here.

An obvious solution is to only generate "bracket enclosed" org links for urls in the bib entries.. Such org links are generated by function citeproc-fmt--org-link, which refrains from generating a bracketed link if the description is identical to the link proper:

(defun citeproc-fmt--org-link (anchor target)
  "Return an Org link with ANCHOR and TARGET.
If ANCHOR is string= to TARGET then return ANCHOR."
  (if (string= anchor target)
      anchor
    (concat "[[" target "][" anchor "]]")))

An easy fix is to generate a bracketed link in any case:

(defun citeproc-fmt--org-link (anchor target)
  "Return an Org link with ANCHOR and TARGET.
If ANCHOR is string= to TARGET then return ANCHOR as a link."
  (if (string= anchor target)
      (concat "[[" anchor "]]")
    (concat "[[" target "][" anchor "]]")))

With this change, the previous bib entry will be

<<citeproc_bib_item_11>>Ruehr, Fritz. 2001. “The Evolution of a Haskell Programmer.”
[[https://www.willamette.edu/~fruehr/haskell/evolution.html]].

URLs in bib entries and ~ characters

Most bibtex entries contain an url field:

@Misc{ruerh01evolution,
  author =       {Fritz Ruehr},
  title =        {The Evolution of a Haskell Programmer},
  howpublished = {Web page},
  year =         2001,
  url =          {https://www.willamette.edu/~fruehr/haskell/evolution.html}
}

It turns out citeproc turns any ~ in the url (or in most bibtex fields anyway) into a space. This probably makes sense for most fields, but certainly not for the url field.3 I found this is performed by citeproc-bt--to-csl:

(defun citeproc-bt--to-csl (s &optional with-nocase)
  "Convert a BibTeX field S to a CSL one.
If optional WITH-NOCASE is non-nil then convert BibTeX no-case
brackets to the corresponding CSL XML spans."
  (if (> (length s) 0)
      (--> s
	   (citeproc-bt--preprocess-for-decode it)
	   (citeproc-bt--decode it)
	   (citeproc-bt--process-brackets
	    it
	    (when with-nocase "<span class=\"nocase\">")
	    (when with-nocase "</span>"))
	   (citeproc-s-replace-all-seq it '(("\n" . " ") ("~" . " ") ("--" . "–")))
	   (s-trim it))
    s))

which is called by citeproc-bt-entry-to-csl

(defun citeproc-bt-entry-to-csl (b)
  "Return a CSL form of normalized parsed BibTeX entry B."
  (let ((type (assoc-default (downcase (assoc-default "=type=" b))
			     citeproc-bt--to-csl-types-alist))
	result year month)
    (cl-loop for (key . value) in b do
	     (let ((key (downcase key))
		   (value (citeproc-bt--to-csl value)))
	       (-if-let (csl-key (assoc-default key citeproc-bt--to-csl-keys-alist))
		   ;; Vars mapped simply to a differently named CSL var
		   (push (cons csl-key value) result)
		 (pcase key
		   ((or "author" "editor") ; Name vars
		    (push (cons (intern key) (citeproc-bt--to-csl-names value))
   ...

as a quick/hacky fix, I simply inserted a special case for the url field:

    (cl-loop for (key . value) in b do
	     (let ((key (downcase key))
		   (pvalue (citeproc-bt--process-brackets value))
		   (value (citeproc-bt--to-csl value)))
	       (if (string= key "url") (push (cons 'URL pvalue) result)
	       (-if-let (csl-key (assoc-default key citeproc-bt--to-csl-keys-alist))

And that's it!

Footnotes


1

Once done, we need to redefine citeproc-fmt--org-format-rt-1 which is built from this definition.

2

I haven't tried very hard to understand why.

3

It's not very clear to me what markup can actually be used in bibtex field values. There must be a spec somewhere, mustn't it?