pandoc

Migrating Markup Languages

There exists a broad range of different markup languages, which allow you to structure, highlight and format your documents nicely. Different wiki platforms utilise various markup languages for rendering texts, Github and other code sharing platforms use their own dedicated languages for their documentation files. These markup languages such as Textile or Markdown work rather similar, as they provide users with a basic set formats and structural elements for text. In order to avoid vendor lock-in or when you need to move to a different system, you will need to extract the information from the old markup and make it available in a different language. Copying the rendered text is insufficient, as the semantics contained in the markup is lost. Therefor you need to apply a tool such as pandoc for this task. The pandoc website says that the tool

can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org mode, DocBook, txt2tags, EPUB, ODT and Word docx; and it can write plain text, Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, reStructuredText, XHTML, HTML5, LaTeX (including beamer slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org mode, AsciiDoc, InDesign ICML, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX or ConTeXt is installed.

Pandoc’s enhanced version of Markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscripts and subscripts, strikeout, metadata blocks, automatic tables of contents, embedded LaTeX math, citations, and Markdown inside HTML block elements.

Install the tool from the Ubuntu repositories:

sudo apt-get install pandoc

You then can convert or migrate between the available formats, for instance from markdown to textile (i.e. Github to Redmine):

pandoc -f markdown_github -t textile inputFile.markdown -o outputFile.textile

For migrating your Github wiki pages for instance, you can checkout the wiki from Github and convert all markdown documents in a loop:

# clone repository
git clone https://gitlab.example.org/user/project.wiki.git
cd project.wiki

for fileName in *.markdown; do
    # Remove the extension markdown and replace it with textile
    pandoc -f markdown_github -t textile $fileName -o "${fileName#"markdown"}".textile
done