Convert Confluence pages to Markdown

There may come a day where you want to stop using Confluence and just convert all the pages to simple Markdown text files.

Luckily this is achievable using the Confluence XML export function, some custom XML parsing code and the markdownify Python package.

Confluence stores data in an Atlassian specific xhtml-format with a mix of html and custom tags (with the namespace prefix "ac"). I have created a fork of markdownify with adaptions for Confluence xml & wiki-markup.

To use the code you will have to download a zip-file with the repository and edit the __init__.py file in the markdownify directory.

You will need to update the configuration in the top of the __init__.py file and provide:

Next you will need to:

  1. Export an XML archive of your Confluence pages, which is only possible for a single space at a time, unless you create a full Confluence site backup.

  2. Write code to parse through the XML tree to extract the XHTML for each page.

  3. Call your modified version of the markdownify package for each page, e.g.:

    md = markdownify.markdownify(xhtml, heading_style='ATX')
    
  4. Save the markdown text in a folder.

  5. Live your new Confluence-free life.