Using Pandoc to export to Obsidian markdown?

DrakeRichards@lemmy.world · edit-2 9 months ago

Using Pandoc to export to Obsidian markdown?

DrakeRichards@lemmy.world · 9 months ago

I got this mostly working, but it was not easy. Not only does Obsidian have a few peculiarities that make it less compatible with standard Markdown, but Word also does a few funny things.

Here’s the config.yaml I used for Pandoc:

from: docx
to: markdown-smart-simple_tables-multiline_tables-grid_tables+pipe_tables+yaml_metadata_block-superscript-subscript-bracketed_spans-native_spans-link_attributes-raw_html+rebase_relative_paths+four_space_rule
extract-media: "./"
wrap: preserve
markdown-headings: atx
tab-stop: 2
shift-heading-level-by: 1
standalone: true
template: obsidian.md
filters:
  - compact-list.lua
  - remove-single-characters.py
  - remove-extra-linebreaks.py
metadata:
  tags: "tags/go/here"

The three filters:

Removed extra linebreaks added between bulleted lists to make them more compact.
Removed lines with only a single character in them. Usually an invisible character like nbsp, which made Pandoc’s linter not remove them automatically.
Removes linebreaks enclosed in Strong tags. This is an artifact from Word where a line is bolded but has no content: technically the line break is bolded.

I then ran the resulting file through a RegExp replacement to change the superscript carats into HTML sup tags.

Even after all this, I still have to go through with an Obsidian plugin to convert the standard Markdown links and embeds into [[Wikilink]] style, since Obsidian will only do one or the other throughout your whole vault.

whereisk@lemmy.world · 9 months ago

Not sure about a specific plugin but couldn’t you sed your way out of it?

trijste@lemmy.world · 9 months ago

I’ve done something like this converting html to obsidian md. I interrogated gpt 3.5 with specifically what I needed to accomplish and went from there. If you can’t accomplish a formatting quirk in the same conversion process you might run iterative processes to accomplish them after conversion. I’ve done similar with BBEdit and vs code basically to find and replace across a lot of documents.

trijste@lemmy.world · 9 months ago

Oh wait I think you want to expressly use pandoc, my bad