Does anyone have a good setup/configuration for converting documents to Obsidian-flavored markdown with Pandoc? I’ve been fiddling with it for a few hours but can’t seem to get everything right:

  • Obsidian markdown doesn’t support ^superscript^. I can get Pandoc to use sup instead by allowing raw_html, but then…
  • Image embeds don’t work. Pandoc wants to use img for some reason, and no matter what relative src I use the image just won’t show up.

I could fix all of this by running the files through a linter of some sort, but I feel like I’m missing something. Surely someone must have had these issues before me, right?

  • DrakeRichards@lemmy.worldOP
    link
    fedilink
    arrow-up
    3
    ·
    9 months ago

    I got this mostly working, but it was not easy. Not only does Obsidian have a few peculiarities that make it less compatible with standard Markdown, but Word also does a few funny things.

    Here’s the config.yaml I used for Pandoc:

    from: docx
    to: markdown-smart-simple_tables-multiline_tables-grid_tables+pipe_tables+yaml_metadata_block-superscript-subscript-bracketed_spans-native_spans-link_attributes-raw_html+rebase_relative_paths+four_space_rule
    extract-media: "./"
    wrap: preserve
    markdown-headings: atx
    tab-stop: 2
    shift-heading-level-by: 1
    standalone: true
    template: obsidian.md
    filters:
      - compact-list.lua
      - remove-single-characters.py
      - remove-extra-linebreaks.py
    metadata:
      tags: "tags/go/here"
    

    The three filters:

    • Removed extra linebreaks added between bulleted lists to make them more compact.
    • Removed lines with only a single character in them. Usually an invisible character like nbsp, which made Pandoc’s linter not remove them automatically.
    • Removes linebreaks enclosed in Strong tags. This is an artifact from Word where a line is bolded but has no content: technically the line break is bolded.

    I then ran the resulting file through a RegExp replacement to change the superscript carats into HTML sup tags.

    Even after all this, I still have to go through with an Obsidian plugin to convert the standard Markdown links and embeds into [[Wikilink]] style, since Obsidian will only do one or the other throughout your whole vault.

  • trijste@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    9 months ago

    I’ve done something like this converting html to obsidian md. I interrogated gpt 3.5 with specifically what I needed to accomplish and went from there. If you can’t accomplish a formatting quirk in the same conversion process you might run iterative processes to accomplish them after conversion. I’ve done similar with BBEdit and vs code basically to find and replace across a lot of documents.