Back

A text-only Markdown preview with Elixir (and Haskell)

Recently, I was doing some stuff in Markdown with Phoenix, more specifically with Earmark, and I needed to have a "textual only" preview of some Markdown text. That is, I wanted to strip the tags but keep the content.

The Earmark AST

By looking at the Earmark documentation, the AST returned by EarmarkParser.as_ast/2 is a list of 4-uples made like this

 {
   block,      # The name of the HTML block
   attrs,      # The HTML attributes of the block
   content,    # The content, a list of ASTs and/or strings
   annotations # A map from atom to string
}

So the AST for

# A title
A paragraph with **bold** and _italic_. 

Another with a [link](www.duckduckgo.com).

Is something like

[
  {"h1", [], ["A title"], %{}},
  {"p", [],
   [
     "A paragraph with ",
     {"strong", [], ["bold"], %{}},
     " and ",
     {"em", [], ["italic"], %{}},
     "."
   ], %{}},
  {"p", [],
   [
     "Another with a ",
     {"a", [{"href", "www.duckduckgo.com"}], ["link"], %{}},
     "."
   ], %{}}
]

The Elixir solution

What I came up with was something like this1

defmodule MarkdownHelpers do
  @block_tags ~w(p div ul ol h1 h2 h3 ...) # and so on

  def markdown_preview(markdown) do
    case EarmarkParser.as_ast(markdown) do
      {:ok, ast, _warnings}   -> strip_tags(ast)
      {:error, _ast, _errors} -> ""
    end
  end

  defp strip_tags(s) when is_bitstring(s), do: s
  
  defp strip_tags({b, _, ast, _}) when b in @block_tags do
    # Here I add "\n\n" because I want to keep
    # the original blocks separated.
    # Want everything on one line? Just use " "
    strip_tags(ast) <> "\n\n"
  end

  defp strip_tags({_, _, ast, _}), do: strip_tags(ast)

  defp strip_tags(xs) when is_list(xs) do
    xs
    |> Enum.map(&strip_tags/1)
    |> Enum.join()
  end
end

And it does work quite nicely

iex> MarkdownHelpers.markdown_preview("""
...> # A title
...> A paragraph with **bold** and _italic_. 
...> 
...> Another with a [link](www.duckduckgo.com).
...> """) |> IO.puts()
A title

A paragraph with bold and italic.

Another with a link.


:ok

The Elixir code is quite nice, and the recursive structure of the AST is (obviously) well suited for the use of recursion.

The Haskell solution

And as soon as I did this I wanted to try a Haskell version. First of all, Elixir is dynamic language, while Haskell is static. So I need to invent a type for the Earmark representation of an AST. One solution could be something like2

module Earmark where

import Data.Map (Map)

type Markdown    = String
type BlockName   = String
type Attrs       = [(String, String)]
type Annotations = Map String String

data AST
  = Str String
  | Block BlockName Attrs [AST] Annotations

asAst :: Markdown -> Either String [AST]
asAst = ...

Now the code pretty much translates trivially.

module MarkdownHelpers (markdownPreview) where

import Earmark

blockTags :: [String]
blockTags = ["p", "div", "ul", "ol", "h1", "h2", "h3", ...] -- and so on

markdownPreview :: Markdown -> Either String String
markdownPreview m = stripTagsList <$> asAst m

stripTagsList :: [AST] -> String
stripTagsList = concatMap stripTags

stripTags :: AST -> String
stripTags (Str s)      = s
stripTags (Block p _ ast _)
  | p `elem` blockTags = stripTagsList ast ++ "\n\n"
  | otherwise          = stripTagsList ast

Notice the two functions stripTagsList and stripTags: since Haskell is statically typed, you can't make stripTags accept both a list and AST as you would in Elixir.

But the code is quite short, and if we take the AST for the same piece of Markdown as before, i.e.

block p a c = Block p a c Map.empty

[ block "h1" [] [Str "A title"]
, block "p" []
  [ Str "A paragraph with ", block "strong" [] [Str "bold"]
  , Str " and ", block "em" [] [Str "italic"]
  , Str "."
  ]
, block "p" []
  [ Str "Another with a "
  , block "a" [("href", "www.duckduckgo.com")] [Str "link"]
  , Str "."
  ]
]

The result is the same.

ghci> Right s = markdownPreview "# A title ..."
ghci> putStrLn s
A title

A paragraph with bold and italic.\n\n

Another with a link.

ghci>

Conclusion

The Elixir code doesn't have to deal with types, so is concise and simple, but the syntax (although I like it) is a little bit more "noisy" than the Haskell one, which is a lot more pristine.

On the other hand, in the Haskell world this is possible only because there is a (maybe not trivial) system of types that represents all the possibilities explicitely. You get a very strong type safety, but it can be harder to write.


1

There are probably better ways to do this, but I wanted to keep things quick and simple.

2

Since I only cared about the AST, here I ignored the possible warnings or errors returned by as_ast and just used a generic String for the error message.