A text-only Markdown preview with Elixir (and Haskell)
24 Feb 2022Recently, I was doing some stuff in Markdown with Phoenix, more specifically with Earmark, and I needed to have a "textual only" preview of some Markdown text. That is, I wanted to strip the tags but keep the content.
The Earmark AST
By looking at the Earmark documentation, the AST returned
by EarmarkParser.as_ast/2
is a list of 4-uples made like this
{
block, # The name of the HTML block
attrs, # The HTML attributes of the block
content, # The content, a list of ASTs and/or strings
annotations # A map from atom to string
}
So the AST for
# A title
A paragraph with **bold** and _italic_.
Another with a [link](www.duckduckgo.com).
Is something like
[
{"h1", [], ["A title"], %{}},
{"p", [],
[
"A paragraph with ",
{"strong", [], ["bold"], %{}},
" and ",
{"em", [], ["italic"], %{}},
"."
], %{}},
{"p", [],
[
"Another with a ",
{"a", [{"href", "www.duckduckgo.com"}], ["link"], %{}},
"."
], %{}}
]
The Elixir solution
What I came up with was something like this1
defmodule MarkdownHelpers do
@block_tags ~w(p div ul ol h1 h2 h3 ...) # and so on
def markdown_preview(markdown) do
case EarmarkParser.as_ast(markdown) do
{:ok, ast, _warnings} -> strip_tags(ast)
{:error, _ast, _errors} -> ""
end
end
defp strip_tags(s) when is_bitstring(s), do: s
defp strip_tags({b, _, ast, _}) when b in @block_tags do
# Here I add "\n\n" because I want to keep
# the original blocks separated.
# Want everything on one line? Just use " "
strip_tags(ast) <> "\n\n"
end
defp strip_tags({_, _, ast, _}), do: strip_tags(ast)
defp strip_tags(xs) when is_list(xs) do
xs
|> Enum.map(&strip_tags/1)
|> Enum.join()
end
end
And it does work quite nicely
iex> MarkdownHelpers.markdown_preview("""
...> # A title
...> A paragraph with **bold** and _italic_.
...>
...> Another with a [link](www.duckduckgo.com).
...> """) |> IO.puts()
A title
A paragraph with bold and italic.
Another with a link.
:ok
The Elixir code is quite nice, and the recursive structure of the AST is (obviously) well suited for the use of recursion.
The Haskell solution
And as soon as I did this I wanted to try a Haskell version. First of all, Elixir is dynamic language, while Haskell is static. So I need to invent a type for the Earmark representation of an AST. One solution could be something like2
module Earmark where
import Data.Map (Map)
type Markdown = String
type BlockName = String
type Attrs = [(String, String)]
type Annotations = Map String String
data AST
= Str String
| Block BlockName Attrs [AST] Annotations
asAst :: Markdown -> Either String [AST]
asAst = ...
Now the code pretty much translates trivially.
module MarkdownHelpers (markdownPreview) where
import Earmark
blockTags :: [String]
blockTags = ["p", "div", "ul", "ol", "h1", "h2", "h3", ...] -- and so on
markdownPreview :: Markdown -> Either String String
markdownPreview m = stripTagsList <$> asAst m
stripTagsList :: [AST] -> String
stripTagsList = concatMap stripTags
stripTags :: AST -> String
stripTags (Str s) = s
stripTags (Block p _ ast _)
| p `elem` blockTags = stripTagsList ast ++ "\n\n"
| otherwise = stripTagsList ast
Notice the two functions stripTagsList
and stripTags
: since Haskell is statically typed,
you can't make stripTags
accept both a list and AST
as you would in Elixir.
But the code is quite short, and if we take the AST for the same piece of Markdown as before, i.e.
block p a c = Block p a c Map.empty
[ block "h1" [] [Str "A title"]
, block "p" []
[ Str "A paragraph with ", block "strong" [] [Str "bold"]
, Str " and ", block "em" [] [Str "italic"]
, Str "."
]
, block "p" []
[ Str "Another with a "
, block "a" [("href", "www.duckduckgo.com")] [Str "link"]
, Str "."
]
]
The result is the same.
ghci> Right s = markdownPreview "# A title ..."
ghci> putStrLn s
A title
A paragraph with bold and italic.\n\n
Another with a link.
ghci>
Conclusion
The Elixir code doesn't have to deal with types, so is concise and simple, but the syntax (although I like it) is a little bit more "noisy" than the Haskell one, which is a lot more pristine.
On the other hand, in the Haskell world this is possible only because there is a (maybe not trivial) system of types that represents all the possibilities explicitely. You get a very strong type safety, but it can be harder to write.
There are probably better ways to do this, but I wanted to keep things quick and simple.
Since I only cared about the AST, here I ignored the possible warnings or errors returned by as_ast
and just used a generic String
for the error message.