Jsoup
The module emil-jsoup
can be used for easier dealing with html
mails. It is based on the famous jsoup library.
Usage
With sbt:
libraryDependencies += "com.github.eikek" %% "emil-jsoup" % "0.15.0"
Description
This module provides the following:
- A custom transformation to the builder dsl to modify the html content in an e-mail. This allows to clean the html from unwanted content using a custom whitelist of allowed elements. For more information, please refer to the documentation of jsoup.
- Create a unified html view of a mail body.
For the examples, consider the following mail:
import cats.effect._
import emil._
import emil.builder._
val htmlMail = """<h1>A header</h2><p onclick="alert('hi!');">Hello</p><p>World<p>"""
// htmlMail: String = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
val mail: Mail[IO] = MailBuilder.build(
From("me@test.com"),
To("test@test.com"),
Subject("Hello!"),
HtmlBody(htmlMail)
)
// mail: Mail[IO] = Mail(
// header = MailHeader(
// id = "",
// messageId = None,
// folder = None,
// recipients = Recipients(
// to = List(MailAddress(name = None, address = "test@test.com")),
// cc = List(),
// bcc = List()
// ),
// sender = None,
// from = Some(value = MailAddress(name = None, address = "me@test.com")),
// replyTo = None,
// originationDate = None,
// subject = "Hello!",
// received = List(),
// flags = Set()
// ),
// additionalHeaders = Headers(all = List()),
// body = Html(
// html = Pure(
// value = StringContent(
// asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
// )
// )
// ),
// attachments = Attachments(all = Vector())
// )
Cleaning HTML
Note the evil onclick
and the malformed html. A clean content can be
created using the BodyClean
transformation:
import emil.jsoup._
val cleanMail = mail.asBuilder
.add(BodyClean(EmailWhitelist.default))
.build
// cleanMail: Mail[IO] = Mail(
// header = MailHeader(
// id = "",
// messageId = None,
// folder = None,
// recipients = Recipients(
// to = List(MailAddress(name = None, address = "test@test.com")),
// cc = List(),
// bcc = List()
// ),
// sender = None,
// from = Some(value = MailAddress(name = None, address = "me@test.com")),
// replyTo = None,
// originationDate = None,
// subject = "Hello!",
// received = List(),
// flags = Set()
// ),
// additionalHeaders = Headers(all = List()),
// body = Html(
// html = Map(
// ioe = Pure(
// value = StringContent(
// asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
// )
// ),
// f = emil.jsoup.BodyClean$$$Lambda$2261/0x000000080188ec60@39626e48,
// event = cats.effect.tracing.TracingEvent$StackTrace
// )
// ),
// attachments = Attachments(all = Vector())
// )
This creates a new mail where the body is annotated with a cleaning function. This only applies to html parts. When the body is now evaluated, the string looks now like this:
import cats.effect.unsafe.implicits.global
cleanMail.body.htmlPart.map(_.map(_.asString)).unsafeRunSync()
// res0: Option[String] = Some(
// value = "<html><head><meta charset=\"UTF-8\"></head><body><h1>A header</h1><p>Hello</p><p>World</p><p></p></body></html>"
// )
Jsoup even fixes the invalid html tree.
Html View
The HtmlBodyView
class can be used to create a unified view of an
e-mail body. It produces HTML, converting a text-only body into html.
For better results here, use the emil-markdown
module.
Example:
val htmlView = HtmlBodyView(
mail.body,
Some(mail.header)
)
// htmlView: IO[BodyContent] = Map(
// ioe = Pure(
// value = StringContent(
// asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
// )
// ),
// f = emil.jsoup.HtmlBodyView$$$Lambda$2285/0x00000008018b2318@3db75056,
// event = cats.effect.tracing.TracingEvent$StackTrace
// )
If the mailHeader
is given (second argument), a short header with
the sender, receiver and subject is included into the result. The
third argument is a config object HtmlBodyViewConfig
that has a
default value that contains:
- a function to convert a text-only body into html. This uses a very
basic string replacement approach and also escapes html entities in
the text. Use the
emil-markdown
module for more sophisticated text-to-html conversion. - a datetime-formatter and a timezone to use when inserting the e-mail date into the document
- a function to modify the html document tree, which by defaults uses
the cleaner from
BodyClean
to remove unwanted content
The result of the example is:
htmlView.map(_.asString).unsafeRunSync()
// res1: String = """<html><head><meta charset="UTF-8"></head><body><div style="padding-bottom: 0.8em;">
// <strong>From:</strong> <code>me@test.com</code><br>
// <strong>To:</strong> <code>test@test.com</code><br>
// <strong>Subject:</strong> <code>Hello!</code><br>
// <strong>Date:</strong> <code>-</code>
// </div>
// <h1>A header</h1><p>Hello</p><p>World</p><p></p></body></html>"""