Jsoup

The module emil-jsoup can be used for easier dealing with html mails. It is based on the famous jsoup library.

Usage

With sbt:

libraryDependencies += "com.github.eikek" %% "emil-jsoup" % "0.15.0"

Description

This module provides the following:

  • A custom transformation to the builder dsl to modify the html content in an e-mail. This allows to clean the html from unwanted content using a custom whitelist of allowed elements. For more information, please refer to the documentation of jsoup.
  • Create a unified html view of a mail body.

For the examples, consider the following mail:

import cats.effect._
import emil._
import emil.builder._

val htmlMail = """<h1>A header</h2><p onclick="alert('hi!');">Hello</p><p>World<p>"""
// htmlMail: String = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"

val mail: Mail[IO] = MailBuilder.build(
  From("me@test.com"),
  To("test@test.com"),
  Subject("Hello!"),
  HtmlBody(htmlMail)
)
// mail: Mail[IO] = Mail(
//   header = MailHeader(
//     id = "",
//     messageId = None,
//     folder = None,
//     recipients = Recipients(
//       to = List(MailAddress(name = None, address = "test@test.com")),
//       cc = List(),
//       bcc = List()
//     ),
//     sender = None,
//     from = Some(value = MailAddress(name = None, address = "me@test.com")),
//     replyTo = None,
//     originationDate = None,
//     subject = "Hello!",
//     received = List(),
//     flags = Set()
//   ),
//   additionalHeaders = Headers(all = List()),
//   body = Html(
//     html = Pure(
//       value = StringContent(
//         asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
//       )
//     )
//   ),
//   attachments = Attachments(all = Vector())
// )

Cleaning HTML

Note the evil onclick and the malformed html. A clean content can be created using the BodyClean transformation:

import emil.jsoup._

val cleanMail = mail.asBuilder
  .add(BodyClean(EmailWhitelist.default))
  .build
// cleanMail: Mail[IO] = Mail(
//   header = MailHeader(
//     id = "",
//     messageId = None,
//     folder = None,
//     recipients = Recipients(
//       to = List(MailAddress(name = None, address = "test@test.com")),
//       cc = List(),
//       bcc = List()
//     ),
//     sender = None,
//     from = Some(value = MailAddress(name = None, address = "me@test.com")),
//     replyTo = None,
//     originationDate = None,
//     subject = "Hello!",
//     received = List(),
//     flags = Set()
//   ),
//   additionalHeaders = Headers(all = List()),
//   body = Html(
//     html = Map(
//       ioe = Pure(
//         value = StringContent(
//           asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
//         )
//       ),
//       f = emil.jsoup.BodyClean$$$Lambda$2261/0x000000080188ec60@39626e48,
//       event = cats.effect.tracing.TracingEvent$StackTrace
//     )
//   ),
//   attachments = Attachments(all = Vector())
// )

This creates a new mail where the body is annotated with a cleaning function. This only applies to html parts. When the body is now evaluated, the string looks now like this:

import cats.effect.unsafe.implicits.global

cleanMail.body.htmlPart.map(_.map(_.asString)).unsafeRunSync()
// res0: Option[String] = Some(
//   value = "<html><head><meta charset=\"UTF-8\"></head><body><h1>A header</h1><p>Hello</p><p>World</p><p></p></body></html>"
// )

Jsoup even fixes the invalid html tree.

Html View

The HtmlBodyView class can be used to create a unified view of an e-mail body. It produces HTML, converting a text-only body into html. For better results here, use the emil-markdown module.

Example:

val htmlView = HtmlBodyView(
  mail.body,
  Some(mail.header)
)
// htmlView: IO[BodyContent] = Map(
//   ioe = Pure(
//     value = StringContent(
//       asString = "<h1>A header</h2><p onclick=\"alert('hi!');\">Hello</p><p>World<p>"
//     )
//   ),
//   f = emil.jsoup.HtmlBodyView$$$Lambda$2285/0x00000008018b2318@3db75056,
//   event = cats.effect.tracing.TracingEvent$StackTrace
// )

If the mailHeader is given (second argument), a short header with the sender, receiver and subject is included into the result. The third argument is a config object HtmlBodyViewConfig that has a default value that contains:

  • a function to convert a text-only body into html. This uses a very basic string replacement approach and also escapes html entities in the text. Use the emil-markdown module for more sophisticated text-to-html conversion.
  • a datetime-formatter and a timezone to use when inserting the e-mail date into the document
  • a function to modify the html document tree, which by defaults uses the cleaner from BodyClean to remove unwanted content

The result of the example is:

htmlView.map(_.asString).unsafeRunSync()
// res1: String = """<html><head><meta charset="UTF-8"></head><body><div style="padding-bottom: 0.8em;">
// <strong>From:</strong> <code>me@test.com</code><br>
// <strong>To:</strong> <code>test@test.com</code><br>
// <strong>Subject:</strong> <code>Hello!</code><br>
// <strong>Date:</strong> <code>-</code>
// </div>
// <h1>A header</h1><p>Hello</p><p>World</p><p></p></body></html>"""