Functional data validation: to monad or not to monad?
Article info
Posted on by Matthias Valvekens
Tags: functional-programming, haskell, tech
Problem statement
“Make invalid states unrepresentable” is an oft-repeated adage in software design, particularly popular in languages with expressive type systems. It’s good advice, and it generally leads to cleaner code bases with fewer opportunities for bugs to creep in.
However, the programs we write are not idealised spherical cows in a vacuum: they actually have to communicate with their environment to do anything useful, e.g. through a REST API, a streaming service or some other I/O layer. Maybe you’re sending data over the wire using a protocol with strict backwards compatibility requirements, or your database schema has changed over the years, and you have to find ways to cope with potential defects in historical data. We can’t let these earthly concerns get in the way of modelling our computations properly now, can we?
Briefly put, in order for the internal domain model to be “nice”, one typically has to do a lot of data conversion and validation at the edge of the system. Last week, I found myself in such a situation, which led to a discussion on the merits and demerits of various FP-based error handling techniques. That discussion is what prompted me to write this post.
This validation/conversion problem is not a new or unsolved problem in FP, but some design choices need to be made:
- Do we want the data ingestion process to fail fast when encountering errors, or do we rather want to try to process as much data as possible and report the resulting errors?
- Do we want to distinguish between errors and warnings?
- Is the validation logic required to compose cleanly with the way we handle errors?
In this post, we’ll take a look at a few patterns to tackle these questions in a functional programming setting, with plenty of example code. I’ll be using Haskell for the examples, but they should work just as well in other languages with a decent type system like Scala or OCaml.
A toy example
Let’s imagine a situation where we’re processing information about human users of some application, with a domain model looking like this:
import Data.Time.Calendar
data User = User
name :: String,
{ dateJoined :: Day,
dateOfBirth :: Maybe Day
}deriving (Show)
In other words, for every user we record a name and the date they joined, and possibly their date of birth, if applicable. Next, let’s imagine that we have to extract this user data from a Map String String
. Obviously, this is a contrived example, but it’ll do for now.
What about errors? To keep things simple, let’s say that we are interested in two error conditions: missing required fields, and parse errors in fields. Those can be represented in the following error type:
type FieldName = String -- convenient alias for readability
data ConversionErr
= MissingField FieldName
| FieldParsingError FieldName String
deriving (Show)
Warm-up: error handling with Either
The most basic error handler in functional programming is arguably Either
.
Applied to our toy example, this is what the API looks like:
type FieldData = Map String String
toUser :: FieldData -> Either ConversionErr User
In plain terms, the toUser
function takes some field data as input, and spits out either a ConversionErr
, or a User
value.
Here’s an example module implementing this toUser
API.
module ExampleAbstract
User (..),
( ConversionErr (..),
FieldName,
FieldParser,
FieldData,
ConvertWithErrors (..),
requiredField,
optionalField,
parseDate,
toUser,
)where
import Data.Map
import Data.Time.Calendar (Day)
import Data.Time.Format.ISO8601 (iso8601ParseM)
import Prelude hiding (fail)
data User = User
name :: String,
{ dateJoined :: Day,
dateOfBirth :: Maybe Day
}deriving (Show)
type FieldName = String
data ConversionErr
= MissingField FieldName
| FieldParsingError FieldName String
deriving (Show)
type FieldData = Map String String
type FieldParser a = String -> Either String a
-- | Parse an ISO date string into a 'Day' value.
parseDate :: FieldParser Day
= case iso8601ParseM str of
parseDate str Just day -> Right day
Nothing -> Left $ str ++ " is not a valid date string"
-- | Look up a required field and parse it using the provided 'FieldParser'.
requiredField ::
FieldName ->
FieldParser a ->
FieldData ->
Either ConversionErr a
= case lookup fieldName input of
requiredField fieldName parse input Just value -> first (FieldParsingError fieldName) (parse value)
Nothing -> Left (MissingField fieldName)
-- | Look up an optional field and parse it using the provided 'FieldParser'
-- if a value is found. If thoe field is not present, return 'Nothing'.
-- If the field can't be parsed, return an error.
optionalField ::
FieldName ->
FieldParser a ->
FieldData ->
Either ConversionErr (Maybe a)
= case lookup fieldName input of
optionalField fieldName parse input Just value -> Just <$> first (FieldParsingError fieldName) (parse value)
Nothing -> Right Nothing
-- | Construct a 'User' from 'FieldData'.
toUser :: FieldData -> Either ConversionErr User
= User <$> getName <*> getDateJoined <*> getDateOfBirth
toUser input where
= requiredField "name" pure input
getName = requiredField "dateJoined" parseDate input
getDateJoined = optionalField "dateOfBirth" parseDate input getDateOfBirth
Notice how the toUser
function actually only uses Applicative
-style combinators. This abstraction will be useful later.
But first, let’s try this out on some inputs.
ghci> toUser $ fromList [("name", "John Doe"), ("dateJoined", "2022-12-14")]
Right (User {name = "John Doe", dateJoined = 2022-12-14, dateOfBirth = Nothing})
ghci> toUser $ fromList [("name", "John Doe"), ("dateJoined", "2022-12-14"), ("dateOfBirth", "1960-01-01")]
Right (User {name = "John Doe", dateJoined = 2022-12-14, dateOfBirth = Just 1960-01-01})
ghci> toUser $ fromList [("name", "John Doe"), ("dateJoined", "2022-12-32")]
Left (FieldParsingError "dateJoined" "2022-12-32 is not a valid date string")
ghci> toUser $ fromList [("name", "John Doe"), ("dateOfBirth", "1960-01-01")]
Left (MissingField "dateJoined")
ghci> toUser $ fromList [("name", "John Doe"), ("dateOfBirth", "1960-01-32")]
Left (MissingField "dateJoined")
Some observations:
- When the input is valid, we get back a
Right (User {...})
value. - When the input is not valid, we get back a
Left (...)
with aConversionErr
value that tells us what went wrong. Either
fails fast: it will stop processing input on the first error it encounters. As a consequence of that, you’ll always see at most one error in the finalEither
value, even if the input has multiple issues!
While Either
is useful to get syntactically cheap error messages, its “fail-fast” behaviour is not always desirable. For example, if you’re processing data submitted by a human user, having them fix the input one error at a time doesn’t exactly make for a great user experience. Can we do better?
Abstract setup
Before we proceed to experiment with alternative approaches to error management, let’s rewrite that code a little more abstractly, so we can easily swap between error handling strategies.
module ExampleAbstract where
import qualified Data.Map as M
import Data.Time.Calendar (Day)
import Data.Time.Format.ISO8601 (iso8601ParseM)
import Prelude hiding (fail, lookup)
data User = User
name :: String,
{ dateJoined :: Day,
dateOfBirth :: Maybe Day
}deriving (Show)
type FieldName = String
data ConversionErr
= MissingField FieldName
| FieldParsingError FieldName String
deriving (Show)
type FieldData = M.Map String String
class Applicative f => ConvertWithErrors f where
fail :: ConversionErr -> f a
parseField :: ConvertWithErrors f => FieldName -> Either String a -> f a
Right result) = pure result
parseField _ (Left err) = fail (FieldParsingError fieldName err)
parseField fieldName (
type FieldParser a = String -> Either String a
-- | Parse an ISO date string into a 'Day' value.
parseDate :: FieldParser Day
= case iso8601ParseM str of
parseDate str Just day -> Right day
Nothing -> Left $ str ++ " is not a valid date string"
-- | Look up a required field and parse it using the provided 'FieldParser'.
requiredField ::
ConvertWithErrors f =>
FieldName ->
FieldParser a ->
FieldData ->
f a= case M.lookup fieldName input of
requiredField fieldName parse input Just value -> parseField fieldName (parse value)
Nothing -> fail (MissingField fieldName)
-- | Look up an optional field and parse it using the provided 'FieldParser'
-- if a value is found. If the field is not present, return 'Nothing'.
-- If the field can't be parsed, return an error.
optionalField ::
ConvertWithErrors f =>
FieldName ->
FieldParser a ->
FieldData ->
Maybe a)
f (= case M.lookup fieldName input of
optionalField fieldName parse input Just value -> Just <$> parseField fieldName (parse value)
Nothing -> pure Nothing
-- | Construct a 'User' from 'FieldData'.
toUser :: ConvertWithErrors f => FieldData -> f User
= User <$> getName <*> getDateJoined <*> getDateOfBirth
toUser input where
= requiredField "name" pure input
getName = requiredField "dateJoined" parseDate input
getDateJoined = optionalField "dateOfBirth" parseDate input getDateOfBirth
A few remarks:
Notice how
ConvertWithErrors
is defined to extendApplicative
instead ofMonad
. This is actually key to our approach.For simplicity’s sake, we kept the
Either String a
type for parsing individual fields. If you’re parsing nested structures, you might want to track errors across multiple “levels”. This requires more sophisticated typing.
To recover our original toUser
, it now suffices to write
instance ConvertWithErrors (Either ConversionErr) where
fail = Left
Gathering multiple errors with Validated
Recall how Either
did not allow us to return all the errors in a computation at once. This is a natural consequence of Either
’s fail-fast approach.
The Validated
functor (found in Scala’s cats
suite, among others) doesn’t have that limitation. Typically, it’s defined as follows:
data Validated e a = Invalid e | Valid a deriving (Show)
valid :: Validated e a -> Maybe a
Valid x) = Just x
valid (= Nothing
valid _
invalid :: Validated e a -> Maybe e
Invalid err) = Just err
invalid (= Nothing
invalid _
instance Functor (Validated e) where
fmap f (Valid x) = Valid (f x)
fmap _ (Invalid e) = Invalid e
instance Semigroup e => Applicative (Validated e) where
pure = Valid
Valid f <*> Valid x = Valid (f x)
Invalid err <*> Valid _ = Invalid err
Invalid err <*> Invalid err' = Invalid (err <> err')
Valid _ <*> Invalid err = Invalid err
The definition of Validated
as a type looks very similar to Either
. In fact, the two are isomorphic as types (and as (bi)functors).
The fundamental difference is in the Applicative
instance:
- Unlike
Either e
,Validated e
has aSemigroup
constraint on the error typee
.This is necessary to combine the errors! - The implementation of
<*>
is different: withEither
,Left err <*> Left err'
would always yieldLeft err
, whileValidated
allows the two errors to be combined using theSemigroup
instance fore
. - Perhaps less obvious is the fact that the
Applicative
instance forValidated e
is actually not monadic! In other words, there is no way to (polymorphically ine
1) define an instance ofMonad (Validated e)
such that theApplicative
structure induced by(<*>) = ap
reduces to the one defined here!
Let’s try to apply this to our toUser
example.
import qualified Data.Map as M
import qualified Data.List.NonEmpty as L
import ExampleAbstract
instance ConvertWithErrors (Validated (L.NonEmpty ConversionErr)) where
fail err = Invalid (L.singleton err)
testUserValidated :: [(String, String)] -> IO ()
= case toUser (M.fromList fields) of
testUserValidated fields Valid usr -> print usr
Invalid (errs :: L.NonEmpty ConversionErr) -> sequence_ (print <$> errs)
We use NonEmpty ConversionErr
instead of the perhaps more straightforward [ConversionErr]
because Invalid []
is a nonsensical result state. We don’t want those in our data model.
This is also why the Semigroup
’s used for error handling are typically not Monoid
s.
Running this on some invalid input indeed shows that we now get back all the errors, as expected:
ghci> testUserValidated [("name", "John Doe"), ("dateJoined", "2020-12-31")]
User {name = "John Doe", dateJoined = 2020-12-31, dateOfBirth = Nothing}
ghci> testUserValidated [("dateJoined", "2020-12-32"), ("dateOfBirth", "2000-13-01")]
MissingField "name"
FieldParsingError "dateJoined" "2020-12-32 is not a valid date string"
FieldParsingError "dateOfBirth" "2000-13-01 is not a valid date string"
Here’s what we have so far:
With a simple change in types, we got more aggressive validation logic without having to change any of the “business logic”.
The price we paid for that is that our error-gathering type is no longer a monad, but “merely” an applicative. Fortunately, in practice, many interesting computations where one would want to use this type of error handling can be restructured to not really require monads in the first place.
Recoverable errors with Ior
What if you want to be able to distinguish between critical errors and recoverable ones? In our toUser
example, the dateOfBirth
field is a Maybe Day
, so we could elect to just set it to Nothing
if the input is invalid, while still logging a warning.
Let’s start by extending our example a little to spell out what we want.
module ExampleAbstractWarnings
module ExampleAbstract,
( ConvertWithWarnings (..),
toUserTolerant,
)where
import qualified Data.Map as M
import ExampleAbstract
class ConvertWithErrors f => ConvertWithWarnings f where
-- | Record a warning and return a default value.
warn :: ConversionErr -> a -> f a
parseFieldOrDefault ::
ConvertWithWarnings f =>
FieldName ->
->
a Either String a ->
f aRight x) = pure x
parseFieldOrDefault _ _ (Left err) =
parseFieldOrDefault fieldName defResult (FieldParsingError fieldName err) defResult
warn (
tolerantOptionalField ::
ConvertWithWarnings f =>
FieldName ->
FieldParser a ->
FieldData ->
Maybe a)
f (= case M.lookup fieldName input of
tolerantOptionalField fieldName parse input Just value -> parseFieldOrDefault fieldName Nothing (Just <$> parse value)
Nothing -> pure Nothing
-- | Construct a 'User' from 'FieldData'.
toUserTolerant :: ConvertWithWarnings f => FieldData -> f User
= User <$> getName <*> getDateJoined <*> getDateOfBirth
toUserTolerant input where
= requiredField "name" pure input
getName = requiredField "dateJoined" parseDate input
getDateJoined = tolerantOptionalField "dateOfBirth" parseDate input getDateOfBirth
One way to implement this new ConvertWithWarnings
typeclass goes through the Ior
(“inclusive or”) type, which is also part of Scala’s cats
toolkit. In a nutshell, Ior
is a variant of Either
with an added Both
constructor.
data Ior a b = JustLeft a | JustRight b | Both a b deriving (Show)
instance Functor (Ior a) where
fmap _ (JustLeft x) = JustLeft x
fmap f (JustRight y) = JustRight (f y)
fmap f (Both x y) = Both x (f y)
In order to define a suitable Applicative
instance, we again need the error type to be a Semigroup
, like with Validated
.
-- | Utility function to "attach" existing errors to a continued computation.
combineErrors :: Semigroup e => e -> Ior e a -> Ior e a
JustRight x) = Both err x
combineErrors err (JustLeft err') = JustLeft (err <> err')
combineErrors err (Both err' y) = Both (err <> err') y
combineErrors err (
instance Semigroup e => Applicative (Ior e) where
pure = JustRight
JustLeft err <*> _ = JustLeft err
JustRight f <*> xF = fmap f xF
Both err f <*> xF = combineErrors err (fmap f xF)
Actually, we didn’t even have to bother spelling out the (<*>)
implementation, because Ior e
is actually a monad2 under this constraint! We could just as well have written
instance Semigroup e => Applicative (Ior e) where
pure = JustRight
<*>) = ap -- from Control.Monad
(
instance Semigroup e => Monad (Ior e) where
JustLeft err >>= _ = JustLeft err
JustRight x >>= f = f x
Both err x >>= f = combineErrors err (f x)
We now pretty much have everything we need.
import qualified Data.List.NonEmpty as L
import qualified Data.Map as M
import ExampleAbstractWarnings
instance ConvertWithErrors (Ior (L.NonEmpty ConversionErr)) where
fail err = JustLeft (L.singleton err)
instance ConvertWithWarnings (Ior (L.NonEmpty ConversionErr)) where
= Both (L.singleton err) defResult
warn err defResult
testUserIor :: [(String, String)] -> IO ()
= case toUserTolerant (M.fromList fields) of
testUserIor fields JustRight usr -> print usr
JustLeft errs -> printErrs errs
Both warnings usr -> print usr >> putStrLn "Warnings:" >> printErrs warnings
where printErrs :: L.NonEmpty ConversionErr -> IO ()
= sequence_ (print <$> errs) printErrs errs
Let’s try that on some inputs as well:
ghci> testUserIor [("name", "John Doe"), ("dateJoined", "2020-12-31"), ("dateOfBirth", "2000-13-01")]
User {name = "John Doe", dateJoined = 2020-12-31, dateOfBirth = Nothing}
Warnings:
FieldParsingError "dateOfBirth" "2000-13-01 is not a valid date string"
ghci> testUserIor [("name", "John Doe"), ("dateJoined", "2020-12-31")]
User {name = "John Doe", dateJoined = 2020-12-31, dateOfBirth = Nothing}
ghci> testUserIor [("dateJoined", "2020-12-32"), ("dateOfBirth", "2000-13-01")]
MissingField "name"
So, we’ve now gotten our monad property back, and we’re able to distinguish between errors and warnings. However, the third invocation shows that, like Either
, Ior
also fails fast on critical errors! What if that’s not what you want?
Aggregating both critical and recoverable errors
Let’s try to figure out how to define an applicative that is to Ior
as Validated
is to Either
.
Unlike the previous examples, I haven’t actually encountered this one in any libraries yet, but maybe I didn’t look hard enough.
Given what we’ve done so far, it seems reasonable to start with something isomorphic to Ior
.
data Pedantic e a = Accept a | Reject e | Nitpick e a deriving (Show)
accepted :: Pedantic e a -> Maybe a
Accept x) = Just x
accepted (Nitpick _ x) = Just x
accepted (= Nothing
accepted _
errors :: Pedantic e a -> Maybe e
Reject err) = Just err
errors (Nitpick err _) = Just err
errors (= Nothing
errors _
instance Functor (Pedantic e) where
fmap f (Accept x) = Accept (f x)
fmap _ (Reject err) = Reject err
fmap f (Nitpick err x) = Nitpick err (f x)
In order to get inspiration for a sensible Applicative
instance for Pedantic e
, it is instructive to note that the <*>
implementations for Validated e
and Ior e
can be rewritten in a more point-free style as follows.
instance Semigroup e => Applicative (Validated e) where
pure = Valid
-- This relation is forced by the applicative laws.
<*>) (Valid f) = fmap f
(
-- What this says informally is that applying (Invalid err <*>), we know we
-- always end up with an Invalid state in the end, but we grab
-- remaining errors (if any!) first before we return it.
<*>) (Invalid err) = Invalid . maybe err (err <>) . invalid
(
instance Semigroup e => Applicative (Ior e) where
pure = JustRight
-- Again: applicative laws.
<*>) (JustRight f) = fmap f
(-- Fail immediately without bothering to evaluate the second argument.
<*>) (JustLeft err) = const (JustLeft err)
(-- Apply the function to what comes next, and tack on the errors we already
-- accumulated after evaluating the result.
<*>) (Both err f) = combineErrors err . fmap f (
Combining these, the Applicative (Pedantic e)
instance we were looking for is now pretty obvious.
addPedantry :: Semigroup e => e -> Pedantic e a -> Pedantic e a
Accept x) = Nitpick err x
addPedantry err (Reject err') = Reject (err <> err')
addPedantry err (Nitpick err' x) = Nitpick (err <> err') x
addPedantry err (
instance Semigroup e => Applicative (Pedantic e) where
pure = Accept
<*>) (Accept f) = fmap f
(<*>) (Reject err) = Reject . maybe err (err <>) . errors
(<*>) (Nitpick err f) = addPedantry err . fmap f (
At this point, you could also decide to unify Accept
and Nitpick
by having Nitpick
take a Maybe e
instead of an e
as its first argument. That’s ultimately a matter of taste.
The identity, homomorphism and interchange laws are easy to verify. Checking the composition law requires some effort, but the proof is an ultimately straightforward exercise in symbol gymnastics.
However, as was the case with Validated
, the Applicative
instance for Pedantic
is not monadic.
If you want to sit down and prove the composition law for Pedantic
, here’s a hint: verify the following identities first.
<$> (v <*> w) = ((f.) <$> v) <*> w
f <*> w) = errors v <> errors w -- (Maybe e) is a Semigroup too!
errors (v <*> w) = addPedantry err (v <*> w) (addPedantry err v
It goes without saying that the associativity of (<>)
is essential to the argument.
To test out our fancy new Applicative
, let’s implement our test example with it.
import qualified Data.List.NonEmpty as L
import qualified Data.Map as M
import ExampleAbstractWarnings
instance ConvertWithErrors (Pedantic (L.NonEmpty ConversionErr)) where
fail err = Reject (L.singleton err)
instance ConvertWithWarnings (Pedantic (L.NonEmpty ConversionErr)) where
= Nitpick (L.singleton err) defResult
warn err defResult
testUserPedantic :: [(String, String)] -> IO ()
= case toUserTolerant (M.fromList fields) of
testUserPedantic fields Accept usr -> print usr
Reject errs -> printErrs errs
Nitpick warnings usr ->
print usr >> putStrLn "Warnings:" >> printErrs warnings
where
printErrs :: L.NonEmpty ConversionErr -> IO ()
= sequence_ (print <$> errs) printErrs errs
Running this test function against the usual set of inputs then produces the following:
ghci> testUserPedantic [("name", "John Doe"), ("dateJoined", "2020-12-31")]
User {name = "John Doe", dateJoined = 2020-12-31, dateOfBirth = Nothing}
ghci> testUserPedantic [("name", "John Doe"), ("dateJoined", "2020-12-31"), ("dateOfBirth", "2000-13-01")]
User {name = "John Doe", dateJoined = 2020-12-31, dateOfBirth = Nothing}
Warnings:
FieldParsingError "dateOfBirth" "2000-13-01 is not a valid date string"
ghci> testUserPedantic [("dateJoined", "2020-12-32"), ("dateOfBirth", "2000-13-01")]
MissingField "name"
FieldParsingError "dateJoined" "2020-12-32 is not a valid date string"
FieldParsingError "dateOfBirth" "2000-13-01 is not a valid date string"
That’s it! We have gotten rid of the fail-fast behaviour while also keeping the distinction between critical and recoverable errors!
Taking all of this one step further, one could even rework the typing a little so that errors and warnings are tracked completely separately, e.g. to report them separately to a human user.
Here’s what that type definition might look like:
data Pedantic' e w a = Nitpick' (Maybe w) a | Reject' (Maybe w) e
Writing down the appropriate Applicative (Pedantic' e w)
instance is left as an exercise to the reader.
Wrapping up
We’ve covered several strategies for handling error conditions here, each with their own advantages and drawbacks.
Either
andIor
are monadic and fail fast on unrecoverable errors. They’re also monad transformers, which makes them easy to use in code that already makes heavy use of monad transformer stacks.Validated
andPedantic
never fail fast, and will attempt to gather as much error information as the structure of the computation allows. They can’t be made into monads, though.Ior
andPedantic
have a recover-with-warning mechanism built in, but can also be used to model unrecoverable error behaviour.Ior
,Validated
andPedantic
require the error type to supply aSemigroup
instance.Either
doesn’t care.
Unless the
Semigroup
instance one
is sufficiently degenerate, that is. For example, in the case wheree = ()
, we of course haveEither () ≅ Validated ()
as applicatives.↩︎Even better, it’s not hard to write down a monad transformer instance for
data IorT e m a = IorT (m (Ior e a))
such thatIor e ≅ IorT e Identity
.↩︎