Emad's Blog - Simple data validation in Haskell

In the previous post, we saw how to use the power of Applicative to validate data, Either Text was our instance of choice. It did well, we could compose several simple helper functions into a few easy to read and easy to change smart constructors.

But it wasn’t perfect, our smart constructor made sure we only get valid products or an error. But in case of errors, it could only give us one message even if the input had several invalid pieces of data. In this post, we will see why this happened and how to fix it.

Let’s have a closer look at our mkProduct smart constructor again:

mkProduct :: Text -> Text -> Double -> Either Text Product
mkProduct name description price = 
  Product                               
  <$> mkProductName name                
  <*> mkProductDescription description  
  <*> mkProductPrice price

As we know from part 1 (Product <$> mkProductName name) is actually applying Product to its first argument which is wrapped in an Applicative (in our case Either Text), and the result is a new function waiting for its other arguments, but the new function itself is also wrapped in Either Text.

We also know that (mkProductName name) could be Right or Left (Would be Left if the name isn’t valid). To understand what happens in (Product <$> mkProductName name), we need to understand how <$> works for Either and since <$> is just a shorthand for fmap, let’s look for that in Data.Either source code.

instance Functor (Either a) where
    fmap _ (Left x) = Left x
    fmap f (Right y) = Right (f y)

As we can see here if the wrapped argument is Right, the function is applied to it and we get back our result wrapped in a Right too. However, if the argument was Left, the function is completely ignored and we get the same Left back.

In our case, this means that if we got an invalid name, (mkProductName name) would be a Left with an error message and (Product <$> mkProductName name) wouldn’t even try to apply the Product function. Which makes perfect sense, you can’t apply Product to an invalid name anyway because Product needs a ProductName and we can’t get one from an invalid name.

So far so good, so now we understand that if we have an invalid name, the whole (Product <$> mkProductName name) would be an error message wrapped in Left. Let’s move to the next part.

This one is easy, we know that (mkProductDescription description) will also be an error message wrapped in Left if we got an invalid description, right? … So for the sake of example, I’ll substitute those two expressions with Left constructors, so for a moment think of this:

Product                               
  <$> mkProductName name                
  <*> mkProductDescription description

As if it’s this:

Left "Invalid product name" <*> Left "Invalid product description"

Which would actually be the case if we got an invalid name and an invalid description.

In fact, the previous expression is a valid Haskell, and if you execute it in GHCI it will print Left "Invalid product name" and completely ignore the second error message. To understand why, we need to look at how <*> works for Either … Again, we can find this in Data.Either source code:

instance Applicative (Either e) where
    pure          = Right
    Left  e <*> _ = Left e
    Right f <*> r = fmap f r

Look at this specific line:

Left  e <*> _ = Left e

As we can see, here’s exactly where things go wrong!

When <*> gets an error (Wrapped in Left) as its first argument, it doesn’t even look at the second argument.

Now imagine if it looked like this:

Left  e1 <*> Left e2 = Left $ combine e1 e2

With this implementation, we don’t lose error messages! … If our imaginary function combine can somehow combine both errors in a new one that represents both e1 and e2 … Sounds good, right?

But why the smart Haskell designers didn’t actually make it like that? .. The answer is: because it’s actually not possible to write this imaginary combine function!

Why impossible? .. Because as we said before, Either lets you use any type to represent your errors and we can’t simply write a function that combine any two pieces of data of any type!

The only thing we know here is that e1 and e2 have the same type but we still don’t know any other information about this type so we can’t really combine them!

But, we don’t give up … We look in our category theory jungle for a solution (It always has one) and we find something very interested called Semigroup.

A type a is a Semigroup if it provides an associative function (<>) that lets you combine any two values of type a into one.

And this means we don’t need to know the exact type of e1 and e2, we only need to make sure this type is an instance of Semigroup and then we can rely on its implementation of (<>) … This sounds brilliant!!

So now imagine if <*> was defined like this:

instance Semigroup e => Applicative (Either e) where
    pure          = Right
    Left  e1 <*> Left e2 = Left (e1 <> e2)
    Right f <*> r = fmap f r

This would actually work, but it would be a bit restrictive. We wouldn’t be able to choose any type for our errors anymore, only instances of Semigroup would be allowed! … For validation purposes, this isn’t really a problem, we will have only one type of validation errors anyway and in fact, even the simple Text is an instance of Semigroup!

But, Either isn’t only for validation. It could be used for many other things. That’s why Haskell designers didn’t want to restrict it to semigroups … But luckily there’s another type that looks exactly like Either except that its developers happily restricted it to semigroups to make it fit perfectly to data validation!

This type is called Validation. It also has two constructors, but instead of Left and Right they are called Success and Failure.

data Validation err a =
  Failure err
  | Success a

All we need to do is to install it, import it and just replace Either with Validation in our types and replace Left with Failure. Since both are instances of Applicative, everything should just work after this very simple refactor.

So now our validation helpers should look like this:

import RIO
import qualified RIO.Text as T
import Data.Validation

minLength :: Text -> Int -> Text -> Validation Text ()
minLength fName n text =
  if T.length text >= n
    then pure ()
    else Failure $ fName <> " should have at least " <> (T.pack . show) n <> " characters"

maxLength :: Text -> Int -> Text -> Validation Text ()
maxLength fName n text =
  if T.length text <= n
    then pure ()
    else Failure $ fName <> " shouldn't exceed " <> (T.pack . show) n <> " characters"

minNumber :: (Ord a, Show a) => Text -> a -> a -> Validation Text ()
minNumber fName n m =
  if m >= n
    then pure ()
    else Failure $ fName <> " should be at least " <> (T.pack . show) n

maxNumber :: (Ord a, Show a) => Text -> a -> a -> Validation Text ()
maxNumber fName n m =
  if m <= n
    then pure ()
    else Failure $ fName <> " shouldn't exceed " <> (T.pack . show) n

And our smart constructors should look like this:

newtype ProductName = ProductName Text
newtype ProductDescription = ProductDescription Text
newtype ProductPrice = ProductPrice Double

data Product = Product ProductName ProductDescription ProductPrice

mkProductName :: Text -> Validation Text ProductName
mkProductName name =
  minLength "Product name" 5 name
    *> maxLength "Product name" 30 name
    *> (pure $ ProductName name)

mkProductDescription :: Text -> Validation Text ProductDescription
mkProductDescription description =
  minLength "Product description" 15 description
    *> maxLength "Product description" 300 description
    *> (pure $ ProductDescription description)

mkProductPrice :: Double -> Validation Text ProductPrice
mkProductPrice price =
  minNumber "Product price" 0.01 price
    *> maxNumber "Product price" 1000000 price
    *> (pure $ ProductPrice price)

mkProduct :: Text -> Text -> Double -> Validation Text Product
mkProduct name description price =
  Product 
    <$> mkProductName name
    <*> mkProductDescription description
    <*> mkProductPrice price

Notice that Validation isn’t an instance of Monad, so if we used the do syntax with Either it would be harder to refactor our code to use Validation and we would end up using the applicative style anyway.

Also notice that because we used pure instead of Right, we only had to change Either to Validation in our smart constructor types. We didn’t change any implementation code, only the types! .. And we could even omit those types altogether and GHC is smart enough to infer them.

Now it’s time to test our code again, I’ll let you test the simple cases alone and I’ll jump directly to the last test we made in Part 1.

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product With Very Long Name" "Short Des" 20

Output: Failure "Product name shouldn't exceed 30 charactersProduct description should have at least 15 characters"

The first change we notice in the output is that we got Failure instead of Left which should be expected.

The second and more interesting change is that we actually got both error messages, but we got them concatenated in a very ugly way! … Well, at least we are on the right track.

Remember this line?

Left  e1 <*> Left e2 = Left (e1 <> e2)

Now it should be

Failure  e1 <*> Failure e2 = Failure (e1 <> e2)

But this isn’t the point, the point is that e1 and e2 are of type Text, and simply (<>) for Text is the same as concatenation! … We don’t want to concatenate error messages, it would be better if we can get them in a list, right?

The type [Text] is also a Semigroup, we can actually use it to represent errors, and luckily (<>) for lists behave just the way want. It just appends two lists together.

So let’s refactor our helper functions to use [Text] for errors:

...

minLength :: Text -> Int -> Text -> Validation [Text] ()
minLength fName n text =
  if T.length text >= n
    then pure ()
    else Failure $ [fName <> " should have at least " <> (T.pack . show) n <> " characters"]

...

And our smart constructors

...

mkProductName :: Text -> Validation [Text] ProductName
mkProductName name =
  minLength "Product name" 5 name
    *> maxLength "Product name" 30 name
    *> (pure $ ProductName name)

...

mkProduct :: Text -> Text -> Double -> Validation [Text] Product
mkProduct name description price =
  Product 
    <$> mkProductName name
    <*> mkProductDescription description
    <*> mkProductPrice price

Notice again that for the smart constructors, no change in implementation was needed. We only changed types.

Now try testing it again:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product With Very Long Name" "Short Des" 20

Output: Failure ["Product name shouldn't exceed 30 characters","Product description should have at least 15 characters"]

And finally it works as expected!

We can stop here, but one of the advantages/disadvantages of using Haskell is that you can always do better.

So far, we have to put our error message in a list, even in the simplest helper functions which can only emit one error! … I don’t like that, and this is what we’re going to change in part 3, stay tuned!

Simple data validation in Haskell - Part 2