Emad's Blog - Simple data validation in Haskell

One of the perks of using Haskell (or any language with ADTs) is that you can encode many business rules in the data type and make invalid data irrepresentable. But usually, you can’t represent everything in the type level (unless you go crazy with dependant types), you often need to do some validations like checking the minimum or maximum length of a text.

For example let’s say we want to model products that have name, description, and price.

data Product = Product Text Text Double

This could work but not very descriptive, we can’t even tell which text is a name and which is a description, newtype comes to the rescue.

newtype ProductName = ProductName Text
newtype ProductDescription = ProductDescription Text
newtype ProductPrice = ProductPrice Double

data Product = Product ProductName ProductDescription ProductPrice

Now we can’t accidentally mix names and descriptions, we feel good and start thinking about business rules. We ask domain experts and they say:

The product name shouldn’t have less than 5 characters.
The product name shouldn’t have more than 30 characters.
The product description shouldn’t have less than 15 characters.
The product description shouldn’t have more than 300 characters.
The Product price shouldn’t be less than 0.01 of our imaginary currency.
The Product price shouldn’t be more than 1000000 of our imaginary currency.

So the question now is how we can make product names with less than 5 characters irrepresentable?

Our ProductName constructor converts any text to a ProductName and this isn’t what we want, so we can’t allow code in other modules to use this constructor and make invalid product names! .. But how can they create valid product names? .. The answer is, use a smart constructor and put the business rules inside it and only create product names through this smart constructor. If you came from an Object-Oriented world, this is kinda like Factory Pattern.

import RIO
import qualified RIO.Text as T

mkProductName :: Text -> ProductName
mkProductName name =
  if T.length name >= 5
  then ProductName name
  else ???

Now we don’t know what to return when the name is invalid! .. We could throw a runtime error but this will break the totality of our function and of course, you know how much we love total function, also throwing errors makes it hard to see that this function may fail so it’s likely that we will forget to handle its failure case somewhere.

In Haskell, we love to encode the possibility of failures in types and let the compiler force us to handle those failures whenever necessary, so let’s see our options here.

The first and simplest option to encode failures is Maybe, it represents something that either has a value or has nothing .. so in our case, the function would produce a ProductName or doesn’t produce anything at all.

import RIO
import qualified RIO.Text as T

mkProductName :: Text -> Maybe ProductName
mkProductName name =
  if T.length name >= 5
  then Just $ ProductName name
  else Nothing

Now it’s obvious for anyone using this function that it can fail and they have to take care of that, but sadly this way we can’t tell the user why we refused his product name! .. and this doesn’t sound very helpful.

A better solution is to use Either, which can represent values with two possibilities, so in our case, we can use it to represent either ProductName or some sort of error.

import RIO
import qualified RIO.Text as T

mkProductName :: Text -> Either ??? ProductName
mkProductName name =
  if T.length name >= 5
  then Right $ ProductName name
  else Left ???

Now we only need to choose a type to represent validation errors and return something of this type in case of any failure.

A good thing about Either is that we can use any type to represent errors. For simplicity, we will choose Text and it will be just the failure text message.

import RIO
import qualified RIO.Text as T

mkProductName :: Text -> Either Text ProductName
mkProductName name =
  if T.length name >= 5
  then Right $ ProductName name
  else Left "Product name should have at least 5 characters"

So far so good, now using this smart constructor, nobody can create a product name that violates the first business rule but what about other business rules? .. we can complicate the if condition there so it can represent all rules related to the product name, but this doesn’t scale very well and will make our code look very ugly and hard to understand, also this way we will have to repeat all this complicated validation logic for other types of data like ProductDescription, so we better refactor.

import RIO
import qualified RIO.Text as T

minLength :: Text -> Int -> Text -> Either Text ()
minLength fName n text = 
  if T.length text >= n
  then Right ()
  else Left $ fName <> " should have at least " <> (T.pack . show) n <> " characters"

mkProductName :: Text -> Either Text ProductName
mkProductName name =
  minLength "Product name" 5 name *> 
  (Right $ ProductName name)

Now we have a new function called minLength which has the logic of our first business rule, but we parameterize field name and minimum length to keep this function general enough to work with other pieces of text data and produce a meaningful error message.

The minLength function returns either Text or Unit, the text represents an error message and unit means that the validation passed without problems, then in our smart constructor we used this function to validate the name and if it’s valid we create a ProductName, and when the name isn’t valid we return the error we got from minLength.

In this example, I used the *> operator to sequence the actions (validation and creation of name) and it works because Either Text is an instance of Applicative, the *> operator for Either just sequences two values of Either and if the first one is Left (which means an error in our case), the result of the whole thing will be a Left and if it’s a Right it will be discarded and the second either will be the result (in our case this means the unit will be discarded and the ProductName will be created and returned).

Note that Either Text is also an instance of Monad so it’d be just the same if we used >> instead of *>, we could even write this function using do notation like this:

mkProductName :: Text -> Either Text ProductName
mkProductName name = do
  minLength "Product name" 5 name
  Right $ ProductName name

I admit that the do syntax looks easier than the applicative style but I preferred applicative style because I’m cool and because Applicative is more general than Monad, it has more instances so it’s usually easier to change code which depends on Applicative than code which depends on Monad.

In general, I think it’s a good idea to use the applicative style when you don’t really need monads.

Now, let’s add more validations.

import RIO
import qualified RIO.Text as T

minLength :: Text -> Int -> Text -> Either Text ()
minLength fName n text = 
  if T.length text >= n
  then pure ()
  else Left $ fName <> " should have at least " <> (T.pack . show) n <> " characters"

maxLength :: Text -> Int -> Text -> Either Text ()
maxLength fName n text = 
  if T.length text <= n
  then pure ()
  else Left $ fName <> " shouldn't exceed " <> (T.pack . show) n <> " characters"


mkProductName :: Text -> Either Text ProductName
mkProductName name = 
  minLength "Product name" 5 name *>
  maxLength "Product name" 30 name *>
  (pure $ ProductName name)

mkProductDescription :: Text -> Either Text ProductDescription
mkProductDescription description = 
  minLength "Product description" 15 description *>
  maxLength "Product description" 300 description *>
  (pure $ ProductDescription description)

Notice that I replaced all calls to Right with calls to pure, pure just lifts a value into an applicative, and when the expected applicative is an Either, pure is just the same as Right.

It just makes more sense to me to use pure there because it works for all applicatives! .. So again, it makes our code easier to change in case we want to use something other than Either later.

Also, notice the way *> allows us to chain validations one after the other.

Now we have smart constructors for ProductName and ProductDescription, we still need one for ProductPrice and then we will finally be able to make one for Product itself and start making and validating products.

minNumber :: (Ord a, Show a) => Text -> a -> a -> Either Text ()
minNumber fName n m =
  if m >= n
    then pure ()
    else Left $ fName <> " should be at least " <> (T.pack . show) n

maxNumber :: (Ord a, Show a) => Text -> a -> a -> Either Text ()
maxNumber fName n m =
  if m <= n
    then pure ()
    else Left $ fName <> " shouldn't exceed " <> (T.pack . show) n

mkProductPrice :: Double -> Either Text ProductPrice
mkProductPrice price = 
  minNumber "Product price" 0.01 price *>
  maxNumber "Product price" 1000000 price *>
  (pure $ ProductPrice price)

The only thing that’s not very obvious here is this (Ord a, Show a) => ..., it means that this function accepts any value of type a given that a is an instance of Ord and Show, which basically means that our function isn’t limited to Double numbers.

Now we’re finally ready to write a smart constructor for the whole Product, and make sure we always have valid products!

But we have a problem! .. So far, the only way to create a product is by using the dumb constructor Product which is a function with this type:

Product :: ProductName -> ProductDiscription -> ProductPrice -> Product

It needs a ProductName as a first argument, But our lovely mkProductName doesn’t actually produce a ProductName!, it gives us an Either Text ProductName and the same for all of our smart constructors!

It seems like we need a function with this type instead:

Either Text ProductName -> Either Text ProductDiscription -> Either Text ProductPrice -> Either Text Product

And this is exactly what liftA3 does, again we make use of Either being an Applicative here. So we can use it liftA3 to write our new smart constructor for Product like this:

mkProduct :: Text -> Text -> Double -> Either Text Product
mkProduct name description price =
  liftA3 Product 
    (mkProductName name) 
    (mkProductDescription description) 
    (mkProductPrice price)

To understand what happens here, let’s look at the type of liftA3.

liftA3 :: Applicative f => (a -> b -> c -> d) -> f a -> f b -> f c -> f d

This function simply converts a function with any 3 parameters to a function that expects the same parameters wrapped in any f given that f is an instance of Applicative (in our case, f would be Either Text).

Very handy, right? .. And yes there’s a liftA2 which does the same thing for functions with 2 parameters, there’s also liftA for functions with 1 parameter but it’s much less popular because it’s actually identical to the famous fmap or <$>.

But what if we have more than 3 parameters? .. we can write a liftA4, liftA5 .. and so on but this doesn’t sound like a generic solution, we need something that works for any functions regardless of its arity!

And here Currying comes to the rescue, Carrying helps us generalize all functions as 1-ary functions (functions with exactly one parameter), to understand this let’s look at the Product dumb constructor, again it has this type:

Product :: ProductName -> ProductDiscription -> ProductPrice -> Product

Look at this type, you can think of it as a 3-ary function that accepts ProductName, ProductDescription, and ProductPrice as parameters and returns Product .. Or you can think of it as a 1-ary function that accepts ProductName and returns another function of this type:

ProductDiscription -> ProductPrice -> Product

And this allows us to think of the lifting problem one parameter at a time! .. So now let’s focus on the first parameter which is ProductName, we want to convert our function Product which accepts a ProductName to another function that accepts Either Text ProductName.

We’ve seen this before, and since it’s only one parameter we only need liftA so the following should happily compile.

liftedProduct :: Either Text ProductName -> Either Text (ProductDescription -> ProductPrice -> Product)
liftedProduct = liftA Product

And since liftA is identical to fmap we can write the same code like this:

liftedProduct :: Either Text ProductName -> Either Text (ProductDescription -> ProductPrice -> Product)
liftedProduct = fmap Product

Or even like this:

liftedProduct :: Either Text ProductName -> Either Text (ProductDescription -> ProductPrice -> Product)
liftedProduct = (<$>) Product

We go one step further and give this new function what it wants (an Either Text ProductName) and see what happens:

liftedProduct :: Either Text (ProductDescription -> ProductPrice -> Product)
liftedProduct = fmap Product (mkProductName "Test Product")

Or better:

liftedProduct :: Either Text (ProductDescription -> ProductPrice -> Product)
liftedProduct = Product <$> mkProductName "Test Product"

So far so good, we could apply our function to the first argument, but is it ready to accept the second argument? .. As we can see liftedProduct has the function we need in its type but it’s wrapped inside Either Text!

Unfortunately, we can’t use liftA or fmap again, they only work when we need to apply a normal function to a wrapped argument but now we have a wrapped function and we want to apply it to a wrapped argument!

Again, Applicative has a neat solution for this .. because both our function and the argument are wrapped in the same Applicative (Either Text in our case), we can use <*> which solves this exact problem, let’s see how:

liftedProduct' :: Either Text (ProductPrice -> Product)
liftedProduct' = 
  Product 
  <$> mkProductName "Test Prodcuct" 
  <*> mkProductDescription "Product Desc"

And we can do the same trick for the last argument:

finallyAProduct ::  Either Text Product
finallyAProduct = 
  Product 
  <$> mkProductName "Test Prodcuct" 
  <*> mkProductDescription "Product Desc"
  <*> mkProductPrice 20

And finally, we got a valid Product!. We may use what we learned so far to refactor mkProduct smart constructor to use <$> and <*> … because it looks cooler than liftA3 and we want our constructor to be ready for more parameters later.

mkProduct :: Text -> Text -> Double -> Either Text Product
mkProduct name description price = 
  Product                               -- ProductName -> ProductDiscription -> ProductPrice -> Product
  <$> mkProductName name                -- Either Text (ProductDiscription -> ProductPrice -> Product)
  <*> mkProductDescription description  -- Either Text (ProductPrice -> Product)
  <*> mkProductPrice price              -- Either Text Product

Notice that Either Text is also a Monad and we could write our smart constructor using do syntax like this:

mkProduct :: Text -> Text -> Double -> Either Text Product
mkProduct name description price = do
  productName <- mkProductName name
  productDescription <- mkProductDescription description
  productPrice <- mkProductPrice price
  pure $ Product productName productDescription productPrice

Which will do exactly the same thing except that applicative style is cooler and more general because again remember that Applicative has more instances than Monad (In other words, all monads are applicatives but not all applicatives are monads).

Another thing to note here is that the way we used <$> and <*> is very common in idiomatic Haskell code, so you want to make sure you understand them well!

This was too much talking about Applicative, I almost forgot I’m writing about data validation :) … But I think it worths it.

Now let’s see what we’ve done so far, we wrote some helper validation functions:

import RIO
import qualified RIO.Text as T

minLength :: Text -> Int -> Text -> Either Text ()
minLength fName n text =
  if T.length text >= n
    then pure ()
    else Left $ fName <> " should have at least " <> (T.pack . show) n <> " characters"

maxLength :: Text -> Int -> Text -> Either Text ()
maxLength fName n text =
  if T.length text <= n
    then pure ()
    else Left $ fName <> " shouldn't exceed " <> (T.pack . show) n <> " characters"

minNumber :: (Ord a, Show a) => Text -> a -> a -> Either Text ()
minNumber fName n m =
  if m >= n
    then pure ()
    else Left $ fName <> " should be at least " <> (T.pack . show) n

maxNumber :: (Ord a, Show a) => Text -> a -> a -> Either Text ()
maxNumber fName n m =
  if m <= n
    then pure ()
    else Left $ fName <> " shouldn't exceed " <> (T.pack . show) n

And we used those functions in our smart constructors to validate our products and their inner pieces of data:

newtype ProductName = ProductName Text
newtype ProductDescription = ProductDescription Text
newtype ProductPrice = ProductPrice Double

data Product = Product ProductName ProductDescription ProductPrice

mkProductName :: Text -> Either Text ProductName
mkProductName name =
  minLength "Product name" 5 name
    *> maxLength "Product name" 30 name
    *> (pure $ ProductName name)

mkProductDescription :: Text -> Either Text ProductDescription
mkProductDescription description =
  minLength "Product description" 15 description
    *> maxLength "Product description" 300 description
    *> (pure $ ProductDescription description)

mkProductPrice :: Double -> Either Text ProductPrice
mkProductPrice price =
  minNumber "Product price" 0.01 price
    *> maxNumber "Product price" 1000000 price
    *> (pure $ ProductPrice price)

mkProduct :: Text -> Text -> Double -> Either Text Product
mkProduct name description price =
  Product 
    <$> mkProductName name
    <*> mkProductDescription description
    <*> mkProductPrice price

Now let’s try to test our code and make some products. But before that, we need our new types to be instances of the Show typeclass to be able to print our products to console, so let’s ask GHC to derive Show for us:

newtype ProductName = ProductName Text deriving Show
newtype ProductDescription = ProductDescription Text deriving Show
newtype ProductPrice = ProductPrice Double deriving Show

data Product = Product ProductName ProductDescription ProductPrice
  deriving Show

Now we make a valid product and print it to console:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product" "Test Product Description" 20

Output: Right (Product (ProductName "Test Product") (ProductDescription "Test Product Description") (ProductPrice 20.0))

It worked fine, now let’s try an invalid product. For example, one with a very short name:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "T" "Test Product Description" 20

Output: Left "Product name should have at least 5 characters"

So far so good, let’s try another one with a short description:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product" "Short Des" 20

Output: Left "Product description should have at least 15 characters"

Another one with an insane price:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product" "Test Product Description" 99999999999999

Output: Left "Product price shouldn't exceed 1000000.0"

It seems that it works well, but what if we have more than one invalid piece of data, like a very long name and a very short description? .. Let’s find out:

main :: IO ()
main = 
  putStrLn $ show $ mkProduct "Test Product With Very Long Name" "Short Des" 20

Output: Left "Product name shouldn't exceed 30 characters"

Hmm, it seems that it can only give us one error at a time, it behaves as if it short circuits the execution once it finds the first error and just returns this error!

If this is what you want, congratulations we’re done and you can start using your smart constructors everywhere. However, our users would usually prefer to see all errors and fix them all at once instead of fixing one at a time.

So this short-circuiting behavior isn’t always sufficient, and we need to know why it happens and improve it somehow. We will do this, in the next post!

I hope this post was useful to you, in part 2 we will improve our validation strategy to return all errors not only the first one. While doing this, we will learn even more about Applicative and Semigroup. Stay tuned!

Simple data validation in Haskell - Part 1