Following a Paginated API with Haskell and UnfoldrM
I’ve been using the Twitter API for Milo - a little command line application that limits my interaction with Twitter, to a select few specific accounts and searches. As with most APIs that have a large number of results, the Twitter API is paginated and you need to keep supplying it with some manner of cursor when going from one page of results to the next.
For the purposes of Milo, I wanted to access the latest direct messages for my user. Unfortunately the call to get your direct messages may return multiple pages of “empty” results and a cursor to the next page of results. I wanted to navigate these empty results and grab the first page with any results that were not empty - essentially just the latest direct message(s).
An example of an empty result:
{
"next_cursor": "some_hash",
"events": []
}
An example of a non-empty result:
{
"next_cursor": "some_hash",
"events": [
{ "id": "110", "created_timestamp": "5300", ... },
{ "id": "109", "created_timestamp": "5200", ... },
{ "id": "108", "created_timestamp": "5200", ... },
{ "id": "107", "created_timestamp": "5200", ... },
{ "id": "106", "created_timestamp": "5100", ... },
{ "id": "105", "created_timestamp": "5100", ... },
...
]
}
As we can see, the resulting Json payload has both the result (contents of the events
field) and the cursor
(next_cursor
).
Please note that I use Aeson to convert the Json payload into the DirectMessages
data type used in the example, but
I have omitted the bindings for clarity. The full Milo source can be found on Github.
I initially came up with this solution:
import qualified Data.Text as T
-- Wrapper around the hash return by the Twitter API
newtype Cursor = Cursor { unCursor :: T.Text } deriving Show
-- Data type to hold the list of direct messages and the cursor (if any)
-- Each DirectMessage maps to a single element in the `events` array
data DirectMessages = DirectMessages { messages :: [DirectMessage], cursorPosition :: Maybe T.Text } deriving Show
-- Function that returns the direct messages or an error
getDirectMessages :: IO (Either String DirectMessages)
= getMoreDirectMessages Nothing
getDirectMessages
-- Function that loops through the result pages using a cursor
getMoreDirectMessages :: Maybe Cursor -> IO (Either String DirectMessages)
= do
getMoreDirectMessages nextCursor <- callTwitterApi nextCursor
dmsE case dmsE of
Right dms@(DirectMessages messageList (Just nextCursor')) ->
if (null messageList) then -- if the messages are empty try to get more
do
fmap (combineDms dms)) <$> (getMoreDirectMessages (Just . Cursor $ nextCursor'))
(else pure . Right $ dms
Right dms@(DirectMessages _ Nothing) -> pure (Right dms) -- No more cursors so just stop
Left dmError -> pure . Left $ dmError
-- Function that collates direct messages
combineDms :: DirectMessages -> DirectMessages -> DirectMessages
-- Function that calls the Twitter API with the cursor (if any)
callTwitterApi :: Maybe Cursor -> IO (Either String DirectMessages)
Now while this works it has a few problems:
- It does not look very reusable, which it should be because pagination is a common problem
- There are dangling error cases where we just lift the error into some outer context
At this point I recalled seeing a function called unfold
somewhere that produced values until some exit condition was reached. I decided to track it down. I found unfoldr
which sort of fitted what I needed.
unfoldr
unfoldr :: (a -> Maybe (b, a)) -> a -> [b]
From the docs:
The unfoldr function is a `dual’ to foldr: while foldr reduces a list to a summary value, unfoldr builds a list from a seed value. The function takes the element and returns Nothing if it is done producing the list or returns Just (b,a), in which case, b is a prepended to the list and a is used as the next element in a recursive call.
This sounded promising. I needed to keep “producing” direct message results and stop as soon as I got some results that
were not empty. Unfortunately I needed to work within an effect (IO
) which this function did not support.
In any event let’s try and understand how this function works. This is the implementation of the the unfoldr
function:
unfoldr :: (a -> Maybe (b, a)) -> a -> [b]
=
unfoldr f a case f a of
Just (b,new_a) -> b : unfoldr f new_a
Nothing -> []
Given some generator function f
that takes a value of type a
, call f
with a
which returns a Maybe
with a pair of
values consisting of a result b
and the next value of a
to feed into the same function. The Maybe
is either a Just
value
with a new result b
and the next value of a
: new_a
. In this case the result b
is prepended to a list of results which
will be generated by recursively calling the unfoldr
function with f
and new_a
. In the Nothing
case
return an empty List.
Here’s a simple example that produces numbers from 1 to 10 and then stops:
import Data.List (unfoldr)
-> if a < 11 then Just (a, a + 1) else Nothing) 1
unfoldr (\a > [1,2,3,4,5,6,7,8,9,10]
Pretty neat but not what I exactly needed.
After some more digging around I stumbled across a library called monad-loops which had what I was after.
unfoldrM
unfoldrM :: Monad m => (a -> m (Maybe (b, a))) -> a -> m [b]
We can see from its function definition that it’s exactly the same as unfoldr
except the intermediate and final results
are within some Monad m
:
unfoldrM :: Monad m => (a -> m (Maybe (b, a))) -> a -> m [b]
unfoldr :: (a -> Maybe (b, a)) -> a -> [b]
This is the implementation of the unfoldrM
function (which is an alias to unfoldrM'
):
unfoldrM' :: (Monad m, MonadPlus f) => (a -> m (Maybe (b,a))) -> a -> m (f b)
= go
unfoldrM' f where go a = do
<- f a
x case x of
Nothing -> return mzero
Just (b, new_a) -> do
<- go new_a
rest return (return b `mplus` rest)
The implementation is very similar to unfoldr
with differences due to the selected effect type m
and result type f
.
Given some generator function f
that takes a value of type a
, it calls f
with a
within a do
block. It returns a Maybe
with a
pair of values; the result b
and the next value of type a
: new_a
, within a context m
. It extracts and pattern matches on the contextual result.
If it’s a Nothing
it returns the default value for the MonadPlus type f
(mzero
).
If the result is a Just
, it creates a nested do
block and recurses with the new_a
value to extract the final result rest
. It then
combines the rest
with the previous result b
according to the mplus
implementation for the MonadPlus
type f
and
returns the results in the context m
. unfoldrM
is a specialized version of unfoldrM'
where f
is a []
.
The solution
Now while this seemed to be what I needed it took a little while for me to understand how to use it in my use case. One
thing that stumped me was why the initial value a
was not a Maybe
. Surely the first time I called the Twitter API,
I would not have a cursor, so how could I represent it as an a
? Even if I did make a
a Maybe a
, how would I
distinguish between the initial call where I had no cursor and the final call where I would also have no cursor?!
My friend Adam stated the obvious so I could understand it:
Maybe does not satisfy your requirements because you need more than two states
Oh dear! Was I supposed to create some ADT with three states? I thought this was supposed to be some plug-and-play solution and it was turning out not to be.
I started off by creating the ADT for the states:
-- | An ADT to capture the three states of a cursor:
data CursorState a
= NewCursor -- NewCursor - Initial cursor state
| GoCursor (Cursor a) -- GoCursor - A state of having a cursor, typically used for iteration
| StopCursor -- StopCursor - The final cursor state
Now if I plug in my types into the unfoldrM
function I get the following:
-> m (Maybe (b, a))) -> a -> m [b]
unfoldrM (a -- 'a' is CursorState c
-- 'c' is the type of cursor data
unfoldrM :: (CursorState c -> m (Maybe (b, CursorState c))) -> CursorState c -> m [b]
-- 'm' is IO
unfoldrM :: (CursorState -> IO (Maybe (b, CursorState c ))) -> CursorState c -> IO [b]
-- 'b' is DirectMessages
unfoldrM :: (CursorState c -> IO (Maybe (DirectMessages, CursorState c))) -> CursorState c -> IO [DirectMessages]
Now this seems to make sense.
Given that I already had a function of type:
callTwitterApi :: Maybe Cursor -> IO (Either String DirectMessages)
How could I convert it to work with the above function definition?
I could define a function unfoldWith
as:
unfoldWith :: forall m b c. CursorState c -> m (Maybe (b, CursorState c))
NewCursor = undefined
unfoldWith GoCursor (Cursor nextCursor)) = undefined
unfoldWith (StopCursor = undefined unfoldWith
The simplest one to define is the StopCursor
variant:
unfoldWith :: forall m b c. Applicative m => CursorState c -> m (Maybe (b, CursorState c))
NewCursor = undefined
unfoldWith GoCursor (Cursor nextCursor)) = undefined
unfoldWith (StopCursor = pure Nothing unfoldWith
and that compiles!
Next I can tried to implement the NewCursor
variant:
unfoldWith :: forall m b c. Applicative m => CursorState c -> m (Maybe (b, CursorState c))
NewCursor = undefined -- I need to be able to use callTwitterApi here
unfoldWith GoCursor (Cursor nextCursor)) = undefined
unfoldWith (StopCursor = pure Nothing unfoldWith
generalising the callTwitterApi
function:
callTwitterApi :: Maybe (Cursor c) -> IO (Either String DirectMessages)
-- Since we need to reduce our Monad to an m, wrap the IO (Either String) in ExceptT
-- ExceptT String IO is 'm' (Essentially a wrapped (IO Either String))
-- DirectMessages is 'a'
callTwitterApi :: Maybe (Cursor c) -> ExceptT String IO DirectMessages
-- which simplifies to:
callTwitterApi :: Maybe (Cursor c) -> m DirectMessages
-- Now we should be able to define any API that gets some `a` as:
someApi :: Maybe (Cursor c) -> m a
-- passing in someApi to unfoldWith
unfoldWith :: forall m a b c. Applicative m => (Maybe (Cursor c) -> m a) -> CursorState c -> m (Maybe (b, CursorState c))
NewCursor = f Nothing -- call it with Nothing because we don't have a Cursor
unfoldWith f GoCursor (Cursor nextCursor)) = undefined
unfoldWith f (StopCursor = pure Nothing unfoldWith _
So far so good. But now we need to extract the result and the next cursor from response of the api call. When we call
someApi
we get a m a
in return:
someApi :: Maybe (Cursor c) -> m a
Note: To add type annotation to let expressions you need to enable the ScopedTypeVariables
language extension:
{-# LANGUAGE ScopedTypeVariables #-}
We need a function that transforms that a
into a pair of (b, CursorState c)
:
extractPayload :: a -> (b, CusorState c)
passing that into our unfoldWith
function:
unfoldWith :: forall m a b c. Applicative m => (a -> (b, CursorState c)) -> (Maybe (Cursor c) -> m a) -> CursorState c -> m (Maybe (b, CursorState c))
NewCursor =
unfoldWith extractPayload callApiWith let resultM :: m a = callApiWith Nothing
in Just . extractPayload <$> resultM
GoCursor (Cursor nextCursor)) = undefined
unfoldWith extract callApiWith (StopCursor = pure Nothing unfoldWith _ _
Seems to compile. Now we do the same of the GoCursor
case:
unfoldWith :: forall m a b c. Applicative m => (a -> (b, CursorState c)) -> (Maybe (Cursor c) -> m a) -> CursorState -> m (Maybe (b, CursorState))
NewCursor =
unfoldWith extractPayload callApiWith let resultM :: m a = callApiWith Nothing
in Just . extractPayload <$> resultM
GoCursor (Cursor nextCursor)) =
unfoldWith extract callApiWith (let resultM :: m a = callApiWith (Just nextCursor)
in Just . extractPayload <$> resultM
StopCursor = pure Nothing unfoldWith _ _
A DirectMessages
is defined as:
data DirectMessages = DirectMessages { messages :: [DirectMessage], cursorPosition :: Maybe T.Text } deriving Show
And now I just define a function that takes in a DirectMessages
type and returns a pair of ([DirectMessage], CursorState T.Text)
:
extractState :: DirectMessages -> ([DirectMessage], CursorState T.Text)
DirectMessages [] (Just c)) = ([], GoCursor (Cursor c)) -- No messages, but we have a cursor, then try to get more
extractState (DirectMessages [] Nothing) = ([], StopCursor) -- No messages and no cursor, then stop
extractState (DirectMessages msgs _) = (msgs, StopCursor) -- Messages so we can stop irrespective of the cursor extractState (
Now I can call unfoldrM
with:
import qualified Control.Monad.Except as Ex
callTwitterApi :: Ex.ExceptT String IO DirectMessages
getDirectMessages :: IO (Either String DirectMessages)
= Ex.runExceptT $ unfoldrM (unfoldWith extractState callTwitterApi) NewCursor getDirectMessages
and we have pagination!
Note how we had to unwrap the ExceptT
with Ex.runExceptT
to retrieve the wrapped IO (Either String DirectMessages)
.
The interesting point is that we now have a reusable function unfoldWith
which we can use with
any paginated API that returns us a payload with a result and a cursor.
A Simpler Example
If you got a little lost in the details of the above example, don’t worry. Here’s a simpler example to give you some intuition.
-- Payload type
data Packet = Packet { value :: String, cursor :: Maybe (Cursor Int) }
-- Function that mimicks a server response
serviceCall :: forall m . Applicative m => Maybe (Cursor Int) -> m Packet
Nothing = pure $ Packet "packet one" (Just $ Cursor 1)
serviceCall Just (Cursor cur))
serviceCall (| cur == 1 = pure $ Packet "packet two" (Just $ Cursor 2)
| cur == 2 = pure $ Packet "packet three" (Just $ Cursor 3)
| cur == 3 = pure $ Packet "packet four" (Just $ Cursor 4)
| otherwise = pure $ Packet "packet five" Nothing
-- Function that splits the payload into a result and the next CursorState
extractState :: Packet -> (String, CursorState Int)
Packet v (Just c)) = (v, GoCursor c)
extractState (Packet v Nothing) = (v, StopCursor) extractState (
As before we can use it with:
NewCursor unfoldrM (unfoldWith extractState serviceCall)
Using the above to log out the first three page responses:
import Control.Monad.Loops (unfoldrM)
import Data.List (intercalate)
run :: IO ()
=
run let resultsIO :: IO [String] = unfoldrM (unfoldWith extractState serviceCall) NewCursor
stringyfied :: IO String = (intercalate "," . take 3) <$> resultsIO
in stringyfied >>= putStrLn
which prints out:
packet one,packet two,packet three
I’m not sure if this is a “pattern” that people generally use but I can see myself using this for other paginated APIs.
The code for the Simpler Example.