19 Jun 2020

Hacker News Graphql

18 minutes

Hello dear reader, it’s May 2020, amid the Covid-19 infinite crisis, which caused some developers, who had nowhere to go to grab a (lot of) espresso, to work on some old forgotten projects.
While I was studying about GraphQL last year I thought it would be cool to apply it to a practical Clojure application, learn two things at once, it’s a good idea, I thought. Well, I also had only done some introductions to Datomic, so why not thrown it in the bucket too … and maybe why not also Re-Frame, but lets me make it clear, Re-Frame was only a late thought.
While development guides are awesome, I find that they are just a simple start to learn something, if you truly want to understand a programming tool you should apply it later, without a guide, only you, your debugger, and REPL in our case.

Datomic

The big powerhouse, Datomic, I probably wouldn’t use it in an application like HackerNews, first I don’t see the need to save the data as immutable facts here, and also HackerNews block edits shortly after your posts, there is really not many sensitive data that shouldn’t be lost. But it was quite interesting to use it, nonetheless.
I decided to use the On-Prem version, started with the Free version, and moved to the Starter, it’s really easy to set up, and until now Cognitec never spammed me. There is also their Cloud version, hosted on AWS. While I didn’t test it in this project, I’m almost sure that it would work effortlessly since I just used the Datomic Client namespace, check here to read the difference between their Client and Peer libraries.

Datalog

Datomic uses Datalog instead of SQL, there are a lot of talks on Youtube about it, mostly on ClojureTV, if you have never used Datomic I recommend that you watch their most recent Day of Datomic videos on the channel.
I don’t remember whose video was, but someone said that one of the main benefits of Datalog over SQL is that Datalog should feel more “natural” and easier to write, I have also read that some people just hate it. I stay in the middle ground, I don’t mind both, while I don’t remember having much issues with SQL I also didn’t have with Datalog (but until now I never used it in a “real” job), so I can’t say much, and also my only use of Datalog was with Datomic, which is by itself different from any SQL or NoSQL database I’ve ever used.

GraphQL

The hot new (not so new anymore) take on APIs, it should be such an easy and straightforward experience to work with it, only a single endpoint, where you can query or mutate your data, which also has a interface for quick testing built-in, but I had some issues with it… The library of my choice was the Lacinia Pedestal since I had some experiences with Pedestal.

JWT

To secure my application I choose to use a JWT, using the Buddy libraries, my setup of choice was to release a JWT which is valid by 5 minutes and another Refresh Token valid for 30 days.
The idea of a JWT is that you release a token that you don’t need to store anywhere, that can be validated by a function, using only itself, it’s stateless.
The Refresh token has a different implementation, its use is to update the client JWT, remember that website you log in almost every day and it always asks for your password and username, well maybe they haven’t the refresh token implemented, well, I guess it’s better than a long-lived JWT, without storing and checking for a Refresh token, a device with a long lived JWT could logging indefinitely to someone account, something like shutting down the access to all the devices after you change you account password wouldn’t be possible. There is a huge discussion about it, and at least for me, this is a good solution. Check this auth0 post for a better explanation.

Re-Frame

After working with Reagent it looked like a logical next step, well, I regret not using it before, it’s super simple even if you need to do any kind of API querying. You must read and do their walkthrough before touching it, it’s not super fast, but worth it. One of the things I like the most is their “events” and “subs” syntax, it’s really easy to read, and if you use Cursive its also easy to navigate. The single data storage location is also really nice.

The Project

Well, I think it’s enough about the chosen tools and time to start with the project. I will be using Components again and expect you to have some knowledge about the tools mentioned above, but if you need any help feel free to hit me on Twitter. Let me start with the back-end.

One of the main differences between here and a “usual” Pedestal API is that we have a schema file for GraphQL, and also a Schema namespace where the resolvers will be located, you can have extra resolvers in the Schema namespace to be later added to the schema file, but every query and mutation in the schema file must have a resolver.

We have all those resolvers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


(defn resolver-map [db]
  {:query/feed              (get-feed db)
   :query/link              (get-link db)
   :query/comment           (get-comment db)
   :query/comments          (get-comments db)
   :query/user_comments     (get-user-comments db)
   :query/user_posts        (get-user-posts db)
   :query/user_description  (get-user-description db)
   :mutation/delete         (delete-link db)
   :mutation/post           (post-link db)
   :mutation/signup         (signup db)
   :mutation/update         (update-link db)
   :mutation/vote           (vote-link db)
   :mutation/login          (login-user db)
   :mutation/refresh        (refresh db)
   :mutation/post_comment   (post-comment db)
   :mutation/vote_comment   (vote-comment db)
   :mutation/delete_comment (delete-comment db)
   :mutation/edit_comment   (edit-comment db)})

And one of the [queries], as an example:

1
2
3
4
5
6
7
8


:feed
  {:type        (list :Link)
   :description "Feed query"
   :args
                {:skip    {:type Int}
                 :first   {:type Int}
                 :orderby {:type String}}
   :resolve     :query/feed}

The resolvers must return a map, or a vector of maps, with the maps keys matching the syntax on the queries, for example, the (get-feed db) must return something like [{:id 1 :url xyz :order 2 …},{…},…], if there is a different key from those in the schema it will be removed, and if a key is missing the value will be null when requested.

Since Datomic has the function to return keymaps this part was kinda painless, I just had to make sure to not mess the syntax of the keys.

Here I made my first use of Pedestal custom interceptors, to attach the headers of the request as a map. If you need to write some interceptors keep in mind that its position is also important, for example, here I was changing the data which would be added to the context, so I had to inject the new interceptor in a step before the data was added to the context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


(defn p-h [p-k]
  [(keyword (first p-k)) (last p-k)])

(def attach-headers
  {:name  ::attach-headers
   :enter (fn [context]
            (assoc-in context [:request :headers] (into {} (map p-h (:headers (:request context))))))})

(defn- inject-user-info-interceptor [interceptors]
  (lacinia/inject interceptors attach-headers :before ::lacinia/inject-app-context))

So I could extract the token and the refresh token in some resolvers.

1
2
3
4
5


(defn- token-extractor [ctx]
  (:authorization (:headers (:request ctx))))

(defn- refresh-extractor [ctx]
  (:refresh (:headers (:request ctx))))

Most of the resolvers are open, but some require authentication, in those resolvers I needed to extract the token, check if it’s valid, and in some cases if the user is authorized to do the request if the request were to edit or delete something.

The update-link function below check first if there is a token, then if the link URL and Description are valid, and them if the user is authorized to delete or update the post, checking if the e-mail from the JWT is the same as the Post in the database, if everything result true the update is executed. I could have used only a single if statement, but I wanted to return distinct error messages, and I think it’s still easy to read.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


(defn update-link
  [db]
  (fn [context args value]
    (let [bearer-token (token-extractor context)
          {description :description
           url         :url
           id          :id} args]
      (if (and (nil? bearer-token))
        {:error "You must logging to edit"}
        (do
          (let [post-data (datomic/get-post-user-info-by-id db id)
                user-email (:user (authentication/get-user-from-token bearer-token))
                validate-url (utils/validate-url url)
                validate-description (utils/validate-description description)]
            (if (and validate-url validate-description)
              (if (authorization/authorized-delete-post? post-data user-email)
                (datomic/update-link db id description url)
                {:error "You are not authorized to edit this post"})
              {:error "You must include only https:// links and at least 20 words in the description"})))))))

Since post ownership can’t change we can just pass the “db” argument between the functions, instead of a Datomic Map.

For releasing and validating JWT in the login process I used the Buddy library, first the hashed password is requested from the database, validated and the JWT is released. A refresh token is also released with the bearer token, there are some extra steps here, since there is a need to store the Refresh Token in the database.

1
2
3
4
5


(defn generate-refresh-token [db email pwd enc-pwd]
  (if (auth-user pwd enc-pwd)
    (let [token (datomic/register-refresh-token db email)
          sign (claims-refresh token)]
      (jwt/sign sign system-secret))))

The process to release a new JWT with the refresh token is the one below, it will extract the id of the refresh token, use it to query the database, and if the date results and the validation result are still true it will generate a new token to the user, this process is quite important, first, it avoids that a refresh token is valid forever, with the date check, second, it allows the user to void the refresh token, for example, if you use your account on computer X and Y, and for some reason, you lose access to the computer Y, you can remove the access from it without even changing the password, if the system has a “devices” page, or if the user changes the password every refresh token from this user can be voided, and other devices would only be able to access the account for more 5 minutes since the JWT was released.

There is a step not implemented here, which is to release a new refresh token once the expiration date approaches, if the refresh token is valid for just more 10 days, for example, a new one should be released, so the user wouldn’t be prompted anymore to type the password, as long as he logs once in a while.

1
2
3
4
5
6
7
8
9


(defn refresh-process [db token]
  (let [unsigned (jwt/unsign token system-secret)
        datomic-result (datomic/get-refresh-token-data db (:id unsigned))
        user-email (:email datomic-result)
        date (:until datomic-result)
        valid (:valid datomic-result)]
    (if (and (= :true valid) (jt/before? (jt/instant) (jt/instant date)))
      {:user (datomic/get-user-info-auth db user-email) :token (sign-data user-email) :refresh token}
      {:error "Refresh token is not valid"})))

So this is pretty much how the resolvers work. Now let’s move for the Datomic part. There are five tables, :link, :auth, :user, :vote, :comment.

I don’t think I did anything too special in the tables, the :link has a :link/order field, which is used for pagination. The :auth table store the hashed refresh token, to check its date and validity as mentioned above. The :user stores user data, but should be modified in a “real” HackerNews, since we have no :user/type here, in this project there is no admins, which should probably exist in a forum to deal with some trolls, you know, they are everywhere. The :vote table stores comments and links votes. Something that may be interesting to look at is in the :comments table.

1
2
3
4
5
6


{:db/ident       :comment/father
 :db/valueType   :db.type/ref
 :db/cardinality :db.cardinality/one}
{:db/ident       :comment/link
 :db/valueType   :db.type/ref
 :db/cardinality :db.cardinality/one}

To query every comment of a link faster I later added the :comment/link field, while a comment may have as a father the :link/id it also may be a child of another comment, it also made the task to count the comments of a link easier and faster, and the :comment/father field is still used, even if the comment father is a link, since it’s needed in the front-end. I also added some transactions to quickstart the database.

To resume most of the [Datomic queries] I will use the one which has the most “features” of the ones I did. It get all the comments of a link.

Let me describe it, :find define the values which should be returned, :keys defines the name of the keys of :find and returns a map, :with is needed for the “join” function, :in is the values to be used for the search, and also the database connection, and :where is our “filtering”. One of the special functions used were get-some and or-join.

The get-some allow me to try to select from two different fields, I don’t want to return the :comment/father value, because it will be a datomic key, I want to return the UUID from :link/id or :comment/id, with the (get-some $ ?father2 :link/id :comment/id) [?father ?father] I manged to return the ?father value after Datomic searched in the :link and :comment table using its internal id which was stored at ?father2.

The or-join is used to count the votes of each one of the comments, I couldn’t use just do :find (count ?votes) and :where [?votes :vote/comment ?e2] because it would ignore the comments without votes, with the or-join I manage to check every comment for votes, and for those without votes a 0 is added at the second step, so every comment has at least a 0 count.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


(def get-comments-link-father
  '[:find ?id ?text ?postedBy ?createdAt ?father (sum ?votes)
    :with ?data-point
    :keys id text postedBy createdAt father votes
    :in $ ?father-t
    :where
    [?e4 :link/id ?father-t]
    [?e2 :comment/link ?e4]
    [?e2 :comment/father ?father2]
    [(get-some $ ?father2 :link/id :comment/id) [?father ?father]]
    [?e2 :comment/id ?id]
    [?e2 :comment/text ?text]
    [?e2 :comment/postedBy ?e3]
    [?e2 :comment/createdAt ?createdAt]
    [?e3 :user/name ?postedBy]
    (or-join [?e2 ?id ?text ?postedBy ?createdAt ?father ?votes ?data-point]
             (and [?vote :vote/comment ?e2]
                  [(identity ?vote) ?data-point]
                  [(ground 1) ?votes])
             (and
               [(identity ?e2) ?data-point]
               [(ground 0) ?votes]))])

And also, datomic queries returns a vector of maps, so if your Schema endpoint is just a single map don’t forget to call (first) os something similar, the transaction bellow show a link update, and also uses the (first) function to return only a map, there is also the “result”, I do the transaction passing the db argument, I don’t need to create a connection map, and the transaction return a map with the :db-after argument, which is the state of the database just after the transaction, I them pass it as the connection map to the query, so I’ll be sure to get the state of the database just after the link update.

1
2
3
4
5
6


(defn update-link
  [con id description url]
  (let [uuid (UUID/fromString id)
        {result :db-after} (d/transact con {:tx-data [[:db/add [:link/id uuid] :link/url url]
                                                      [:db/add [:link/id uuid] :link/description description]]})]
    (first (d/q get-link-by-id result uuid))))

I guess we finished the back-end, so let’s move to the front-end.

Together with the “stock” Re-frame, I will be using Re-Graph and Secretary. Re-frame has a single store location for all your data, and it’s initialized as the default-db, I added some extra data to bootstrap the application.

We then have views, events and subscriptions, basically, but, I had some issues with some choices, the first one is with re-graph, it’s plain simple to quick-start it in your default-db, my issue was that after quickstarting the user may log in, then I needed to update it to add the token so my back-end could process the requests, it took my a while to realize that I could just update re-graph after I got the tokens…

Default start:

1
2
3


(def re-graph-init {:ws-url          nil
                    :http-url        "http://localhost:8080/graphql"
                    :http-parameters {:with-credentials? false}})

To upgrade later after a login process or new page when the system tries to read the local storage:

1
2
3
4


(re-frame/reg-event-fx
  ::update-re-graph
  (fn [{db :db} [_ [token refresh]]]
    {:db (assoc-in db [:re-graph :re-graph.internals/default :http-parameters :headers] {"Authorization" token "Refresh" refresh})}))

Another issue I had for a while was that even if you request a mutation from your graphql server that has no argument, you must pass an empty map, like the one below, I were getting errors, and missing it. You may also see that the event below is far from the best, it’s impure since I’m requesting the “refresh-token” from the storage. But why aren’t I requesting it from the default-db? Well, after the login I store the JWT, the username and the refresh token at the local storage, to be later accessed, and if you close the page, or open it in a new tab the default-db will reset to the default value, without the tokens, as soon as possible the username is updated at the default-db and while I didn’t want to store the token in the default-db, I ended storing them in the re-graph headers anyways… What are your thoughts about it?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


(re-frame/reg-event-fx
  ::refresh
  (fn [{db :db} _]
    (let [not-logged @(re-frame/subscribe [::subs/username])
          refresh-token (get-local-storage "refresh-token")]
      (if (and (not (nil? refresh-token)) (nil? not-logged))
        {:dispatch [::re-graph/mutate
                    graph/refresh
                    {}
                    [::refresh-result]]
         :db       (assoc db :loading? true)}))))

The subscription is the most simple part, you just should take care about the names you used in the events, this is also why it’s recommended to use some kind of specs, which I didn’t, at least had no problem here.

1
2
3
4


(re-frame/reg-sub
  ::username
  (fn [db _]
    (:username db)))

The routing is also easy to read, I just highly recommend to define routes names and use them at the views, avoid typing the path of it.

1
2
3
4


(defroute submit "/submit" []
            (re-frame/dispatch-sync [::events/start-headers])
            (re-frame/dispatch [::events/refresh])
            (re-frame/dispatch [::events/set-active-panel :post-panel]))

We move to the views, which is reagent plus calls to subscriptions and event dispatch. You can show or not something considering subscriptions, like the example below:

1
2
3


(let [login-error @(re-frame/subscribe [::subs/login-error])]
     (if-not (nil? login-error)
       [:div.columns.is-centered.space-left [:span [:strong login-error]]]))

Re-frame breaks if you have null values, so when I could get some null value is called the function:

1
2
3
4


(defn- avoid-null [x]
  (if (nil? x)
    ""
    x))

My main issue with the project happened near the end, I was trying to request all comments but I cannot request recursively nested objects (here is the Github discussion about it), so for example, I could only request all the comments of a link if I knew how deep my nested objects were, my back-end was already working and I was like… how now?! Then came the idea of the “Father Id” of each comment. Let me explain.

I have my Link-Id, which is the Id used to request the comments at the back-end, I know that if there are comments at least one will have the Link-Id as their father, and I can at least read it in the front-end and add each one as I find them, so as the first step I try to find every “main” comment, which has the page Link-Id as father.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


(defn- organize-print-comments-linear [comments first-father]
  (let [counter (count comments)]
    (loop [i 0
           father first-father
           missing []
           added []]
      (if (< i counter)
        (if (= father (:father (nth comments i)))
          (recur (inc i) father missing (conj added {:id  (:id (nth comments i))
                                                     :row (comment-row (:id (nth comments i))
                                                                       (:postedBy (nth comments i))
                                                                       (:text (nth comments i))
                                                                       (:createdAt (nth comments i))
                                                                       (:votes (nth comments i))
                                                                       0)}))
          (recur (inc i) father (conj missing (nth comments i)) added))
        (if (empty? missing)
          added
          (organize-print-comments-linear-second-step added missing))))))

Nice, we have now the first comments displayed in our page, but there are still the subcomments to organize:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


(defn- insert-index [v i e]
  (vec (concat (subvec v 0 i) [e] (subvec v i))))

(defn- organize-print-comments-linear-second-step [a m]
  (let [missing m]
    (loop [i 0
           f 0
           depth 1
           added a           ]
      (if (< i (count missing))
        (if (= (:id (nth added f)) (:father (nth missing i)))
          (recur (inc i) f depth (insert-index added (inc f) {:id  (:id (nth missing i))
                                                        :row (comment-row (:id (nth missing i))
                                                                          (:postedBy (nth missing i))
                                                                          (:text (nth missing i))
                                                                          (:createdAt (nth missing i))
                                                                          (:votes (nth missing i))
                                                                          (* depth 3))}))
          (recur (inc i) f depth added))
        (if (< f (dec (count added)))
          (recur 0 (inc f) (inc depth) added)
          added)))))

We finally print them with:

1
2


   (for [i (range counter)]
           (:row (nth comments i)))

My main issue with this approach is that if a thread gets some thousands of comments, like on Reddit, it will probably be too laggy, too many data to request and sort, I would probably try something different, like sorting in the back-end, and adding other fields like :order and :depth, so I could just print it directly in the front-end, but well, doing that GraphQL does not look as practical as promised.

Conclusion

I think this a good overview of the project. What do you think about the GraphQL trick with the sub comments?
To store or not to store tokens at the default-db? I could also do a dispatch-sync so I wouldn’t need to keep them at the re-graph headers all the time.

Are you a fan of GraphQL or Datomic? I found both easy to work with, besides the GraphQL recursive issue, but not every project need to have recursive nested objects, just remember about this limitation before you start a new project with it. Anyways it was a fun project, I think my best Clojure one so far, hope you enjoyed it and find my code easy to read.

Follow me at Twitter for other updates.