Imagine you are writing a backend that is talking to some API, exchanging JSON. This API could be a paid and/or rate-limited one and you don’t want to call it every time you need it.

Of course you are now already thinking „I could use a cache like Redis“. In this post I want to show you an approach to caching that I recently used. This approach adds some benefits to the traditional caches I know.

Enter PostgreSQL

In my case I am writing a Kotlin application. I need data from a service that only allows 150 requests per month without the paid plan. Therefore I want to make sure it only gets hit when really needed. Naturally I also need the data from the service. I also want to make sure that I store all the data from the service. This allows me to add future features that rely on data I currently don’t use.

I am using PostgreSQL for data storage. So I started writing mapping classes for the JSON data and creating relational tables. While doing this I remembered a feature of Postgres that I wanted to try out for a long time now. Since version 9.2, Postgres allows storing JSON objects natively. Also since version 9.3 there are a lot of functions to work with JSON objects.

Hence I could kill two birds with one stone: Having persistent storage of the responses and querying the data I need directly from that storage.

Storing and Reading JSON

This following snippet (click here for GitHub Gist) shows an example table structure and queries to operate on it.

CREATE TABLE api_responses (
    name VARCHAR,
    body JSONB
);

INSERT INTO api_responses (name, body)
VALUES ('request-x', '{ "id": 1234, "data": { "people": [ { "name": "Moritz", "github": "FrontierPsychiatrist" }, { "name": "Alexander", "github": "alexhanschke" } ] } }');

SELECT people->'name', people->'github'
FROM api_responses, jsonb_array_elements(body->'data'->'people') people WHERE name = 'request-x';

Happily I deleted my mapping classes and started to research. It turns out inserting is not hard at all! For JDBC you need to add a typecast to the body column.

String sql = "INSERT INTO api_responses (name, body) VALUES (?, ?::jsonb)";

I am using jsonb here which has a slower insert time but more efficient storage and querying. Additionally it even allows indexing!

The expected JSON format for this example is:

{
  "id": 1331412,
  "data": {
    "people": [
      { "name": "Moritz", "github": "FrontierPsychiatrist" },
      { "name": "Alexander", "github": "alexhanschke" }
    ]
  }
}

Considerations

Of course I am aware that the flexibility and ease of this approach is not for free. The main concern I have is that I am now dependent on the structure of the response without any validation. If the property people suddenly is called accounts in newer responses my select will just not return any new data. This is the part you have to be sure to understand when using this approach. You have no real guarantee for the schema of the body column.

As with any caching I will have to think about invalidation.

Conclusion

In conclusion, for my use case storing the data as JSON in Postgres is a very good fit. I needed a cache that is not necessarily fast but supports mid- to long term storage and complex queries. Postgres allows this while retaining all the response data. So if I find use for another part of the response in the future I have it at my fingertips without the need to create a complex table structure upfront.