I’ve been looking on how to implement a history for entities in my thesis project. Essentialy, I’d need an endpoint which would return the timestamps and actions performed on sites’s data, to be displayed as: ‘[timestamp] User X updated Y, Z’, ‘[timestamp] User Y removed attachments..’ and so on.

Useful video on audit tables in database design: The Best Way To Add Audit Tables to Your Database - YouTube The approach of shadow tables that mimic the structure of the table we’re auditing seemed like a good approach. However, the sites also have a m:m relationship with taxonomies and localities, which begs the question whether to audit those tables or to store these relationships for the sites.

A more flexible way, which was also mentioned in the comments of the YouTube video, is to have a general audit table which stores jsonb for the entities. This was showcased in this Supabase article: Postgres Auditing in 150 lines of SQL. Essential to this is to have triggers instead of modifyig the service code for audit.

The challenging part is that we do not have just one table for sites that captures all the information about this entity but other joining tables.

The final approach I took was storing deltas between the versions instead of full snapshots. There are triggers that fire on the main sites table, as well as sites_images, sites_documents, sites_research, sites_taxonomies, sites_localities — you get the gist. The call the same function audit_trigger_func(). The audit table has columns for the table name, entity id being the site’s id, since we fetch the version history for sites as of now, operation (CREATE, UPDATE, DELETE), the JSON changes structure, the ID of the user who changed this, the time, and the related entity information. The latter concerns child rows (image, document, research, taxonomy, locality, null for the sites table itself).

The shape for the changes column was refined over time until I got it to fully capture everything I needed for the API response, and UI messages. This was the most challenging part of this feature for me, especially considering all the sibling tables and providing a clear, comprehensive change messages on the frontend.

For scalar field diffs used on the sites table itself, we capture this:

{ fieldKey: string; from?: unknown; to?: unknown }

For soft-deletion we get this:

[{ "fieldKey": "deleted", "from": false, "to": true }]

For FK fields in the sites table corresponding to taxonomies, the trigger resolves the name too, not just the ID, just like we resolve it for the child table rows.

For the child table rows, the shape is like this:

{ fieldKey: string; added: Array<{ id: string; [key: string]: unknown }> | null; removed: ... | null }

When adding a new image, it looks like this:

[{
  "fieldKey": "images",
  "added": [{ "id": "...", "fileKey": "photo.jpg", "caption": "...", "imageType": "general_photograph", ... }],
  "removed": null
}]

The fields in the added and removed objects are determined based on the information we store for each type. Images have a different structure than taxonomies, for example. Furthermore, regular UPDATEs that produce no meaningful diffs are skipped.

I also got to learn about Postgres session variables, which I used to set the ID of the current user performing the change.