Thinking about the Outbox Pattern and CockroachDB

Motivation

I certainly don't claim to be an expert on the Outbox Pattern but, as I understand it, one key feature is that it involves rows written into a database table which is configured to emit events into a change feed. These rows are ephemeral in the database table; their chief function is to aggregate some data and trigger that CDC event.

A couple of recent or upcoming (in version 22.2) CockroachDB features seem particularly well suited to this pattern:

  1. Row level time to live (TTL, already available)
  2. Changefeed expressions (coming in v. 22.2)

It was with that in mind that I started experimenting with what's shown below.

Experiment

Connect using the psql CLI:

$ psql postgres://root@localhost:26257/defaultdb
Timing is on.
psql (14.5 (Homebrew), server 13.0.0)
Type "help" for help.

I'm using the 22.2 alpha because of its support of "changefeed expressions":

defaultdb=> SELECT version();
                                           version                                            
----------------------------------------------------------------------------------------------
 CockroachDB CCL v22.2.0-alpha.3 (x86_64-apple-darwin19, built 2022/09/12 16:13:44, go1.18.4)
(1 row)

In creating the outbox table, I used a sequence as the source of the n column just to show things in order in the SELECT. Note the WITH ... clause which specifies the TTL for the row (10 seconds) along with a cron-like expression for how often to purge the expired rows (every 3 minutes). Note that the cron expression is flexible; it can, for example, be set to a specific time of day: ttl_job_cron = '0 21-23 * * *' to run at 9, 10, and 11 PM.

DROP SEQUENCE IF EXISTS outbox_sequence CASCADE;
CREATE SEQUENCE outbox_sequence CACHE 1000;

DROP TABLE IF EXISTS outbox;
CREATE TABLE outbox
(
  id UUID NOT NULL PRIMARY KEY DEFAULT gen_random_uuid()
  , n INT NOT NULL DEFAULT nextval('outbox_sequence')
  , last_updated TIMESTAMP NOT NULL DEFAULT now()
  , message TEXT NOT NULL
) WITH (ttl_expire_after = '10 s', ttl_job_cron = '*/3 * * * *');

The changefeed uses an HTTP endpoint to a short Python script which just prints each of the CDC events. It specifies an action to take on_error and that the rows shouldn't be GC'd when this is paused, so that, if the endpoint isn't available, the events will not be lost. The definition also includes an expression (new in 22.2) which selects only the message column and the n column, separated by a space. Finally, it specifies to skip deleted rows, since we don't need to see an event when the TTL expression (above) removes rows.

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE CHANGEFEED INTO 'http://localhost:3000/'
WITH on_error=pause, protect_data_from_gc_on_pause, schema_change_policy=stop, diff
AS
  SELECT message || ' ' || n::TEXT
  FROM outbox 
  WHERE NOT cdc_is_delete();

Next, three rows are inserted into the table:

INSERT INTO outbox (message) VALUES ('Message');
INSERT INTO outbox (message) VALUES ('Message');
INSERT INTO outbox (message) VALUES ('Message');

The corresponding output from the Python script looks like this:

{
  "after": {
    "column_1": "Outbox message 1"
  },
  "before": null,
  "key": [
    "b325baa1-b5c8-4279-8ca3-1d827316705b"
  ]
}
{
  "after": {
    "column_1": "Outbox message 2"
  },
  "before": null,
  "key": [
    "0f2d1d48-7f69-4588-93b4-d744da8ea889"
  ]
}
{
  "after": {
    "column_1": "Outbox message 3"
  },
  "before": null,
  "key": [
    "c51b9e12-2711-4316-9a98-f1075b67912a"
  ]
}

From there, we do a SELECT on the outbox table, then sleep for a bit longer than the 3 minute interval specified in the TTL config, then re-run that same SELECT:

SELECT * FROM outbox ORDER BY n ASC;
                  id                  | n |        last_updated        |    message     
--------------------------------------+---+----------------------------+----------------
 b325baa1-b5c8-4279-8ca3-1d827316705b | 1 | 2022-09-23 14:36:58.484504 | Outbox message
 0f2d1d48-7f69-4588-93b4-d744da8ea889 | 2 | 2022-09-23 14:36:58.651621 | Outbox message
 c51b9e12-2711-4316-9a98-f1075b67912a | 3 | 2022-09-23 14:36:58.691519 | Outbox message

SELECT pg_sleep(188);
 pg_sleep 
----------
 t
(1 row)

SELECT * FROM outbox ORDER BY n ASC;
 id | n | last_updated | message 
----+---+--------------+---------
(0 rows)

And that illustrates the effect of row level TTL and how it can be applied to the case of the Outbox Pattern.

I appreciate you taking the time to review this experiment. See you again, soon.