CockroachDB: An Experiment With on Disk Encryption

CockroachDB: An Experiment With on Disk Encryption

Table of contents

No heading

No headings in the article.

When we hear about our data being encrypted, kept safe from prying eyes, we're reassured. Or, we could be skeptical and seek some empirical proof that the data is indeed encrypted. That's the motivation behind this brief exercise, to show the difference between unencrypted and encrypted data in CockroachDB. This post deals with on disk encryption, as described in the CockroachDB docs.

What we seek to illustrate here is that, initially, when the files aren't encrypted, we can easily read the data files and retrieve strings we've loaded into a table but, once we've enabled on disk encryption, we're no longer able to do this. Let's get started ...

Start up a single node CockroachDB instance (downloads available here):

$ cockroach start-single-node --insecure --listen-addr=localhost:26257 --http-addr=localhost:8080 --background

The cockroach binary is installed in /usr/local/bin, which is in my $PATH (I'm doing this on a MacBook computer).

Connect using some PostgreSQL compatible client:

$ psql postgres://root@localhost:26257/defaultdb

Create a very simple table:

defaultdb=> create table t1 (id uuid primary key default gen_random_uuid(), msg string);
CREATE TABLE
Time: 134.324 ms

Insert one specific row into this table, followed by 100k rows of random data (so we get some SST files on disk):

defaultdb=> insert into t1 (msg) values ('Sensitive data I want to protect');
INSERT 0 1
Time: 169.029 ms
defaultdb=> insert into t1 select gen_random_uuid(), substr(md5(random()::text), 0, 25) from generate_series(1, 100000);
INSERT 0 100000
Time: 3421.874 ms (00:03.422)

Exit that SQL client:

defaultdb=> \q
$

Change to the data directory and then run ls to see which files exist:

$ cd cockroach-data/
$ ls
000004.log                000025.sst                STORAGE_MIN_VERSION            cockroach.listen-addr
000005.log                000026.sst                auxiliary                cockroach.sql-addr
000006.log                CURRENT                    cockroach-temp937546811            logs
000007.log                LOCK                    cockroach.advertise-addr        marker.format-version.000003.004
000011.log                MANIFEST-000001                cockroach.advertise-sql-addr        marker.manifest.000001.MANIFEST-000001
000021.log                OPTIONS-000003                cockroach.http-addr            temp-dirs-record.txt

Of these, the interesting ones for this discussion are the ones which contain data: the *.log files are the write-ahead logs (WAL) and the *.sst files are the "sorted string table" (hence the SST extension) files, where the data ultimately is persisted. ( This reference provides a much more detailed discussion of these files.)

Since we are currently running without on-disk encryption, we expect to be able to somehow read our stored values. To do this, we can use the strings command to read the .log files:

$ for log in *.log ; do strings $log | grep Sensitive && echo "LOG: $log" ; done
& Sensitive data I want to protect 
& Sensitive data I want to protect
LOG: 000006.log

... and a built-in CockroachDB command to read the .sst files:

$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
/Table/20/1/52/726137662145003521/0/1641670028.297543000,0#0,SET "ϦK\f\n6\b__auto__\x1d\x04\x01\x03\x01\x04\x18\x98\x96ϝ\f\xb0\x95\xe1\x9b\x02\x13\x02\x13\x02\x13\x00\x169\n'\b\x01\x1a#\x12Sensitive data I want to protect\x00\x01\x12\f\b\a\x10\x00\x18\x000\x00P\x19`\x00\x18\x01"
SST: 000018.sst
/Table/52/1/"\xb6)%\xbeCAL\x18\xa3\am2\x02:\f\x91"/0/1641669970.184220000,0#0,SET "\xfafh\xf9\n& Sensitive data I want to protect"
SST: 000019.sst

Ok, that's reassuring. We can read the sensitive data in the data files when they aren't encrypted. The next step is to follow the instructions and encrypt our storage.

Change back into the original directory:

$ cd -

First, we generate an encryption key:

$ cockroach gen encryption-key -s 128 aes-128.key
successfully created AES-128 key: aes-128.key

Next, we kill the running cockroach process:

$ pkill cockroach
$ initiating graceful shutdown of server
server drained and shutdown completed

Finally, we start it back up, passing an additional command line argument specifying we want to encrypt the data, along with the path to the newly generated key:

$ cockroach start-single-node --insecure --listen-addr=localhost:26257 --http-addr=localhost:8080 --background --enterprise-encryption=path=cockroach-data,key=aes-128.key,old-key=plain

Now to run the final part of the experiment. We'll change into the data directory again and they run the command used earlier to look for our string in the .log files as well as the one for the .sst files:

$ cd cockroach-data/
$ for log in *.log ; do strings $log | grep Sensitive && echo "LOG: $log" ; done
$
$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
$

That second command, the one where you're looking for the string "Sensitive" in the SST files, could potentially return a match. If this happens, it's because these files haven't yet been rewritten. A simple way to get them to be rewritten is to simulate a workload on the database by inserting another 100k rows into this table:

$ psql postgres://root@localhost:26257/defaultdb
Timing is on.
psql (13.2, server 13.0.0)
Type "help" for help.

defaultdb=> insert into t1 select gen_random_uuid(), substr(md5(random()::text), 0, 25) from generate_series(1, 100000);
INSERT 0 100000
Time: 2994.137 ms (00:02.994)

Having done that, re-run that check against the SST file and you shouldn't see a match:

$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
$

That's it for this note. Thank you for reading and I hope this we helpful.