Table of contents
No headings in the article.
When we hear about our data being encrypted, kept safe from prying eyes, we're reassured. Or, we could be skeptical and seek some empirical proof that the data is indeed encrypted. That's the motivation behind this brief exercise, to show the difference between unencrypted and encrypted data in CockroachDB. This post deals with on disk encryption, as described in the CockroachDB docs.
What we seek to illustrate here is that, initially, when the files aren't encrypted, we can easily read the data files and retrieve strings we've loaded into a table but, once we've enabled on disk encryption, we're no longer able to do this. Let's get started ...
Start up a single node CockroachDB instance (downloads available here):
$ cockroach start-single-node --insecure --listen-addr=localhost:26257 --http-addr=localhost:8080 --background
The cockroach
binary is installed in /usr/local/bin
, which is in my $PATH
(I'm doing this on a MacBook computer).
Connect using some PostgreSQL compatible client:
$ psql postgres://root@localhost:26257/defaultdb
Create a very simple table:
defaultdb=> create table t1 (id uuid primary key default gen_random_uuid(), msg string);
CREATE TABLE
Time: 134.324 ms
Insert one specific row into this table, followed by 100k rows of random data (so we get some SST files on disk):
defaultdb=> insert into t1 (msg) values ('Sensitive data I want to protect');
INSERT 0 1
Time: 169.029 ms
defaultdb=> insert into t1 select gen_random_uuid(), substr(md5(random()::text), 0, 25) from generate_series(1, 100000);
INSERT 0 100000
Time: 3421.874 ms (00:03.422)
Exit that SQL client:
defaultdb=> \q
$
Change to the data directory and then run ls
to see which files exist:
$ cd cockroach-data/
$ ls
000004.log 000025.sst STORAGE_MIN_VERSION cockroach.listen-addr
000005.log 000026.sst auxiliary cockroach.sql-addr
000006.log CURRENT cockroach-temp937546811 logs
000007.log LOCK cockroach.advertise-addr marker.format-version.000003.004
000011.log MANIFEST-000001 cockroach.advertise-sql-addr marker.manifest.000001.MANIFEST-000001
000021.log OPTIONS-000003 cockroach.http-addr temp-dirs-record.txt
Of these, the interesting ones for this discussion are the ones which contain data: the *.log
files are the write-ahead logs (WAL) and the *.sst
files are the "sorted string table" (hence the SST extension) files, where the data ultimately is persisted. ( This reference provides a much more detailed discussion of these files.)
Since we are currently running without on-disk encryption, we expect to be able to somehow read our stored values. To do this, we can use the strings
command to read the .log files:
$ for log in *.log ; do strings $log | grep Sensitive && echo "LOG: $log" ; done
& Sensitive data I want to protect
& Sensitive data I want to protect
LOG: 000006.log
... and a built-in CockroachDB command to read the .sst files:
$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
/Table/20/1/52/726137662145003521/0/1641670028.297543000,0#0,SET "ϦK\f\n6\b__auto__\x1d\x04\x01\x03\x01\x04\x18\x98\x96ϝ\f\xb0\x95\xe1\x9b\x02\x13\x02\x13\x02\x13\x00\x169\n'\b\x01\x1a#\x12Sensitive data I want to protect\x00\x01\x12\f\b\a\x10\x00\x18\x000\x00P\x19`\x00\x18\x01"
SST: 000018.sst
/Table/52/1/"\xb6)%\xbeCAL\x18\xa3\am2\x02:\f\x91"/0/1641669970.184220000,0#0,SET "\xfafh\xf9\n& Sensitive data I want to protect"
SST: 000019.sst
Ok, that's reassuring. We can read the sensitive data in the data files when they aren't encrypted. The next step is to follow the instructions and encrypt our storage.
Change back into the original directory:
$ cd -
First, we generate an encryption key:
$ cockroach gen encryption-key -s 128 aes-128.key
successfully created AES-128 key: aes-128.key
Next, we kill the running cockroach
process:
$ pkill cockroach
$ initiating graceful shutdown of server
server drained and shutdown completed
Finally, we start it back up, passing an additional command line argument specifying we want to encrypt the data, along with the path to the newly generated key:
$ cockroach start-single-node --insecure --listen-addr=localhost:26257 --http-addr=localhost:8080 --background --enterprise-encryption=path=cockroach-data,key=aes-128.key,old-key=plain
Now to run the final part of the experiment. We'll change into the data directory again and they run the command used earlier to look for our string in the .log
files as well as the one for the .sst
files:
$ cd cockroach-data/
$ for log in *.log ; do strings $log | grep Sensitive && echo "LOG: $log" ; done
$
$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
$
That second command, the one where you're looking for the string "Sensitive" in the SST files, could potentially return a match. If this happens, it's because these files haven't yet been rewritten. A simple way to get them to be rewritten is to simulate a workload on the database by inserting another 100k rows into this table:
$ psql postgres://root@localhost:26257/defaultdb
Timing is on.
psql (13.2, server 13.0.0)
Type "help" for help.
defaultdb=> insert into t1 select gen_random_uuid(), substr(md5(random()::text), 0, 25) from generate_series(1, 100000);
INSERT 0 100000
Time: 2994.137 ms (00:02.994)
Having done that, re-run that check against the SST file and you shouldn't see a match:
$ for sst in *.sst ; do cockroach debug pebble sstable scan $sst | grep Sensitive && echo "SST: $sst" ; done
$
That's it for this note. Thank you for reading and I hope this we helpful.