Skip to content

Move default data bind mount from system directory to local directory#596

Closed
spwoodcock wants to merge 4 commits intogetodk:nextfrom
hotosm:build/move-data-dir
Closed

Move default data bind mount from system directory to local directory#596
spwoodcock wants to merge 4 commits intogetodk:nextfrom
hotosm:build/move-data-dir

Conversation

@spwoodcock
Copy link
Contributor

@spwoodcock spwoodcock commented Feb 7, 2024

See https://forum.getodk.org/t/odk-central-make-data-directory-configurable/44760/5

What has been done to verify that this works as intended?

I can start the compose stack without issue.

Why is this the best possible solution? Were any other approaches considered?

This offers flexibility of user configuration, plus a sane default.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Any users with data located under /data/transfer will have to either:

  1. Update their .env with DATA_DIR=/data/transfer.
  2. Migrate the data to ./data/transfer, relative to the git repo.
  3. Move the data anywhere else on their system and set DATA_DIR accordingly.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Yes, I will PR the docs after this is approved.

Before submitting this PR, please make sure you have:

  • branched off and targeted the next branch OR only changed documentation/infrastructure (master is stable and used in production)
  • verified that any code or assets from external sources are properly credited in comments or that everything is internally sourced

@alxndrsn
Copy link
Contributor

alxndrsn commented Feb 9, 2026

@spwoodcock or anyone else: any context on what /data/transfer is used for?

It looks like it was introduced in fe34e41, but I can't find any reference to it in any of the following repos:

@lognaturel
Copy link
Member

It's intended to be a convenience that gives us a single way to document how to do backup restores: https://docs.getodk.org/central-backup/#restoring-a-backup

@brontolosone is also looking at changes to backups so now's a good time to revive this, I think.

@brontolosone
Copy link
Contributor

It's intended to be a convenience that gives us a single way to document how to do backup restores: https://docs.getodk.org/central-backup/#restoring-a-backup

@brontolosone is also looking at changes to backups so now's a good time to revive this, I think.

If that's its sole purpose, then I wonder if we could get rid of it altogether — that busywork of shuffling files around, and thus the fuss around this mount and its defaults and migration strategy. Less is more!

From the backup restore docs, this is when it's used:

docker compose exec service node /usr/odk/lib/bin/restore.js /data/transfer/backup-2018-01-01T00:00:00Z.zip 'SECRET_PASSPHRASE'

So I'm thinking, we could spool the backup in via restore.js's stdin, along the lines of:

docker compose exec service -T node /usr/odk/lib/bin/restore.js - 'SECRET_PASSPHRASE' -- < /path/to/your/backup.zip

One can convince oneself that this works by running the following, which makes the service container's cat echo out what it receives on its stdin (-, technically optional, just added for clarity): docker compose exec -T service cat - -- < .env. The -- terminates docker's argv, so that we can tell the shell on our side to pipe in the .env file. Anyway, you get the idea.

On the central side we'd adapt the restore.js to, if - is specified as the "file", read from stdin, spool to a temporary file (sadly, the current backup format is not streamable), and then proceed as usual.

One sidecatch bonus effect of this approach is that things are easier to utilize, automate, and test; one could in fact curl the backup API of one host and pipe* the output straight into the restore command.

This way we could remove the /data/transfer from the docker circus, and we'd just need an update to restore.js and the restore command in the documentation. 🥳
Of course one may still bind some directory into the container, and pass the file path to restore.js, but one doesn't have to.

footnotes

*) Though due to the spooling, that's a fake streaming sensation: the central side needs to spool its stdin to a tempfile because the current archive format is not actually streamable (as the keys.json that's needed to decrypt each file's content is at the end of the archive).
A better designed format would put the key materials at the start of the archive, which is in fact possible with a small alteration to the way things are done (as we know what we're going to pack up, we could just generate the IVs beforehand and make keys.json the first archive member). And then we could stream the zip in, though still, due to the choice of pgdump format (directory), we'd still need a tempdir to decrypt the whole dump to in postgres format, before we can start pg_restore. And moreover we wouldn't so easily know beforehand if stdin will give us the improved streamable subformat, or the current unstreamable one (which we'd need to spool to a seekable file first).
And even more moreover, we don't want this archive format anyway, we want a more versatile and portable format that can (should one want to) be 100% streamed out one side and in on the other side in the first place, loosely: pg_dump --format=c | encrypt | /v1/backup api | http client | restore wrapper | decrypt | pg_restore.

@spwoodcock
Copy link
Contributor Author

I generally pipe + stdin for my pg_dump and pg_restore, so that sounds good to me!

Perhaps it looks more convoluted to an end user, but as long as there are well documented commands.

Plus the less bind mounts used, the better the transferability to Kubernetes 😃

@lognaturel
Copy link
Member

Great! How about we close this and @brontolosone you can remove the bind mount and update the restore process as part of the backup overhaul?

@spwoodcock spwoodcock closed this Feb 11, 2026
@spwoodcock spwoodcock deleted the build/move-data-dir branch February 11, 2026 18:54
@lognaturel
Copy link
Member

Thanks, @spwoodcock!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants