Skip to content

Comments

Add job_schema_version column#334

Merged
ZimbiX merged 14 commits intomasterfrom
add-que-version-column
Feb 25, 2022
Merged

Add job_schema_version column#334
ZimbiX merged 14 commits intomasterfrom
add-que-version-column

Conversation

@maddymarkovitz
Copy link
Contributor

@maddymarkovitz maddymarkovitz commented Feb 22, 2022

This is a change best made prior to merging #319.

We've added the job_schema_version column to the que_jobs table.
This will allow us to:

  • In Que 2.x, start recording the que_version as 2
  • Build a migration process from 1.x to 2.x whereby Que workers only process jobs that were enqueued with their version of Que.
  • In Que 2.x, we can delete the old poller index.

We're thinking this will require a major version bump due to the additional DB migrations but we're open to suggestions.

Copy link
Member

@ZimbiX ZimbiX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a quick read. I'll have a look at whether we need to check the version in any other queries

Copy link
Member

@ZimbiX ZimbiX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further changes are needed. Polling mode works (any v2 jobs enqueued prior to starting que remain un-run), but notify mode still runs a job with the wrong que_version.

Test

With app.rb of:

require 'bundler/setup'

require 'sequel'
require 'que'

DB = Sequel.connect(ENV['DATABASE_URL'])
Que.connection = DB

Que.migrate!(version: 5)

class MyJob < Que::Job
  def run
    puts 'Running MyJob'
  end
end

Start Que:

./auto/dev bash -c 'RUBYOPT= bundle exec ./bin/que ./app.rb'

Then enqueue a Que v2 job with SQL:

./auto/psql -c "insert into que_jobs (job_class, que_version) values ('MyJob', 2);"

Result:

Running MyJob

@ZimbiX
Copy link
Member

ZimbiX commented Feb 23, 2022

We should add specs for the behaviour too

@oeoeaio
Copy link
Contributor

oeoeaio commented Feb 23, 2022

Polling mode works (any v2 jobs enqueued prior to starting que remain un-run), but notify mode still runs a job with the wrong que_version.

Nice @ZimbiX, yes we will need some more thorough tests for this.

@ZimbiX ZimbiX self-assigned this Feb 23, 2022
@ZimbiX
Copy link
Member

ZimbiX commented Feb 23, 2022

I've added a failing spec.

I haven't had an easy time with the implementation. I see there as being a couple of options for addressing the notify ingestion:

  1. Make the que_job_notify PostgreSQL function select a worker which has a matching Que version

    Pros:

    • Efficient

    Cons:

    • The que_job_notify function is already complex, and we may introduce bugs
    • May require recording the Que version elsewhere in the database, e.g. the que_lockers table, to be able to tell which workers are suitable
  2. After the worker is notified of the new jobs, skip jobs which have the wrong Que version

    Pros:

    • Maybe simpler implementation

    Cons:

    • It's less efficient for a worker to be notified of a job, but then ignore it, and leave it to have to be picked up by a suitable worker doing a poll

    There doesn't seem to be an obvious place to put this. Maybe in the job_available message resolver? But the job_available message format (what's sent to the worker by the db function) only contains queue, id, run_at, and priority. Annoyingly, the job's other columns are only loaded after the job gets locked.

    Approaches:

    • Add another query to load the que_version column early
    • Change the job_available message format to include que_version. But that necessitates a que_job_notify function change. And may not be backwards-compatible
  3. Have multiple copies of the que_job_notify function - one for each Que version

    Pros:

    • Probably simpler than making the function smarter

    Cons:

    • Code duplication
    • Would resulting in another migration to deduplicate the function after the upgrade is done
  4. Get users to disable notify during the upgrade process

    Pros:

    • No more implementation to do

    Cons:

    • Relying solely on polling will cripple throughput
    • Some users may not be aware that they need to do this, and could end up with bad data

@ZimbiX
Copy link
Member

ZimbiX commented Feb 23, 2022

I thought I'd first have a go at updating que_job_notify, since it seemed like that'd produce the cleanest result. And it was actually a lot easier than I expected!

Now the que_version population and checks are dynamically based off the Que::VERSION major version. I thought this was preferable to needing to remember and update all the places we'd put the magic number 1 for when we release Que 2 soon. If we were to, at some point, release a later major version, this would cause potentially unnecessary isolation - but who knows what changes that version would be bringing, so I think that's fine - we can change it back at that point if we want to. Or perhaps this is worth documenting above the version number constant, so you see it when updating the version?

It passes the specs I added. Manually tested with poll, notify, and threads - working great! 🎉

Screenshot from 2022-02-23 22-25-01

Note that when migrating while an old worker is running, unfortunately the following error occurs in the worker:

#<Thread:0x000055eb45e09eb8 /home/brendan/Projects/que/lib/que/locker.rb:161 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	8: from /home/brendan/Projects/que/lib/que/locker.rb:195:in `block in initialize'
	7: from /home/brendan/Projects/que/lib/que/locker.rb:243:in `work_loop'
	6: from /home/brendan/Projects/que/lib/que/locker.rb:275:in `cycle'
	5: from /home/brendan/Projects/que/lib/que/locker.rb:315:in `poll'
	4: from /home/brendan/Projects/que/lib/que/locker.rb:315:in `each'
	3: from /home/brendan/Projects/que/lib/que/locker.rb:324:in `block in poll'
	2: from /home/brendan/Projects/que/lib/que/poller.rb:150:in `poll'
	1: from /home/brendan/Projects/que/lib/que/connection.rb:98:in `execute_prepared'
/home/brendan/Projects/que/lib/que/connection.rb:98:in `exec_prepared': ERROR:  cached plan must not change result type (PG::FeatureNotSupported)

I suspect this is unavoidable, and we should just advise that if you want to avoid the error, to briefly shut down the workers while the migration is taking place. Otherwise, it should be fine once restarted by the supervisor (e.g. Kubernetes).

@ZimbiX
Copy link
Member

ZimbiX commented Feb 23, 2022

Commands used in testing:

# Env for all terminals:

export DATABASE_URL=postgres://que:que@localhost:5697/que-test
export PGHOST=localhost PGPORT=5697 PGUSER=que PGPASSWORD=que PGDATABASE=que-test

# Setup:

docker ps -aq | xargs -r docker stop && docker container ls -aq | xargs -r docker rm && docker volume rm -f que_db-data && docker-compose up -d db # reset db

ruby -r bundler/setup -r que -r sequel -e "Que.connection = Sequel.connect(ENV['DATABASE_URL']); Que.migrate!(version: 5)"

# Terminal 1:

bundle exec ./bin/que --min-buffer-size=1 --max-buffer-size=1 --worker-count=2 --poll-interval=1 ./app.rb

# Terminal 2:

# Update the version in `lib/que/version.rb` to `2.0.0`, then start a second worker:
bundle exec ./bin/que --min-buffer-size=1 --max-buffer-size=1 --worker-count=2 --poll-interval=1 ./app.rb

# Terminal 3:

psql -c "insert into que_jobs (job_class, que_version) values ('MyJobForQue1', 1), ('MyJobForQue1', 1), ('MyJobForQue1', 1), ('MyJobForQue1', 1), ('MyJobForQue2', 2), ('MyJobForQue2', 2), ('MyJobForQue2', 2), ('MyJobForQue2', 2)"

app.rb:

require 'bundler/setup'

require 'sequel'
require 'que'

DB = Sequel.connect(ENV['DATABASE_URL'])
Que.connection = DB

class MyJobForQue1 < Que::Job
  def run
    puts 'Running MyJobForQue1'
  end
end

class MyJobForQue2 < Que::Job
  def run
    puts 'Running MyJobForQue2'
  end
end

To run just the spec that was failing:

TEST=lib/que/locker.spec.rb TESTOPTS="--name='/with listen only/' -v" bundle exec rake spec

@ZimbiX ZimbiX force-pushed the add-que-version-column branch from 464b2d9 to ffc4be2 Compare February 23, 2022 12:38
@ZimbiX
Copy link
Member

ZimbiX commented Feb 23, 2022

Rebased to resolve a merge conflict in the version file.

@ZimbiX ZimbiX marked this pull request as ready for review February 23, 2022 12:39
maddymarkovitz and others added 11 commits February 24, 2022 14:11
Co-authored-by: Rob Harrington <oeoeaio@gmail.com>
This can be used to indicate which version of the gem enqueued a job

Co-authored-by: Rob Harrington <oeoeaio@gmail.com>
Co-authored-by: Rob Harrington <oeoeaio@gmail.com>
…, incl. a failing one for listen

To run just the failing spec:

```bash
DATABASE_URL=postgres://que:que@localhost:5697/que-test TEST=lib/que/locker.spec.rb TESTOPTS="--name='/with listen only/' -v" bundle exec rake spec
```
… to take into account que_jobs.que_version

All that's changed from the previous version of `que_job_notify` is the addition of `AND ql.que_version = NEW.que_version`, in order to notify only a worker which has a Que version that matches what was used to schedule the job.
…till pass when the major version gets bumped
This more accurately reflects its purpose
To allow control over when it needs to change, rather than with every major version
@ZimbiX ZimbiX force-pushed the add-que-version-column branch from e7df05d to 60af431 Compare February 24, 2022 03:16
@ZimbiX
Copy link
Member

ZimbiX commented Feb 24, 2022

Rebased to resolve a merge conflict around the startup message.

@oeoeaio oeoeaio changed the title Add que version column Add job_schema_version column Feb 24, 2022
Copy link
Contributor

@oeoeaio oeoeaio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good, let's do it.

@ZimbiX ZimbiX force-pushed the add-que-version-column branch from 4a6577b to e186c8f Compare February 25, 2022 03:32
@ZimbiX ZimbiX merged commit ec66e60 into master Feb 25, 2022
@ZimbiX ZimbiX deleted the add-que-version-column branch February 25, 2022 05:06
@ZimbiX
Copy link
Member

ZimbiX commented Feb 25, 2022

Note: I've changed the advice around 3rd-party code using Que.job_schema_version in 18f9479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants