Skip to content

[api] add done event to signify when a single job is done so that we can hook into that#7

Open
jcrugzz wants to merge 1 commit intodominictarr:masterfrom
jcrugzz:job-done
Open

[api] add done event to signify when a single job is done so that we can hook into that#7
jcrugzz wants to merge 1 commit intodominictarr:masterfrom
jcrugzz:job-done

Conversation

@jcrugzz
Copy link
Contributor

@jcrugzz jcrugzz commented Aug 6, 2014

im open to a different event name if you want but I want to be able to hook into this to keep track of when individual jobs have been executed

@jcrugzz jcrugzz changed the title [api] add done event to signify when a single job is done so that we can [api] add done event to signify when a single job is done so that we can hook into that Aug 6, 2014
@dominictarr
Copy link
Owner

sounds reasonable - but can you give me a description of what you are using this for?
It's very useful to know how people are using a given feature.

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 6, 2014

@dominictarr a little wrapper around this essentially. https://github.com/jcrugzz/atomicize/blob/master/index.js

[edit]: and i thought code was better than description 🎿

@dominictarr
Copy link
Owner

hmm, are you sure that is how it should work?

level-trigger doesn't do that because if the job is already running, then it might be running with the old data, and you need to run it again to make sure the job is processed with the new input.

maybe this does make sense in your case, though, what does the data auctually represent?

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 6, 2014

@dominictarr in my case there are ephemeral messages that can be sent in rapid fire to do work as soon as possible. The work being done is fairly well defined so in my situation, queueing pending messages, even if they have new data, can cause unnecessary work. I'd rather drop all messages while the job is executing because there is a high probability messages will stop being received once the job is complete.

The work being done in my initial use case is sshing into particular machines to execute well defined commands. What I'm reducing here is the number of times the box gets sshed into as the data that actually changes is fetched externally outside of the "job" message. The only data in this specific use case that ends up changing is the IP address which would be allowed to act concurrently.

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 8, 2014

@dominictarr if there is no real issue here, i'd love a merge as I'd like to get rid of that dirty git dependency on my module ;). I'm currently using it in production with success. Let me know if you want anymore details though.

@dominictarr
Copy link
Owner

Well, I'm not really agaist merging, but the use case you are describing doesn't have the intended
properties of the job function - maybe there is a better way?

I ask because issues end up being documentation, so other people with similar problems might end up here and see this. So even if we end up merging this, we need a discussion about what is the best approach to this class of problem.

So the thing here is that you are triggering state in an external system from changes in level.
The idea with level-trigger jobs are that they are idempotent - i.e. they should have the same effect if they are run twice (or more) as they do when they run just once. Say for example, you trigger a write in an external database - maybe the process crashes after the write in the external db, but before level-trigger has registred the success. To be sure it runs the job again.

If you are just wrting transformed data to another database then just overwriting the old data will probably be fine - but if you are altering the state of an external system, like, say, starting a new server process or spinning up a new machine etc, then you could make that idempotent by first checking whether the job has already run, and if so exiting with a success.

You'll still ssh in twice, but you wont perform the actual work - if you could make your script work like that I think it should make your system overall more predictable.

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 9, 2014

@dominictarr I have no problem with the back and forth :). I find it a valuable exercise. Technically I am using level-trigger in a way that may not be as it was intended. In my case I willingly choose to ignore the ability to have pending operations be queued as a post optimization for my specific use. It just creates less churn as there could be many of the same messages coming in. The service itself already operates by ensuring the command cannot do any harm (by checking to see if it already worked as intended) and atomicize ensures this behavior with an api that I like.

I could have done the same by just using level-create-batch when inserting into the database as the locks technically take care of the possible race condition that could cause two jobs to be run simultaneously. This happens when two jobs with the same key manage to get inserted at the same time, overwrite each other but both hit the pre-hook and create two separate jobs since the timestamps are ensured to be monotonic as expected.

What I wouldn't get in this case is the ability to return the messages I want when managing my own state. This change is only really necessary because I want to to do this (which could be considered unnecessary) but is kind of nice to be able to have this introspection regardless even if my specific use of this added event is to satisfy my own OCD.

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 13, 2014

@dominictarr this is rebased if you want to consider merging ;).

@dominictarr
Copy link
Owner

I'm not yet persuaded that this is the best solution to your problem. To merge this, we will have to satisify both your OCD and mine ;)

It would help to have a more concrete idea of what your ssh scripts do and why an update is likely to happen many times in quick succession? my gut feeling here is that we may arrive at a better solution by reframing the problem.

@jcrugzz
Copy link
Contributor Author

jcrugzz commented Aug 15, 2014

@dominictarr main use case currently is the following.

  1. Balancer receives request for an app
  2. Proxy to app gets econnrefused
  3. we send a message over TCP to our metrics cluster
  4. Metrics cluster hits a web service with an http request with IP address
  5. web service ssh's into the box and finds that app died due to a memory leak as it wasn't restarted by the process runner
  6. App is then started back up over SSH

During this process we are holding the initial request and doing retries on a backoff. If this app is being accessed by a different person or multiple people that are hitting different balancers, many of these messages could be coming in at once until that first command then completes (which brings the app back up). This is basically a hack around people who aren't in the space to have certain awareness around their code until we can provide that awareness.

This was the solution I came up with that seemed simplest at the time and seems to be working well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants