-
Notifications
You must be signed in to change notification settings - Fork 10
Fix vanishing thread data between deserialising and performing an ActiveJob instance #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
caius
wants to merge
2
commits into
Betterment:main
Choose a base branch
from
caius:cd/activejob_base_execute
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+56
−3
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to dig up, in an old archive repo, the original comment thread pertaining to the choice of
job.perform_nowvsActiveJob::Base.execute(job_data)in this custom job wrapper class.Basically, it's because of the
delegate_missing_to :jobcall (above), and how we wanted to support defining things likemax_attemptsandreschedule_at(and other DJ-specific interfaces) on the ActiveJob class, and have the wrapper be able to call into the ActiveJob class in order to expose those interfaces to the Delayed::Worker.And the implementation choice was made here to ensure that
jobis the same object instance as the thing that ultimately gets performed withinexecute(job_data). Otherwise, you'd have one instance of the ActiveJob (@job, memoized below) responding to the worker, and a different instance deserialized within ActiveJob's.executecall.So the implementation here was based on the assumption that the internal implementation of
executelooks something like this:And we wanted to avoid calling
deserialize(job_data)twice. Not just because it requires doing 2x the work, but because, in rare cases, you may actually want the legacy DelayedJob methods to be able to change their response depending on the result of theperform. I've only ever seen this use case once or twice, though. Usually, they are a pure function or return a static value, like this:However, it's possible to make them react to the results of the
performmethod:And this breaks if
@jobin the wrapper (which the worker can see) is a completely different object from the deserialized job inside of.execute.All of that said, I can see exactly what problem you are encountering. Because, if we deserialize the job outside of the
run_callbacksblock, callbacks that deal with setting up global state will swap out that global state from underneath the existingjobinstance. No bueno!The way I can think to resolve this is to essentially pull everything back into the
run_callbacksblock, but expose an attr_reader so that the instantiated@jobinstance can be read back out by the worker.This assumes that the worker doesn't need the job instance before it calls
perform. If it does... maybe there's still a double-instantiation that you'd have to do to make it happy (and then throw away the first instance you instantiated), but at the very least we can guarantee that any delegated method calls that occur after the job has completed will be delegated to the same instance of@jobthat actually just ran itsperformmethod.Does that make sense to you @caius?
(And, thank you for shining the flashlight on this very specific but very important piece of behavior. I learned a lot going down this rabbit hole, and I'm sorry I didn't get a chance to do it sooner!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Another choice we could make here would be to not support that very niche "read the ivar from the perform method to respond dynamically to the worker" kind of configuration method. Technically a breaking change, and I'm hesitant to make it if there's still a way to make the job instance internally consistent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry it's taken a while to get back to you on this.
Thanks for going code-spelunking in the history, it's much clearer to understand why choices were made previously - and makes sense as to why it's written the way it is now given the context. I suspected this might be a case of DelayedJob meets ActiveJob and they have subtly different behaviours that don't interlace nicely.
Ah, this is an interesting use case I'd not considered. In my mind callbacks are either pure, or respond based on data held outside the instance as the ruby objects are basically ephemeral within the worker's runloop. In some ways that's quite a neat idea though, and in theory the ivar could be included in
job_datato persist through different attempts of the job as well. Hrm. :thinking_face:I can definitely see benefits to trying to support both use cases here, but I think I also come down on the side of not wanting to make breaking changes if we can't. The ivar-callback pattern seems like a useful option to maintain, even if rarely used. Especially if you don't have (or need) to store that data somewhere outside the instance.
I started out agreeing, then thought about this some more and realised if that would work then it should work now as
JobWrapper#jobis already lazily loading the job data after . But then we wouldn't have seen the behaviour we did, so there must be something accessingJobWrapper#jobbeforeperformis invoked. If no messages are delegated to missing through theJobWrapperit wouldn't be loaded (ie,ActiveRecord::Base#deserialize) before entering the callback block inJobWrapper#perform.Looking at the worker class, we go through
Delayed::Runnable#start,Delayed::Worker#run!,Delayed::Worker#work_offthen descend into iterating the reserved jobs wherejobis anDelayed::Jobinstance, andjob#payload_objectwill return theDelayed::JobWrapperinstance - which at this point hasn't yet loaded@job.Presuming no Delayed callbacks defined in this example, we call through
Worker#run_thread_callbacksintoWorker#run_job, passing theDelayed::Jobinstance with#payload_objectstill not having loaded@jobyet.#run_jobthen invokes any Delayed perform callbacks and calls intoDelayed::Worker#run.The first thing we do in
Worker#runis generate a metadata hash by calling some methods on theDelayed::Jobinstance. Most of these are database columns, so won't trigger theJobWrapperloading the@jobivar. We do however calljob.namewhich isDelayed::Backend::Base#name, and checks ifpayload_objectresponds_to?a couple of things. Turns outdelegate_missing_toin turn delegatesrespond_to?(viarespond_to_missing?), which causes the@jobto be loaded to handle therespond_to?. This means the job is deserialized before we get tojob.invoke_joba couple of lines later inWorker#run.I've not used
delegate_missing_tobefore, so wrote a little in-memory reproduction to prove to myself that was the case:As you can see after calling
j.namethe JobWrapper instance gains the@jobinstance, becausedelegate_missing_tois callingjob.respond_to?which in turn then deserializes our application job instance.Then went and proved it was due to the
respond_to_missing?falling through:I think we're at a point where the two use cases just don't mesh in a way we can make happen, lazily deserializing the job instance outside of the callbacks means anything run during deserialize that affects things outside the job instance aren't guaranteed to persist through the ActiveJob callbacks being invoked. And if we deserialize the job instance again, we break any ivar-trickery that works with DelayedJob natively.
I'm half-tempted to suggest we shouldn't be doing anything that operates outside the job instance in
#deserializein our application code - currently we've worked around this issue by moving the loading logic forCurrent.services =into abefore_performcallback, which is working fine. The job gets deserialized before the ActiveJob callbacks are invoked, then the before_perform callback sets up the environment for us just before the#performmethod is invoked.I wonder if potentially we could move the ActiveJob callbacks up a level in Delayed, so job deserialization happens inside the callbacks even when Delayed callbacks are accessing the job instance. :thinking_face:
We'd have to keep supporting non-ActiveJob jobs being scheduled, so it would require branching in
Delayed::Job#run_thread_callbacksas the earliest point non-Delayed code could trigger job deserialization. And then theJobWrappercan just perform the job, knowing that the worker has already invoked the callbacks.Not sure what the knock-on effect of an ActiveJob callback erroring at that point would be vs the current behaviour though. I think currently that's handled by
Delayed::Job#runhavingrescue Exceptionwhich would be bypassed by moving it earlier. Potentially would have to catch errors inrun_thread_callbacksand invoke the error callbacks to maintain error handling parity.I wonder if an easier option might be to try and stop the
#runmetadata hash loading the job instance and document that any of the Delayed callbacks accessing the job can lead to ActiveJob's#deserializebeing called before the ActiveJob callbacks being executed, which can cause anything usingActiveSupport::CurrentAttributesto be wiped between#deserializeand#perform. I suspect the correct answer for our use case is to use abefore_performcallback as we are doing now, and treat#deserializeas something that operates on the job instance and nothing else.What do you think about interleaving the
ActiveJob::Callbacksearlier in the Worker, and having ActiveJob logic outside ofJobWrapper@smudge?