Skip to content

Conversation

@ChunMinChang
Copy link
Member

This solves the deadlock mentioned in BMO 1572273 comment 6. It's very
likely to have a deadlock when the output callback tries to stop the
input AudioUnit when the input device is unplugged during the duplex
stream is working.

When the input device is unplugged, the stream will be reinitialized and
the AudioUnits will be stopped then. Hence we could only feed silence to
the output buffer and let stream reinit stop the AudioUnits.

@kinetiknz kinetiknz requested a review from achronop August 13, 2019 23:16
@achronop
Copy link
Contributor

Thanks, this is exactly what we discussed. A new question came up and I would like your opinion. If the stream has drained, should reinit start the newly updated stream? With that fix, does the status callback report a drain and then the stream restarts?

@achronop
Copy link
Contributor

Also, I am looking at the rust code and the logic there seems similar to this backend. I am wondering why the error does not reproduce in rust. Is it due to different timing or I am missing something?

@ChunMinChang
Copy link
Member Author

ChunMinChang commented Aug 14, 2019

Also, I am looking at the rust code and the logic there seems similar to this backend. I am wondering why the error does not reproduce in rust. Is it due to different timing or I am missing something?

Technically, in the Rust backend, the deadlock might be formed. The GraphDriver will initiate a stream stop/shutdown task and then set the stream to drain in an output callback when it detects the device switching. Then, if the next output callback comes before the stream stop/shutdown task, the next output callback will try requiring the CAMutex for the input AudioUnit and then forms a deadlock.

I'll update the Rust code soon.

@ChunMinChang
Copy link
Member Author

With that fix, does the status callback report a drain and then the stream restarts?

Yes. The drain is set by GraphDriver so the stream state is still drained. With the fix, the output callback will feed the silence to output buffer directly without stoping the AudioUnit and canceling the callback. Next, the stream reinit will work as usual. It will stop and then restart the AudioUnits. Finally, (technically) there are two cases to stop the stream and cancel the callback.

  1. if the stream shutdown task comes before the output callback, then the AudioUnits will be stopped and there are no callbacks anymore.
  2. if the output callback comes before the stream shutdown task, output callback will stop the AudioUnits since the state is still drained.

If the stream has drained, should reinit start the newly updated stream?

I could add a check for stm->draining. If it's drained, reinit can simply stop the AudioUnits and do nothing more. In that case, the output callback won't be fired and stream shutdown task will be called without racing with output callback.

@achronop
Copy link
Contributor

Thank you for the information so far. I gave it a try this morning and I found out that the drain logic inside the callback is odd. I create a couple of patches to improve it and to solve this error. You can find the patches here. Can you please test them and tell me if the problem is solved. If I had your device I would have tested myself. If the problem is solved please let me know what you think about the patches and if I should push them for review.

This solves the deadlock mentioned in BMO 1572273 comment 6. It's very
likely to have a deadlock when the output callback tries to stop the
input AudioUnit when the input device is unplugged while the output
device is unchanged during a WebRTC call.

One of the thread forming the deadlock is internal so we don't have any
control on it. The only way to free the deadlock is to prevent the
output callback to require the CAMutex tied to the input AudioUnit.

With the change, the input AudioUnit will be stopped in its own callback
so the deadlock won't be formed.
@ChunMinChang
Copy link
Member Author

Thank you for the information so far. I gave it a try this morning and I found out that the drain logic inside the callback is odd.

Sorry, I change my mind. I think adding the reinit stuff into callback makes things more complicated.

I think we can stop the AudioUnits in their own callbacks. The callback will definitely hold the corresponding CAMutex so deadlock won't be formed by requiring that CAMutex.

The try result is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fdc52c30096e03312535bb507774db6d73684d64

You can find the patches here.

I think it's good to add a force-drain test, but how do you simulate the unplug event that causes the deadlock?

@ChunMinChang
Copy link
Member Author

This solves the deadlock mentioned in BMO 1572273 comment 6.

Sorry for the force update. I should leave it as a reference for the discussion. The original patch could be found here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b2368127419bb229a9a42fd07fb2883ca4a580f

@achronop
Copy link
Contributor

I think it's good to add a force-drain test, but how do you simulate the unplug event that causes the deadlock?

I do not simulate that. Even if I tested unplugging manually, I am unable to test your scenario without your device. I would rely on you to test my patches with your device if you can.

@achronop
Copy link
Contributor

achronop commented Sep 4, 2019

Since #529 confused the things let's finish this one first and I'll take care of it later. I was rushing to push everything before PTO and mixed them all together.

Regarding the deadlock, the fix looks good, since you report that is gone. My concern is about restarting the stream when we should not. Since the stream has drained we need to end up with a drained stream.

I am thinking of using the stm->reinit_pending flag to exit early any input/output callback during reinit. This will allow the drain to appear after the end of reinit and the new (reinited) stream will be drained/stopped as expected.

Let me know what you think and ping me if you have questions. Thanks!

@achronop
Copy link
Contributor

achronop commented Sep 4, 2019

I am looking at it further and I am realizing that the drain after reinit problem mentioned above exists regardless of your fix. Thus I'll go on and r+ this one. I will raise an issue for the reinit problem.

Copy link
Contributor

@achronop achronop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@achronop
Copy link
Contributor

This is ready, I'm gonna merge it. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants