Skip to content

Feat: Voice activity threshold slider#492

Closed
hugohutri wants to merge 42 commits into
element-hq:full-meshfrom
hugohutri:main
Closed

Feat: Voice activity threshold slider#492
hugohutri wants to merge 42 commits into
element-hq:full-meshfrom
hugohutri:main

Conversation

@hugohutri
Copy link
Copy Markdown

@hugohutri hugohutri commented Aug 1, 2022

Adding voice activity threshold slider, since this is pretty important feature for voice calls, and similar feature can be found in Discord etc. If volume is below the threshold, the track will be muted.

Added features:

  • Reusable slider component
  • Voice activity threshold slider in the settings modal
  • Volume indicator in the slider track
  • Working voice activity detection
elem-vad.mp4

Requires:
matrix-org/matrix-js-sdk#2556

@@ -0,0 +1,68 @@
.slider[type="range"] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a slider in VideoTileSettingsModal, could you please use this there too for consistency?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍

This pull request is a bit messy atm. We will continue to refactor it but currently just a proof of concept level

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Slider is now the same as in LocalVolume, we also made that slider reusable, so we can use it anywhere we want in the future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the future the slider should be replaced with a slider from a library, since those have more functionality and have better accessibility

@dbkr
Copy link
Copy Markdown
Member

dbkr commented Aug 19, 2022

Some initial bits of feedback on this - mainly that it looks like this is done using the volume analysis: I would think that this is always going to miss the very start of the phrase since it will already have been discarded by the time it's triggered a volume event, so this approach might not work. I'm not an expert on how VAD is done 'properly' though (ie. without introducing latency to do the analysis).

(Also, it's "threshold" with an 'h'.)

@DashieTM
Copy link
Copy Markdown

Me and @hugohutri did both try it on our machines and it seemed to work fine, we can do some more testing ofc.
But other than that, do you have any other ideas on how we can accomplish it?
In theory it shouldn't be that much of a hassle, but we need to get the samples and this seemed to be the best way.

@hugohutri
Copy link
Copy Markdown
Author

Typo is now fixed. I haven't noticed any problems with the start of a phrase, and it seems to capture it.

@hugohutri hugohutri changed the title Draft: Voice activity treshold slider Feat: Voice activity threshold slider Aug 25, 2022
@hugohutri hugohutri marked this pull request as ready for review August 25, 2022 19:03
@hugohutri hugohutri requested a review from a team as a code owner August 25, 2022 19:03
@hugohutri
Copy link
Copy Markdown
Author

Signed-off-by: Hugo Hutri hugo.hutri98@gmail.com

@DashieTM
Copy link
Copy Markdown

Signed-off-by: Fabio Lenherr / DashieTM fabio.lenherr@gmail.com

@gaelledel
Copy link
Copy Markdown
Collaborator

@hugohutri @DashieTM and everyone else who is contributing to this, Thank you! I'm Gaelle, member of the product design team here at Element. I have reviewed the above and my conclusions are the following:

  1. What strikes me the most from a UX perspective is that you are proposing a settings that should apply to the general behaviour of a selected mic inside a call room settings. As this is a general settings, we should place the user in the right context in the system where general adjustments are done at a universal level rather than local level. So in order for us to integrate this work we'd need to create a general settings access point. The access point should be accessible via the Lobby screen. On the top right hand corner you will see your avatar > OnClick You currently have: ProfileName > Sign Out. We could therefore have instead: ProfileName > (cogwheel icon) Settings > Sign out. On Click settings > Show the same settings modal as room settings but with the addition of the slider in the Audio settings tab just like you have done above.

Screenshot 2022-10-05 at 14 38 54

  1. The first implementation with a slider and a defined threshold point seems most appropriate - I'm no acoustic expert but this guy here https://acousticnature.com/journal/which-microphone-sensitivity-is-better-dbv-vs-mv suggests the default should be at around -50db -60db

  2. I am a fan with the way Discord implemented their functionality - it is simple and effective visually and interaction wise

  3. I agree with adding the extra string of text for context under the slider to explain what the functionality does.

  4. A way to automatically determine mic sensitivity would be super useful. Similar to Discord functionality with the toggle switch - no brainer for the user the system does it!

Let me know if you'd like to sync in person as well - we can organise a call np

DashieTM and others added 5 commits October 6, 2022 08:30
Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com>
Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com>
Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com>
Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com>
Co-authored-by: Šimon Brandner <simon.bra.ag@gmail.com>
@SimonBrandner
Copy link
Copy Markdown
Contributor

(re-request my review when ready)

@DashieTM
Copy link
Copy Markdown

DashieTM commented Oct 7, 2022

Alright thanks @gaelledel for the reply!
I will reply to each point individually.

  1. When we first started the pr I had the same thought about it not being a global change since it only affects each room individually, however we also knew that at some point element call will be integrated into element itself, which is why we left it as is.
    If these settings on the top right will not be available in element web / element desktop / etc, then we would need to implement them in the element settings itself as well.
    This would also be the proper implementation in my opinion, as all element settings can then be changed in one place.
    We have already addressed some points from simon for now and we will update the branches on the weekend, then we can also move this to a new settings page on the top right.

  2. I absolutely agree, we still have the slider, the threshold can be changed back to 60.

  3. That's exactly where the idea comes from :D

  4. alright, We can always change it if needed later on.

  5. This would be nice, but the first time I looked at it, it was quite a task with all the math involved.
    We will see if we can implement it in this pr, but I will give it another look.

About syncing, I am open about whatever you would like to use.
For matrix the id is fabio.lenherr:matrix.org
Time zone is gmt + 1, writing is fine during the day, a call would probably be best in the evening or on thursdays.

@iakat
Copy link
Copy Markdown

iakat commented Jan 20, 2023

Related: #714 (Noise reduction)

@robintown
Copy link
Copy Markdown
Member

Hi @hugohutri and @DashieTM, really sorry to have let this PR languish. Unfortunately, the reality is that we don't have the product or design bandwidth to support a feature like this at the moment, given that it doesn't fit in with our roadmap items for the foreseeable future. To reflect this, I'm going to close the PR.

Please don't hesitate to reach out to us again in the public WebRTC room if you want to discuss other ways to contribute to Element Call. I think there's still a lot of opportunities where we'd be happy to let the community get involved, but they're generally more aligned to a video call use case, rather than voice-first.

@robintown robintown closed this Jul 19, 2023
@SplittyDev
Copy link
Copy Markdown

@robintown I honestly don't understand this decision. Even if it doesn't fit with the roadmap, this PR is basically done and has been for quite some time. Why not just merge it so the users wanting/needing it are happy, and move on with your roadmap?

@Neotamandua
Copy link
Copy Markdown

I'm quite disappointed with the progress of this PR, especially considering it's the only one I've been following in this repository. I've been regularly checking for availability on this feature, but no update came.

People contributed to this feature because they consider it important, and I share the same sentiment. Widely used applications like Discord have a similar feature to address background noise quickly. Video calls are in fact voice-first because you communicate via the voice, having a volume threshold is crucial for a smooth experience. The absence of this feature is the main reason I haven't been using element-call, and I know others who face the same issue. Additionally, I think this should be a straightforward feature accessible to all users, not just confined to an "advanced section".

It's unfortunate to see that something seemingly simple, even with the support of the open-source community, can't be implemented due to not enough "bandwidth".

@fkwp
Copy link
Copy Markdown
Contributor

fkwp commented Jul 20, 2023

Hi @Neotamandua @SplittyDev @fti7,

first of all, I would also like to apologise for being so hesitant with this longstanding PR, which also showcases that we have not taken it easy not to merge it. In fact, we had a passionate discussion about it several times. Element Call was designed to be a slim lightweight VoIP app which just works also for non-expert users. Since this proposed voice gate would require a decent amount of technical knowledge for a non-discord non-power users we decided against it.

Now coming to the actual functionality of this PR "Voice activity threshold / gate" and, if applicable, possilbe workarounds. At least on linux using the pipewire audio daemon the following projects provide you similar functionality:

  • easyeffects which is a bunch of audio filters (including a gate) controlled by a nice GUI
  • A virtual mic source using DeepFilterNet's ladspa plugin which provides a ML noise reduction on the level of krispAI

PS: Talking about 'we': I want make you aware that this was a team decision and @robintown is only the messenger.

@DashieTM
Copy link
Copy Markdown

DashieTM commented Jul 23, 2023

I hope this is not considered spam, and I am sorry for the late response, but I was a bit busy this week.

While I am of course, sad to hear that this feature will not be available in element call, I also can't say that this PR was maintained for long, we also stopped as we simply waited for a response. ( see the backend PR what I
mean with no longer maintained )
I guess the only thing I would hope is a bit more transparency for future PRs. It is quite frustrating to see a PR just sit still, but I of course understand that this was by far not the only PR.

At the end I want to thank all the nice devs, especially Simon for the helpful reviews! Also quite fast!

Some additional things:
I still believe that discord users are an untapped source for you guys here, I know they will not bring monetary value, but they could bring a larger userbase.
All they want essentially is a one click room join without any accept button and some voice features, if not VAD, then perhaps regular noise suppression etc.
Inspiration can always be taken from other open source projects such as mumble and revolt.

I wish you the best with this project, and I hope to see this in other matrix clients as well at some point, cheers.

@Neotamandua
Copy link
Copy Markdown

Hello @fkwp,

I have to disagree with you and everyone involved in making this decision.

Considering that Element Call was intended to be a slim and lightweight VoIP app, it is interesting to see support for screenshare, which undoubtedly is a valuable feature, but considerably more sophisticated than a simple volume threshold? Especially because it is designed for simplicity, a volume threshold serves as the simplest inbuilt solution, eliminating the need for complex ML noise reduction or other algorithms. This feature, being optional, serves to both technical and non-technical users. Non-power users who may not feel comfortable using it can simply choose not to, but depriving everyone of this option seems unreasonable.

Considering that this app is designed for VoIP, adding a volume threshold is in line with providing an all-encompassing experience. I have personally encountered instances where meetings suffered from poor voice quality, and a volume threshold would have been immensely helpful in rectifying the issue. Even for non-technical users, colleagues in the meeting could easily guide them (even with screenshare) to set the volume threshold correctly, enhancing the overall voice experience for everyone involved, making the whole application more appealing.

Coming back to discord again: It is a prime example of a mass adopted application with a user base also consisting of non-power users, which successfully implemented a volume gate. This feature has been a market standard, also adopted by successful predecessors like e.g., Teamspeak, making it a widely recognized and easily explainable tool for managing voice quality. I don't see how this needs a decent amount of technical knowledge in order to use it properly, especially given the previously mentioned fact, that other meeting attendees can help out.

I kindly recommend you to reconsider this decision.

@MLWeber
Copy link
Copy Markdown

MLWeber commented Sep 2, 2023

To summarize, I understand that the main objections against this feature are:

  1. Limited design bandwidth to support this features / features outside the roadmap (brought forward by @robintown)
  2. Element Call needs to just work for non-expert, non-discord, non-power users (@fkwp)
  3. Element Call should be kept lightweight (@fkwp)

Regarding point 1 @SplittyDev has raised an excellent point: Seeing how this PR is basically done, the bandwidth requirement should be minimal.

Regarding point 2 and 3, I fully agree with @Neotamandua and I would like to add a few more thoughts:

About point 2: I believe this argument is misplaced in this discussion, because the feature does not take away from the existing functionality or hamper it in any way. Non-experts can and likely will just ignore such a setting.

About point 3: While I understand that you want to keep Element Call as lightweight as possible, I do believe that the benefits of this feature by far outweighs the added complexity, because it solves a big fraction of a more complex problem (see also #714) in a very simple way. In none of the major voice or video conferencing softwares that I am using (this includes not only Teamspeak and Discord, but also more business-focused applications like Slack, Teams and Zoom), I hear as much typing, mouse clicking and other background noise as in Element Call. I expect silence when nobody is talking, especially since this is a solved problem virtually everywhere else (and has been solved for more than 15 years, e.g. with TeamSpeak 2).

With that, I would like to join @Neotamandua in politely asking you to reconsider this decision.

@eonrider
Copy link
Copy Markdown

I've been been following this PR since the beginning and I was pretty shocked to see it rejected. It's frustrating that Element is constantly being touted as an alternative to Discord and, while there is some truth in that, the things that make Discord Discord are constantly being ignored.

It's getting to the point that whenever I see a social media post from Element or one of the Matrix people comparing Element to Discord, my brain subconsciously edits the message to "better than generic messenger" because Element is leagues behind Discord despite only needing to implement a small number of features to get their foot in.

@hoptional
Copy link
Copy Markdown

I agree completely with the above couple of comments and I cannot wrap my head around how implementing these basic features is not one of their top priorities. I am absolutely certain that the moment element had discord / teamspeak / mumble style voice-only rooms with voice activation they would instantly gain tens if not hundreds of thousands of users. Just search for "self-hosted discord alternative" and you will see that there is a huge demand.

Instead, the focus seems to lie on becoming another competitor to slack or teams, which in my opinion is a lost cause, especially if they don't already have a large userbase that would advocate for element. The only explanation I can come up with is that it is easier to secure investments with a business focused target userbase. I think this can never succeed.

Sorry for the off-topic rant. It is so frustrating to see a project with so much potential go absolutely nowhere due to narrow-minded strategic decisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-Needs-Design May require input from the design team X-Needs-Product More input needed from the Product team

Projects

None yet

Development

Successfully merging this pull request may close these issues.