Add CAM++ speaker verification/embedding model by hamzaq2000 · Pull Request #3 · FluidInference/mobius

hamzaq2000 · 2025-09-22T00:48:59Z

CAM++ is an efficient speaker embedding model that I use in my diarization pipeline, Senko.

This PR adds a conversion script for it from torch to CoreML, as well as a test script to verify correctness and benchmark inference speed vs torch.

Bharat0091 · 2025-09-22T17:51:08Z

models/emb/cam++/coreml/camplusplus.py

@@ -0,0 +1,210 @@
+# https://github.com/modelscope/3D-Speaker/tree/main/speakerlab/models/campplus


Should we add dependency here instead of copying the model ?

Any specific changes you had to do to make it CoreML friendly ?

That would be cumbersome, imo.
Yeah I did put comments in camplusplus_coreml.py for changes that were made.

It looks like they might not have a pip package, copying is fine as well. Next time could even just build off a cloned repo if thats easier.

Though it would be good to include the commit hash you worked off for future references

Ok, added the commit hash.

Bharat0091 · 2025-09-22T17:55:17Z

models/emb/cam++/coreml/convert.py

+
+    input_type = ct.TensorType(
+        name="input_features",
+        shape=(BATCH_SIZE, FIXED_FRAMES, FEATURE_DIM),


Maybe having BATCH_SIZE to be RangeDim, so that inference can be done on dynamic batch size.

Dynamic batch size hurts performance from my testing.

Bharat0091 · 2025-09-22T17:55:59Z

models/emb/cam++/coreml/convert.py

+    coreml_model.output_description["embeddings"] = f"Speaker embeddings: ({BATCH_SIZE}, {EMBEDDING_DIM})"
+
+    # Save the model
+    output_path = "./models/camplusplus_batch16.mlpackage"


nit: replace 16 with BATCH_SIZE

Oh good catch, I'll add that.

Bharat0091 · 2025-09-22T18:04:39Z

models/emb/cam++/coreml/test.py

+
+warnings.filterwarnings('ignore')
+
+def extract_fbank_features(waveform, sample_rate=16000):


Not for this PR, is it possible to move this to CoreML as separate model ? Else if someone wants to implement in Swift, they have to implement fbank computation in Swift.

In my Senko pipeline, I do fbank extraction efficiently in C++. So perhaps I can link that in here? Not sure if that should be part of Mobius though; what do you think?
This test script I kept in pure python just as an example of how to use the CoreML model, not for production deployment.

Yeah, please link Senko. It would be useful to show how the model could be used.

I think what Bharat is asking is if its possible to freeze the fbank operations into a CoreML model as well. Could be beneficial for Senko as well so you can strip out the C++ code.

But I would say its optional, not a blocker for the PR. Fbank is probably simple enough to vibe code in Swift and from what we've seen FFT/STFT operations don't help much being in CoreML, it might be faster to use Accelerate via Swift

Ok, linked Senko.

Very interesting; didn't know FFT/STFT operations aren't sped up much by CoreML. I have looked at Accelerate in the past. I think optimizing Fbank extraction using that will be my next optimization target. If I end up doing that, I'll create another PR to contribute that here as well.

Bharat0091 · 2025-09-22T18:09:14Z

CAM++ is an efficient speaker embedding model that I use in my diarization pipeline, Senko.

This PR adds a conversion script for it from torch to CoreML, as well as a test script to verify correctness and benchmark inference speed vs torch.

Thanks for this PR @hamzaq2000 . Gave some highlevel comments. Once we resolves these, will review in detail.

BrandonWeng

BrandonWeng · 2025-09-23T05:00:56Z

models/emb/cam++/coreml/camplusplus.py

@@ -0,0 +1,210 @@
+# https://github.com/modelscope/3D-Speaker/tree/main/speakerlab/models/campplus


It looks like they might not have a pip package, copying is fine as well. Next time could even just build off a cloned repo if thats easier.

Though it would be good to include the commit hash you worked off for future references

BrandonWeng · 2025-09-23T13:38:05Z

models/emb/cam++/coreml/test.py

+
+warnings.filterwarnings('ignore')
+
+def extract_fbank_features(waveform, sample_rate=16000):


Yeah, please link Senko. It would be useful to show how the model could be used.

I think what Bharat is asking is if its possible to freeze the fbank operations into a CoreML model as well. Could be beneficial for Senko as well so you can strip out the C++ code.

But I would say its optional, not a blocker for the PR. Fbank is probably simple enough to vibe code in Swift and from what we've seen FFT/STFT operations don't help much being in CoreML, it might be faster to use Accelerate via Swift

hamzaq2000 · 2025-09-23T15:33:56Z

Ok, added the 3D-Speaker commit hash.

Linked Senko fbank_extractor C++ code as well, for production deployment.

Wanted to ask, since 3D-Speaker code will be part of the repo, shall I also create a THIRD_PARTY_LICENSES file in the root of the repo, with 3D-Speaker credit & the license text?
Or if that's too much, then I can make convert.py clone the 3D-Speaker repo and use the CAM++ model definition from that.

BrandonWeng · 2025-09-23T15:48:50Z

3D-Speaker

Good question - a copy of their license in the root of models/emb/cam++/ should be sufficient. The idea is that each folder is its own isolated environment/copy

hamzaq2000 · 2025-09-23T19:09:13Z

Great, added the attribution and license text in models/emb/cam++/THIRD_PARTY_LICENSES.

CAM++ CoreML conversion & test scripts

bb719cf

Bharat0091 reviewed Sep 22, 2025

View reviewed changes

Use BATCH_SIZE variable in model file name

00efbf0

BrandonWeng approved these changes Sep 23, 2025

View reviewed changes

added CAM++ 3D-Speaker commit hash + link Senko C++ fbank_extractor

1f13800

add 3D-Speaker license text

cf295a5

BrandonWeng merged commit 7a92545 into FluidInference:main Sep 23, 2025

		@@ -0,0 +1,210 @@
		# https://github.com/modelscope/3D-Speaker/tree/main/speakerlab/models/campplus


		warnings.filterwarnings('ignore')

		def extract_fbank_features(waveform, sample_rate=16000):

Conversation

hamzaq2000 commented Sep 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hamzaq2000 Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hamzaq2000 Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bharat0091 commented Sep 22, 2025

Uh oh!

BrandonWeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hamzaq2000 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrandonWeng commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamzaq2000 commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hamzaq2000 Sep 22, 2025 •

edited

Loading

hamzaq2000 Sep 22, 2025 •

edited

Loading

hamzaq2000 commented Sep 23, 2025 •

edited

Loading

BrandonWeng commented Sep 23, 2025 •

edited

Loading