Skip to content

Thousands of Clusters that Include Many Different People as Same Person #475

@bsaggy

Description

@bsaggy

Describe the bug
I'm running Recognize against ~35k images. It's creating way too many clusters, currently above 7k and growing.

MariaDB [nextcloud]> select count(*) from oc_recognize_face_clusters;
+----------+
| count(*) |
+----------+
|     7392 |
+----------+
1 row in set (0.005 sec)

The craziest part is that I'll click on a cluster in the Memories App with Mark Person in Preview enabled, and see multiple different people with the green bound box around them across all the pictures in that same cluster.

For example, one cluster has myself, my wife, my grandmother, my mother in law, my sister in law, my brother in law, a friend of a different skin color - all as the person of interest in this cluster. Another cluster has my 2 month old son, myself, my wife, my sister in law, my grandfather, etc. all as the person of interest in the cluster.

While I understand there is a margin for error in facial recognition, I have to believe something is wrong here. With over 7,000 clusters and every cluster containing all kinds of people of interest as indicated by the green bounding box, this is pretty much useless to me at this point. ~92k queued files still yet to go.

To Reproduce
Steps to reproduce the behavior:
I can't say that this is necessarily "reproducible", but this is the evolution of how I have used Recognize thus far.

  1. Installed Recognize v2.x on Nextcloud v24.x and let it do its thing.
  2. Upgraded to NextCloud 25.0.0 and then 25.0.1 while also upgrading to Recognize v3.1.2.
  3. At some point, I did a Reset Tags for Classified Files and a Reset Faces for Classified Files since I didn't seem to have made that much progress yet coming from v2.x and wanted to start fresh on v3.x. I also issued a recrawl. Otherwise I've been letting it just run for about a week now.

Expected behavior
Facial Recognition to work more accurately, and not identity my whole family including friends as the same person. Create less clusters but with more accuracy.

Recognize (please complete the following information):

  • JS-only mode: WASM Mode
  • Enabled modes: face recognition only

Server (please complete the following information):
System Configuration

Ubuntu 20.04 VM
32 GB RAM (I increased memory & vCPU in hopes of speeding things up to no avail)
8 vCPU
NC v25.0.1
Recognize v3.1.2

Recognize Configuration:

Face Recognition is enabled
6 Cores allowed for use
WASM mode is enabled

Additional context
If there's anything else I can check or do, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions