Skip to content

RFC 110: EXIF metadata removal of Wagtail images#110

Open
Stormheg wants to merge 6 commits intowagtail:mainfrom
Stormheg:exif-stripping
Open

RFC 110: EXIF metadata removal of Wagtail images#110
Stormheg wants to merge 6 commits intowagtail:mainfrom
Stormheg:exif-stripping

Conversation

@Stormheg
Copy link
Member

@Stormheg Stormheg commented Sep 3, 2025

@Stormheg Stormheg changed the title RFC XXX: EXIF metadata removal of Wagtail images RFC 110: EXIF metadata removal of Wagtail images Sep 3, 2025
Copy link
Member Author

@Stormheg Stormheg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Briefly discussed during core team meeting today.

Questions raised for me to answer:

  • When using Willow image optimizers, will the metadata stay intact or is it removed entirely?
    • May need to work around that or document it as a limitation
  • Investigate more into why other CMS have not made GPS metadata stripping the default
    • Done, seems to boil down to backwards compatibility and shifting responsibility to users to enable.

Copy link
Member

@thibaudcolas thibaudcolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I’d love to see this in. I think it’d be important for the RFC to do more on configuration as this is probably the trickiest thing to get right here.

In particular I’m a bit worried the current approach would lead to lots of additions of this or that extra tag to strip over time. It might be harder initially but much simpler in the long run if we agreed to "strip everything except essential tags", and then individual sites could opt into keeping the 3-10 extra tags they need. Rather than us all constantly chasing privacy/performance improvements by adding this and that extra tag to the list over time.


## Impact on users of Wagtail

The author is not aware of any Wagtail users who actively rely on EXIF metadata (please make yourself and your use-cases known!), but some may. photography-focused sites are most likely to be affected; for them, defaults can be disabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have worked with multiple museum / art sites where publishing licensing metadata within the files was a must. For example Tate in the UK - see this collection page, excerpt from exiftool on the main image:

Copyright Notice: (c) Spire Healthcare, (c) Noise Abatement Society, (c) Samaritans  / Photo (c) Tate
Object Name: T02140
Headline: Scylla 1938 by Ithell Colquhoun 1906-1988
Caption-Abstract: Scylla 1938 Ithell Colquhoun 1906-1988 Purchased 1977 http://www.tate.org.uk/art/work/T02140

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this Thibaud! I have added Tate to the RFC as a known users and expanded the wording of the 'likely affected parties' to include art galleries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the Tate website seems down at the moment, 'Error 503 first byte timeout - varnish cache server' so I cannot check the image in question myself.

Based on the tag names from the output you provided, this information is instead stored in IPTC tags. See this documentation page: https://exiftool.org/TagNames/IPTC.html

It might be wise for this RFC to call out this difference in metadata formats in the RFC. This RFC explicitly targets EXIF.


## Out of scope / future possibilities

This RFC opens the door to further possibilities, like showing EXIF data in the Wagtail admin interface, or allowing users to edit certain EXIF tags directly from Wagtail. However, these features are considered out of scope for this RFC and can be explored in future RFCs or feature proposals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

- DateTimeOriginal (when the photo was taken)
- (any other tags determined to be non-sensitive and useful - please suggest!)

Because this feature is much more rigorous in stripping EXIF data, it will not be the default configuration and must be explicitly enabled by site administrators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I understand the rationale here. Some of this metadata strikes me as essential (without Orientation some images are displayed upside-down on Windows?). Some not at all (DateTimeOriginal).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly rewritten this when I rearranged the configuration sections.

This section explains what metadata to keep (you seem to be under the impression these tags are removed?) when used with the optional allowlist configuration option.

I've also added language explaining why it is not a good idea to make this the default (future tags that are important for proper display not in this allowlist, breaking forward compatibility)

Thanks to Thibaud Colas for reporting.
Note how WordPress did not change defaults and defers to plugins and
configuration options.
Thanks to Thibaud Colas for the feedback.
@Stormheg
Copy link
Member Author

Stormheg commented Sep 6, 2025

@thibaudcolas

In particular I’m a bit worried the current approach would lead to lots of additions of this or that extra tag to strip over time. It might be harder initially but much simpler in the long run if we agreed to "strip everything except essential tags", and then individual sites could opt into keeping the 3-10 extra tags they need. Rather than us all constantly chasing privacy/performance improvements by adding this and that extra tag to the list over time.

Ultimate file size savings is not what I'm going after with this RFC - the estimated savings of a 1 or 2 KB at most for the majority of images are just too negligible to justify this, since I see unnecessary risk involved with going the 'strip all unless told not to' route.

There may be (future) tags not in Wagtail's allowlist that are important for some reason (proper display, or have legal implications for site developers if not kept) that Wagtail should keep. It would be much simpler for site developers if they wouldn't have to fix Wagtail's overzealous defaults. And it would be simpler for us Wagtail maintainers if we did not have to update Wagtails allowlist of tags because image display is broken otherwise.

I'm not worried about the Wagtail team having to deal with 'this or that extra tag to strip' over time. We have provided a mechanism for people to customize this themselves, which provides an 'escape hatch'. We can accept contributions to improve Wagtail defaults as we see fit.

Hope this convinces you? 😄

@gasman
Copy link
Contributor

gasman commented Sep 6, 2025

It seems to me that the question of whether to keep or strip by default comes down to two things:

  • If we encounter an unfamiliar EXIF tag, is it more likely to be necessary for image processing (= keep it) or to be an unwanted information leak (= strip it)?
  • How serious are the consequences of making a wrong decision?

It's hard to imagine what use-cases there might be for EXIF tags that have a meaningful effect on the image (things like orientation and colour profile) that haven't been established already. If any do come up, then it's reasonable to expect that there will be a lot of existing software that doesn't support it, so Wagtail won't be seen particularly "at fault" - and it's not guaranteed that keeping the tag from the source image intact will be enough to support it.

Conversely, it's not hard to imagine phone or camera manufacturers coming up with new schemes for metadata representing the person taking the photo, or location, or the type of camera, so I think it's far more likely that any unknown EXIF tags are of the unwanted kind.

As for the consequences of getting it wrong - in the worst case, stripping a tag that should have been kept means that the final image will display incorrectly in some way. As mentioned, this broken behaviour is likely to be shared by plenty of other software, and be fairly minor (no phone/camera manufacturer would want to introduce a new feature that breaks a whole load of existing software). Most likely, it will be something that we fix in a patch release (or tell users to update in their configuration) and move on with no drama.

However, keeping a tag that identifies a user / location that should have been private could, in extreme cases, be a matter of life or death. (Hopefully in those cases Wagtail isn't their only guard against that, but as per the "swiss cheese" model of security it's better to have that layer in place than not.)

For those reasons, I would lean towards stripping unknown tags by default.

@thibaudcolas
Copy link
Member

Yep, same reasoning to lean towards stripping more by default. In the privacy scenario we should err on the side of stripping more than needed. For the compatibility scenario, I assume it will be simple to review those requests as they come in. And for the legal scenario I expect the projects with those requirements will be able to either decide to change the config for their project or make a feature request.

I suppose we also could review what similar software does for this – even if the CMS options you reviewed don’t have this turned on by default @Stormheg, presumably some of the implementations have had to make similar choices and possibly receive the kind of feedback we’re discussing here over time?

@nimasmi
Copy link

nimasmi commented Sep 16, 2025

Default behaviour

I came here to recommend that the safer default choice of processor would be retain_only_common_exif, or one that behaves in that way – an allowlist rather than a blocklist. I'm pleased to see that this has been discussed already.

The argument in favour of strip_common_sensitive_exif (i.e. a blocklist) as default

The likelihood of a newly-invented EXIF tag being necessary for correct rendering is unknown.

However, for it to be significant, and need fixing, its absence would need to break rendering in lots of contexts.

An allowlist-based approach for EXIF stripping would need a code change for compatibility with this new tag, whereas a blocklist-based approach would be up-to-date immediately. However, if such a new tag were introduced, adoption would be slow enough, and loud enough, that the Wagtail project would have time to adapt.

The argument in favour of retain_only_common_exif (i.e. an allowlist) as default

The likelihood of a newly-invented EXIF tag being added which contains some PII is unknown.

However, for it to be significant (either publicly embarrassing for Wagtail, or risking somebody’s safety) only a small number of manufacturers need to start including it, enough for one person's PII to be made public. Given we don't know the full gamut of EXIF tags added by all manufacturers, it's possible that such a camera already exists that is adding this tag to files.

An allowlist-based approach would block this by default, whereas a blocklist-based approach would need updating every time a new vendor-specific or non-standard tag is discovered which is a PII-violation. I think it's less safe.


Event Likelihood Saturation needed to be significant Harm Allowlist must change Blocklist must change
New EXIF tag needed for correct rendering ? High Low Yes No
New EXIF tag is PII violation ? Low High No Yes

A parallel in Django is the recommendation to use fields rather than exclude when creating a ModelForm: https://docs.djangoproject.com/en/5.2/topics/forms/modelforms/#selecting-the-fields-to-use. It's a failsafe, security-by-default, approach, even if it adds a bit of friction for developers.


This will be documented in release notes as a breaking change affecting users that rely on EXIF metadata. The default behaviour (stripping GPS and camera info) will be noted on guide.wagtail.org to make editors aware of the existence of EXIF metadata and how Wagtail handles it.

Existing images are unaffected until renditions are regenerated (via `manage.py wagtail wagtail_update_image_renditions`). Earlier behaviour can be restored by disabling EXIF processing in settings.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You discuss this being a breaking change. This seems reasonable, given enough warning and deprecation. I recommend that the release notes, and documentation give clear instructions on how to retain the current behaviour. I presume this would be to add the setting:

WAGTAILIMAGES_EXIF_PROCESSORS = []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants