Skip to content

Use deterministic time for generated sdist files#142

Merged
kasteph merged 1 commit into
python-poetry:masterfrom
abn:deterministic-sdists
Apr 19, 2021
Merged

Use deterministic time for generated sdist files#142
kasteph merged 1 commit into
python-poetry:masterfrom
abn:deterministic-sdists

Conversation

@abn
Copy link
Copy Markdown
Member

@abn abn commented Mar 22, 2021

When generating setup.py and PKG-INFO files, ensure that generated
files use a deterministic timestamp to enhance reproducibility of
source distributions.

Resolves: python-poetry/poetry#1102

@abn abn requested a review from a team March 22, 2021 00:56
When generating setup.py and PKG-INFO files, ensure that generated
files use a deterministic timestamp to enhance reproducibility of
source distributions.
@abn abn force-pushed the deterministic-sdists branch from 7bca967 to ee3989e Compare March 22, 2021 01:05
@Natureshadow
Copy link
Copy Markdown

Natureshadow commented Mar 22, 2021

I propose that this code should use SOURCE_DATE_EPOCH from environment if set.

cf. https://reproducible-builds.org/docs/source-date-epoch/

(Maybe setting this should even be required, and the timestamp left unchanged if the variable is not set.)

Copy link
Copy Markdown

@Natureshadow Natureshadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested earlier, I would propose to use SOURCE_DATE_EPOCH.

Defaulting to 0 if it is not set might be ok, I am not sure of that. I somehow tend to leaving the current timestamp as default.

Comment on lines +99 to +100
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
Copy link
Copy Markdown

@Natureshadow Natureshadow Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
tar_info.mtime = int(os.environ.get("SOURCE_DATE_EPOCH", 0))
tar_info = self.clean_tarinfo(tar_info)

Comment on lines +107 to +108
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
Copy link
Copy Markdown

@Natureshadow Natureshadow Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tar_info.mtime = 0
tar_info = self.clean_tarinfo(tar_info)
tar_info.mtime = int(os.environ.get("SOURCE_DATE_EPOCH", 0))
tar_info = self.clean_tarinfo(tar_info)

@Natureshadow
Copy link
Copy Markdown

Natureshadow commented Mar 22, 2021

Oh, quoting from https://reproducible-builds.org/specs/source-date-epoch/ , it seems that honouring this variable is a MUST to comply with the reproducible builds standard. I was not aware of this. As such, I make my proposal a bit harder ;).

It also talks about "timestamp clamping", which says that all timestamps that do not use the value from this variable must be earlier, so 0 looks good. It does, however, look wrong to me to use such a pseudo-random magic value. I would therefore stick with time.time() if the source epoch is not set.

@Natureshadow
Copy link
Copy Markdown

Oh, quoting from https://reproducible-builds.org/specs/source-date-epoch/ , it seems that honouring this variable is a MUST to comply with the reproducible builds standard. I was not aware of this. As such, I make my proposal a bit harder ;).

Had a chat with the people form reproducible-builds.org.

It seems there were a few misunderstnadings on my side:

  • SOURCE_DATE_EPOCH is mainly for binary builds, to align it with the source timestamp
  • 0 is perfectly fine for reproducibility
  • Using a timestamp that somehow reflects the last time the source was modified would be good. One idea is looking at git's HEAD if available, another would be to copy the mtime of pyproject.toml.

So if the maintainers want to keep it simple, just sticking with 0 seems fine.

@Natureshadow
Copy link
Copy Markdown

@stephsamson It seems you reviewed this PR. Did you take my comments into account?

@kasteph
Copy link
Copy Markdown
Member

kasteph commented Apr 17, 2021

@Natureshadow for the purposes of this PR, I think that keeping tar_info.mtime to 0 is fine as you mentioned here yourself or did I misunderstand? Also, SOURCE_DATE_EPOCH as you said is for binary builds so I find that that is out of scope for sdist and subsequently this PR and python-poetry#1102.

@Natureshadow
Copy link
Copy Markdown

@stephsamson Yes, I did say that using 0 is fine as well. I just wanted to know your opinion, as you left my change proposals open.

@kasteph
Copy link
Copy Markdown
Member

kasteph commented Apr 19, 2021

@Natureshadow I think that your suggestion, specifically for checking mtime of pyproject.toml to use as a timestamp for binary distributions, is a good candidate for a separate issue/refactoring.

@kasteph kasteph merged commit f478ec3 into python-poetry:master Apr 19, 2021
@abn abn deleted the deterministic-sdists branch April 19, 2021 12:16
@sdispater sdispater mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reproducible sdist builds

3 participants