Skip to content

Binary license management system#7998

Merged
jihoonson merged 47 commits intoapache:masterfrom
jihoonson:auto-bin-license
Jul 8, 2019
Merged

Binary license management system#7998
jihoonson merged 47 commits intoapache:masterfrom
jihoonson:auto-bin-license

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Jun 29, 2019

Motivation

We are currently maintaining LICENSE.BINARY file manually. The way we do now is

  1. Generating dependency reports using maven and npm
  2. Compare the licenses in LICENSE.BINARY against generated reports
  3. Fix LICENSE.BINARY if something is wrong

This is a huge burden for the release manager since we have tons of binary dependency. We should automate this check.

Description

This PR adds a binary license management system. All binary licenses are maintained in license.yaml file. Here is a snippet of the file.

name: modified portions of the Guava library
license_category: source
license_name: Apache License version 2.0
module: java-core
copyright: The Guava Authors (https://github.com/google/guava)
source_paths:
  - Closer class: core/src/main/java/org/apache/druid/java/util/common/io/Closer.java
  - Splitter.splitToList() method: core/src/main/java/org/apache/druid/java/util/common/parsers/DelimitedParser.java
  - DirectExecutorService class: core/src/main/java/org/apache/druid/java/util/common/concurrent/DirectExecutorService.java

---

name: Checker Qual
license_category: binary
module: java-core
license_name: MIT License
version: 2.5.7
copyright: the Checker Framework developers
license_file_path: licenses/bin/checker-qual.MIT
libraries:
  - org.checkerframework: checker-qual

Note that skip_dependency_report_check is set to true for some licenses. For FindBugs JSR305, it's set to true since maven dependency plugin reports a wrong license (LGPL). Note that FindBugs is licensed under LGPL but JSR305 is licensed under BSD-3 license. This script also skips dependency report check for "Java Concurrency In Practice" Book Annotations since it points to a book and doesn't have a valid library version.

docs/_bin/generate-license.py reads this file, checks the registered licenses with maven dependency report, and generates the contents of LICENSE.BINARY file. You can run this program as below:

$ docs/_bin/generate-license-dependency-reports.py /path/to/druid/source/code /path/to/root/of/license-reports/
$ docs/_bin/generate-license.py licenses/APACHE2 license.yaml /path/to/root/of/license-reports/ LICENSE.BINARY

The generated LICENSE.BINARY file is a little bit different from the current one, mostly because of the difficulty of auto generation.

I will adjust distribution/pom.xml to generate LICENSE.BINARY when building binary distribution in a follow-up PR. I will also set up Travis to run license check automatically.
I have adjusted distribution/pom.xml to generate LICENSE.BINARY when building binary distribution. This is currently being checked automatically in Travis.

@jihoonson jihoonson added the Apache Items related to being a part of the ASF label Jun 29, 2019
@clintropolis
Copy link
Copy Markdown
Member

Heh, travis rat check failing for license.yaml https://travis-ci.org/apache/incubator-druid/jobs/552284036#L644

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice.

For additional future improvement stuff, I think we should also probably add something to contributing guide docs and to the PR template about maintaining license.yaml, mentioning that if you add a new dependency or change a version of a dependency to update it accordingly. There is probably room for automating this part as well to some extent.

Comment thread docs/_bin/generate-license.py Outdated
from html.parser import HTMLParser


apache_license_v2 = "\n\
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big string, should we just keep this in an external file somewhere (or do we already have it somewhere maybe) and consider just loading it from that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted as licenses/APACHE2.

Comment thread docs/_bin/generate-license.py Outdated


class DependencyReportParser(HTMLParser):
# TODO: Change to comments
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this was resolved. Removed now.

Comment thread docs/_bin/generate-license.py Outdated
print(" * {}:{}".format(group_id, artifact_id))


def check_licenses(license_yaml, dependencie_reports_root):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependencie_reports_root -> dependencies_reports_root

Copy link
Copy Markdown
Contributor Author

@jihoonson jihoonson Jun 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Comment thread docs/_bin/generate-license.py Outdated

print_error("Found {} reported licenses".format(len(reported_dep_to_licenses)))

# Compare licenses in registry and those in dependency reports
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a few additional comments about what is going on here to break up all these loops? makes it kind of rough to follow along with what is going on here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments here and other places. Please let me know if it's still unclear.

@clintropolis
Copy link
Copy Markdown
Member

The generated LICENSE.BINARY file is a little bit different from the current one, mostly because of the difficulty of auto generation.

To clarify, are the contents of the generated LICENSE.BINARY currently the same, just not the same format?

@jihoonson
Copy link
Copy Markdown
Contributor Author

To clarify, are the contents of the generated LICENSE.BINARY currently the same, just not the same format?

Yes, the contents should be same. Only the format is slightly different.

@jihoonson jihoonson added the WIP label Jun 30, 2019
@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented Jul 1, 2019

Great work @jihoonson. Thanks for picking this up.

I fully agree with you; if it isn't checked by the CI, then it is hard to maintain. Maybe it is a good idea to add the lines on how to run the checker to the CONTRIBUTING.md file for further reference.

One final note on the findbugs dependency. I'm looking for replacing this one with spotbugs: https://mvnrepository.com/artifact/com.github.spotbugs/spotbugs-annotations/3.1.12
But it looks like this one has the same issue with the wrong license. They are working on removing the JSR305 component: spotbugs/spotbugs#421 but this will be tricky as we might expect.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me after you get CI fixed

@jihoonson
Copy link
Copy Markdown
Contributor Author

@Fokko thanks for taking a look!

Maybe it is a good idea to add the lines on how to run the checker to the CONTRIBUTING.md file for further reference.

It sounds good. I think it would be nicer if we add that to our PullRequest template as well.

One final note on the findbugs dependency. I'm looking for replacing this one with spotbugs: https://mvnrepository.com/artifact/com.github.spotbugs/spotbugs-annotations/3.1.12
But it looks like this one has the same issue with the wrong license. They are working on removing the JSR305 component: spotbugs/spotbugs#421 but this will be tricky as we might expect.

What do you mean by "wrong license"? Spotbugs is also licensed under LGPL, so we shouldn't include it in our binary distribution unless the part you want to use it is particularly licensed under a compatible license.

@jihoonson jihoonson removed the WIP label Jul 6, 2019
@jihoonson
Copy link
Copy Markdown
Contributor Author

Ok, now Travis looks stable.

@jihoonson jihoonson merged commit 12f1267 into apache:master Jul 8, 2019
@jihoonson
Copy link
Copy Markdown
Contributor Author

Hmm, looks like Travis is currently getting stuck at generate-dependency-reports phase. Looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apache Items related to being a part of the ASF Area - Dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants