Skip to content

Conversation

@seddonm1
Copy link
Contributor

DataFusion aims to support the PostgreSQL compatibility. To achieve compatibility parts of the DataFusion code base may have reproduced code and documentation from the PostgreSQL project and needs the license to reflect this.

@github-actions
Copy link

@alamb alamb requested review from kou and pitrou February 17, 2021 21:42
@kou
Copy link
Member

kou commented Feb 18, 2021

Could you show existing DataFusion codes and documentations that are derived from PostgreSQL?
Do we want to re-licensing them to ASF copyrighted Apache 2.0 license? Or do we want to reuse them with the original PostgreSQL license?

In C++, we reuse existing codes with the original license. For example, https://github.com/apache/arrow/blob/master/cpp/src/arrow/status.h uses the original BSD-style license.

@seddonm1
Copy link
Contributor Author

Thanks @kou

Here is the original discussion that details what we are referencing: #9243 (comment)

Do you have any feedback given this context?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me, but I am not a lawyer nor an expert in such matters so I think we should wait for some other comments before merging this in

@kou
Copy link
Member

kou commented Feb 19, 2021

Thanks.

I think that we should mention which codes and documentations are derived from PostgreSQL. For example, the example documentation showed in #9243 (comment) should be mentioned.

For example, in C++, https://github.com/apache/arrow/blob/master/cpp/src/arrow/status.h does it in file header.

Can we do it in DataFusion?

@seddonm1
Copy link
Contributor Author

Thanks.

I think that we should mention which codes and documentations are derived from PostgreSQL. For example, the example documentation showed in #9243 (comment) should be mentioned.

For example, in C++, https://github.com/apache/arrow/blob/master/cpp/src/arrow/status.h does it in file header.

Can we do it in DataFusion?

My concern is that if we specific file based references for this license it will be very easy to miss when more work is done in the future which may make the project non-compliant. This is why I used general language.

@kou
Copy link
Member

kou commented Feb 19, 2021

Umm. Generally, I think that we should behave as careful as possible when we reuse codes and documentations of other projects. For example, we should mention that "this code/documentation is derived from PostgreSQL" in each pull request when we add code/documentation derived from PostgreSQL. My concern is that general language approach may lack the behavior. (Note that we don't have a consensus whether the behavior is expected or not yet.)

FYI: I'm not particular about file based approach. We can use suitable approach. For example, there is comment based approach in C++:

# Add Boost dependencies (code adapted from Apache Kudu)

How about discuss this on arrow-devel@ ?

@seddonm1
Copy link
Contributor Author

Thanks @kou .

How about we merge this plus we add words like this to the relevant code:

// Some of these functions reference the Postgres documentation

@kou
Copy link
Member

kou commented Feb 22, 2021

Generally, the general language plus ... approach looks good to me. But the added words concern me.
The approach mentions that string_expressions.rs has PostgreSQL licensed codes/documentations but doesn't mention that which codes/documentations use PostgreSQL license explicitly, right?
(string_expressions.rs has ASF copyrighted Apache License 2.0 codes/documentations and The PostgreSQL Global Development Group copyrighted PostgreSQL license codes/documentations. But we don't mention which codes/documentations are licensed under PostgreSQL license.)

I'm not sure that it satisfies "provided that the above copyright notice" in PostgreSQL license.


I found a guideline when we use third-party works:

https://www.apache.org/legal/src-headers.html#3party

Treatment of Third-Party Works
0. The term "third-party work" refers to a work not submitted directly to the ASF by the copyright owner or owner's agent. This includes parts of a work submitted directly to the ASF for which the submitter is not the copyright owner or owner's agent.

  1. Do not modify or remove any copyright notices or licenses within third-party works.
  2. Do ensure that every third-party work includes its associated license, even if that requires adding a copy of the license from the third-party download site into the distribution.
  3. Do not add the standard Apache License header to the top of third-party source files.
  4. Minor modifications/additions to third-party source files should typically be licensed under the same terms as the rest of the rest of the third-party source for convenience.
  5. Major modifications/additions to third-party should be dealt with on a case-by-case basis by the PMC

If we don't mention PostgreSQL derived codes/documentations explicitly, I think that we and users can't associate PostgreSQL derived codes/documentations with The PostgreSQL Global Development Group copyright. I'm not sure that it satisfies 1.

If we choose 4. approach, can we use arrow/rust/datafusion/src/physical_plan/string_expressions.rs only for our original implementations that are licensed under Apache License 2.0 and arrow/rust/datafusion/src/physical_plan/string_postgresql_expressions.rs (os something) for PostgreSQL derived works that are licensed under PostgreSQL license?

If we choose 5. approach, we should discuss this on dev@.


Anyway, I think that we should discuss this on dev@. If we need, ASF will help us with legal concerns.

@seddonm1
Copy link
Contributor Author

@kou
Copy link
Member

kou commented Feb 23, 2021

Thanks!

alamb pushed a commit that referenced this pull request Feb 24, 2021
This PR is a child of #9243

It does a few things that are hard to separate:

- fixes the behavior of `concat` and `trim` functions to be in line with the Postgres implementations
- restructures some of the code base (mainly sorting and adding tests) to facilitate easier testing and implementation of the remainder of #9243

@alamb @jorgecarleitao
please review but merging will be dependent on #9507

Closes #9551 from seddonm1/concat

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@kszucs
Copy link
Member

kszucs commented Jun 29, 2021

The datafusion source code have been pulled out to its own repository: https://github.com/apache/arrow-datafusion
If it is still valid than this PR should be reopened there.

@kszucs kszucs closed this Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants