Skip to content

Open-API: Add Python code as an example#7751

Merged
Fokko merged 9 commits intoapache:masterfrom
Fokko:fd-add-python-code
Jun 6, 2023
Merged

Open-API: Add Python code as an example#7751
Fokko merged 9 commits intoapache:masterfrom
Fokko:fd-add-python-code

Conversation

@Fokko
Copy link
Copy Markdown
Contributor

@Fokko Fokko commented Jun 1, 2023

This make it easier (at least for me), to see the impact on the code that's being generated. Polymorphism and Inheritance is not always clear to me from the spec, but it is from the Python code.

This make it easier (at least for me), to see the impact on the
code that's being generated. Polymorphism and Inheritance is not
always clear to me from the spec, but it is from the Python code.
Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it's a good idea. I left a minor comment to make a sentence slightly better to understand

Comment thread open-api/README.md Outdated
Comment thread open-api/README.md Outdated
@Fokko Fokko force-pushed the fd-add-python-code branch from b1b2395 to c0e1404 Compare June 1, 2023 13:20
Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the license check is happy

@Fokko Fokko force-pushed the fd-add-python-code branch from c0e1404 to fa69dcb Compare June 2, 2023 13:45
Copy link
Copy Markdown
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an example helps to visualize the change is great! but maybe consider move such into a different folder instead of co-live with yaml?

something like?

open-api
├── python-example
│   ├── Makefile
│   ├── README.md
│   ├── header.txt
│   ├── requirements.txt
│   └── rest-catalog-open-api.py
└── rest-catalog-open-api.yaml

@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 3, 2023

@dramaticlly Thanks for the suggestion. People won't directly use the Python code, nor I would expect a Java/Javascript version of it. It is more for the developer to see the impact on the code when making changes to the yaml. I think having everything in the same folder makes things simpler (otherwise I'd be inclined to add another README.md to the open-api/ dir). I think adding the directory doesn't add much value other than adding more complexity to the directory structure. WDYT?

@dramaticlly
Copy link
Copy Markdown
Contributor

@dramaticlly Thanks for the suggestion. People won't directly use the Python code, nor I would expect a Java/Javascript version of it. It is more for the developer to see the impact on the code when making changes to the yaml. I think having everything in the same folder makes things simpler (otherwise I'd be inclined to add another README.md to the open-api/ dir). I think adding the directory doesn't add much value other than adding more complexity to the directory structure. WDYT?

I see, thanks for the explanation. For those generated code, do we want every change in spec to kick off the python generation and include such in PR or do you plan to add some sort of git action to apply it automatically? Anyway, change LGTM

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Jun 5, 2023

Why add the Python code? We could just have a README that explains how to generate it. I think it's a little confusing to have the code here, since it isn't used by the pyiceberg implementation.

@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 5, 2023

@rdblue The idea is that you'll have the changes in the open-api spec and also the changes in the Python code alongside. Small changes can have a significant impact on the code itself.

Let's take #7710 (comment)

    ReportMetricsRequest:
      anyOf:
        - $ref: '#/components/schemas/ScanReport'
        - $ref: '#/components/schemas/CommitReport'
      required:
        - report-type
      properties:
        report-type:
          type: string

Above is the the current code on master, and that generates:

class ReportMetricsRequest(ScanReport):
    report_type: str = Field(..., alias='report-type')


class ReportMetricsRequest1(CommitReport):
    report_type: str = Field(..., alias='report-type')


class ReportMetricsRequest2(BaseModel):
    __root__: Union[ReportMetricsRequest, ReportMetricsRequest1]

This looks weird, and it is. ReportMetricsRequest is anyOf (validates the value against any (one or more) of the subschemas) of both ScanReport and CommitReport. It is actually the other way around. ScanReport and CommitReport is allOf ReportMetricsRequest. And based on the report-type it is oneOf the subtypes based on the report-type:

    ReportMetricsRequest:
      type: object
      discriminator:
        propertyName: report-type
        mapping:
          commit-report: '#/components/schemas/CommitReport'
          scan-report: '#/components/schemas/ScanReport'
      required:
        - report-type
      properties:
        report-type:
          type: string

The point that I'm trying to make here is that is very easy to get confused in the allOf, anyOf, and oneOf. And for me, it really helps to see the code alongside the PR. The idea is that we keep the Python code up to sync, and when you update the open-api code, you'll see the impact on the code. If you do it the other way around, you'll end up with:

class ReportMetricsRequest(BaseModel):
    report_type: str = Field(..., alias='report-type')


class CommitReport(ReportMetricsRequest):
    report_type: Literal['commit-report'] = Field(..., alias='report-type')
    table_name: str = Field(..., alias='table-name')
    snapshot_id: int = Field(..., alias='snapshot-id')
    sequence_number: int = Field(..., alias='sequence-number')
    operation: str
    metrics: Metrics
    metadata: Optional[Dict[str, str]] = None


class ScanReport(ReportMetricsRequest):
    report_type: Literal['scan-report'] = Field(..., alias='report-type')
    table_name: str = Field(..., alias='table-name')
    snapshot_id: int = Field(..., alias='snapshot-id')
    filter: Expression
    schema_id: int = Field(..., alias='schema-id')
    projected_field_ids: List[int] = Field(..., alias='projected-field-ids')
    projected_field_names: List[str] = Field(..., alias='projected-field-names')
    metrics: Metrics
    metadata: Optional[Dict[str, str]] = None

If we want to generate clients in other languages as well, it helps to keep the open-API as close to the supposed implementation as possible.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Jun 5, 2023

Makes sense to me! Can we also add a README explaining why there is code here? It's not like we actually think people would use this as a library, right?

Copy link
Copy Markdown
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think @Fokko's explanation of adding the Python code makes sense. I'm +1

@github-actions github-actions Bot added the INFRA label Jun 6, 2023
@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 6, 2023

@dramaticlly Thanks. I would suggest doing the update by hand, for now, we can always automate stuff later on. I don't think it is a big hurdle since the open-api spec doesn't get updated very often.

@Fokko Fokko merged commit 5464826 into apache:master Jun 6, 2023
@Fokko Fokko deleted the fd-add-python-code branch June 6, 2023 09:44
@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Jun 6, 2023

Thanks @dramaticlly, @nastra and @rdblue for the review, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants