Add Amazon Comprehend Document Classifier #40287
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding Amazon Comprehend document classifier. Doc, Operator, Sensor, Trigger, Waiter, Unit Test, System Test.
Manually tested in Breeze with
wait_for_completion=False with a Sensor
deferrable=True.
wait_for_completion=True
For the system test, I used two documents from AWS samples and created multiple copies. Since the classifier requires a minimum of 10 documents for training for each label. I've observed that it takes a maximum of 10 to 15 minutes to train the classifier, given the limited number of labels and documents. This is the minimum setup I was able to get running, so it can be executed in the daily system test suite.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.