In #193, a set of traffic_* streams were added to the tap, with a customised metadata property, which deselects them if no catalog was passed as input to the tap.
Unfortunately, when running the tap with poetry run tap-github --config /tmp/tmpmt8fq0pn/tmp7896kkwh.json --test=schema
with this config (which does not seem to matter much, the main thing being the test=schema cli option):
{"metrics_log_level": "error", "auth_token": "<mytoken>", "additional_auth_tokens": [], "rate_limit_buffer": 1000, "start_date": "2021-05-24 13:44:42.693145", "skip_parent_streams": true, "repositories": []}
the tap issues invalid SCHEMA messages like:
{
"type": "SCHEMA",
"stream": "traffic_pageviews",
"schema": {"properties": {}, "type": "object"},
"key_properties": ["repo", "org", "timestamp"]
}
Specifically, properties is empty, so downstream targets cannot lookup the key_properties.
The line that causes the problem is here https://github.com/MeltanoLabs/tap-github/pull/193/files#diff-06dc9c6115cbc069ce355913de0c101fedf6956d6f6b4873c5112434596934d3R2260
I have not dug into the details yet, but it looks like the schema production does not correctly take the selection metadata into account.
Pinging @edgarrmondragon as you suggested that code, and you might have a fix for it :)
I also think the sdk should not allow a tap to produce invalid records like this. Is there a way to test against it without causing too much overhead? Obviously, we could validate each record before sending it out, but that might be a bit heavy ;)
Interestingly there's a test for this _test_replication_keys_in_schema but it does not validate against the schema messages that are sent.
In #193, a set of
traffic_*streams were added to the tap, with a customisedmetadataproperty, which deselects them if no catalog was passed as input to the tap.Unfortunately, when running the tap with
poetry run tap-github --config /tmp/tmpmt8fq0pn/tmp7896kkwh.json --test=schemawith this config (which does not seem to matter much, the main thing being the
test=schemacli option):{"metrics_log_level": "error", "auth_token": "<mytoken>", "additional_auth_tokens": [], "rate_limit_buffer": 1000, "start_date": "2021-05-24 13:44:42.693145", "skip_parent_streams": true, "repositories": []}the tap issues invalid
SCHEMAmessages like:{ "type": "SCHEMA", "stream": "traffic_pageviews", "schema": {"properties": {}, "type": "object"}, "key_properties": ["repo", "org", "timestamp"] }Specifically,
propertiesis empty, so downstream targets cannot lookup thekey_properties.The line that causes the problem is here https://github.com/MeltanoLabs/tap-github/pull/193/files#diff-06dc9c6115cbc069ce355913de0c101fedf6956d6f6b4873c5112434596934d3R2260
I have not dug into the details yet, but it looks like the schema production does not correctly take the selection metadata into account.
Pinging @edgarrmondragon as you suggested that code, and you might have a fix for it :)
I also think the sdk should not allow a tap to produce invalid records like this. Is there a way to test against it without causing too much overhead? Obviously, we could validate each record before sending it out, but that might be a bit heavy ;)
Interestingly there's a test for this
_test_replication_keys_in_schemabut it does not validate against the schema messages that are sent.