Skip to content
This repository was archived by the owner on Feb 23, 2022. It is now read-only.

Specify key csv#351

Merged
jjnesbitt merged 17 commits intomasterfrom
specify_key_csv
Mar 26, 2020
Merged

Specify key csv#351
jjnesbitt merged 17 commits intomasterfrom
specify_key_csv

Conversation

@jjnesbitt
Copy link
Copy Markdown
Member

Backend support for multinet-app/multinet-client#15.


One thing to note, after adding this support, I think we should split up the validate_csv function, as I'm no longer satisfied with the way it operates:

def validate_csv(rows: Sequence[MutableMapping], key_field: str = "_key") -> None:
"""Perform any necessary CSV validation, and return appropriate errors."""
data_errors: List[ValidationFailure] = []
fieldnames = rows[0].keys()
if key_field != "_key" and key_field not in fieldnames:
data_errors.append(KeyFieldDoesNotExist(key=key_field))
raise ValidationFailed(data_errors)
if "_key" in fieldnames and key_field != "_key":
data_errors.append(KeyFieldAlreadyExists(key=key_field))
raise ValidationFailed(data_errors)
if key_field in fieldnames:
# Node Table, check for key uniqueness
keys = [row[key_field] for row in rows]
unique_keys: Set[str] = set()
for key in keys:
if key in unique_keys:
data_errors.append(DuplicateKey(key=key))
else:
unique_keys.add(key)
elif "_from" in fieldnames and "_to" in fieldnames:
# Edge Table, check that each cell has the correct format
valid_cell = re.compile("[^/]+/[^/]+")
for i, row in enumerate(rows):
fields: List[str] = []
if not valid_cell.match(row["_from"]):
fields.append("_from")
if not valid_cell.match(row["_to"]):
fields.append("_to")
if fields:
# i+2 -> +1 for index offset, +1 due to header row
data_errors.append(InvalidRow(fields=fields, row=i + 2))
else:
# Unsupported Table, error since we don't know what's coming in
data_errors.append(UnsupportedTable())
if len(data_errors) > 0:
raise ValidationFailed(data_errors)

The current way it works is messy, and does the testing for both node and edge tables, but really they're mutually exclusive test cases, and have no real reason to be in the same function. Splitting them up to do their respective checking would eliminate the need for confusing conditionals against the key argument. I would have done this here, but I thought it'd be too big of a change, and thought it'd be better to do that in a separate PR.


Anyway, please lmk what you guys think.

@jjnesbitt jjnesbitt requested review from JackWilb and waxlamp March 19, 2020 21:45
@jjnesbitt
Copy link
Copy Markdown
Member Author

Also I reworked the test_csv_uploader file a bit, as before all the tests were in one function, and it would have grown uncomfortably large with these additions. Hopefully that wasn't too off topic for this PR.

Copy link
Copy Markdown
Member

@JackWilb JackWilb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and works as intended. I'm not sure we should force users to use _key if it exists though. I think we should suggest it, if it exists, but I don't think it should throw a validation error if another key variable is specified

Comment thread multinet/uploaders/swagger/csv.yaml
@jjnesbitt
Copy link
Copy Markdown
Member Author

I'm not sure we should force users to use _key if it exists though. I think we should suggest it, if it exists, but I don't think it should throw a validation error if another key variable is specified

I see your point, but if we allow that then we'll be overwriting the _key field. This isn't a problem necessarily, if the user is fully aware of what they're doing, but I'm uneasy about it. I'd like to hear @waxlamp weigh in.

@JackWilb
Copy link
Copy Markdown
Member

I'm not sure we should force users to use _key if it exists though. I think we should suggest it, if it exists, but I don't think it should throw a validation error if another key variable is specified

I see your point, but if we allow that then we'll be overwriting the _key field. This isn't a problem necessarily, if the user is fully aware of what they're doing, but I'm uneasy about it. I'd like to hear @waxlamp weigh in.

Makes sense. Maybe a suitable solution would be to just rename the field the original _key column to something else and write the new column? Then nothing is overwritten and the users intentions are executed. I think Roni will have useful insight, though.

Copy link
Copy Markdown
Contributor

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Withholding approval until we resolve the discussion about what to do in case of existing _key.

Comment thread multinet/uploaders/csv.py Outdated
Comment thread multinet/uploaders/swagger/csv.yaml
Comment thread multinet/uploaders/swagger/csv.yaml
@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Mar 20, 2020

I'm not sure we should force users to use _key if it exists though. I think we should suggest it, if it exists, but I don't think it should throw a validation error if another key variable is specified

I see your point, but if we allow that then we'll be overwriting the _key field. This isn't a problem necessarily, if the user is fully aware of what they're doing, but I'm uneasy about it. I'd like to hear @waxlamp weigh in.

Makes sense. Maybe a suitable solution would be to just rename the field the original _key column to something else and write the new column? Then nothing is overwritten and the users intentions are executed. I think Roni will have useful insight, though.

The best thing to do is add one more toggle in the UI, a checkbox that says something like "Overwrite _key field if it exists". If that checkbox is unchecked, and the uploaded file contains a _key field, then it should fail validation with something like "Existing _key field would be overwritten". On the serverside, this ought to be fairly simple to add--it requires one more bool argument to the upload and validation functions, and the validation function would wrap the new checks for existing _key field in an if-block conditioned on that argument.

What do you think, @AlmightyYakob?

@jjnesbitt
Copy link
Copy Markdown
Member Author

The best thing to do is add one more toggle in the UI, a checkbox that says something like "Overwrite _key field if it exists". If that checkbox is unchecked, and the uploaded file contains a _key field, then it should fail validation with something like "Existing _key field would be overwritten". On the serverside, this ought to be fairly simple to add--it requires one more bool argument to the upload and validation functions, and the validation function would wrap the new checks for existing _key field in an if-block conditioned on that argument.

What do you think, @AlmightyYakob?

That sounds like a good idea. Specifically, I think the default should be the unchecked state, to be safe.

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Mar 20, 2020

The best thing to do is add one more toggle in the UI, a checkbox that says something like "Overwrite _key field if it exists". If that checkbox is unchecked, and the uploaded file contains a _key field, then it should fail validation with something like "Existing _key field would be overwritten". On the serverside, this ought to be fairly simple to add--it requires one more bool argument to the upload and validation functions, and the validation function would wrap the new checks for existing _key field in an if-block conditioned on that argument.
What do you think, @AlmightyYakob?

That sounds like a good idea. Specifically, I think the default should be the unchecked state, to be safe.

Agreed.

Do you want to add this feature in on the current PR, or in a followup?

@jjnesbitt
Copy link
Copy Markdown
Member Author

Do you want to add this feature in on the current PR, or in a followup?

I can add it here, it's pretty relevant to this change so I think it's warranted.

@jjnesbitt jjnesbitt requested a review from waxlamp March 20, 2020 15:53
Comment thread test/test_csv_uploader.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants