Skip to content
This repository was archived by the owner on Feb 23, 2022. It is now read-only.

Add functionality to catch many upload problems at once and return them all#141

Merged
JackWilb merged 27 commits intomasterfrom
upload-validation-errors
Aug 28, 2019
Merged

Add functionality to catch many upload problems at once and return them all#141
JackWilb merged 27 commits intomasterfrom
upload-validation-errors

Conversation

@JackWilb
Copy link
Copy Markdown
Member

@JackWilb JackWilb commented Aug 21, 2019

Right now this just deals with the csv uploader. Looks like some of this work overlaps with @AlmightyYakob but expands functionality to catch many errors in one pass and to catch decoding errors. (closes #101)

@jjnesbitt
Copy link
Copy Markdown
Member

My other PR only contains one change, which yours also contains, so I'll close that one. For clarity, I'll just re-iterate that this deals with the backend side of #68, but doesn't add the client component. Also, there's more that can be done to validate the CSV upload, but this is a start. That can probably be done separate from this.

Comment thread multinet/uploaders/csv.py
Comment thread multinet/uploaders/csv.py Outdated
Comment thread multinet/uploaders/csv.py Outdated
@JackWilb
Copy link
Copy Markdown
Member Author

The next step for me is to implement some json testing but that's tricky without examples. Are there existing examples, @waxlamp, or will I have to generate some?

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 22, 2019

What do mean by "json testing"?

@JackWilb
Copy link
Copy Markdown
Member Author

JackWilb commented Aug 23, 2019

Sorry, that wasn't clear. I meant json upload validation, i.e. verifying the structure is correct (like we've done with CSV).

@JackWilb JackWilb marked this pull request as ready for review August 26, 2019 18:09
@JackWilb
Copy link
Copy Markdown
Member Author

JackWilb commented Aug 26, 2019

There is currently no validation for the nested json file structure so I've opened a new issue for that (#159).

@JackWilb JackWilb requested review from jjnesbitt and waxlamp August 26, 2019 18:11
Comment thread test/test_csv_uploader.py Outdated
Comment thread multinet/uploaders/csv.py
Comment thread multinet/uploaders/csv.py Outdated
Comment thread multinet/uploaders/newick.py
Comment thread multinet/uploaders/newick.py Outdated
@JackWilb
Copy link
Copy Markdown
Member Author

JackWilb commented Aug 27, 2019

@AlmightyYakob making those changes has broken the validation tests since they now don't return the errors, they raise the error instead. Any idea how we can handle this? Importing unittest? Or return the errors and then raise the error in the upload methods

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 27, 2019

I think you can have pytest assert that such-and-such exception was raised.

@JackWilb
Copy link
Copy Markdown
Member Author

Ready for review @AlmightyYakob @waxlamp

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 27, 2019

I think you could also probably catch the exception yourself, then run the asserts in the exception handler.

@JackWilb
Copy link
Copy Markdown
Member Author

JackWilb commented Aug 27, 2019

Is that a preferable solution compared to what I just pushed?
b7941f7

@jjnesbitt
Copy link
Copy Markdown
Member

jjnesbitt commented Aug 27, 2019

I think the preferred solution is something like:

with pytest.raises(ValidationFailed) as v_error:
    validate_csv(rows)

validation_resp = v_error.value.errors
...etc...

Pytest docs regarding this. FWIW I've testing this method locally and it does work, and still allows us to make those assertions.

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 27, 2019

@AlmightyYakob, would the validation_resp line go inside the scope of the with block?

In any case, that pattern gets us the best of both worlds: you can assert that the function raises an exception, and you can examine the structure of that exception too.

@JackWilb
Copy link
Copy Markdown
Member Author

Yeah I like it. Can you review my last commit and see if that aligns better with the new tests

@jjnesbitt
Copy link
Copy Markdown
Member

@AlmightyYakob, would the validation_resp line go inside the scope of the with block?

In any case, that pattern gets us the best of both worlds: you can assert that the function raises an exception, and you can examine the structure of that exception too.

I think you may be able to, but it's not necessary. I prefer to not include that line in the scope, because the only line that would raise the exception is the call to validate_csv.

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 27, 2019

I think you may be able to, but it's not necessary. I prefer to not include that line in the scope, because the only line that would raise the exception is the call to validate_csv.

Ah so the block just converts a try-except into a linear flow, neat.

@waxlamp
Copy link
Copy Markdown
Contributor

waxlamp commented Aug 27, 2019

Can you review my last commit and see if that aligns better with the new tests

Looks right to me!

jjnesbitt
jjnesbitt previously approved these changes Aug 27, 2019
@jjnesbitt
Copy link
Copy Markdown
Member

Ah so the block just converts a try-except into a linear flow, neat.

Yes, and also will fail the test if the exception is not raised, which is very handy.

Copy link
Copy Markdown
Contributor

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall!

Comment thread multinet/uploaders/csv.py Outdated
try:
body = input.decode("utf8")
except UnicodeDecodeError:
raise DecodeFailed([{"error": "unsupported", "detail": "not utf8"}])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error should read "utf8", and the detail should get whatever error message or error data UnicodeDecodeError provides.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have caught this earlier, but DecodeFailed ought to just receive a single argument value, rather than a list, and that value should just be the string value of the UnicodeDecodeError object.

So I would change this line to take in str(e), and change the definition of DecodeFailed to reflect that it's expecting just a string to describe how the decode failed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is probably best for this one function but since ValidationFailed returns {"errors": errors} maybe we should be consistent? This would allow us to do just one check from the client.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason for DecodeFailed to be made consistent with ValidationFailed -- they are different kinds of error conditions. DecodeFailed is really just our variant of UnicodeDecodeError, which does signal a single error, so we should be consistent with that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see.

I think ValidationError should also lose the "errors" dict format. Let's change both.

Copy link
Copy Markdown
Member Author

@JackWilb JackWilb Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that solution is that ValidationError can return multiple errors. How should we update that to handle the multiple errors?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already handles multiple errors properly; the change here is in how it reports those errors. It can just return the JSONified string representing the list of errors, instead of the JSON object with the errors key that it currently uses.

Comment thread multinet/uploaders/newick.py Outdated
Comment thread multinet/uploaders/newick.py Outdated
@JackWilb JackWilb merged commit c9ecdfb into master Aug 28, 2019
@JackWilb JackWilb deleted the upload-validation-errors branch August 28, 2019 20:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modify the upload endpoints to return a list of errors that can be used to diagnose the problem in bad input

3 participants