Add support for csv selectors#1966
Conversation
jrobble
left a comment
There was a problem hiding this comment.
@jrobble reviewed 13 of 13 files at r1.
Reviewable status: 12 of 13 files reviewed, 4 unresolved discussions (waiting on @brosenberg42)
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):
import org.mitre.mpf.wfm.util.CharsetDetectingReader; public class TestCsvColSelectorService {
Please add tests for the following:
- Column expression returns no results
- Leading / trailing white space sensitivity (e.g. user provides
"1,2 , 3,"). This currently fails. Can we be insensitive? - White space and/or punctuation in the column name (e.g.
"1,\"2,0\",\"hello world\"")
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):
import org.mitre.mpf.wfm.util.CharsetDetectingReader; public class TestCsvColSelectorService {
No exception is thrown when a column index is provided and that index doesn't exist; however, an exception is thrown when a column name is provided and the column name doesn't exist. We should be consistent and throw in both cases.
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 144 at r1 (raw file):
1,2,3,4,5 """; var mediaSelector = createMediaSelector("a,b,1,2");
Consider the case where the header name is a literal number, like "2".
I think we should force the user to be explicit. Either they're providing col indices, or they're providing col names.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/camel/operations/MediaSelectorsMapper.java line 1 at r1 (raw file):
package org.mitre.mpf.wfm.camel.operations;
Please add header comments.
|
Please use quotes around the header name. |
|
Previously, jrobble (Jeff Robble) wrote…
We decided to use the existing output file for a test where we treat the first row as a header row. This will let us test for white space and punctuation in the header names. The leading / trailing white space sensitivity issue is must likely to occur when a user doesn't notice that kind of white space in the file they are trying to process. |
brosenberg42
left a comment
There was a problem hiding this comment.
Reviewable status: 7 of 16 files reviewed, 5 unresolved discussions (waiting on @brosenberg42 and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/camel/operations/MediaSelectorsMapper.java line 1 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please add header comments.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/CsvColSelectorService.java line 191 at r2 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please use quotes around the header name.
Done.
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
We decided to use the existing output file for a test where we treat the first row as a header row. This will let us test for white space and punctuation in the header names.
The leading / trailing white space sensitivity issue is must likely to occur when a user doesn't notice that kind of white space in the file they are trying to process.
Done.
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
No exception is thrown when a column index is provided and that index doesn't exist; however, an exception is thrown when a column name is provided and the column name doesn't exist. We should be consistent and throw in both cases.
That is to support CSV with variable width rows.
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 144 at r1 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Consider the case where the header name is a literal number, like "2".
I think we should force the user to be explicit. Either they're providing col indices, or they're providing col names.
Done.
jrobble
left a comment
There was a problem hiding this comment.
@jrobble reviewed 9 of 9 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42)
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 1 at r3 (raw file):
/******************************************************************************
Please add this test:
@Test
public void handlesDuplicateContent() {
var csvContent = """
a,b,c,c,b
1,1,1,1,1
1,1,1,1,1
""";
var mediaSelector = createMediaSelector("a,b", false, false);
var media = createTestMedia(csvContent, mediaSelector);
var updatedStrings = Map.of(
"1", "I");
assertExtractionsEqualTo(media, updatedStrings.keySet());
var resultBytes = createOutputDocument(media, updatedStrings);
String[][] expectedOutput = {
{"a", "b", "c", "c", "b"},
{"I", "I", "1", "1", "I"},
{"I", "I", "1", "1", "I"}
};
assertOutputMatches(resultBytes, expectedOutput);
}The point of this is to ensure we're only replacing the cells that correspond to the specified cols, even if other cells have the same content.
brosenberg42
left a comment
There was a problem hiding this comment.
Reviewable status: 15 of 16 files reviewed, 1 unresolved discussion (waiting on @jrobble)
trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 1 at r3 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please add this test:
@Test public void handlesDuplicateContent() { var csvContent = """ a,b,c,c,b 1,1,1,1,1 1,1,1,1,1 """; var mediaSelector = createMediaSelector("a,b", false, false); var media = createTestMedia(csvContent, mediaSelector); var updatedStrings = Map.of( "1", "I"); assertExtractionsEqualTo(media, updatedStrings.keySet()); var resultBytes = createOutputDocument(media, updatedStrings); String[][] expectedOutput = { {"a", "b", "c", "c", "b"}, {"I", "I", "1", "1", "I"}, {"I", "I", "1", "1", "I"} }; assertOutputMatches(resultBytes, expectedOutput); }The point of this is to ensure we're only replacing the cells that correspond to the specified cols, even if other cells have the same content.
Done.
jrobble
left a comment
There was a problem hiding this comment.
@jrobble reviewed 1 of 1 files at r4, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)
Issues:
This change is