Skip to content

Add support for csv selectors#1966

Merged
brosenberg42 merged 7 commits into
developfrom
feat/csv-selector
Nov 19, 2025
Merged

Add support for csv selectors#1966
brosenberg42 merged 7 commits into
developfrom
feat/csv-selector

Conversation

@brosenberg42
Copy link
Copy Markdown
Member

@brosenberg42 brosenberg42 commented Sep 19, 2025

Issues:


This change is Reviewable

@brosenberg42 brosenberg42 self-assigned this Sep 19, 2025
Copy link
Copy Markdown
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrobble reviewed 13 of 13 files at r1.
Reviewable status: 12 of 13 files reviewed, 4 unresolved discussions (waiting on @brosenberg42)


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):

import org.mitre.mpf.wfm.util.CharsetDetectingReader;

public class TestCsvColSelectorService {

Please add tests for the following:

  • Column expression returns no results
  • Leading / trailing white space sensitivity (e.g. user provides "1,2 , 3,"). This currently fails. Can we be insensitive?
  • White space and/or punctuation in the column name (e.g. "1,\"2,0\",\"hello world\"")

trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):

import org.mitre.mpf.wfm.util.CharsetDetectingReader;

public class TestCsvColSelectorService {

No exception is thrown when a column index is provided and that index doesn't exist; however, an exception is thrown when a column name is provided and the column name doesn't exist. We should be consistent and throw in both cases.


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 144 at r1 (raw file):

            1,2,3,4,5
            """;
        var mediaSelector = createMediaSelector("a,b,1,2");

Consider the case where the header name is a literal number, like "2".

I think we should force the user to be explicit. Either they're providing col indices, or they're providing col names.


trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/camel/operations/MediaSelectorsMapper.java line 1 at r1 (raw file):

package org.mitre.mpf.wfm.camel.operations;

Please add header comments.

@jrobble
Copy link
Copy Markdown
Member

jrobble commented Nov 6, 2025

trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/CsvColSelectorService.java line 191 at r2 (raw file):

        if (result.isEmpty()) {
            throw new WfmProcessingException(
                "The CSV file does not have a header named: " + headerName);

Please use quotes around the header name.

@jrobble
Copy link
Copy Markdown
Member

jrobble commented Nov 7, 2025

trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Please add tests for the following:

  • Column expression returns no results
  • Leading / trailing white space sensitivity (e.g. user provides "1,2 , 3,"). This currently fails. Can we be insensitive?
  • White space and/or punctuation in the column name (e.g. "1,\"2,0\",\"hello world\"")

We decided to use the existing output file for a test where we treat the first row as a header row. This will let us test for white space and punctuation in the header names.

The leading / trailing white space sensitivity issue is must likely to occur when a user doesn't notice that kind of white space in the file they are trying to process.

Copy link
Copy Markdown
Member Author

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 16 files reviewed, 5 unresolved discussions (waiting on @brosenberg42 and @jrobble)


trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/camel/operations/MediaSelectorsMapper.java line 1 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Please add header comments.

Done.


trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/CsvColSelectorService.java line 191 at r2 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Please use quotes around the header name.

Done.


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

We decided to use the existing output file for a test where we treat the first row as a header row. This will let us test for white space and punctuation in the header names.

The leading / trailing white space sensitivity issue is must likely to occur when a user doesn't notice that kind of white space in the file they are trying to process.

Done.


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 60 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

No exception is thrown when a column index is provided and that index doesn't exist; however, an exception is thrown when a column name is provided and the column name doesn't exist. We should be consistent and throw in both cases.

That is to support CSV with variable width rows.


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 144 at r1 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Consider the case where the header name is a literal number, like "2".

I think we should force the user to be explicit. Either they're providing col indices, or they're providing col names.

Done.

Copy link
Copy Markdown
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrobble reviewed 9 of 9 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @brosenberg42)


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 1 at r3 (raw file):

/******************************************************************************

Please add this test:

    @Test
    public void handlesDuplicateContent() {
        var csvContent = """
            a,b,c,c,b
            1,1,1,1,1
            1,1,1,1,1
            """;
        var mediaSelector = createMediaSelector("a,b", false, false);
        var media = createTestMedia(csvContent, mediaSelector);

        var updatedStrings = Map.of(
            "1", "I");
        assertExtractionsEqualTo(media, updatedStrings.keySet());

        var resultBytes = createOutputDocument(media, updatedStrings);

        String[][] expectedOutput = {
            {"a", "b", "c", "c", "b"},
            {"I", "I", "1", "1", "I"},
            {"I", "I", "1", "1", "I"}
        };
        assertOutputMatches(resultBytes, expectedOutput);
    }

The point of this is to ensure we're only replacing the cells that correspond to the specified cols, even if other cells have the same content.

Copy link
Copy Markdown
Member Author

@brosenberg42 brosenberg42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 15 of 16 files reviewed, 1 unresolved discussion (waiting on @jrobble)


trunk/workflow-manager/src/test/java/org/mitre/mpf/wfm/service/TestCsvColSelectorService.java line 1 at r3 (raw file):

Previously, jrobble (Jeff Robble) wrote…

Please add this test:

    @Test
    public void handlesDuplicateContent() {
        var csvContent = """
            a,b,c,c,b
            1,1,1,1,1
            1,1,1,1,1
            """;
        var mediaSelector = createMediaSelector("a,b", false, false);
        var media = createTestMedia(csvContent, mediaSelector);

        var updatedStrings = Map.of(
            "1", "I");
        assertExtractionsEqualTo(media, updatedStrings.keySet());

        var resultBytes = createOutputDocument(media, updatedStrings);

        String[][] expectedOutput = {
            {"a", "b", "c", "c", "b"},
            {"I", "I", "1", "1", "I"},
            {"I", "I", "1", "1", "I"}
        };
        assertOutputMatches(resultBytes, expectedOutput);
    }

The point of this is to ensure we're only replacing the cells that correspond to the specified cols, even if other cells have the same content.

Done.

Copy link
Copy Markdown
Member

@jrobble jrobble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrobble reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @brosenberg42)

@brosenberg42 brosenberg42 merged commit 77b44e7 into develop Nov 19, 2025
2 checks passed
@brosenberg42 brosenberg42 deleted the feat/csv-selector branch November 19, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants