Skip to content

Conversation

@szarnyasg
Copy link
Member

@szarnyasg szarnyasg commented Aug 24, 2022

Will fix #205.

We can use the DuckDB appender to populate the tables.

Current validation scripts are in:

A lot of time is spent parsing the results back from CSVs to Java data structures, this could also be improved by using DuckDB's COPY ... FROM 'filename.csv' (DELIMITER ' ', FORMAT csv) clause.

Validation tests (that are used to test the validation rules themselves) are in:

Populating tables using the DuckDB appender and comparing WCC results

A snippet for using appenders (not sure whether it is useful):

try (DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {
    Statement stmt = conn.createStatement();

    // fill 'expected' table
    stmt.execute("DROP TABLE IF EXISTS expected");
    stmt.execute("CREATE TABLE expected(v bigint not null, x double not null);");
    DuckDBAppender expectedAppender = conn.createAppender("main", "expected");
    for (long vertexId : outputGraph.getVertices()) {
        expectedAppender.beginRow();
        expectedAppender.append(vertexId);
        expectedAppender.append(outputGraph.getVertexValue(vertexId));
        expectedAppender.endRow();
    }
    expectedAppender.close();

    // fill 'actual' table
    stmt.execute("DROP TABLE IF EXISTS actual");
    stmt.execute("CREATE TABLE actual(v bigint not null, x double not null);");
    DuckDBAppender actualAppender = conn.createAppender("main", "actual");
    for (long vertexId : outputGraph.getVertices()) {
        actualAppender.beginRow();
        actualAppender.append(vertexId);
        actualAppender.append(expected result);
        actualAppender.endRow();
    }
    actualAppender.close();

    ResultSet rs = stmt.executeQuery(
            "SELECT e1.v AS v, e1.x AS x, a1.x AS x\n" +
            "FROM expected e1, actual a1\n" +
            "WHERE e1.v = a1.v -- select a node in the expected-actual tables\n" +
            "  AND EXISTS (\n" +
            "    SELECT 1\n" +
            "    FROM expected e2, actual a2\n" +
            "    WHERE e2.v = a2.v   -- another node in expected-actual tables\n" +
            "      AND e1.x = e2.x   -- where the node is in the same equivalence class in the expected table\n" +
            "      AND a1.x != a2.x  -- but not in the actual table\n" +
            "  )\n" +
            ";");
    while (rs.next()) {
        System.out.format("%ld: %ld != %ld %n", rs.getLong(1), rs.getLong(2), rs.getLong(3));
    }
    rs.close();

Handling infinity values

Handling infinity necessitates special care as multiple values should be accepted:

if (low.equals("inf") || low.equals("+inf") || low.equals("infinity") || low.equals("+infinity")) {
    return Double.POSITIVE_INFINITY;
} else if (low.equals("-inf") || low.equals("-infinity")) {
    return Double.NEGATIVE_INFINITY;
}

Validation of completeness

The validation should not only check whether the results are correct, it should also check whether all vertices are included in the result set.

@szarnyasg szarnyasg force-pushed the output-validation-using-matching-in-sql branch 2 times, most recently from df28286 to 1407616 Compare August 27, 2022 17:52
@szarnyasg szarnyasg force-pushed the output-validation-using-matching-in-sql branch from 929a105 to 57c4366 Compare August 27, 2022 20:07
@szarnyasg szarnyasg marked this pull request as ready for review January 12, 2023 13:49
@szarnyasg szarnyasg force-pushed the main branch 3 times, most recently from 243bf48 to f4acf27 Compare February 14, 2023 11:14
@szarnyasg szarnyasg merged commit c2bca48 into main Feb 14, 2023
@szarnyasg szarnyasg deleted the output-validation-using-matching-in-sql branch February 14, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validation is slow for large graphs

2 participants