Skip to content

Dataset returning empty strings #102

@p-ferreira

Description

@p-ferreira

Sometimes the dataset samples an empty string like \n\n which goes into all the steps of the CoT of the validator, causing some weird conversation flows.

This can be easily fixed by implementing a "is string empty" verification before returning the text sampled from the dataset.

Some examples of conversation flows with empty strings:

'\n\nSummarize the preceding context in 5 sentences.\n\n',
 '\n\nSummarize the preceding context in 4 sentences.\n\n',
'\n\nSummarize the preceding context in 6 sentences.\n\n',
 '\n\nSummarize the preceding context in 7 sentences.\n\n'

Location to implement change:
https://github.com/opentensor/validators/blob/e422d2a5e402e814e9dd325c4c5b5675cf976380/openvalidators/dataset.py#L30C18-L30C18

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions