Category: [B2]; Team name: NPL; Dataset: Chordonomicon#238
Open
PierrickLeroy wants to merge 14 commits intogeometric-intelligence:mainfrom
Open
Category: [B2]; Team name: NPL; Dataset: Chordonomicon#238PierrickLeroy wants to merge 14 commits intogeometric-intelligence:mainfrom
PierrickLeroy wants to merge 14 commits intogeometric-intelligence:mainfrom
Conversation
1. Find the segments of chords based on the <verse> label. 2. Inference the tone for each segment. 3. Transpose the chords into the Roman numeral expression.
Now the file includes a systematic analysis. But still haven't used for every song in the dataframe. Also includes a dictionary for search the mapping between chord-scales. (See the last part)
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
PierrickLeroy
commented
Nov 24, 2025
Author
There was a problem hiding this comment.
small possible workaround to allow edge level prediction, at least with NoReadOut
PierrickLeroy
commented
Nov 24, 2025
Author
There was a problem hiding this comment.
small possible workaround to allow edge level prediction, at least with NoReadOut
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Description
This pull request introduces a benchmark task based on musical chord synergy and redundancy.
The goal is to assess how well topological models capture the structure of chords, which are naturally occurring higher-order objects.
The task consists of predicting a theoretically grounded information-theoretic quantity, the local O-information [1], computed from real musical data, thereby enabling evaluation at the hyperedge level.
Concretely, we propose a hyperedge regression task with the following characteristics:
Dataset
We integrated a chord dataset, derived from the CHORDONOMICON dataset [3], which contains over 600 000 songs annotated with their chord progressions. We preprocess this dataset by standardizing chord notation, mapping every chord to a common alphabet of$12$ notes in the base version.
We then aggregate occurrences to construct a hypergraph in which each of the 226 hyperedges corresponds to a unique chord. The resulting processed datasets are publicly available on Hugging Face: link.
We additionally provide the pre-aggregation data, which can support alternative tasks such as predicting musical genre from hyperedges.
The dataset is available in 2 versions, depending on how musical scales are treated:
single_scale: notes are merged across octaves (e.g., C♯2 and C♯3 are treated as the same pitch class), yielding 12 distinct notes.all_scales: octave information is preserved, and notes at different octaves are treated as distinct, yielding 38 total notes. In this case the number of hyperedges is 4313.The choice of which dataset to load is made in the configuration file (chordonomicon.yaml) or directly with an argument in the dataset class (ChordonomiconDataset).
Issue
This benchmark task introduces local O-information in TDL evaluation: it is a mathematically rigorous measure of synergy and redundancy in multivariate systems.
Using it as a regression target, the goal is to set up a task for which:
Expressivity of TDL models
Local O-information [1], derived from the O-information [2], is an interesting quantity because it assess for each hyperedge whether the information it contains is redundant (recoverable from lower-order interactions) or synergistic (emerging only from higher-order interactions).$n$ -tuple $x^n$ is given as follow (see eq 4 in [1] for more details):
Concretely, the local O-information for an
where$h$ is the information-content function, corresponding to an hyperedge, $x_j$ is the marginalisation for variable $j$ (over $(n-1)$ variables) and $x^n_{-j}$ is the marginalisation over $j$ , that is, a function of all variables except $j$ .
Computing local O-information requires contrasting information across different orders, hence we think it might be a good evaluation of model expressivity, in the same spirit of the WL tests for GNNs.
Additional context
Limitations
Due to the structured nature of musical harmony, chords follow patterns and only cover a fraction of all possible combinations of the 12 pitch classes.
As a result, the number of hyperedges in the single-scale setting remains relatively modest (226).
When all scales are included, the number increases substantially (4,313), but the corresponding empirical frequencies become more variable and contain more outliers, which in turn introduces additional noise into the labels.
References
[1] "Quantifying high-order interdependencies on individual patterns via the local O-information: Theory and applications to music analysis", Scagliarini et al
[2] "Quantifying high-order interdependencies via multivariate extensions of the mutual information", Rosas et al
[3] "CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions", Kantarelis et al