Skip to content

Allow to_pandas to return player_id as column on tracking datasets #68

@rjtavares

Description

@rjtavares

On tracking datasets, to_pandas returns a DataFrame with a row per frame. To make further transformations it may be easier to work with a DataFrame with one row per frame per player.

So to_pandas could allow a player_as_index option that produces this transformation.

This code worked on a metrica dataset:

df = dataset.to_pandas()

def add(a,b): return a + b
players = add(*[[p.player_id for p in t.players] for t in dataset.metadata.teams])+['ball']

x_coords = [p+'_x' for p in players]
y_coords = [p+'_y' for p in players]
ball_cords = ['ball_x', 'ball_y']

data_columns = set(df.columns)-set(x_coords)-set(y_coords)-set(ball_cords)

x = df.set_index(list(data_columns))[x_coords]
y = df.set_index(list(data_columns))[y_coords]
x.columns = players
y.columns = players

df = pd.concat([x.stack(),y.stack()], axis=1)
df.columns = ['x', 'y']

If you're ok with it I can test it further and submit a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions