Skip to content

Flexibilize wandb config #136

@p-ferreira

Description

@p-ferreira

Problem

As shown in #134 , some members of the community rely on the wandb config for data extraction.

The change implemented in #132 was seeking to reduce space for exploits of the data logged in wandb config.

The rationale was that valuable information for filtering such as hotkey, netuid, version, createdAt among others would be already there through the tags and wandb metadata.

Following the native mongodb api exposed by wandb, ideally one would be able to filter runs with the following code:

import wandb
from datetime import datetime, timedelta

# Date that you want to filter, in this case 3 days ago
date_filter = datetime.now() - timedelta(days=3)

api = wandb.Api()
all_runs = api.runs("opentensor-dev/openvalidators", filters={
    "$and": [
        {"createdAt": {"$gt": date_filter.timestamp()}},
        {"tags": {"$all": ["1.1.8", "netuid_1"]}}
    ]})
print(len(runs))

Unfortunately, wandb api throws the following internal error for the call:

wandb: Network error (HTTPError), entering retry loop.

HTTPError: 500 Server Error: Internal Server Error for url: https://api.wandb.ai/graphql

The foundation is currently in contact with wandb team in order to enable a communication channel seeking to improve the integration with their platform, that is currently relatively unstable from their API perspective.

One can work around this issue by querying all the runs and filtering them manually, preferably with retry mechanisms in place as wandb API throws eventual exceptions from time to time.

Bellow an example of how to filter runs by tags, username and date:

import wandb
from datetime import datetime, timedelta
import logging
from tenacity import retry, stop_after_attempt, wait_fixed, before_sleep_log

api = wandb.Api()

# Tags that you want to filter
filter_tags = ['1.1.8', 'netuid_1']

# User that you want to filter
username_filter = 'opentensor-pedro'

# Date that you want to filter, in this case 3 days ago
date_filter = datetime.now() - timedelta(days=3)

@retry(stop=stop_after_attempt(10), wait=wait_fixed(0.5), before_sleep=before_sleep_log(logging.getLogger(), logging.WARNING))
def get_filtered_runs():
    all_runs = api.runs('opentensor-dev/openvalidators', filters={'tags': {"$in": filter_tags}})
    print('Total collected runs:', len(all_runs))

    filtered_runs = []        
    for run in all_runs:
        # Check if run has all filter tags
        run_matches_filter_tags = all(filter_tag in run.tags for filter_tag in filter_tags)
        run_matches_username = run.user.username == username_filter     
        run_matches_date = datetime.strptime(run.created_at, '%Y-%m-%dT%H:%M:%S') > date_filter

        if run_matches_filter_tags and run_matches_date and run_matches_username:
            filtered_runs.append(run)        

    return filtered_runs

filtered_runs = get_filtered_runs()
len(filtered_runs)

The solution above is far from being the best as it’s very slow and the data retrieved from wandb is not reliable (total number of runs do not match what is seem in the UI filter). This issue was not identified when filtering by config.netuid .

Proposed solution

With all that in mind, it could be interesting for everybody to bring back parts of the original config, such as

  • config.netuid
  • config.wandb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions