Skip to content

error: failed to authorize: no active session for <session-id>: context deadline exceeded #2367

@maxlaverse

Description

@maxlaverse

Hi!
Every day we have some builds failing with the following error:

#25 exporting cache
#25 preparing build cache for export
#25 preparing build cache for export 73.8s done
#25 writing layer sha256:somesha
#25 78.76 error: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded
#25 78.76 retrying in 1s
#25 84.76 error: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded
#25 84.76 retrying in 2s
#25 91.76 error: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded
#25 91.76 retrying in 4s
#25 writing layer sha256:somesha 27.0s done
#25 100.8 error: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded
#25 ERROR: error writing layer blob: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded
#26 ** export finalization failed - continuing anyway: error writing layer blob: failed to authorize: no active session for d64hnddm9jssidwj6f9vv2w37: context deadline exceeded (that's custom code)
#26 DONE 0.0s   

It affected 36 builds out of 883 in the past 24 hours (approx. 4%). This is likely related to docker/buildx#456 but I believe it deserves an issue in this project since it’s where the error is coming from.

Timeline

The session identifiers are legit. The problem is that is the session is gone when trying to fetch registry credentials, from a couple of seconds up to a couple of minutes. Here is an example of timeline:

Why the session is gone is not entirely clear, but we're doing a lot of concurrent builds that share part of their stages. I hope I didn't made an error when wrapping up the timeline. If anything seems weird, don't hesitate to ask for verification or additional traces. We're using version 0.9.0 on linux/amd64.

Any idea what the approach should be to recover from such an error ?

Workaround

For the time being, we came up with a workaround which is to ignore errors when exporting the cache images, instead of failing our pipelines. We achieved adding something like the following snippet around

cacheExporterResponse, err = e.Finalize(ctx)

if exportFinalizationFailure {
       inBuilderContext(ctx, j, fmt.Sprintf("** export finalization failed - continuing anyway: %v", err), "", func(_ context.Context, _ session.Group) error { return nil })
} else {
       return nil, err
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions