Skip to content

jobs failing due GKE error 401 #124

@nuclearcat

Description

@nuclearcat

As noticed, many jobs are failing due error 401 calling GKE API:
https://bot.staging.kernelci.org/job/kernel-build/77628/console

.
PASS
Traceback (most recent call last):
  File "/data/workspace/bot.staging.kernelci.org/k8s-gcp-us-central1/workspace/kernel-build__5/kernelci-core/./config/k8s/wait.py", line 167, in <module>
    main(args)
  File "/data/workspace/bot.staging.kernelci.org/k8s-gcp-us-central1/workspace/kernel-build__5/kernelci-core/./config/k8s/wait.py", line 105, in main
    pod = core.list_namespaced_pod(namespace=args.namespace, watch=False,
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/api/core_v1_api.py", line 15697, in list_namespaced_pod
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/api/core_v1_api.py", line 15812, in list_namespaced_pod_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/rest.py", line 240, in GET
    return self.request("GET", url,
  File "/usr/local/lib/python3.9/dist-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '133dc888-5588-4292-b5d9-f23f4e3dded8', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 19 Aug 2022 01:21:38 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

Probably we are facing some kind of race condition with tokens or they are getting expired. Maybe we should retry at least once on such error?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions