Skip to content

Potential Issue: GCSFileSystem Reuses HTTP Session from a Different Event Loop #659

@janwodnicki

Description

@janwodnicki

When using GCSFileSystem, the superclass (fsspec.AbstractFileSystem) caches instances to avoid unnecessary reinitialization. However, if an instance is created in one event loop and later used in another, the self.session attribute remains tied to the original event loop. This can lead to issues when making requests in the new loop, as the session is no longer valid for it.

  • Is GCSFileSystem designed to support usage across multiple event loops?
  • Should it check if the loop has changed and create a new session accordingly?

Would you consider GCSFileSystem(skip_instance_cache=True, ...) a workaround? Or the intended design?

Code to reproduce issue:

import asyncio
import gcsfs


async def test_cat():
    # fs instance cached after first call, session loop stays the same
    fs = gcsfs.GCSFileSystem(asynchronous=True)
    try:
        await fs._cat("test-bucket/test-file.txt")
    except OSError as e:
        # 1st call will fail because the file does not exist (expected)
        print(e)
    except RuntimeError as e:
        # 2nd call will fail because the session loop is different from the event loop
        print(e)


loop = asyncio.new_event_loop()
loop.run_until_complete(test_cat())
loop = asyncio.new_event_loop()
loop.run_until_complete(test_cat())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions