Skip to content

Cross-document-contamination between python AsyncSuperDocClient's operating on different documents #2493

@lysonnjoroge

Description

@lysonnjoroge

What happened?

Summary

When using the python SDK to open multiple documents sequentially or concurrently (asyncio/multiprocessing) using different AsyncSuperDocClient clients, reads/writes do not appear to be isolated, both clients read/write to the most recently opened document.

Impact

Cannot safely run concurrent workflows on different documents using the python SDK

Steps to reproduce

  1. Create 2 docs doc_a.docx, doc_b.docx
  2. Run script uv run minimal-repro.py (open both docs with different clients, write different paragraphs, save)
  3. Check logs. Both clients print only doc_b's content.
  4. Open docs and check for changes. Only doc_b is updated. Client 1's edits are lost
"""Read both docs, write different paragraphs, save"""

import asyncio
from superdoc import AsyncSuperDocClient

# Provide 2 distinct .docx files
DOC_A = "path/to/doc_a.docx" 
DOC_B = "path/to/doc_b.docx"

async def main():
    c1 = AsyncSuperDocClient(user={"name": "c1"})
    c2 = AsyncSuperDocClient(user={"name": "c2"})
    await c1.connect()
    await c2.connect()

    await c1.doc.open({"sessionId": "s1", "doc": DOC_A})
    await c2.doc.open({"sessionId": "s2", "doc": DOC_B})

    md1 = await c1.doc.get_markdown()
    md2 = await c2.doc.get_markdown()
    print(f"c1 (opened doc-1): {md1[:80]}")
    print(f"c2 (opened doc-2): {md2[:80]}")
    print(f"SAME CONTENT after open: {md1 == md2}")
    print()

    # Insert different paragraphs via each client using doc.create.paragraph
    res1 = await c1.doc.create.paragraph(
        {"at": {"kind": "documentEnd"}, "text": "INSERTED BY CLIENT ONE"},
    )
    res2 = await c2.doc.create.paragraph(
        {"at": {"kind": "documentEnd"}, "text": "INSERTED BY CLIENT TWO"},
    )
    print(f"create_paragraph res1: {res1}")
    print(f"create_paragraph res2: {res2}")

    md1 = await c1.doc.get_markdown()
    md2 = await c2.doc.get_markdown()
    print("c1 full markdown:")
    print(md1)
    print()
    print("c2 full markdown:")
    print(md2)
    print()
    print(f"SAME CONTENT after insert: {md1 == md2}")
    print(f"c1 has CLIENT ONE: {'CLIENT ONE' in md1}")
    print(f"c1 has CLIENT TWO: {'CLIENT TWO' in md1}")
    print(f"c2 has CLIENT ONE: {'CLIENT ONE' in md2}")
    print(f"c2 has CLIENT TWO: {'CLIENT TWO' in md2}")

    await c1.doc.save({})
    await c2.doc.save({})
    await c1.dispose()
    await c2.dispose()

asyncio.run(main())

Output:

c1 (opened doc-1): ***~~Test doc 2~~***

***~~With more text~~***

c2 (opened doc-2): ***~~Test doc 2~~***

***~~With more text~~***

SAME CONTENT after open: True

create_paragraph res1: {'document': {'path': 'path/to/doc_b.docx', 'source': 'path', 'byteLength': 9570,
'revision': 1}, 'result': {'success': True, 'paragraph': {'kind': 'block', 'nodeType': 'paragraph', 'nodeId': '02a3bcca-7c3b-4b68-9a7f-897494e949b2'}, 'insertionPoint': {'kind': 'text', 'blockId':
'02a3bcca-7c3b-4b68-9a7f-897494e949b2', 'range': {'start': 0, 'end': 0}}}, 'changeMode': 'direct', 'dryRun': False, 'context': {'dirty': True, 'revision': 1}}
create_paragraph res2: {'document': {'path': 'path/to/doc_b.docx', 'source': 'path', 'byteLength': 9570,
'revision': 2}, 'result': {'success': True, 'paragraph': {'kind': 'block', 'nodeType': 'paragraph', 'nodeId': '50c50259-e877-492c-ae95-02b515a7911e'}, 'insertionPoint': {'kind': 'text', 'blockId':
'50c50259-e877-492c-ae95-02b515a7911e', 'range': {'start': 0, 'end': 0}}}, 'changeMode': 'direct', 'dryRun': False, 'context': {'dirty': True, 'revision': 2}}
c1 full markdown:
***~~Test doc 2~~***

***~~With more text~~***


c2 full markdown:
***~~Test doc 2~~***

***~~With more text~~***

INSERTED BY CLIENT TWO


SAME CONTENT after insert: False
c1 has CLIENT ONE: False
c1 has CLIENT TWO: False
c2 has CLIENT ONE: False
c2 has CLIENT TWO: True

SuperDoc version

superdoc-sdk==1.0.0a48

Browser

None

Additional context

Setup: macOS Darwin 24.3.0, Python 3.12

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions