Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
228 commits
Select commit Hold shift + click to select a range
67f6a90
refactor: update dj package to use new name
eywalker Jun 19, 2025
ac228b0
feat: add ability to skip computation in pod
eywalker Jun 19, 2025
450ec90
refactor: major change of structure and implementation of pipeline
eywalker Jun 24, 2025
09eb947
refactor: implement ContentHashableBase
eywalker Jun 24, 2025
bd3c7a8
refactor: significantly clean up label logic
eywalker Jun 25, 2025
90b9dad
optim: avoid len call by using list comprehension
eywalker Jun 25, 2025
1e61259
refactor: place Operator back in base
eywalker Jun 25, 2025
df58134
refactor: place operator in base and add additional operator methods …
eywalker Jun 25, 2025
6e4d4bd
wip: change to content identifable base
eywalker Jun 25, 2025
e8efa44
Merge branch 'main' of https://github.com/walkerlab/orcabridge into p…
eywalker Jun 26, 2025
5fb2435
style: apply ruff formatting
eywalker Jun 26, 2025
09f59cb
refactor: clean up test of name orcabridge
eywalker Jun 26, 2025
c5fcb3d
test: remove filepath specification
eywalker Jun 27, 2025
22215ca
fix: remove orcabridge reference
eywalker Jun 27, 2025
56d559a
refactor: rename module to match class
eywalker Jun 27, 2025
59ad526
refactor: move core to legacy_core
eywalker Jun 27, 2025
3e0cdf4
fix: update reference to core
eywalker Jun 27, 2025
50e0772
refactor: rename semantic arrow hasher module to generic arrow hashers
eywalker Jun 27, 2025
33103b8
refactor: rename variables to typespec
eywalker Jun 27, 2025
e35b024
feat: collect refined hashing functions
eywalker Jun 27, 2025
02412d0
feat: collect semantic type hashsers into a module
eywalker Jun 27, 2025
1e90679
refactor: make file hasher return bytes
eywalker Jun 27, 2025
78fdead
feat: add new defaut object hasher
eywalker Jun 27, 2025
3dcaa0b
test: update ref
eywalker Jun 27, 2025
89ddd76
fix: handle type vars in process_structure
eywalker Jun 27, 2025
905f915
wip: use new schema system
eywalker Jun 30, 2025
a3ba172
feat: add field source tracking
eywalker Jul 1, 2025
d3b66de
feat: support map and join on packets with source info
eywalker Jul 1, 2025
0bafbaa
fix: keep all columns internally
eywalker Jul 1, 2025
6321467
wip: update legacy file related tests and rename to stores
eywalker Jul 1, 2025
41f1b63
test: fix legacy tests
eywalker Jul 1, 2025
fe423f7
fix: make all tests functional
eywalker Jul 1, 2025
ba1f45d
refactor: cleanup imports and use versioned object hasher
eywalker Jul 1, 2025
e689d0d
fix: failure to reset cache due to mro mixup
eywalker Jul 1, 2025
6222064
style: apply ruff format
eywalker Jul 1, 2025
cbe82ab
fix: legacy_core imports
eywalker Jul 1, 2025
caca67b
wip: arrow logical serialization
eywalker Jul 1, 2025
7bc98e1
refactor: utils renaming and relocation
eywalker Jul 1, 2025
51f3da2
fix: cleanup imports and fix issue in recursive structure processing
eywalker Jul 2, 2025
3d54067
refactor: add more robust arrow serialization strategy and use @ for …
eywalker Jul 2, 2025
1ac2be6
feat: logical serialization for arrow table
eywalker Jul 2, 2025
dab3378
feat: update versioned arrow hasher to use new serialization
eywalker Jul 2, 2025
4f07927
wip: delta table store implementation
eywalker Jul 2, 2025
1b7519e
feat: better handling of stores and add flushing to stores and pipeline
eywalker Jul 2, 2025
07fd76e
feat: integrate actual saving to parquet into simple in memory store
eywalker Jul 2, 2025
8411b40
refactor: cleanup improt and comment out old packet converter for fut…
eywalker Jul 2, 2025
d90e5c6
fix: attach label on kernel invocation to the invocation object
eywalker Jul 3, 2025
fe35aba
fix: invoke superclass init
eywalker Jul 3, 2025
ef301b3
feat: expose explicit check for assigned label on content identifiabl…
eywalker Jul 3, 2025
ead6704
feat: add label on wrapped invocation
eywalker Jul 3, 2025
cbb8754
doc: add tutorial notebook
eywalker Jul 3, 2025
73b2638
refactor: clean up store package
eywalker Jul 3, 2025
555a751
feat: improve pipeline usability with typechecks and convenience attr…
eywalker Jul 3, 2025
083134b
fix: use new store name
eywalker Jul 3, 2025
7e33bae
test: update to use new package name
eywalker Jul 3, 2025
5641810
fix: wrong import
eywalker Jul 3, 2025
00b4066
Merge pull request #27 from eywalker/pipeline
brian-arnold Jul 3, 2025
c66920c
doc: handle typing corner cases
eywalker Jul 3, 2025
58d7e40
Merge branch 'pipeline' of https://github.com/walkerlab/orcabridge in…
eywalker Jul 3, 2025
7ace5a4
doc: reorganize tutorials
eywalker Jul 5, 2025
6f41f52
feat: cleaned up delta store
eywalker Jul 5, 2025
523291f
feat: add protocols
eywalker Jul 10, 2025
83118ab
refactor: use protocols in hashing package
eywalker Jul 10, 2025
5fc78f8
refactor: temporarily stop top level import while refactoring
eywalker Jul 10, 2025
93beb0f
refactor: remove protocol-relevant definitions
eywalker Jul 10, 2025
4d7761f
refactor: add concrete component implementation in data package
eywalker Jul 10, 2025
cac1855
refactor: cleanup protocols
eywalker Jul 11, 2025
5a178b5
refactor: further refinement of tracker protocols
eywalker Jul 11, 2025
53527b1
feat: refine kernel and pod interaction with tracker
eywalker Jul 11, 2025
6e2bdd7
feat: implement pure immutable datagram
eywalker Jul 12, 2025
7293749
fix: preparation of output stream in pod
eywalker Jul 12, 2025
7f49de0
feat: add feature to include content hash in arrow table
eywalker Jul 12, 2025
ff99495
doc: add comprehensive documentation to datagrams
eywalker Jul 12, 2025
5c8f85d
refactor: remove unused datagram base
eywalker Jul 14, 2025
3d3e946
refactor: combine pre-foward step into one for simplicity
eywalker Jul 14, 2025
0d8f7cb
refactor: adopt the new method signature for pre-forward step
eywalker Jul 14, 2025
a7531bf
feat: add non-zero input operator
eywalker Jul 14, 2025
730f72b
wip: major refactoring of package structure
eywalker Jul 18, 2025
29b8004
feat: implement data context to capture shared hashing and semantic c…
eywalker Jul 19, 2025
4c710db
refactor: clean up protocol around types
eywalker Jul 19, 2025
8b84c02
wip: further refinement of datagram implementations
eywalker Jul 22, 2025
8429611
fix: handling of schema when merging tables
eywalker Jul 22, 2025
228f469
refactor: clean up unused imports and move old code into renamed module
eywalker Jul 22, 2025
c871bfb
feat: add lazyloading system
eywalker Jul 22, 2025
a416c20
refactor: refine kernel and pod setup
eywalker Jul 22, 2025
af75ab7
refactor: refine tracker system
eywalker Jul 22, 2025
2854068
feat: add wrapped stream
eywalker Jul 22, 2025
15bfc4c
refactor: use hasher id consistently
eywalker Jul 22, 2025
534e810
refactor: remove fixed stream from kernel and clean up cached pod
eywalker Jul 23, 2025
c443a32
refactor: consistent copy logic and ability to specify meta info in c…
eywalker Jul 23, 2025
08fa0ef
feat: clean implementation of pipeline nodes
eywalker Jul 23, 2025
38b155b
refactor: rename pre-kernel step to be more explicit
eywalker Jul 24, 2025
3351cf9
refactor: extract node base class
eywalker Jul 24, 2025
7ff5a51
refactor: import cleanup and additional todos
eywalker Jul 24, 2025
971aed0
feat: add ability to change source info
eywalker Jul 25, 2025
532da7d
feat: add saving table with its own id column
eywalker Jul 25, 2025
1af2cf4
feat: add refined kernel id logic
eywalker Jul 25, 2025
61e170b
doc: reorganize tutorials
eywalker Jul 5, 2025
5958594
feat: cleaned up delta store
eywalker Jul 5, 2025
663082b
feat: add protocols
eywalker Jul 10, 2025
37ed8e8
refactor: use protocols in hashing package
eywalker Jul 10, 2025
b04fb61
refactor: temporarily stop top level import while refactoring
eywalker Jul 10, 2025
f47aa02
refactor: remove protocol-relevant definitions
eywalker Jul 10, 2025
1b22ee6
refactor: add concrete component implementation in data package
eywalker Jul 10, 2025
e58de64
refactor: cleanup protocols
eywalker Jul 11, 2025
d2aacad
refactor: further refinement of tracker protocols
eywalker Jul 11, 2025
e3b3d92
feat: refine kernel and pod interaction with tracker
eywalker Jul 11, 2025
2c9c33b
feat: implement pure immutable datagram
eywalker Jul 12, 2025
83e1ab8
fix: preparation of output stream in pod
eywalker Jul 12, 2025
8dc0353
feat: add feature to include content hash in arrow table
eywalker Jul 12, 2025
47726a8
doc: add comprehensive documentation to datagrams
eywalker Jul 12, 2025
251a685
refactor: remove unused datagram base
eywalker Jul 14, 2025
77b6f21
refactor: combine pre-foward step into one for simplicity
eywalker Jul 14, 2025
57d59d7
refactor: adopt the new method signature for pre-forward step
eywalker Jul 14, 2025
eb08084
feat: add non-zero input operator
eywalker Jul 14, 2025
65289bb
wip: major refactoring of package structure
eywalker Jul 18, 2025
d6de91e
feat: implement data context to capture shared hashing and semantic c…
eywalker Jul 19, 2025
1788c05
refactor: clean up protocol around types
eywalker Jul 19, 2025
6f8f996
wip: further refinement of datagram implementations
eywalker Jul 22, 2025
ebe401a
fix: handling of schema when merging tables
eywalker Jul 22, 2025
2b9e15f
refactor: clean up unused imports and move old code into renamed module
eywalker Jul 22, 2025
6f8b690
feat: add lazyloading system
eywalker Jul 22, 2025
239a458
refactor: refine kernel and pod setup
eywalker Jul 22, 2025
643de2b
refactor: refine tracker system
eywalker Jul 22, 2025
7f8e3d4
feat: add wrapped stream
eywalker Jul 22, 2025
e0916aa
refactor: use hasher id consistently
eywalker Jul 22, 2025
1d14975
refactor: remove fixed stream from kernel and clean up cached pod
eywalker Jul 23, 2025
dbb70f9
refactor: consistent copy logic and ability to specify meta info in c…
eywalker Jul 23, 2025
390e99d
feat: clean implementation of pipeline nodes
eywalker Jul 23, 2025
2f7946c
refactor: rename pre-kernel step to be more explicit
eywalker Jul 24, 2025
735299d
refactor: extract node base class
eywalker Jul 24, 2025
6780ace
refactor: import cleanup and additional todos
eywalker Jul 24, 2025
a3d3624
feat: add ability to change source info
eywalker Jul 25, 2025
2dffb92
feat: add saving table with its own id column
eywalker Jul 25, 2025
78e1b08
feat: add refined kernel id logic
eywalker Jul 25, 2025
9c9b11f
feat: add get_all_records to KernelNode
eywalker Jul 26, 2025
5a9ff9a
feat: add new delta data store capable of handling mutiple entries as…
eywalker Jul 29, 2025
98a1f16
build: add pyarrow stubs
eywalker Jul 29, 2025
527cdc0
refactor: update store protocol
eywalker Jul 29, 2025
460230e
Merge branch 'dev' of https://github.com/walkerlab/orcapod-python int…
eywalker Jul 29, 2025
ea85bcf
fix: cleanup nodes
eywalker Jul 29, 2025
773d5ca
refactor: update protocol over modification time
eywalker Jul 29, 2025
598a5cb
wip: implementation of manual table source
eywalker Jul 29, 2025
76e4771
feat: working join operator
eywalker Jul 29, 2025
cd9cebb
fix: accidental trigger of computation and add tutorial
eywalker Jul 29, 2025
db9b3fe
refactor: remove unused streams
eywalker Jul 29, 2025
625204f
doc: cleanup tutorial notebook
eywalker Jul 29, 2025
d87ab53
doc: fix typo
eywalker Jul 29, 2025
e13b3f8
fix: signature of dict datagram methods
eywalker Jul 30, 2025
fc6c828
feat: add more operators and improved stream integration
eywalker Jul 30, 2025
b10669d
fix: mismatched protocol signature
eywalker Jul 30, 2025
fce4420
refactor: move kernel id logic to kernel base and add clean source ba…
eywalker Jul 30, 2025
f956fea
feat: complete implementation of manual source and adjustment of sour…
eywalker Jul 30, 2025
de93e31
refactor: introduce source as first class kernel in tracker
eywalker Jul 30, 2025
b1d0631
feat: add refined function info extrators
eywalker Jul 31, 2025
8031bf7
refactor: add new caching and pod streaming implementation
eywalker Jul 31, 2025
3248a94
fix: failure to capture source info column in table
eywalker Jul 31, 2025
84d78ab
refactor: split operators into separate modules
eywalker Jul 31, 2025
e179222
refactor: clean up imports
eywalker Jul 31, 2025
f0f621e
refactor: remove old cache pod implementation
eywalker Jul 31, 2025
ba23828
feat: add generated implementation for arrow to/from python conversion
eywalker Jul 31, 2025
1933744
wip: working implementation of arrow python conversion
eywalker Jul 31, 2025
f87d313
feat: working complete arrow python converters
eywalker Jul 31, 2025
1547295
build: add jsonschema build dependency
eywalker Jul 31, 2025
d08de3b
feat: refined universal converter between arrow and python
eywalker Aug 2, 2025
8b44eb2
feat: add convenience method for converting between pydict and pylist
eywalker Aug 2, 2025
9854d3f
feat: start implementing submodule system
eywalker Aug 2, 2025
f368db4
refactor: update context to load universal type converter
eywalker Aug 2, 2025
66939f8
refactor: major change of data context and semantic type system
eywalker Aug 3, 2025
5220829
fix: add better error message
eywalker Aug 3, 2025
40c890c
fix: error in joining tables with complex data types
eywalker Aug 3, 2025
e047613
refactor: use updated converter system and optimization
eywalker Aug 4, 2025
2914d88
doc: update tutorial notebook with explanation on operator methods an…
eywalker Aug 4, 2025
44ead2b
doc: add as_df to streams
eywalker Aug 4, 2025
fb993b7
feat: add support for system tags and clean up constants
eywalker Aug 4, 2025
cf535f9
fix: missing cache attribute in pod stream
eywalker Aug 5, 2025
7ff856e
feat: add execution engine capability
eywalker Aug 5, 2025
8e4fdd7
refactor: remove unused modules
eywalker Aug 5, 2025
83189f9
fix: add proper handling of complex return type to table conversion
eywalker Aug 5, 2025
695d1d6
feat: support execution engine in pipeline run
eywalker Aug 5, 2025
f8fd07c
fix: proper use of execution engine in pipeline
eywalker Aug 5, 2025
63a217f
fix: improper handling of same named column in input and output packe…
eywalker Aug 6, 2025
dd2d1d5
Fix join to work with complex data types
eywalker Aug 8, 2025
3e1069c
feat: add hashing of complex type with semantic type and use of ref i…
eywalker Aug 10, 2025
3c2db88
refactor: use simpler names for classes
eywalker Aug 10, 2025
e32488e
fix: ignore common data files
eywalker Aug 10, 2025
f2b70fd
fix: include system tags in propagation and hash computation
eywalker Aug 11, 2025
351aa29
feat: implement tag schema-based pipeline record separation
eywalker Aug 11, 2025
e5b2efd
build: add mkdocs and minor cleanups
eywalker Aug 12, 2025
bff569b
feat: keep track of timestamps for data addition
eywalker Aug 12, 2025
43fe9d4
test: add comprehensive tests for datagrams
eywalker Aug 12, 2025
166ef8a
fix: bugs in datagrams alternation methods
eywalker Aug 12, 2025
197f3e1
fix: corner cases of system tag handling
eywalker Aug 12, 2025
144ef6a
test: add system tag persistence tests
eywalker Aug 13, 2025
5a4227e
refactor: clean up the protocol organization and use content hash object
eywalker Aug 19, 2025
282bc3a
refactor: clean up protocols and add batch operator
eywalker Aug 23, 2025
bea864d
feat: add baseclass for context awareness
eywalker Aug 26, 2025
aff0c5f
refactor: turn sources into a subpackage
eywalker Aug 26, 2025
09561f1
refactor: enhance stream protocol with data view methods
eywalker Aug 26, 2025
81226f8
refactor: rename to delta table store
eywalker Aug 26, 2025
29f1c5d
refactor: update to match new protocols
eywalker Aug 26, 2025
5a57ae6
refactor: remove redundant properties
eywalker Aug 26, 2025
719bca4
refactor: remove old core package
eywalker Aug 26, 2025
2561b55
refactor: remove deprecated modules
eywalker Aug 26, 2025
468266d
refactor: use updated types module path
eywalker Aug 26, 2025
8c6721a
refactor: change protocol name and use PythonSchema in place of TypeSpec
eywalker Aug 27, 2025
d4c016c
test: for semantic types
eywalker Aug 27, 2025
e67631a
refactor: remove unused code in semantic types
eywalker Aug 27, 2025
c575c80
type: handle input PythonSchema case
eywalker Aug 27, 2025
4acd157
type: further fix on python schema
eywalker Aug 27, 2025
d5c5340
feat: add support for system tags and source id info
eywalker Aug 27, 2025
42308f9
build: ignore test notebooks
eywalker Aug 27, 2025
e60a6e7
refactor: rename stores to databases
eywalker Aug 27, 2025
b99844d
feat: set default hash length to 20 characters
eywalker Aug 27, 2025
56439cb
refactor: rename kernel_id to reference
eywalker Aug 27, 2025
47bcbc1
type: use mapping for input parameters
eywalker Aug 27, 2025
b54aab4
feat: fix bugs and add proper save path handling
eywalker Aug 30, 2025
0fe06b3
refactor: remove unused module
eywalker Aug 30, 2025
340cfb5
refactor: rename data to core subpackage
eywalker Aug 30, 2025
83bd86a
refactor: update data protocols to core protocols
eywalker Aug 30, 2025
6dff27b
fix: tag handling bugs and refactor to use core protocol
eywalker Aug 31, 2025
0f44d2a
feat: add data frame source, config and fix bugs pertaining to node h…
eywalker Sep 1, 2025
b476274
refactor: repackage core streams into its own subpackage
eywalker Sep 1, 2025
f2472bf
feat: add pipeline dag plotting
eywalker Sep 1, 2025
99763dc
fix: compatibility of delta table database with database protocol
eywalker Sep 1, 2025
ee0b1a1
refactor: clean up type use and add graph support
eywalker Sep 1, 2025
4c397d6
buid: remove pixi and cleanup pyproject.toml
eywalker Sep 1, 2025
a158e17
feat: add more operators
eywalker Sep 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ RUN \
USER vscode
ENV PATH=/home/vscode/.local/bin:$PATH
WORKDIR /home/vscode
COPY --chown=vscode:nogroup src/orcabridge/requirements.txt /tmp/requirements.txt
COPY --chown=vscode:nogroup src/orcapod/requirements.txt /tmp/requirements.txt
RUN \
# python setup
curl -LsSf https://astral.sh/uv/install.sh | sh && \
Expand Down
18 changes: 17 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,22 @@ notebooks/**/*.parquet
notebooks/**/*.pkl
notebooks/**/*.db

# Ignore npy and npz data files by default
*.np[yz]

# Ignore any notebook that starts with an underscore
# Ignore common data types by default
*.csv
*.parquet
*.xls
*.xlsx
*.txt

# Ignore profiler output
*.prof

# Ignore any notebook that starts with an underscore or test
notebooks/**/_*.ipynb
notebooks/**/test*.ipynb

# Ignore vscode settings
.vscode/
Expand Down Expand Up @@ -198,3 +211,6 @@ cython_debug/
dj_*_conf.json
# directory excluded from source control e.g. trash, scratch work, etc.
.untracked
# pixi environments
.pixi/*
!.pixi/config.toml
6 changes: 3 additions & 3 deletions misc/demo_redis_mocking.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,10 @@ def demonstrate_redis_mocking():

# Patch the Redis availability and exceptions
with (
patch("orcabridge.hashing.string_cachers.REDIS_AVAILABLE", True),
patch("orcabridge.hashing.string_cachers.redis.RedisError", MockRedisError),
patch("orcapod.hashing.string_cachers.REDIS_AVAILABLE", True),
patch("orcapod.hashing.string_cachers.redis.RedisError", MockRedisError),
patch(
"orcabridge.hashing.string_cachers.redis.ConnectionError",
"orcapod.hashing.string_cachers.redis.ConnectionError",
MockConnectionError,
),
):
Expand Down
Loading
Loading