Skip to content

Update seqspec check so we can run it directly in python script#58

Merged
sbooeshaghi merged 2 commits intopachterlab:develfrom
IGVF-DACC:CHECK-134-check
Jan 15, 2025
Merged

Update seqspec check so we can run it directly in python script#58
sbooeshaghi merged 2 commits intopachterlab:develfrom
IGVF-DACC:CHECK-134-check

Conversation

@mingjiecn
Copy link
Contributor

Previously, we need those two parameters beside seqspec file path to run check function: schema: Draft4Validator, spec: Assay. I updated the function so those two parameters are no longer needed. Only seqspec file path is needed to run check function in python script.

@sbooeshaghi
Copy link
Collaborator

Thank you for the PR! To be consistent with other functionality, can you modify the code slightly so that the already loaded spec is passed to the check function? So that way run_check takes in the spec_fn, loads the spec, and then passes it to the check function (like how index works

def run_index(

)

@mingjiecn
Copy link
Contributor Author

mingjiecn commented Jan 15, 2025

If we do so, how to run check in python script when only seqspec file name provided? The purpose of this PR is to remove the schema and the spec parameters out of function check so we can run check directly with only spec file name parameter since seqspec file is the only info we have before, for example: errors = check(seqpec_file_path). Let me know if you can enable check using python script without running it in terminal in some other ways. Thanks!

@sbooeshaghi
Copy link
Collaborator

I agree that removing the schema (making it a param that doesn't need to be passed) is a good idea. The way that you can run check in a python script with only the seqspec file name provided is the following

from seqspec.utils import load_spec
from seqspec.seqspec_check import check

spec = load_spec(spec_fn)
errors = check(spec)

This requires making the changes that I suggested (which has the benefit of making the code consistent across seqspec functionality).

@mingjiecn
Copy link
Contributor Author

Got it. Will do! Thanks!

@mingjiecn
Copy link
Contributor Author

OK, I put spec back in function check parameters. I can't remove spec_fn since it is used inside check function. Let me know if this is OK. Thank you! @sbooeshaghi

@sbooeshaghi
Copy link
Collaborator

Looks great! Thank you. Will merge.

@sbooeshaghi sbooeshaghi merged commit 8e9554f into pachterlab:devel Jan 15, 2025
@mingjiecn mingjiecn deleted the CHECK-134-check branch March 20, 2025 13:48
sbooeshaghi added a commit that referenced this pull request Aug 24, 2025
…ase 0.4.0 (#73)

See docs/CHANGELOG.md for more details.

* update schema (#52)

* update file_exsits function to check file url in igvf portal (#53)

* adding seqspec spec tokenization

* allow https for remote onlist (#54)

* added regions files for some popular commercial methods

* corrected two file names

* Update seqspec check so we can run it directly in python script (#58)

* update seqspec check

* add spec parameter back to check function

* added python usage to docs

* support gzipped yaml file for function load_spec (#60)

* support gzipped yaml file for function load_spec

* fix bug in function run_check

* support gzipped yaml file for function load_spec

* added region files for additional methods

* enabled skipping checks with seqspec check

* updating seqspec-html to print read info

* CHECK-161-onlist (#63)

* ignore onlist in seqspec check when needed

* fix bug in seqspec check

* code review

* Devel fix tests (#65)

* Update tox to use newer python interpreter versions

* validate_check_args now returns a list

* Deal with ascii and png display function name changes

* protocols and kits are a controlled vocabulary now.

* Clear out the environment variables before running the tests

In case they happen to be set

* Rename files from .txt to .tsv ot better match DACC conventions

* Set a specific seqpec version as the structure keeps changing

* Structure now needs an files attribute

* Update test for remote access

needs to change more text in the example, and suppress the new call to
the network

* Update for more detailed onlist structure

many calls to create Onlists needed more attributes

* update onlist test to use preferred -i argument

-r was deprecated

* Reduce code repetition by using to_dict in __repr__

The return the dictionaries in the same order, might as well have
fewer places to update

* Be robust to missing values for the File and Read objects

All of the attributes for the File object are retrieved with getattr,
and the Read.files attribute introduced with 0.3 is protected with
getattr.

You do need to provide a default value with getattr if you want to
avoid an attribute error

* Add files object to example seqspec

* Make plot_png work better if there's only one modality (#66)

With constrained_layout and a single modality the height of the bar
graph showing the regions collapsed into almost a line.

This plots it without constrained_layout, and adjusts the title offet
as needed.

This fixes #44

* made internal api more consistent, added -t kb-single to seqspec index to force single end reads for read with max size

* continued making internal api more consistent

* upgraded build system to pyproject.toml, removed requirements.txt and dev-requirements.txt, simplified release process in the Makefile, simplified version tracking with pyproject.toml through setuptools_scm

* cleaned up pyproject.toml

* updated to pyproject packaging, removed setup.py/cfg, requirements.txt and MANIFEST.in

* fixing python version and removing mcp requirement

* fixing pip install with pyproject.toml

* changing Assay/Region/File/Read/etc classes to be derived from pydantic Base Class. this removes the need to specify yaml tags. These now get stripped. Changed formatter from black and flake8 to ruff.

* fixed seqspec modify when sequence is empty, removed parent_id implicit in spec, it wasn't being used anywhere, removed seqspec convert from the cli (currently not implemented)

* added bead_TSO to validator

* verified check works on 10x_rna_5prime.spec.yaml

* - fixed bug in read get file by id
- updated seqspec index to initialize pydantic models with named args
- removed tox, changed to pytest as a test manager
- cleaned up internal api for seqspec onlist (todo, add subcommands list, download, join)

* added 'loose' loading of a spec file followed by conversion to a validated version (so subsequent loads of the file work). TODO consider using loose loading only for format and check commands, and strict loading for every other command.

* set loose loading only for seqspec check and seqspec format. capture loading validation errors and print to stdout when trying to load with strict mode

* complete test rewrite, currenty passing

* made region.regions no longer optional, defaults to an empty list, updated associated functions accordingly, addeed extensive tests

* cleaned up internal api, added some error handling in assay, can now map primer id from a read to any level in the library spec. no changes made to seqspec index, but greatly expands style of specs that are compatible.

* cleaning up pyproject.toml

* updated repr for Assay/Read/Region, cleaned up print code internal to use updated functions, updated seqspec index to fix file name useage when -s file is specified, updated some tests that were previously incorrect in seqspec index which used read ids instead of region ids. seqspec index was changed to fix the behavior when asking for region indices

* removing comments from seqspec check

* added repr for file object, depracated -r argument in seqspec index/onlist/find, updated seqspec index to use consistent internal types, expanded tests for index

* relaxed check_primer_ids_in_libspec_leaves in seqspec check since updates to index no longer require primer_id to be in leaves

* added doc regions

* updated region and assay doc examples

* added --no-overlap to seqspec index so the set of region ids contained within each read are fully unique

* add region_type: sgrna_target (#72)

* added sgrna_target as region_type

* fixed format_kallisto_bus_force_single

* updating gitignore

* added seqspec build, change internal api of seqspec format, removed spec_fn from seqspec check, made seqspec insert and seqspec modify consistent (taking in list of *Input objects), annotated *Input objects for llm usage

* added check_region_against_subregion_length and check_region_against_subregion_sequence which checks that the min/max length of a region are equal to the sum of the sub regions and that the sequence of the region is equal to the concatenation of the subregions. suggestions by Zhewei Shen and Ian Whaling based on a spec submitted by Alex Barrera to IGVF portal (https://data.igvf.org/configuration-files/IGVFFI9197UDXC/)

* made check_sequence_types checks more robust, fixed spec loading to not overwrite sequences if present in region

* added check_read_length_against_library to check that the read lengths don't exceed sequenceable range given by the library elements after or before the primer id (based on the strandedness of the read)

* updated documentation for consistency with updated api

* updated list of checks in the documentation

* seqspec plot png now layers on the sequencing reads onto the library spec

* removed unecessary ghost primer in the spec

* attempt to fix bug #68 for seqspec onlist, fixing broken tests

* updated change long, preparation for release

* added a dev guide (in progress)

* updated seqspec build cli help text

* updated dev docs and dev flow, prepping for release

---------

Co-authored-by: Mingjie Li <44071821+mingjiecn@users.noreply.github.com>
Co-authored-by: dbrg77@gmail.com <nixche@outlook.com>
Co-authored-by: Diane Trout <diane@caltech.edu>
Co-authored-by: Ian Whaling <78115078+ian-whaling@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants