Feature/karen bacterial wrapper #183

joshfactorial · 2025-10-16T21:13:50Z

No description provided.

…T into feature/karen_bacterial_wrapper

joshfactorial · 2025-10-16T21:18:45Z

BacterialWrapperScript/BacterialWrapperScript.py

+    run_neat(new_config_file)
+
+
+


These are great to have, but let's wrap them in a unit test. Most of our tests live in "NEAT/tests", but it's easier just to add them to your script. That way Github will run the tests automatically for us. Look up Python's unit-test package (https://realpython.com/python-unittest/, is one description) and let me know if you have any questions.

joshfactorial · 2025-10-16T21:22:24Z

BacterialWrapperScript/BacterialWrapperScript.py

+
+    f.close()
+
+    full_orig_seq = orig_seq.replace("\n", "")


This works, but requires looping over the entire sequence again ("replace" has to check every character for '\n') which could become an issue with larger datasets or someone running multiple bacteria at once. What you can do instead is when your read in the line on line 73, use python's "strip" (line.strip()) method, which automatically removes line breaks and white space from the end of the string.

joshfactorial · 2025-10-16T21:35:52Z

BacterialWrapperScript/BacterialWrapperScript.py

+# Runs the NEAT read simulator using the given config file
+
+def run_neat(config_file):
+    subprocess.run(["neat", "read-simulator", "-c", config_file, "-o", config_file])


the "-o" variable should be the name of the output directory (small change since last version). Let's try this: take "output_dir" and "prefix". Ultimately, those will just be user inputs, but we can set output dir equal to the parent dir of the config file. One was is to use Pathlib -> str(Path(config_file).parent.absolute()) or even just default to the current working directory (os.getcwd()). But we may want to use a temporary directory as well, for the intermediate outputs.

joshfactorial · 2025-10-16T21:37:57Z

BacterialWrapperScript/BacterialWrapperScript.py

+#                     if not line.startswith(b"#"):
+#                         out_f.write(line)
+
+def stitch_all_outputs(ofw: OutputFileWriter, output_files: list[tuple[int, dict[str, Path]]], 


Okay, I see you were updating from my latest work. I think the original version will work better here, though, because in the main code I decided to do VCFs a slightly different way. Start with the commented out code that includes all 4, and don't worry so much about the ofw code and all that, we won't need it yet.

joshfactorial · 2025-10-16T21:38:54Z

BacterialWrapperScript/BacterialWrapperScript.py

+
+# Stitching all outputs together - Keshav's script
+
+def concat_fq(input_files: List[Path], dest: BgzfWriter) -> None:


This should be correct, because NEAT will output bgzipped fastq files, so we'll need bgzf to read and write them. For fq, this should work, so we can go with this version over the commented out version.

joshfactorial · 2025-10-16T21:40:24Z

BacterialWrapperScript/BacterialWrapperScript.py

+#             with input_file.open("rb") as in_f:
+#                 shutil.copyfileobj(in_f, out_f)
+
+def merge_bam(bams: List[Path], ofw: OutputFileWriter, threads: int) -> None:


I think this pysam method should work for stitching the bams, so we can use this But instead of OutputFileWriter, you can just pass in the file names directly.

joshfactorial · 2025-10-16T21:41:37Z

BacterialWrapperScript/BacterialWrapperScript.py

+    concat_fq(fq2_list, ofw.files_to_write[ofw.fq2])
+    merge_bam(bam_list, ofw, threads)
+
+# def stitch_all_outputs(options: Options, thread_options: list[Options]) -> None:


Options has a lot of things, but the most important for this piece is just the file names. So you can pass the filenames in directly here (maybe have them default to "None" since we want to account for any and all files).

joshfactorial · 2025-10-16T21:42:06Z

BacterialWrapperScript/BacterialWrapperScript.py

+#             vcf_list.append(local_ops.vcf)
+#         if local_ops.bam:
+#             bam_list.append(local_ops.bam)
+


Probably won't need the loop, since we know ahead of time this will only be two files (regular and wrapped)

joshfactorial · 2025-10-16T21:43:30Z

BacterialWrapperScript/BacterialWrapperScript.py

+# options = Options(reference)
+# bam_header = {}
+
+# ofw = OutputFileWriter(options, bam_header)


You can do it this way, if you get all the parts to work, or just directly import Bio.bgzf and open the files with bgzfReader and bgzfWriter. Whichever way is easiest for now is perfect.

karenhx2 added 2 commits October 4, 2025 00:38

Developed script for wrapping bacterial chromosomes

8aa2032

Cleaned up files

fbca683

joshfactorial self-assigned this Oct 16, 2025

joshfactorial and others added 4 commits October 16, 2025 16:14

Merge branch 'main' into feature/karen_bacterial_wrapper

1440b21

Worked on testing functions for stitching outputs together

edc67b8

Merge branch 'feature/karen_bacterial_wrapper' of github.com:ncsa/NEA…

ba60be7

…T into feature/karen_bacterial_wrapper

Removed reference and data files

541b3b1

karenhx2 approved these changes Oct 17, 2025

View reviewed changes

joshfactorial commented Oct 20, 2025

View reviewed changes

karenhx2 added 2 commits November 12, 2025 16:17

Updated script to include paired-ended runs

85a9c51

Implemented CLI for wrapper

acef491

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/karen bacterial wrapper #183

Feature/karen bacterial wrapper #183

Uh oh!

joshfactorial commented Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

joshfactorial Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		# Stitching all outputs together - Keshav's script

		def concat_fq(input_files: List[Path], dest: BgzfWriter) -> None:

Feature/karen bacterial wrapper #183

Are you sure you want to change the base?

Feature/karen bacterial wrapper #183

Uh oh!

Conversation

joshfactorial commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants