-
Notifications
You must be signed in to change notification settings - Fork 0
New Features #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Features #50
Conversation
This reverts commit 9ee4823.
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (3)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughVersion bump to 0.9.10, Python env updated to 3.11, polars added. Large migration from pandas to Polars across parsing/rendering/storage, new Polars-aware FileManager, workflow per-file error aggregation and restructured tool invocations, Streamlit UI log/status refactor, and extensive example-data feature/spectrum updates and config removals. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant UI as Streamlit UI
participant WM as WorkflowManager
participant W as Workflow
participant FM as FileManager
participant FD as FLASHDeconv
participant FT as FLASHTnT
participant P as Parsers
User->>UI: start workflow
UI->>WM: launch workflow_process()
WM->>W: run(workspace, inputs, params)
loop each input file (try/except)
W->>FD: run(in, out, out_spec*, out_mzml, out_feature*, threads)
W->>FM: store_data(outputs + FD_parameters.json)
alt tag workflow
W->>FT: run(in_deconv, fasta/decoy, out_tag/out_pro/out_prsm)
W->>FM: store_data(FLASHTnT outputs + params)
end
W->>P: parseDeconv (Polars)
W->>P: parseTnT (progress logs)
W-->>W: cleanup temp folder
end
alt any errors
W-->>WM: raise ExceptionGroup(errors)
WM->>UI: show error + traceback
else success
WM->>UI: write "WORKFLOW FINISHED" to all.log
UI->>UI: tail all.log, extract % and update progress/status
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/parse/tnt.py (1)
178-195: KDE robustness: handle NaNs and small sample sizes; guard string ops on accession.Avoids crashes when there are NaNs, a single point, or zero variance.
Apply:
def fdr_density_distribution(df, logger=None): df = df[df['ProteoformLevelQvalue'] > 0] - # Find density targets - target_qscores = df[~df['accession'].str.startswith('DECOY_')]['ProteoformLevelQvalue'].dropna() - if len(target_qscores) > 0: + # Find density targets + acc = df['accession'].fillna('') + target_qscores = df.loc[~acc.str.startswith('DECOY_'), 'ProteoformLevelQvalue'].dropna() + if len(target_qscores) >= 2 and target_qscores.std() > 0: x_target = np.linspace(target_qscores.min(), target_qscores.max(), 200) kde_target = gaussian_kde(target_qscores) density_target = pd.DataFrame({'x': x_target, 'y': kde_target(x_target)}) else: density_target = pd.DataFrame(columns=['x', 'y']) # Find density decoys (if present) - decoy_qscores = df[df['accession'].str.startswith('DECOY_')]['ProteoformLevelQvalue'].dropna() - if len(decoy_qscores) > 0: + decoy_qscores = df.loc[acc.str.startswith('DECOY_'), 'ProteoformLevelQvalue'].dropna() + if len(decoy_qscores) >= 2 and decoy_qscores.std() > 0: x_decoy = np.linspace(decoy_qscores.min(), decoy_qscores.max(), 200) kde_decoy = gaussian_kde(decoy_qscores) density_decoy = pd.DataFrame({'x': x_decoy, 'y': kde_decoy(x_decoy)}) else: density_decoy = pd.DataFrame(columns=['x', 'y'])
🧹 Nitpick comments (15)
example-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.json (1)
1-1: Parameter value includes a literal newline; confirm consumers expect this format.If the reader splits on newline, fine; otherwise prefer a structured value to avoid parsing ambiguity.
Consider normalizing to an array:
-{"SD:tol": "5.0\n5.0"} +{"SD:tol": ["5.0", "5.0"]}src/parse/tnt.py (1)
85-95: Loop variable shadowing; use a distinct name for residue index.Prevents accidental reuse of the outer enumerator and improves readability.
Apply:
- for i, row in protein_df.iterrows(): + for _, row in protein_df.iterrows(): @@ - for i in range(len(sequence)): - coverage[i] = np.sum( + for pos in range(len(sequence)): + coverage[pos] = np.sum( (tag_df['ProteinIndex'] == pid) & - (tag_df['StartPos'] <= i) & - (tag_df['EndPos'] >= i) + (tag_df['StartPos'] <= pos) & + (tag_df['EndPos'] >= pos) )example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature (1)
2-2: Numeric ranges.Fraction_feature_intensity values reach the 1e9 range. Verify all in‑memory and parquet schemas use 64‑bit types to avoid overflow when aggregating or downsampling.
Also applies to: 14-14, 21-21, 26-33
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign (1)
14-31: msalign size: consider compressing or trimming example blocks.The file is very large. If CI or app startup times suffer, consider gzipping the example and loading on demand, or keeping a smaller slice for tests plus a link to the full dataset.
Also applies to: 57-106, 108-151, 316-370
src/workflow/WorkflowManager.py (2)
59-62: Don’t dump full tracebacks into minimal logs.Logger.log(level) writes to multiple files; with default level=0, the full traceback goes into minimal.log and others. Log the message at minimal level and the traceback at verbose level to keep minimal.log concise.
- except Exception as e: - self.logger.log(f"ERROR: {e}") - self.logger.log("".join(traceback.format_exception(e))) + except Exception as e: + # short message for minimal/commands logs + self.logger.log(f"ERROR: {e}", level=0) + # full traceback only in all.log + self.logger.log("".join(traceback.format_exception(e)), level=2)
51-62: Version‑robust traceback formatting.traceback.format_exception(e) is only available on 3.11+. If any runtime uses 3.10, prefer TracebackException to be safe.
- self.logger.log("".join(traceback.format_exception(e)), level=2) + tb = "".join(traceback.TracebackException.from_exception(e).format()) + self.logger.log(tb, level=2)src/render/initialize.py (1)
17-31: DRY the heatmap loading/compression boilerplate.The four branches are identical except for key prefixes. Factor into a small helper to reduce maintenance and accidental API drift.
+def _load_heatmap_with_levels(file_manager, dataset_id, base_key: str): + lf = file_manager.get_results(dataset_id, [base_key], use_polars=True)[base_key] + total = lf.select(pl.len()).collect(streaming=True).to_series(0).item() + levels = [] + for size in compute_compression_levels(20000, total): + levels.append(file_manager.get_results(dataset_id, [f"{base_key}_{size}"], use_polars=True)[f"{base_key}_{size}"]) + levels.append(lf) + return levels @@ - data_full = file_manager.get_results( - selected_data, ['ms1_deconv_heatmap'], use_polars=True - )['ms1_deconv_heatmap'] - cached_compression_levels = [] - for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): - cached_compression_levels.append( - file_manager.get_results( - selected_data, [f'ms1_deconv_heatmap_{size}'], use_polars=True - )[f'ms1_deconv_heatmap_{size}'] - ) - cached_compression_levels.append(data_full) + cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms1_deconv_heatmap") @@ - data_full = file_manager.get_results( - selected_data, ['ms2_deconv_heatmap'], use_polars=True - )['ms2_deconv_heatmap'] - cached_compression_levels = [] - for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): - cached_compression_levels.append( - file_manager.get_results( - selected_data, [f'ms2_deconv_heatmap_{size}'], use_polars=True - )[f'ms2_deconv_heatmap_{size}'] - ) - cached_compression_levels.append(data_full) + cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms2_deconv_heatmap") @@ - data_full = file_manager.get_results( - selected_data, ['ms1_raw_heatmap'], use_polars=True - )['ms1_raw_heatmap'] - cached_compression_levels = [] - for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): - cached_compression_levels.append( - file_manager.get_results( - selected_data, [f'ms1_raw_heatmap_{size}'], use_polars=True - )[f'ms1_raw_heatmap_{size}'] - ) - cached_compression_levels.append(data_full) + cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms1_raw_heatmap") @@ - data_full = file_manager.get_results( - selected_data, ['ms2_raw_heatmap'], use_polars=True - )['ms2_raw_heatmap'] - cached_compression_levels = [] - for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): - cached_compression_levels.append( - file_manager.get_results( - selected_data, [f'ms2_raw_heatmap_{size}'], use_polars=True - )[f'ms2_raw_heatmap_{size}'] - ) - cached_compression_levels.append(data_full) + cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms2_raw_heatmap")Also applies to: 39-53, 62-76, 85-99
src/parse/masstable.py (1)
207-209: Consider using native Polars array conversion for better performance.While
map_elementsworks, for numpy array conversion you might get better performance with native Polars operations if the arrays are consistently structured.- pl.col("mzarray").map_elements(lambda x: x.tolist() if hasattr(x, 'tolist') else list(x), return_dtype=pl.List(pl.Float32)), - pl.col("intarray").map_elements(lambda x: x.tolist() if hasattr(x, 'tolist') else list(x), return_dtype=pl.List(pl.Float32)) + pl.col("mzarray").list.eval(pl.element()).cast(pl.List(pl.Float32)), + pl.col("intarray").list.eval(pl.element()).cast(pl.List(pl.Float32))However, if the current implementation handles edge cases better (e.g., mixed types), keep it as is.
src/render/compression.py (1)
19-19: Remove unused logger parameter or implement logging.The
loggerparameter is not used in the function. Either remove it or add logging statements for debugging/monitoring purposes.Would you like me to suggest where logging statements would be most valuable in this function, such as before/after the binning operation or when the iterative selection completes?
src/workflow/StreamlitUI.py (1)
1081-1081: Fix membership test syntax.Use the more Pythonic
not insyntax instead ofnot ... in.- if not "WORKFLOW FINISHED" in content: + if "WORKFLOW FINISHED" not in content:src/parse/deconv.py (1)
51-51: Prefer streaming collect for counts (avoid materializing LazyFrame)src/parse/deconv.py:51 — Replace
relevant_heatmap_lazy.select(pl.len()).collect().item()
with the project's streaming-aggregate pattern (e.g. .select(pl.len()).collect(streaming=True).item() or .select(pl.count()).collect(streaming=True).item() / .collect(engine="streaming")) to avoid materializing the full LazyFrame and match render/initialize.py and render/update.py.src/workflow/FileManager.py (1)
468-470: Consider returning collected DataFrames for consistencyWhen
use_polars=True, the function returns aLazyFramefrompl.scan_parquet, while other paths return materialized DataFrames. This inconsistency could confuse callers who might expect immediate data access.Consider collecting the LazyFrame before returning, or document this behavior clearly in the docstring:
elif use_polars: # Load as polars DataFrame - data = pl.scan_parquet(file_path) + data = pl.scan_parquet(file_path).collect()Alternatively, if lazy evaluation is intentional for performance, update the docstring to clarify that
use_polars=Truereturns aLazyFrame.src/Workflow.py (2)
116-132: Remove unnecessary f-string prefixesThese strings don't contain any placeholders, so the
fprefix is unnecessary.-out_tsv = join(folder_path, f'out.tsv') -out_deconv = join(folder_path, f'out_deconv.mzML') -out_anno = join(folder_path, f'anno_annotated.mzML') -out_spec1 = join(folder_path, f'spec1.tsv') -out_spec2 = join(folder_path, f'spec2.tsv') -out_spec3 = join(folder_path, f'spec3.tsv') -out_spec4 = join(folder_path, f'spec4.tsv') -out_quant = join(folder_path, f'quant.tsv') -out_msalign1 = join(folder_path, f'toppic_ms1.msalign') -out_msalign2 = join(folder_path, f'toppic_ms2.msalign') -out_feature1 = join(folder_path, f'toppic_ms1.feature') -out_feature2 = join(folder_path, f'toppic_ms2.feature') -out_prsm = join(folder_path, f'prsms.tsv') -out_db = join(folder_path, f'database.fasta') -out_tag = join(folder_path, f'tags.tsv') -out_protein = join(folder_path, f'protein.tsv') +out_tsv = join(folder_path, 'out.tsv') +out_deconv = join(folder_path, 'out_deconv.mzML') +out_anno = join(folder_path, 'anno_annotated.mzML') +out_spec1 = join(folder_path, 'spec1.tsv') +out_spec2 = join(folder_path, 'spec2.tsv') +out_spec3 = join(folder_path, 'spec3.tsv') +out_spec4 = join(folder_path, 'spec4.tsv') +out_quant = join(folder_path, 'quant.tsv') +out_msalign1 = join(folder_path, 'toppic_ms1.msalign') +out_msalign2 = join(folder_path, 'toppic_ms2.msalign') +out_feature1 = join(folder_path, 'toppic_ms1.feature') +out_feature2 = join(folder_path, 'toppic_ms2.feature') +out_prsm = join(folder_path, 'prsms.tsv') +out_db = join(folder_path, 'database.fasta') +out_tag = join(folder_path, 'tags.tsv') +out_protein = join(folder_path, 'protein.tsv')
336-347: Remove unnecessary f-string prefixes (DeconvWorkflow)Same issue as in TagWorkflow - these strings don't need the
fprefix.-out_tsv = join(folder_path, f'out.tsv') -out_deconv = join(folder_path, f'out_deconv.mzML') -out_anno = join(folder_path, f'anno_annotated.mzML') -out_spec1 = join(folder_path, f'spec1.tsv') -out_spec2 = join(folder_path, f'spec2.tsv') -out_spec3 = join(folder_path, f'spec3.tsv') -out_spec4 = join(folder_path, f'spec4.tsv') -out_quant = join(folder_path, f'quant.tsv') -out_msalign1 = join(folder_path, f'toppic_ms1.msalign') -out_msalign2 = join(folder_path, f'toppic_ms2.msalign') -out_feature1 = join(folder_path, f'toppic_ms1.feature') -out_feature2 = join(folder_path, f'toppic_ms2.feature') +out_tsv = join(folder_path, 'out.tsv') +out_deconv = join(folder_path, 'out_deconv.mzML') +out_anno = join(folder_path, 'anno_annotated.mzML') +out_spec1 = join(folder_path, 'spec1.tsv') +out_spec2 = join(folder_path, 'spec2.tsv') +out_spec3 = join(folder_path, 'spec3.tsv') +out_spec4 = join(folder_path, 'spec4.tsv') +out_quant = join(folder_path, 'quant.tsv') +out_msalign1 = join(folder_path, 'toppic_ms1.msalign') +out_msalign2 = join(folder_path, 'toppic_ms2.msalign') +out_feature1 = join(folder_path, 'toppic_ms1.feature') +out_feature2 = join(folder_path, 'toppic_ms2.feature')src/render/update.py (1)
133-160: Consider extracting default selection valuesThe default selection values are duplicated. Consider extracting them to reduce duplication.
+DEFAULT_HEATMAP_SELECTION = { + 'xRange': [-1, -1], + 'yRange': [-1, -1] +} + elif (component in ['Deconvolved MS1 Heatmap', 'Deconvolved MS2 Heatmap']): selection = 'heatmap_deconv' if '1' in component else 'heatmap_deconv2' if selection not in selection_store: - selected_data = { - 'xRange' : [-1, -1], - 'yRange' : [-1, -1] - } + selected_data = DEFAULT_HEATMAP_SELECTION else: selected_data = selection_store[selection] data['deconv_heatmap_df'] = render_heatmap( additional_data['deconv_heatmap_df'], selected_data, additional_data['dataset'], component ) elif (component in ['Raw MS1 Heatmap', 'Raw MS2 Heatmap']): selection = 'heatmap_raw' if '1' in component else 'heatmap_raw2' if selection not in selection_store: - selected_data = { - 'xRange' : [-1, -1], - 'yRange' : [-1, -1] - } + selected_data = DEFAULT_HEATMAP_SELECTION else: selected_data = selection_store[selection]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
example-data/workspaces/default/flashdeconv/cache/cache.dbis excluded by!**/*.dbexample-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.pkl.gzis excluded by!**/*.gzexample-data/workspaces/default/flashdeconv/cache/files/example_fd/out.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd/quant.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd/spec1.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd/spec2.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.pkl.gzis excluded by!**/*.gzexample-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/out.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/quant.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/spec1.tsvis excluded by!**/*.tsvexample-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/spec2.tsvis excluded by!**/*.tsv
📒 Files selected for processing (25)
.github/workflows/build-windows-executable-app.yaml(1 hunks)Dockerfile(1 hunks)example-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.json(1 hunks)example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature(1 hunks)example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature(4 hunks)example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign(105 hunks)example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.json(0 hunks)example-data/workspaces/default/flashdeconv/ini/FLASHDeconv.ini(0 hunks)example-data/workspaces/default/flashdeconv/params.json(0 hunks)example-data/workspaces/default/flashtnt/ini/FLASHDeconv.ini(0 hunks)example-data/workspaces/default/flashtnt/ini/FLASHTnT.ini(0 hunks)example-data/workspaces/default/flashtnt/params.json(0 hunks)requirements.txt(1 hunks)settings.json(1 hunks)src/Workflow.py(2 hunks)src/parse/deconv.py(3 hunks)src/parse/masstable.py(2 hunks)src/parse/tnt.py(5 hunks)src/render/compression.py(3 hunks)src/render/initialize.py(5 hunks)src/render/render.py(1 hunks)src/render/update.py(6 hunks)src/workflow/FileManager.py(7 hunks)src/workflow/StreamlitUI.py(2 hunks)src/workflow/WorkflowManager.py(2 hunks)
💤 Files with no reviewable changes (6)
- example-data/workspaces/default/flashdeconv/params.json
- example-data/workspaces/default/flashtnt/ini/FLASHTnT.ini
- example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.json
- example-data/workspaces/default/flashtnt/params.json
- example-data/workspaces/default/flashdeconv/ini/FLASHDeconv.ini
- example-data/workspaces/default/flashtnt/ini/FLASHDeconv.ini
🧰 Additional context used
🧬 Code graph analysis (8)
src/parse/deconv.py (4)
src/workflow/Logger.py (1)
log(16-42)src/parse/masstable.py (2)
parseFLASHDeconvOutput(9-182)getMSSignalDF(197-243)src/workflow/FileManager.py (2)
store_data(327-345)get_results(428-478)src/render/compression.py (2)
compute_compression_levels(8-16)downsample_heatmap(19-104)
src/parse/tnt.py (4)
src/workflow/Logger.py (1)
log(16-42)src/parse/masstable.py (2)
parseFLASHDeconvOutput(9-182)parseFLASHTaggerOutput(185-187)src/parse/deconv.py (1)
fdr_density_distribution(218-235)src/workflow/FileManager.py (1)
store_data(327-345)
src/workflow/WorkflowManager.py (1)
src/workflow/Logger.py (1)
log(16-42)
src/workflow/StreamlitUI.py (1)
src/workflow/CommandExecutor.py (1)
stop(254-268)
src/render/initialize.py (2)
src/render/compression.py (1)
compute_compression_levels(8-16)src/workflow/FileManager.py (1)
get_results(428-478)
src/render/update.py (1)
src/render/compression.py (1)
downsample_heatmap(19-104)
src/Workflow.py (6)
src/workflow/Logger.py (1)
log(16-42)src/workflow/ParameterManager.py (1)
get_parameters_from_json(82-101)src/workflow/CommandExecutor.py (1)
run_topp(160-252)src/workflow/FileManager.py (4)
store_file(347-386)store_data(327-345)get_results(428-478)result_exists(480-498)src/parse/deconv.py (1)
parseDeconv(9-215)src/parse/tnt.py (1)
parseTnT(16-174)
src/render/render.py (1)
src/render/update.py (2)
update_data(85-99)filter_data(102-184)
🪛 Ruff (0.12.2)
src/workflow/StreamlitUI.py
1081-1081: Test for membership should be not in
Convert to not in
(E713)
src/render/compression.py
19-19: Unused function argument: logger
(ARG001)
34-34: Avoid specifying long messages outside the exception class
(TRY003)
src/render/update.py
32-32: Unused lambda argument: x
(ARG005)
85-85: Unused function argument: additional_data
(ARG001)
src/Workflow.py
116-116: f-string without any placeholders
Remove extraneous f prefix
(F541)
117-117: f-string without any placeholders
Remove extraneous f prefix
(F541)
118-118: f-string without any placeholders
Remove extraneous f prefix
(F541)
119-119: f-string without any placeholders
Remove extraneous f prefix
(F541)
120-120: f-string without any placeholders
Remove extraneous f prefix
(F541)
121-121: f-string without any placeholders
Remove extraneous f prefix
(F541)
122-122: f-string without any placeholders
Remove extraneous f prefix
(F541)
123-123: f-string without any placeholders
Remove extraneous f prefix
(F541)
124-124: f-string without any placeholders
Remove extraneous f prefix
(F541)
125-125: f-string without any placeholders
Remove extraneous f prefix
(F541)
126-126: f-string without any placeholders
Remove extraneous f prefix
(F541)
127-127: f-string without any placeholders
Remove extraneous f prefix
(F541)
129-129: f-string without any placeholders
Remove extraneous f prefix
(F541)
130-130: f-string without any placeholders
Remove extraneous f prefix
(F541)
131-131: f-string without any placeholders
Remove extraneous f prefix
(F541)
132-132: f-string without any placeholders
Remove extraneous f prefix
(F541)
162-162: f-string without any placeholders
Remove extraneous f prefix
(F541)
187-187: f-string without any placeholders
Remove extraneous f prefix
(F541)
204-204: f-string without any placeholders
Remove extraneous f prefix
(F541)
266-266: Do not catch blind exception: Exception
(BLE001)
269-269: Avoid specifying long messages outside the exception class
(TRY003)
269-269: Undefined name ExceptionGroup. Consider specifying requires-python = ">= 3.11" or tool.ruff.target-version = "py311" in your pyproject.toml file.
(F821)
336-336: f-string without any placeholders
Remove extraneous f prefix
(F541)
337-337: f-string without any placeholders
Remove extraneous f prefix
(F541)
338-338: f-string without any placeholders
Remove extraneous f prefix
(F541)
339-339: f-string without any placeholders
Remove extraneous f prefix
(F541)
340-340: f-string without any placeholders
Remove extraneous f prefix
(F541)
341-341: f-string without any placeholders
Remove extraneous f prefix
(F541)
342-342: f-string without any placeholders
Remove extraneous f prefix
(F541)
343-343: f-string without any placeholders
Remove extraneous f prefix
(F541)
344-344: f-string without any placeholders
Remove extraneous f prefix
(F541)
345-345: f-string without any placeholders
Remove extraneous f prefix
(F541)
346-346: f-string without any placeholders
Remove extraneous f prefix
(F541)
347-347: f-string without any placeholders
Remove extraneous f prefix
(F541)
349-349: f-string without any placeholders
Remove extraneous f prefix
(F541)
374-374: f-string without any placeholders
Remove extraneous f prefix
(F541)
417-417: Do not catch blind exception: Exception
(BLE001)
420-420: Avoid specifying long messages outside the exception class
(TRY003)
420-420: Undefined name ExceptionGroup. Consider specifying requires-python = ">= 3.11" or tool.ruff.target-version = "py311" in your pyproject.toml file.
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-openms
- GitHub Check: build-vue-js-component
🔇 Additional comments (32)
.github/workflows/build-windows-executable-app.yaml (1)
17-17: APP_NAME bumped to 0.9.10 — OK.Verified: the "Set Version in settings.json" step sets settings.json.version from github.event.release.tag_name (.github/workflows/build-windows-executable-app.yaml, lines 228–236). Ensure the GitHub Release tag exactly matches FLASHApp-0.9.10 (including any "v" prefix convention) to avoid UI/version drift between release and PR artifacts.
settings.json (1)
4-4: Approve — version bumped to 0.9.10; no remaining 0.9.9 references found.Ran the rg you provided: settings.json line 4 is "version": "0.9.10". No matches for "0.9.9" or "FLASHApp-0.9.9" were found; remaining "version:" hits are in built/dist JS assets (expected).
src/parse/tnt.py (1)
20-23: Unpack count mismatch from parseFLASHDeconvOutput (throws ValueError).parseFLASHDeconvOutput returns 5 values; the code expects 6.
Apply:
- deconv_df, _, tolerance, _, _, = parseFLASHDeconvOutput( + deconv_df, _, tolerance, _, _ = parseFLASHDeconvOutput( anno_mzML, deconv_mzML )Likely an incorrect or invalid review comment.
Dockerfile (1)
76-76: Python version mismatch — align runtime and validate scientific-stack builds
- Observed: requirements.txt was autogenerated with Python 3.12 (requirements.txt:2); Dockerfile creates env with python=3.11 (Dockerfile:76); workflows pin PYTHON_VERSION 3.11.0 and 3.11.9 (.github/workflows/build-windows-executable-app.yaml:15, .github/workflows/test-win-exe-w-embed-py.yaml:12). polars is present (requirements.txt:141).
- Action: choose a single Python runtime (3.11 or 3.12) and align it across requirements, Dockerfile, and CI — either regenerate requirements.txt under the chosen Python or update Dockerfile/CI to the version used to compile requirements.
- Action: after alignment, run a full build/test to confirm pyOpenMS + Cython + numpy (e.g. 1.26.4) + pandas + polars build successfully on the selected Python and that required binary wheels are available.
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature (1)
102-187: MS1 feature rows updated — verify parser compatibility and refresh cachesHeader detected and tab-delimited; file example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature has 187 lines (1 header + 186 rows). Columns appear to match parser: File_name, Fraction_ID, Feature_ID, Mass, Intensity, Min_time, Max_time, Min_scan, Max_scan, Min_charge, Max_charge, Apex_time, Apex_scan, Apex_intensity, Rep_charge, Rep_average_mz, Envelope_num, EC_score.
- Field-count consistency check aborted in the last run (inline script hit a Python syntax error) — re-run a simple check to ensure every row has the same number of tab-separated fields.
- Confirm Mass/Intensity units expected by downstream parsers (e.g., m/z vs Da; raw vs normalized intensity) and regenerate any cached indices/hashes used by examples/tests.
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature (3)
2-2: Path consistency check.The File_name column uses a relative path with “../workspaces-FLASHApp/...”. Ensure all downstream loaders resolve these paths the same way as msalign FILE_NAME entries to avoid cache misses.
2-2: Column semantics sanity.Double‑check that MS_one_ID/MS_one_scans correspond to the MS1 spectrum referenced in msalign’s MS_ONE_ID/MS_ONE_SCAN for the same SCANS, and that Fraction_feature_min/max/apex_time envelopes still cover the referenced MS2 RTs.
Also applies to: 5-5, 8-12, 17-18, 20-22
4-4: PRECURSOR_FEATURE_ID referential integrity verified — no missing IDs.
Cross-check: Total spectra = 167; Unique PRECURSOR_FEATURE_IDs = 109; Missing = 0.example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign (2)
16-18: TITLE rename and header changes—confirm parsers.TITLE now uses “Scan_” (was “DScan_” per summary). Ensure all parsers/UI components key off SCANS/SPECTRUM_ID rather than string‑matching TITLE, or update any TITLE‑based logic.
Also applies to: 25-31
209-219: Changed MS_ONE_ID/MS_ONE_SCAN—validate MS1 linkage.Here MS_ONE_ID/SCAN point to 3101/3102 unlike adjacent blocks. Confirm this is intentional and that upstream indexing aligns with the MS1 mzML.
src/workflow/WorkflowManager.py (1)
9-9: Good addition: traceback for richer diagnostics.Importing traceback is appropriate for detailed failure reports.
src/render/initialize.py (1)
1-1: Import Polars—LGTM.Switching initialization to Polars aligns with the rest of the PR.
requirements.txt (2)
53-61: Run import compatibility test in CI (numpy/pyarrow/pandas/polars/streamlit)Local verification failed with: ModuleNotFoundError: No module named 'numpy'. Run the test in an environment with project deps installed (e.g., after
pip install -r requirements.txt) and paste the output.File: requirements.txt Lines: 53-61
numpy==1.26.4 # via # contourpy # matplotlib # pandas # pydeck # pyopenms # src (pyproject.toml) # streamlitRun:
#!/usr/bin/env bash pip install -r requirements.txt python - <<'PY' import numpy, pyarrow, pandas, streamlit try: import polars as pl lf = pl.LazyFrame({"x":[1,2,3]}).select(pl.len()).collect(streaming=True) print("Polars OK:", lf.to_series(0).item()) except Exception as e: raise SystemExit(f"Polars check failed: {e}") print("Imports OK") PYAlso applies to: 84-89, 98-106, 114-121, 133-137
141-142: Pin polars to a major range and recompile using the runtime Python
- requirements.txt currently lists
polars>=1.0.0(requirements.txt ≈ line 141). Pin to avoid breaking changes:polars>=1.0,<2.0.- The requirements header says pip-compile was run with Python 3.12, but the environment here is Python 3.11.2 and
pip-compileis not available, so re-runpip-compileinside the actual runtime (Docker/CI image using the runtime Python) and commit the regenerated requirements.txt to avoid solver drift.- polars>=1.0.0 + polars>=1.0,<2.0src/render/render.py (1)
31-32: Consistent API change for filter_data function.The
filter_datafunction signature has also been updated to receive the fullstateobject, maintaining consistency with theupdate_datachange.src/parse/masstable.py (2)
3-3: Good adoption of Polars for data processing efficiency.The migration to Polars is a solid architectural choice for handling large datasets efficiently, particularly with its lazy evaluation capabilities.
198-243: Efficient Polars pipeline implementation with proper lazy evaluation.The implementation correctly leverages Polars' lazy evaluation and efficient operations. The pipeline is well-structured with appropriate transformations. A few observations:
- The use of
map_elementsfor converting numpy arrays is appropriate- Window functions for
mass_idxcalculation is efficient- Proper filtering of null and non-positive intensities
- Returns a LazyFrame for downstream efficiency
src/render/compression.py (4)
19-32: Well-documented function with clear parameter descriptions.The docstring properly documents the expected input types and parameters, which is essential for a function that handles multiple data formats.
33-35: Good validation of bin constraints.The check ensures that the number of bins doesn't exceed the maximum datapoints, preventing unnecessary computation.
53-53: Strategic use of streaming engine for memory efficiency.Using
collect(engine="streaming")is appropriate here as it enables processing of large datasets that might not fit in memory all at once.
76-77: Correct adjustment for scipy's 1-based indexing.The conversion from scipy's 1-based to Polars' 0-based indexing is properly handled with clear documentation.
src/parse/deconv.py (4)
13-15: Good addition of progress logging for user feedback.Adding progress logging at the start of processing improves user experience by providing immediate feedback.
22-29: Smart memory management with immediate conversion to Polars.Deleting the original DataFrames after storing and immediately reloading as Polars LazyFrames is an excellent pattern for reducing memory footprint.
48-48: Appropriate use of streaming collection for large datasets.The pattern of collect with streaming -> lazy is correct for ensuring data materialization while maintaining memory efficiency.
76-91: Clean Polars implementation for scan_table.The use of native Polars operations with proper column selection and aliasing is well-structured. The
with_row_indexandwith_columnspattern is idiomatic Polars.src/workflow/StreamlitUI.py (4)
1017-1018: Good change to use complete log for progress tracking.Using the 'all.log' file ensures that progress information is always available regardless of the user's selected log level.
1039-1054: Robust progress parsing with regex patterns.The regex-based parsing for percentage and progress labels is well-implemented with proper fallback handling.
1026-1028: Improved UX with shorter spinner delay.Reducing the spinner delay from the typical default to 1 second provides faster feedback to users.
1030-1094: Well-structured status management with proper error handling.The new status-based UI with separate handling for running, complete, and error states provides clear visual feedback. The error detection based on log content is a good safety check.
src/Workflow.py (2)
100-269: Good error aggregation pattern!The implementation of per-file error handling with
ExceptionGroupaggregation is excellent. This ensures all files are attempted before reporting failures, improving user experience and debugging.
269-269: Require Python >= 3.11 or avoid ExceptionGroupExceptionGroup is only available on Python 3.11+; repository has no pyproject.toml or setup.py declaring python_requires (requirements.txt header shows pip-compile used Python 3.12 but does not set runtime). Add python_requires = ">=3.11" to your packaging config or replace ExceptionGroup with a compatibility fallback. Location: src/Workflow.py:269
src/render/update.py (1)
41-72: Excellent Polars streaming implementation!The migration to Polars with streaming collection and intelligent downsampling based on data size is well-implemented. The early size estimation using
limit(20000)before deciding whether to downsample is a smart optimization.
| PRECURSOR_CHARGE=7 | ||
| PRECURSOR_MASS=9241.289441 | ||
| PRECURSOR_MASS=9242.293272 | ||
| PRECURSOR_INTENSITY=4475717.00 | ||
| PRECURSOR_FEATURE_ID=101 | ||
| PRECURSOR_QSCORE=0.93 | ||
| PRECURSOR_FEATURE_ID=11 | ||
| PRECURSOR_QSCORE=1.00 | ||
| 1370.253350 3304.76 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Mass/charge consistency spot‑check.
PRECURSOR_MASS should match (mz − 1.007276466812) × z (average vs mono as defined). Run a quick validation to catch data entry/calibration slips across edited blocks.
Also applies to: 218-220, 257-260, 1685-1690, 7217-7224
🏁 Script executed:
#!/usr/bin/env bash
python - <<'PY'
import re, math, pathlib
p=pathlib.Path("example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign")
prot=1.007276466812
bad=[]
with p.open("r", encoding="utf-8") as f:
cur=None; vals={}
for ln,line in enumerate(f,1):
if line.startswith("SPECTRUM_ID="): cur=int(line.split("=",1)[1])
for k in ("PRECURSOR_MZ","PRECURSOR_CHARGE","PRECURSOR_MASS"):
if line.startswith(k+"="): vals[k]=float(line.split("=",1)[1])
if line.startswith("END IONS"):
mz, z, mass = vals.get("PRECURSOR_MZ"), vals.get("PRECURSOR_CHARGE"), vals.get("PRECURSOR_MASS")
if mz and z and mass:
calc=(mz-prot)*z
if abs(calc-mass) > 0.01: bad.append((cur, mass, calc))
vals={}
print(f"Spectra checked: {cur}")
print(f"Mass mismatches > 0.01 Da: {len(bad)}")
for x in bad[:20]: print(x)
PYLength of output: 829
Critical: PRECURSOR_MASS inconsistent with PRECURSOR_MZ/CHARGE — 157 mismatches
File: example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign — scanned 3,411 spectra; 157 entries differ from (PRECURSOR_MZ − 1.007276466812) × PRECURSOR_CHARGE by >0.01 Da.
Examples (SPECTRUM_ID, reported_mass, calculated_mass):
- (3099, 9399.279607, 9406.299892732315)
- (3100, 9242.293272, 9268.292568732315)
- (3103, 3086.435554, 3095.563618599564)
- (3104, 9046.278856, 9065.350964265503)
- (3106, 9203.327291, 9229.320692265503)
Recompute PRECURSOR_MASS = (PRECUROR_MZ − 1.007276466812) × PRECURSOR_CHARGE for all spectra or correct the mass-type/calibration used when writing PRECURSOR_MASS before downstream processing.
🤖 Prompt for AI Agents
In
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign
around lines 121–126, many PRECURSOR_MASS values are inconsistent with
PRECURSOR_MZ and PRECURSOR_CHARGE; recompute PRECURSOR_MASS for every spectrum
using PRECURSOR_MASS = (PRECURSOR_MZ − 1.007276466812) × PRECURSOR_CHARGE and
replace the written value (match the file's numeric precision), or if a
different mass calibration/type was intended, ensure the writer records that
mass-type explicitly and uses the correct calibration before downstream
processing; run a post-write validation to assert all entries differ by ≤0.01 Da
and fix any remaining mismatches.
| logger.log("Progress of 'processing FLASHTnT results':", level=2) | ||
| logger.log("0.0 %", level=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard logger usage to avoid AttributeError when logger is None.
Create a local no‑op logger and replace direct calls.
Apply:
def parseTnT(file_manager, dataset_id, deconv_mzML, anno_mzML, tag_tsv, protein_tsv, logger=None):
- logger.log("Progress of 'processing FLASHTnT results':", level=2)
- logger.log("0.0 %", level=2)
+ log = (logger.log if logger is not None else (lambda *a, **k: None))
+ log("Progress of 'processing FLASHTnT results':", level=2)
+ log("0.0 %", level=2)
@@
- logger.log("10.0 %", level=2)
+ log("10.0 %", level=2)
@@
- logger.log("30.0 %", level=2)
+ log("30.0 %", level=2)
@@
- logger.log("50.0 %", level=2)
+ log("50.0 %", level=2)
@@
- logger.log("70.0 %", level=2)
+ log("70.0 %", level=2)
@@
- logger.log("90.0 %", level=2)
+ log("90.0 %", level=2)
@@
- logger.log("100.0 %", level=2)
+ log("100.0 %", level=2)Also applies to: 24-24, 40-40, 79-79, 151-151, 169-169, 174-174
🤖 Prompt for AI Agents
In src/parse/tnt.py around lines 17-18 (and also apply same change to lines 24,
40, 79, 151, 169, 174), guard against logger being None by creating a local
no-op logger and using it instead of calling methods on a possibly None object;
e.g., at the top of the scope, assign a local variable logger = logger if logger
is not None else a small no-op object implementing the same methods used
(log/info/warn/debug) that do nothing, then replace direct calls with this local
logger so AttributeError cannot occur.
| # Fetch all caches | ||
| cached_compression_levels = [] | ||
| for size in compute_compression_levels(20000, len(data_full)): | ||
| for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: wrong Polars collect API and invalid item() usage.
- collect(engine="streaming") is not a valid Polars API. Use collect(streaming=True).
- .select(pl.len()) returns a 1x1 DataFrame; call .to_series(0).item() to extract the count.
- for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):
+ total_rows = data_full.select(pl.len()).collect(streaming=True).to_series(0).item()
+ for size in compute_compression_levels(20000, total_rows):Apply the same fix to all four occurrences.
Also applies to: 46-46, 69-69, 92-92
🤖 Prompt for AI Agents
In src/render/initialize.py around lines 24, 46, 69, and 92, the code
incorrectly uses collect(engine="streaming") and .select(pl.len()).item();
replace those calls with collect(streaming=True) and extract the scalar count
from the single-row DataFrame by calling .select(pl.len()).to_series(0).item()
(or call .select(pl.count()).to_series(0).item() if clearer), i.e., change
collect(...) to use streaming=True and convert the 1x1 DataFrame to a Series
then .item() to get the numeric value for each of the four occurrences.
|
|
||
| # Update data with current session state | ||
| data = update_data(data, out_components, additional_data, tool) | ||
| data = update_data(data, out_components, state, additional_data, tool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
API signature change needs verification across codebase.
The function call signature has been updated to pass the full state instead of a filtered version. This is a breaking change for the update_data function.
🏁 Script executed:
#!/bin/bash
# Description: Verify that update_data function signature has been properly updated
# Check the update_data function definition to confirm it accepts the new parameters
rg -A 3 'def update_data' --type py
# Check for any other calls to update_data that might need updating
rg 'update_data\(' --type py -C 3Length of output: 1090
update_data called with full state but function expects selection_store — fix required
src/render/update.py defines: def update_data(data, out_components, selection_store, additional_data, tool)
src/render/render.py calls: data = update_data(data, out_components, state, additional_data, tool)
- Fix A (preferred): Change the caller to pass the filtered selection_store (extract it from
state) instead of the fullstate(edit src/render/render.py where the call occurs). - Fix B: Or change src/render/update.py to accept the full
state(update the function signature and implementation) and update any callers accordingly. - Confirm there are no other callers; ripgrep shows the definition in src/render/update.py and this call in src/render/render.py.
🤖 Prompt for AI Agents
In src/render/render.py around line 28, the call passes the entire state to
update_data which expects selection_store; change the caller to extract and pass
selection_store from state (e.g., selection_store = state["selection_store"] or
state.selection_store depending on the state shape) and call update_data(data,
out_components, selection_store, additional_data, tool); then run tests/grep to
confirm no other callers need changes.
| hash_funcs = {pl.LazyFrame : lambda x : 1} | ||
| @st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash function for LazyFrame caching needs improvement
Using a constant hash value (1) for all LazyFrames defeats the purpose of caching and could lead to incorrect cached results being returned.
Consider implementing a proper hash based on the LazyFrame's query plan or parameters:
-hash_funcs = {pl.LazyFrame : lambda x : 1}
+# Hash based on the LazyFrame's logical plan representation
+hash_funcs = {pl.LazyFrame : lambda x : hash(str(x.explain(optimized=False)))}Alternatively, if the LazyFrame content is truly dynamic and shouldn't be cached, consider removing the caching decorator for this function.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hash_funcs = {pl.LazyFrame : lambda x : 1} | |
| @st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs) | |
| # Hash based on the LazyFrame's logical plan representation | |
| hash_funcs = {pl.LazyFrame: lambda x: hash(str(x.explain(optimized=False)))} | |
| @st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs) |
🧰 Tools
🪛 Ruff (0.12.2)
32-32: Unused lambda argument: x
(ARG005)
🤖 Prompt for AI Agents
In src/render/update.py around lines 32-33, the current hash_funcs uses a
constant value for pl.LazyFrame which breaks caching correctness; replace this
with a stable hash function that derives a deterministic fingerprint from the
LazyFrame's query plan/expressions (e.g., serialize the LazyFrame's query plan
or SQL/AST representation and return a hash of that string) so different query
shapes produce different cache keys, or if the LazyFrame is inherently dynamic
and should not be cached, remove the @st.cache_data decorator from the function
instead.
| if isinstance(data, (pl.DataFrame, pl.LazyFrame)): | ||
| path = Path(path, f"{name_tag}.pq") | ||
| # Convert LazyFrame to DataFrame for writing | ||
| if isinstance(data, pl.LazyFrame): | ||
| data = data.sink_parquet(path) | ||
| else: | ||
| data.write_parquet(path) | ||
| return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix LazyFrame sink_parquet return value handling
The sink_parquet method returns None when writing to disk, not the data itself. Line 310 incorrectly assigns this None value to data.
Apply this fix:
# Polars DataFrames and LazyFrames are stored as parquet
if isinstance(data, (pl.DataFrame, pl.LazyFrame)):
path = Path(path, f"{name_tag}.pq")
# Convert LazyFrame to DataFrame for writing
if isinstance(data, pl.LazyFrame):
- data = data.sink_parquet(path)
+ data.sink_parquet(path)
else:
data.write_parquet(path)
return path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(data, (pl.DataFrame, pl.LazyFrame)): | |
| path = Path(path, f"{name_tag}.pq") | |
| # Convert LazyFrame to DataFrame for writing | |
| if isinstance(data, pl.LazyFrame): | |
| data = data.sink_parquet(path) | |
| else: | |
| data.write_parquet(path) | |
| return path | |
| if isinstance(data, (pl.DataFrame, pl.LazyFrame)): | |
| path = Path(path, f"{name_tag}.pq") | |
| # Convert LazyFrame to DataFrame for writing | |
| if isinstance(data, pl.LazyFrame): | |
| data.sink_parquet(path) | |
| else: | |
| data.write_parquet(path) | |
| return path |
🤖 Prompt for AI Agents
In src/workflow/FileManager.py around lines 306 to 313, the code assigns the
result of pl.LazyFrame.sink_parquet to `data` even though sink_parquet returns
None; change the logic to call `data.sink_parquet(path)` without assigning its
return value (i.e., just invoke to write to disk) and then return the `path` as
currently done for the eager DataFrame branch; ensure the eager branch still
calls `data.write_parquet(path)` and both branches return the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/workflow/StreamlitUI.py (5)
1056-1066: Tail the selected log and drop “neon” highlighting.
- Full read on every tick is costly; tail with deque.
- Some logs contain non-UTF8; add errors="replace".
- “neon” isn’t a standard lexer; render as plain text.
Apply this diff:
- with open(log_path, "r", encoding="utf-8") as f: - lines = f.readlines() - if log_lines_count == "all": - display_lines = lines - else: - display_lines = lines[-st.session_state.log_lines_count:] + from collections import deque + if log_lines_count == "all": + with open(log_path, "r", encoding="utf-8", errors="replace") as f: + # cap to last 50k lines to avoid OOM on huge logs + display_lines = list(deque(f, maxlen=50_000)) + else: + with open(log_path, "r", encoding="utf-8", errors="replace") as f: + display_lines = list(deque(f, maxlen=st.session_state.log_lines_count)) code_box = st.container(key='log') - code_box.code( - "".join(display_lines), language="neon", - line_numbers=False - ) + code_box.code("".join(display_lines), language="text", line_numbers=False)
1074-1076: Fix timestamp timezone label.Hardcoding “CET” is incorrect on non‑CET hosts. Use local tz or UTC.
Apply this diff:
- st.markdown( - f"**Workflow log file: {datetime.fromtimestamp(log_path.stat().st_ctime).strftime('%Y-%m-%d %H:%M')} CET**" - ) + ts = datetime.fromtimestamp(log_path.stat().st_mtime).astimezone() + st.markdown(f"**Workflow log file: {ts.strftime('%Y-%m-%d %H:%M %Z')}**")
1077-1085: Usenot in(E713) and improve error label.Style fix and clearer status message when finish marker is missing.
Apply this diff:
- with open(log_path, "r", encoding="utf-8") as f: + with open(log_path, "r", encoding="utf-8", errors="replace") as f: content = f.read() - # Check if workflow finished successfully - if not "WORKFLOW FINISHED" in content: + # Check if workflow finished successfully + if "WORKFLOW FINISHED" not in content: status.update( - label='Workflow completed.', state='error', + label='Workflow finished with errors.', state='error', expanded=True ) error_box.error("**Errors occurred, check log file.**")
1091-1092: Render final log as text, not “neon”.“neon” isn’t a recognized lexer; use plain text.
Apply this diff:
- code_box = st.container(key='log') - code_box.code(content, language="neon", line_numbers=False) + code_box = st.container(key='log') + code_box.code(content, language="text", line_numbers=False)
1038-1049: Precompile regexes for faster repeated matching.Minor speedup when polling every second.
Example (outside this function):
PCT_RE = re.compile(r"(\d+(?:\.\d+)?)\s*%") LBL_RE = re.compile(r"Progress of\s+'([^']+)'")Then use PCT_RE.search(...) and LBL_RE.search(...).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/workflow/StreamlitUI.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/workflow/StreamlitUI.py (1)
src/workflow/CommandExecutor.py (1)
stop(254-268)
🪛 Ruff (0.12.2)
src/workflow/StreamlitUI.py
1080-1080: Test for membership should be not in
Convert to not in
(E713)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-full-app
- GitHub Check: build-openms
| log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log") | ||
| log_exists = log_path.exists() and log_path_complete.exists() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t gate log display on “all.log”; split existence checks.
Requiring both the selected log and all.log to exist suppresses output when users pick “minimal”/“commands and run times” and all.log isn’t present yet. Show the selected log whenever it exists; use all.log only for progress parsing.
Apply this diff:
- log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log")
- log_exists = log_path.exists() and log_path_complete.exists()
+ log_path_complete = Path(self.workflow_dir, "logs", "all.log")
+ selected_log_exists = log_path.exists()
+ progress_log_exists = log_path_complete.exists()And update the later checks:
- if log_exists:
+ if progress_log_exists:- elif log_exists and not pid_exists:
+ elif selected_log_exists and not pid_exists:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log") | |
| log_exists = log_path.exists() and log_path_complete.exists() | |
| log_path_complete = Path(self.workflow_dir, "logs", "all.log") | |
| selected_log_exists = log_path.exists() | |
| progress_log_exists = log_path_complete.exists() |
🤖 Prompt for AI Agents
In src/workflow/StreamlitUI.py around lines 1016-1018, the code currently
requires both the selected log and "all.log" to exist before showing logs;
change this so the selected log is shown whenever its own path exists and only
use the "all.log" existence check when doing progress parsing. Concretely:
replace the joint existence boolean with separate checks (e.g., log_exists =
log_path.exists() and all_log_exists = log_path_complete.exists()), use
log_exists to gate display of the selected log, and use all_log_exists where the
code parses or extracts progress from all.log; update any later conditional
branches that relied on the old combined variable to reference the appropriate
new boolean.
| with open(log_path_complete, "r", encoding="utf-8") as f: | ||
| for line in reversed(f.readlines()): | ||
| line = line.strip() | ||
| match = re.search(r"(\d+(?:\.\d+)?)\s*%", line) | ||
| if match and percentage < 0: | ||
| percentage = float(match.group(1))/100 | ||
| match = re.search(r"Progress of\s+'([^']+)'", line) | ||
| if match: | ||
| label = match.group(1) | ||
| break | ||
| elif "Process finished:" in line: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid full-file reads; tail the log and fix early-break logic.
Reading the entire all.log every second is O(n) RAM/IO and will lag/freeze on large logs. Also, breaking as soon as the label is found can drop a valid percent if it appears earlier.
Apply this diff:
- with open(log_path_complete, "r", encoding="utf-8") as f:
- for line in reversed(f.readlines()):
- line = line.strip()
- match = re.search(r"(\d+(?:\.\d+)?)\s*%", line)
- if match and percentage < 0:
- percentage = float(match.group(1))/100
- match = re.search(r"Progress of\s+'([^']+)'", line)
- if match:
- label = match.group(1)
- break
- elif "Process finished:" in line:
- break
+ from collections import deque
+ with open(log_path_complete, "r", encoding="utf-8", errors="replace") as f:
+ tail = list(deque(f, maxlen=400)) # only keep last ~400 lines
+ for raw in reversed(tail):
+ line = raw.strip()
+ if percentage < 0:
+ m = re.search(r"(\d+(?:\.\d+)?)\s*%", line)
+ if m:
+ percentage = float(m.group(1)) / 100.0
+ if label is None:
+ m = re.search(r"Progress of\s+'([^']+)'", line)
+ if m:
+ label = m.group(1)
+ if (percentage >= 0 or label is not None) and "Process finished:" in line:
+ break
+ if percentage >= 0 and label is not None:
+ breakNote: if you prefer, precompile the regexes once at module scope.
Additional import (outside this hunk):
from collections import deque🤖 Prompt for AI Agents
In src/workflow/StreamlitUI.py around lines 1040 to 1051, change the logic that
currently reads the entire log file into memory and breaks early upon finding a
label so it tails only the last N lines (use collections.deque with maxlen,
e.g., 500) and scans those lines for both percentage and label without stopping
as soon as one is found — precompile the two regexes at module scope (or locally
once) and, when iterating the deque in reverse order, update percentage and
label as matches are found but only break after both have been considered or
when a terminal "Process finished:" line is encountered; this avoids full-file
reads and ensures you don’t miss a percentage that appears earlier than the
label.
Summary by CodeRabbit
New Features
Refactor
Chores