New Features #50

t0mdavid-m · 2025-09-15T14:54:53Z

Summary by CodeRabbit

New Features
- Status-based run view with progress bar, clearer completion/error messages, and aggregated error reporting; continues processing other files on individual failures.
- Results saved in dated output folders with parameter snapshots alongside outputs.
Refactor
- Polars-based pipeline for faster, lower-memory processing and rendering; improved heatmap downsampling and streaming-friendly data handling.
Chores
- Version bumped to 0.9.10 (Windows installer metadata updated).
- Docker base updated to Python 3.11; added dependency polars>=1.0.0.
- Example datasets refreshed and many legacy config files removed.

This reverts commit 9ee4823.

coderabbitai · 2025-09-15T14:55:05Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (3)

js-component/dist/assets/index-26b53bf6.js is excluded by !**/dist/**
js-component/dist/assets/index-c441cb53.css is excluded by !**/dist/**
js-component/dist/index.html is excluded by !**/dist/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Version bump to 0.9.10, Python env updated to 3.11, polars added. Large migration from pandas to Polars across parsing/rendering/storage, new Polars-aware FileManager, workflow per-file error aggregation and restructured tool invocations, Streamlit UI log/status refactor, and extensive example-data feature/spectrum updates and config removals.

Changes

Cohort / File(s)	Summary
Versioning (App/Settings) `.github/workflows/build-windows-executable-app.yaml`, `settings.json`	Bump APP_NAME/version from 0.9.9 → 0.9.10; no logic changes.
Env / Dependencies `Dockerfile`, `requirements.txt`	Python in mamba env updated 3.10 → 3.11; added `polars>=1.0.0`.
Workflow Orchestration `src/Workflow.py`, `src/workflow/WorkflowManager.py`	Per-file try/except with aggregated ExceptionGroup; outputs/path updates; conditional decoy DB generation; explicit tool input/output mappings; parameter JSON persistence; enhanced traceback logging.
File Storage / Caching (Polars-aware) `src/workflow/FileManager.py`	Store/load Parquet for pandas/polars; support for `pl.DataFrame`/`pl.LazyFrame`; `get_results(..., use_polars=True)` option; pickle fallback for other objects.
Parsing (Polars migration) `src/parse/deconv.py`, `src/parse/masstable.py`, `src/parse/tnt.py`	Migrate deconv/masstable pipelines to Polars LazyFrames; progressive logging; heatmap construction now Polars-driven; parseTnT adds progress logs.
Rendering / Compression / Update `src/render/compression.py`, `src/render/initialize.py`, `src/render/update.py`, `src/render/render.py`	New Polars-based downsample_heatmap; initialize uses Polars streaming counts; update/filter functions now accept selection_store/state (signatures changed) and use Polars for filtering and caching.
Streamlit UI (Execution / Logs) `src/workflow/StreamlitUI.py`	Removed live show_log; replaced with status-based progress parsing by tailing all.log, status/error/code containers, and final-finish detection.
Example Data — Feature / MSAlign Updates `example-data/.../flashdeconv/cache/files/example_fd/toppic_ms1.feature`, `.../toppic_ms2.feature`, `.../toppic_ms2.msalign`	Large-scale data refresh: masses, intensities, times, scan ids, charges, feature IDs, and spectrum blocks rewritten/reshuffled (data-only changes).
Example Data — Params/INI Additions & Removals `example-data/.../flashdeconv/cache/files/example_fd/FD_parameters.json`, `example-data/.../flashdeconv/ini/FLASHDeconv.ini`, `.../params.json`, `example-data/.../flashtnt/ini/*`, `example-data/.../flashtnt/params.json`, `.../FD_parameters.json (dated)`	Added a small FD_parameters.json in example_fd; removed multiple FLASHDeconv/FLASHTnT INI and params.json files; cleared an older FD_parameters.json.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant UI as Streamlit UI
  participant WM as WorkflowManager
  participant W as Workflow
  participant FM as FileManager
  participant FD as FLASHDeconv
  participant FT as FLASHTnT
  participant P as Parsers

  User->>UI: start workflow
  UI->>WM: launch workflow_process()
  WM->>W: run(workspace, inputs, params)
  loop each input file (try/except)
    W->>FD: run(in, out, out_spec*, out_mzml, out_feature*, threads)
    W->>FM: store_data(outputs + FD_parameters.json)
    alt tag workflow
      W->>FT: run(in_deconv, fasta/decoy, out_tag/out_pro/out_prsm)
      W->>FM: store_data(FLASHTnT outputs + params)
    end
    W->>P: parseDeconv (Polars)
    W->>P: parseTnT (progress logs)
    W-->>W: cleanup temp folder
  end
  alt any errors
    W-->>WM: raise ExceptionGroup(errors)
    WM->>UI: show error + traceback
  else success
    WM->>UI: write "WORKFLOW FINISHED" to all.log
    UI->>UI: tail all.log, extract % and update progress/status
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Fix table #48 — Prior APP_NAME/version bump in the same Windows workflow; directly related to the version change.
add fragments #7 — Overlaps with Polars/render/parse refactors and FileManager Parquet changes.
Fix production #24 — Related Streamlit UI execution/log display refactor and versioning touches.

Poem

A rabbit taps keys with a hop and a twirl,
Polars now sparkle, data streams swirl.
Workflows herd errors into one tidy pen,
Logs whisper progress from zero to ten.
0.9.10—carrot-bright and neat—🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title "New Features" is too vague to convey the substantial and varied changes in this PR; the diff includes Polars integration across parsing/rendering/FileManager, heatmap downsampling, workflow and Streamlit UI changes, several signature updates, and a version bump to 0.9.10, so a reviewer scanning history would not know the primary impact from the title alone. Because the title does not summarize the main change or the highest-impact areas, it does not provide sufficient context. This makes the check inconclusive.	Please retitle the PR to a short, specific sentence that highlights the primary change(s), for example: "Integrate Polars for parsing/rendering and add Parquet caching; update workflows and heatmap downsampling (v0.9.10)". If multiple areas are equally important, mention the two highest-impact items in the title and list full details in the PR description. After updating the title, re-run the pre-merge checks.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/parse/tnt.py (1)

178-195: KDE robustness: handle NaNs and small sample sizes; guard string ops on accession.

Avoids crashes when there are NaNs, a single point, or zero variance.

Apply:

 def fdr_density_distribution(df, logger=None):
     df = df[df['ProteoformLevelQvalue'] > 0]
-    # Find density targets
-    target_qscores = df[~df['accession'].str.startswith('DECOY_')]['ProteoformLevelQvalue'].dropna()
-    if len(target_qscores) > 0:
+    # Find density targets
+    acc = df['accession'].fillna('')
+    target_qscores = df.loc[~acc.str.startswith('DECOY_'), 'ProteoformLevelQvalue'].dropna()
+    if len(target_qscores) >= 2 and target_qscores.std() > 0:
         x_target = np.linspace(target_qscores.min(), target_qscores.max(), 200)
         kde_target = gaussian_kde(target_qscores)
         density_target = pd.DataFrame({'x': x_target, 'y': kde_target(x_target)})
     else:
         density_target = pd.DataFrame(columns=['x', 'y'])
 
     # Find density decoys (if present)
-    decoy_qscores = df[df['accession'].str.startswith('DECOY_')]['ProteoformLevelQvalue'].dropna()
-    if len(decoy_qscores) > 0:
+    decoy_qscores = df.loc[acc.str.startswith('DECOY_'), 'ProteoformLevelQvalue'].dropna()
+    if len(decoy_qscores) >= 2 and decoy_qscores.std() > 0:
         x_decoy = np.linspace(decoy_qscores.min(), decoy_qscores.max(), 200)
         kde_decoy = gaussian_kde(decoy_qscores)
         density_decoy = pd.DataFrame({'x': x_decoy, 'y': kde_decoy(x_decoy)})
     else:
         density_decoy = pd.DataFrame(columns=['x', 'y'])

🧹 Nitpick comments (15)

example-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.json (1)

1-1: Parameter value includes a literal newline; confirm consumers expect this format.

If the reader splits on newline, fine; otherwise prefer a structured value to avoid parsing ambiguity.

Consider normalizing to an array:
-{"SD:tol": "5.0\n5.0"}
+{"SD:tol": ["5.0", "5.0"]}

src/parse/tnt.py (1)

85-95: Loop variable shadowing; use a distinct name for residue index.

Prevents accidental reuse of the outer enumerator and improves readability.

Apply:

-    for i, row in protein_df.iterrows():
+    for _, row in protein_df.iterrows():
@@
-        for i in range(len(sequence)):
-            coverage[i] = np.sum(
+        for pos in range(len(sequence)):
+            coverage[pos] = np.sum(
                 (tag_df['ProteinIndex'] == pid) &
-                (tag_df['StartPos'] <= i) &
-                (tag_df['EndPos'] >= i)
+                (tag_df['StartPos'] <= pos) &
+                (tag_df['EndPos'] >= pos)
             )

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature (1)

2-2: Numeric ranges.

Fraction_feature_intensity values reach the 1e9 range. Verify all in‑memory and parquet schemas use 64‑bit types to avoid overflow when aggregating or downsampling.

Also applies to: 14-14, 21-21, 26-33

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign (1)

14-31: msalign size: consider compressing or trimming example blocks.

The file is very large. If CI or app startup times suffer, consider gzipping the example and loading on demand, or keeping a smaller slice for tests plus a link to the full dataset.

Also applies to: 57-106, 108-151, 316-370

src/workflow/WorkflowManager.py (2)

59-62: Don’t dump full tracebacks into minimal logs.

Logger.log(level) writes to multiple files; with default level=0, the full traceback goes into minimal.log and others. Log the message at minimal level and the traceback at verbose level to keep minimal.log concise.
-        except Exception as e:
-            self.logger.log(f"ERROR: {e}")
-            self.logger.log("".join(traceback.format_exception(e)))
+        except Exception as e:
+            # short message for minimal/commands logs
+            self.logger.log(f"ERROR: {e}", level=0)
+            # full traceback only in all.log
+            self.logger.log("".join(traceback.format_exception(e)), level=2)
51-62: Version‑robust traceback formatting.

traceback.format_exception(e) is only available on 3.11+. If any runtime uses 3.10, prefer TracebackException to be safe.
-            self.logger.log("".join(traceback.format_exception(e)), level=2)
+            tb = "".join(traceback.TracebackException.from_exception(e).format())
+            self.logger.log(tb, level=2)

src/render/initialize.py (1)

17-31: DRY the heatmap loading/compression boilerplate.

The four branches are identical except for key prefixes. Factor into a small helper to reduce maintenance and accidental API drift.

+def _load_heatmap_with_levels(file_manager, dataset_id, base_key: str):
+    lf = file_manager.get_results(dataset_id, [base_key], use_polars=True)[base_key]
+    total = lf.select(pl.len()).collect(streaming=True).to_series(0).item()
+    levels = []
+    for size in compute_compression_levels(20000, total):
+        levels.append(file_manager.get_results(dataset_id, [f"{base_key}_{size}"], use_polars=True)[f"{base_key}_{size}"])
+    levels.append(lf)
+    return levels
@@
-        data_full = file_manager.get_results(
-            selected_data,  ['ms1_deconv_heatmap'], use_polars=True
-        )['ms1_deconv_heatmap']
-        cached_compression_levels = []
-        for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):
-            cached_compression_levels.append(
-                file_manager.get_results(
-                    selected_data,  [f'ms1_deconv_heatmap_{size}'], use_polars=True
-                )[f'ms1_deconv_heatmap_{size}']
-            )
-        cached_compression_levels.append(data_full)
+        cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms1_deconv_heatmap")
@@
-        data_full = file_manager.get_results(
-            selected_data,  ['ms2_deconv_heatmap'], use_polars=True
-        )['ms2_deconv_heatmap']
-        cached_compression_levels = []
-        for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):
-            cached_compression_levels.append(
-                file_manager.get_results(
-                    selected_data,  [f'ms2_deconv_heatmap_{size}'], use_polars=True
-                )[f'ms2_deconv_heatmap_{size}']
-            )
-        cached_compression_levels.append(data_full)
+        cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms2_deconv_heatmap")
@@
-        data_full = file_manager.get_results(
-            selected_data,  ['ms1_raw_heatmap'], use_polars=True
-        )['ms1_raw_heatmap']
-        cached_compression_levels = []
-        for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):
-            cached_compression_levels.append(
-                file_manager.get_results(
-                    selected_data,  [f'ms1_raw_heatmap_{size}'], use_polars=True
-                )[f'ms1_raw_heatmap_{size}']
-            )
-        cached_compression_levels.append(data_full)
+        cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms1_raw_heatmap")
@@
-        data_full = file_manager.get_results(
-            selected_data,  ['ms2_raw_heatmap'], use_polars=True
-        )['ms2_raw_heatmap']
-        cached_compression_levels = []
-        for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):
-            cached_compression_levels.append(
-                file_manager.get_results(
-                    selected_data,  [f'ms2_raw_heatmap_{size}'], use_polars=True
-                )[f'ms2_raw_heatmap_{size}']
-            )
-        cached_compression_levels.append(data_full)
+        cached_compression_levels = _load_heatmap_with_levels(file_manager, selected_data, "ms2_raw_heatmap")

Also applies to: 39-53, 62-76, 85-99

src/parse/masstable.py (1)

207-209: Consider using native Polars array conversion for better performance.

While map_elements works, for numpy array conversion you might get better performance with native Polars operations if the arrays are consistently structured.
-            pl.col("mzarray").map_elements(lambda x: x.tolist() if hasattr(x, 'tolist') else list(x), return_dtype=pl.List(pl.Float32)),
-            pl.col("intarray").map_elements(lambda x: x.tolist() if hasattr(x, 'tolist') else list(x), return_dtype=pl.List(pl.Float32))
+            pl.col("mzarray").list.eval(pl.element()).cast(pl.List(pl.Float32)),
+            pl.col("intarray").list.eval(pl.element()).cast(pl.List(pl.Float32))
However, if the current implementation handles edge cases better (e.g., mixed types), keep it as is.

src/render/compression.py (1)

19-19: Remove unused logger parameter or implement logging.

The logger parameter is not used in the function. Either remove it or add logging statements for debugging/monitoring purposes.

Would you like me to suggest where logging statements would be most valuable in this function, such as before/after the binning operation or when the iterative selection completes?

src/workflow/StreamlitUI.py (1)

1081-1081: Fix membership test syntax.

Use the more Pythonic not in syntax instead of not ... in.
-                if not "WORKFLOW FINISHED" in content:
+                if "WORKFLOW FINISHED" not in content:

src/parse/deconv.py (1)

51-51: Prefer streaming collect for counts (avoid materializing LazyFrame)

src/parse/deconv.py:51 — Replace
relevant_heatmap_lazy.select(pl.len()).collect().item()
with the project's streaming-aggregate pattern (e.g. .select(pl.len()).collect(streaming=True).item() or .select(pl.count()).collect(streaming=True).item() / .collect(engine="streaming")) to avoid materializing the full LazyFrame and match render/initialize.py and render/update.py.

src/workflow/FileManager.py (1)

468-470: Consider returning collected DataFrames for consistency

When use_polars=True, the function returns a LazyFrame from pl.scan_parquet, while other paths return materialized DataFrames. This inconsistency could confuse callers who might expect immediate data access.

Consider collecting the LazyFrame before returning, or document this behavior clearly in the docstring:
 elif use_polars:
     # Load as polars DataFrame
-    data = pl.scan_parquet(file_path)
+    data = pl.scan_parquet(file_path).collect()
Alternatively, if lazy evaluation is intentional for performance, update the docstring to clarify that use_polars=True returns a LazyFrame.

src/Workflow.py (2)

116-132: Remove unnecessary f-string prefixes

These strings don't contain any placeholders, so the f prefix is unnecessary.

-out_tsv = join(folder_path, f'out.tsv')
-out_deconv = join(folder_path, f'out_deconv.mzML')
-out_anno = join(folder_path, f'anno_annotated.mzML')
-out_spec1 = join(folder_path, f'spec1.tsv')
-out_spec2 = join(folder_path, f'spec2.tsv')
-out_spec3 = join(folder_path, f'spec3.tsv')
-out_spec4 = join(folder_path, f'spec4.tsv')
-out_quant = join(folder_path, f'quant.tsv')
-out_msalign1 = join(folder_path, f'toppic_ms1.msalign')
-out_msalign2 = join(folder_path, f'toppic_ms2.msalign')
-out_feature1 = join(folder_path, f'toppic_ms1.feature')
-out_feature2 = join(folder_path, f'toppic_ms2.feature')

-out_prsm = join(folder_path, f'prsms.tsv')
-out_db = join(folder_path, f'database.fasta')
-out_tag = join(folder_path, f'tags.tsv')
-out_protein = join(folder_path, f'protein.tsv')
+out_tsv = join(folder_path, 'out.tsv')
+out_deconv = join(folder_path, 'out_deconv.mzML')
+out_anno = join(folder_path, 'anno_annotated.mzML')
+out_spec1 = join(folder_path, 'spec1.tsv')
+out_spec2 = join(folder_path, 'spec2.tsv')
+out_spec3 = join(folder_path, 'spec3.tsv')
+out_spec4 = join(folder_path, 'spec4.tsv')
+out_quant = join(folder_path, 'quant.tsv')
+out_msalign1 = join(folder_path, 'toppic_ms1.msalign')
+out_msalign2 = join(folder_path, 'toppic_ms2.msalign')
+out_feature1 = join(folder_path, 'toppic_ms1.feature')
+out_feature2 = join(folder_path, 'toppic_ms2.feature')

+out_prsm = join(folder_path, 'prsms.tsv')
+out_db = join(folder_path, 'database.fasta')
+out_tag = join(folder_path, 'tags.tsv')
+out_protein = join(folder_path, 'protein.tsv')

336-347: Remove unnecessary f-string prefixes (DeconvWorkflow)

Same issue as in TagWorkflow - these strings don't need the f prefix.

-out_tsv = join(folder_path, f'out.tsv')
-out_deconv = join(folder_path, f'out_deconv.mzML')
-out_anno = join(folder_path, f'anno_annotated.mzML')
-out_spec1 = join(folder_path, f'spec1.tsv')
-out_spec2 = join(folder_path, f'spec2.tsv')
-out_spec3 = join(folder_path, f'spec3.tsv')
-out_spec4 = join(folder_path, f'spec4.tsv')
-out_quant = join(folder_path, f'quant.tsv')
-out_msalign1 = join(folder_path, f'toppic_ms1.msalign')
-out_msalign2 = join(folder_path, f'toppic_ms2.msalign')
-out_feature1 = join(folder_path, f'toppic_ms1.feature')
-out_feature2 = join(folder_path, f'toppic_ms2.feature')
+out_tsv = join(folder_path, 'out.tsv')
+out_deconv = join(folder_path, 'out_deconv.mzML')
+out_anno = join(folder_path, 'anno_annotated.mzML')
+out_spec1 = join(folder_path, 'spec1.tsv')
+out_spec2 = join(folder_path, 'spec2.tsv')
+out_spec3 = join(folder_path, 'spec3.tsv')
+out_spec4 = join(folder_path, 'spec4.tsv')
+out_quant = join(folder_path, 'quant.tsv')
+out_msalign1 = join(folder_path, 'toppic_ms1.msalign')
+out_msalign2 = join(folder_path, 'toppic_ms2.msalign')
+out_feature1 = join(folder_path, 'toppic_ms1.feature')
+out_feature2 = join(folder_path, 'toppic_ms2.feature')

src/render/update.py (1)

133-160: Consider extracting default selection values

The default selection values are duplicated. Consider extracting them to reduce duplication.

+DEFAULT_HEATMAP_SELECTION = {
+    'xRange': [-1, -1],
+    'yRange': [-1, -1]
+}
+
 elif (component in ['Deconvolved MS1 Heatmap', 'Deconvolved MS2 Heatmap']):
     selection = 'heatmap_deconv' if '1' in component else 'heatmap_deconv2'
     if selection not in selection_store:
-        selected_data = {
-            'xRange' : [-1, -1],
-            'yRange' : [-1, -1]
-        }
+        selected_data = DEFAULT_HEATMAP_SELECTION
     else:
         selected_data = selection_store[selection]
     data['deconv_heatmap_df'] = render_heatmap(
         additional_data['deconv_heatmap_df'], 
         selected_data,
         additional_data['dataset'], component
     )
 elif (component in ['Raw MS1 Heatmap', 'Raw MS2 Heatmap']):
     selection = 'heatmap_raw' if '1' in component else 'heatmap_raw2'
     if selection not in selection_store:
-        selected_data = {
-            'xRange' : [-1, -1],
-            'yRange' : [-1, -1]
-        }
+        selected_data = DEFAULT_HEATMAP_SELECTION
     else:
         selected_data = selection_store[selection]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f04c7f8 and ab425f5.

⛔ Files ignored due to path filters (11)

example-data/workspaces/default/flashdeconv/cache/cache.db is excluded by !**/*.db
example-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.pkl.gz is excluded by !**/*.gz
example-data/workspaces/default/flashdeconv/cache/files/example_fd/out.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd/quant.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd/spec1.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd/spec2.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.pkl.gz is excluded by !**/*.gz
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/out.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/quant.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/spec1.tsv is excluded by !**/*.tsv
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/spec2.tsv is excluded by !**/*.tsv

📒 Files selected for processing (25)

.github/workflows/build-windows-executable-app.yaml (1 hunks)
Dockerfile (1 hunks)
example-data/workspaces/default/flashdeconv/cache/files/example_fd/FD_parameters.json (1 hunks)
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature (1 hunks)
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature (4 hunks)
example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign (105 hunks)
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.json (0 hunks)
example-data/workspaces/default/flashdeconv/ini/FLASHDeconv.ini (0 hunks)
example-data/workspaces/default/flashdeconv/params.json (0 hunks)
example-data/workspaces/default/flashtnt/ini/FLASHDeconv.ini (0 hunks)
example-data/workspaces/default/flashtnt/ini/FLASHTnT.ini (0 hunks)
example-data/workspaces/default/flashtnt/params.json (0 hunks)
requirements.txt (1 hunks)
settings.json (1 hunks)
src/Workflow.py (2 hunks)
src/parse/deconv.py (3 hunks)
src/parse/masstable.py (2 hunks)
src/parse/tnt.py (5 hunks)
src/render/compression.py (3 hunks)
src/render/initialize.py (5 hunks)
src/render/render.py (1 hunks)
src/render/update.py (6 hunks)
src/workflow/FileManager.py (7 hunks)
src/workflow/StreamlitUI.py (2 hunks)
src/workflow/WorkflowManager.py (2 hunks)

💤 Files with no reviewable changes (6)

example-data/workspaces/default/flashdeconv/params.json
example-data/workspaces/default/flashtnt/ini/FLASHTnT.ini
example-data/workspaces/default/flashdeconv/cache/files/example_fd_20250525-204033/FD_parameters.json
example-data/workspaces/default/flashtnt/params.json
example-data/workspaces/default/flashdeconv/ini/FLASHDeconv.ini
example-data/workspaces/default/flashtnt/ini/FLASHDeconv.ini

🧰 Additional context used

🧬 Code graph analysis (8)

src/parse/deconv.py (4)

src/workflow/Logger.py (1)

log (16-42)

src/parse/masstable.py (2)

parseFLASHDeconvOutput (9-182)

getMSSignalDF (197-243)

src/workflow/FileManager.py (2)

store_data (327-345)

get_results (428-478)

src/render/compression.py (2)

compute_compression_levels (8-16)

downsample_heatmap (19-104)

src/parse/tnt.py (4)

src/workflow/Logger.py (1)

log (16-42)

src/parse/masstable.py (2)

parseFLASHDeconvOutput (9-182)

parseFLASHTaggerOutput (185-187)

src/parse/deconv.py (1)

fdr_density_distribution (218-235)

src/workflow/FileManager.py (1)

store_data (327-345)

src/workflow/WorkflowManager.py (1)

src/workflow/Logger.py (1)

log (16-42)

src/workflow/StreamlitUI.py (1)

src/workflow/CommandExecutor.py (1)

stop (254-268)

src/render/initialize.py (2)

src/render/compression.py (1)

compute_compression_levels (8-16)

src/workflow/FileManager.py (1)

get_results (428-478)

src/render/update.py (1)

src/render/compression.py (1)

downsample_heatmap (19-104)

src/Workflow.py (6)

src/workflow/Logger.py (1)

log (16-42)

src/workflow/ParameterManager.py (1)

get_parameters_from_json (82-101)

src/workflow/CommandExecutor.py (1)

run_topp (160-252)

src/workflow/FileManager.py (4)

store_file (347-386)

store_data (327-345)

get_results (428-478)

result_exists (480-498)

src/parse/deconv.py (1)

parseDeconv (9-215)

src/parse/tnt.py (1)

parseTnT (16-174)

src/render/render.py (1)

src/render/update.py (2)

update_data (85-99)

filter_data (102-184)

🪛 Ruff (0.12.2)

src/workflow/StreamlitUI.py

1081-1081: Test for membership should be not in

Convert to not in

(E713)

src/render/compression.py

19-19: Unused function argument: logger

(ARG001)

34-34: Avoid specifying long messages outside the exception class

(TRY003)

src/render/update.py

32-32: Unused lambda argument: x

(ARG005)

85-85: Unused function argument: additional_data

(ARG001)

src/Workflow.py

116-116: f-string without any placeholders

Remove extraneous f prefix

(F541)

117-117: f-string without any placeholders

Remove extraneous f prefix

(F541)

118-118: f-string without any placeholders

Remove extraneous f prefix

(F541)

119-119: f-string without any placeholders

Remove extraneous f prefix

(F541)

120-120: f-string without any placeholders

Remove extraneous f prefix

(F541)

121-121: f-string without any placeholders

Remove extraneous f prefix

(F541)

122-122: f-string without any placeholders

Remove extraneous f prefix

(F541)

123-123: f-string without any placeholders

Remove extraneous f prefix

(F541)

124-124: f-string without any placeholders

Remove extraneous f prefix

(F541)

125-125: f-string without any placeholders

Remove extraneous f prefix

(F541)

126-126: f-string without any placeholders

Remove extraneous f prefix

(F541)

127-127: f-string without any placeholders

Remove extraneous f prefix

(F541)

129-129: f-string without any placeholders

Remove extraneous f prefix

(F541)

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

131-131: f-string without any placeholders

Remove extraneous f prefix

(F541)

132-132: f-string without any placeholders

Remove extraneous f prefix

(F541)

162-162: f-string without any placeholders

Remove extraneous f prefix

(F541)

187-187: f-string without any placeholders

Remove extraneous f prefix

(F541)

204-204: f-string without any placeholders

Remove extraneous f prefix

(F541)

266-266: Do not catch blind exception: Exception

(BLE001)

269-269: Avoid specifying long messages outside the exception class

(TRY003)

269-269: Undefined name ExceptionGroup. Consider specifying requires-python = ">= 3.11" or tool.ruff.target-version = "py311" in your pyproject.toml file.

(F821)

336-336: f-string without any placeholders

Remove extraneous f prefix

(F541)

337-337: f-string without any placeholders

Remove extraneous f prefix

(F541)

338-338: f-string without any placeholders

Remove extraneous f prefix

(F541)

339-339: f-string without any placeholders

Remove extraneous f prefix

(F541)

340-340: f-string without any placeholders

Remove extraneous f prefix

(F541)

341-341: f-string without any placeholders

Remove extraneous f prefix

(F541)

342-342: f-string without any placeholders

Remove extraneous f prefix

(F541)

343-343: f-string without any placeholders

Remove extraneous f prefix

(F541)

344-344: f-string without any placeholders

Remove extraneous f prefix

(F541)

345-345: f-string without any placeholders

Remove extraneous f prefix

(F541)

346-346: f-string without any placeholders

Remove extraneous f prefix

(F541)

347-347: f-string without any placeholders

Remove extraneous f prefix

(F541)

349-349: f-string without any placeholders

Remove extraneous f prefix

(F541)

374-374: f-string without any placeholders

Remove extraneous f prefix

(F541)

417-417: Do not catch blind exception: Exception

(BLE001)

420-420: Avoid specifying long messages outside the exception class

(TRY003)

420-420: Undefined name ExceptionGroup. Consider specifying requires-python = ">= 3.11" or tool.ruff.target-version = "py311" in your pyproject.toml file.

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-openms
GitHub Check: build-vue-js-component

🔇 Additional comments (32)

.github/workflows/build-windows-executable-app.yaml (1)

17-17: APP_NAME bumped to 0.9.10 — OK.

Verified: the "Set Version in settings.json" step sets settings.json.version from github.event.release.tag_name (.github/workflows/build-windows-executable-app.yaml, lines 228–236). Ensure the GitHub Release tag exactly matches FLASHApp-0.9.10 (including any "v" prefix convention) to avoid UI/version drift between release and PR artifacts.

settings.json (1)

4-4: Approve — version bumped to 0.9.10; no remaining 0.9.9 references found.

Ran the rg you provided: settings.json line 4 is "version": "0.9.10". No matches for "0.9.9" or "FLASHApp-0.9.9" were found; remaining "version:" hits are in built/dist JS assets (expected).
src/parse/tnt.py (1)
20-23: Unpack count mismatch from parseFLASHDeconvOutput (throws ValueError).

parseFLASHDeconvOutput returns 5 values; the code expects 6.

Apply:
-    deconv_df, _, tolerance, _, _,  = parseFLASHDeconvOutput(
+    deconv_df, _, tolerance, _, _ = parseFLASHDeconvOutput(
         anno_mzML, deconv_mzML
     )
Likely an incorrect or invalid review comment.
Dockerfile (1)

76-76: Python version mismatch — align runtime and validate scientific-stack builds

Observed: requirements.txt was autogenerated with Python 3.12 (requirements.txt:2); Dockerfile creates env with python=3.11 (Dockerfile:76); workflows pin PYTHON_VERSION 3.11.0 and 3.11.9 (.github/workflows/build-windows-executable-app.yaml:15, .github/workflows/test-win-exe-w-embed-py.yaml:12). polars is present (requirements.txt:141).

Action: choose a single Python runtime (3.11 or 3.12) and align it across requirements, Dockerfile, and CI — either regenerate requirements.txt under the chosen Python or update Dockerfile/CI to the version used to compile requirements.

Action: after alignment, run a full build/test to confirm pyOpenMS + Cython + numpy (e.g. 1.26.4) + pandas + polars build successfully on the selected Python and that required binary wheels are available.

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature (1)

102-187: MS1 feature rows updated — verify parser compatibility and refresh caches

Header detected and tab-delimited; file example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms1.feature has 187 lines (1 header + 186 rows). Columns appear to match parser: File_name, Fraction_ID, Feature_ID, Mass, Intensity, Min_time, Max_time, Min_scan, Max_scan, Min_charge, Max_charge, Apex_time, Apex_scan, Apex_intensity, Rep_charge, Rep_average_mz, Envelope_num, EC_score.

Field-count consistency check aborted in the last run (inline script hit a Python syntax error) — re-run a simple check to ensure every row has the same number of tab-separated fields.

Confirm Mass/Intensity units expected by downstream parsers (e.g., m/z vs Da; raw vs normalized intensity) and regenerate any cached indices/hashes used by examples/tests.

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.feature (3)

2-2: Path consistency check.

The File_name column uses a relative path with “../workspaces-FLASHApp/...”. Ensure all downstream loaders resolve these paths the same way as msalign FILE_NAME entries to avoid cache misses.

2-2: Column semantics sanity.

Double‑check that MS_one_ID/MS_one_scans correspond to the MS1 spectrum referenced in msalign’s MS_ONE_ID/MS_ONE_SCAN for the same SCANS, and that Fraction_feature_min/max/apex_time envelopes still cover the referenced MS2 RTs.

Also applies to: 5-5, 8-12, 17-18, 20-22

4-4: PRECURSOR_FEATURE_ID referential integrity verified — no missing IDs.
Cross-check: Total spectra = 167; Unique PRECURSOR_FEATURE_IDs = 109; Missing = 0.

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign (2)

16-18: TITLE rename and header changes—confirm parsers.

TITLE now uses “Scan_” (was “DScan_” per summary). Ensure all parsers/UI components key off SCANS/SPECTRUM_ID rather than string‑matching TITLE, or update any TITLE‑based logic.

Also applies to: 25-31

209-219: Changed MS_ONE_ID/MS_ONE_SCAN—validate MS1 linkage.

Here MS_ONE_ID/SCAN point to 3101/3102 unlike adjacent blocks. Confirm this is intentional and that upstream indexing aligns with the MS1 mzML.

src/workflow/WorkflowManager.py (1)

9-9: Good addition: traceback for richer diagnostics.

Importing traceback is appropriate for detailed failure reports.

src/render/initialize.py (1)

1-1: Import Polars—LGTM.

Switching initialization to Polars aligns with the rest of the PR.
requirements.txt (2)
53-61: Run import compatibility test in CI (numpy/pyarrow/pandas/polars/streamlit)

Local verification failed with: ModuleNotFoundError: No module named 'numpy'. Run the test in an environment with project deps installed (e.g., after pip install -r requirements.txt) and paste the output.

File: requirements.txt Lines: 53-61
numpy==1.26.4
    # via
    #   contourpy
    #   matplotlib
    #   pandas
    #   pydeck
    #   pyopenms
    #   src (pyproject.toml)
    #   streamlit
Run:
#!/usr/bin/env bash
pip install -r requirements.txt
python - <<'PY'
import numpy, pyarrow, pandas, streamlit
try:
    import polars as pl
    lf = pl.LazyFrame({"x":[1,2,3]}).select(pl.len()).collect(streaming=True)
    print("Polars OK:", lf.to_series(0).item())
except Exception as e:
    raise SystemExit(f"Polars check failed: {e}")
print("Imports OK")
PY
Also applies to: 84-89, 98-106, 114-121, 133-137

141-142: Pin polars to a major range and recompile using the runtime Python

requirements.txt currently lists polars>=1.0.0 (requirements.txt ≈ line 141). Pin to avoid breaking changes: polars>=1.0,<2.0.

The requirements header says pip-compile was run with Python 3.12, but the environment here is Python 3.11.2 and pip-compile is not available, so re-run pip-compile inside the actual runtime (Docker/CI image using the runtime Python) and commit the regenerated requirements.txt to avoid solver drift.
- polars>=1.0.0
+ polars>=1.0,<2.0
src/render/render.py (1)

31-32: Consistent API change for filter_data function.

The filter_data function signature has also been updated to receive the full state object, maintaining consistency with the update_data change.

src/parse/masstable.py (2)

3-3: Good adoption of Polars for data processing efficiency.

The migration to Polars is a solid architectural choice for handling large datasets efficiently, particularly with its lazy evaluation capabilities.

198-243: Efficient Polars pipeline implementation with proper lazy evaluation.

The implementation correctly leverages Polars' lazy evaluation and efficient operations. The pipeline is well-structured with appropriate transformations. A few observations:

The use of map_elements for converting numpy arrays is appropriate

Window functions for mass_idx calculation is efficient

Proper filtering of null and non-positive intensities

Returns a LazyFrame for downstream efficiency

src/render/compression.py (4)

19-32: Well-documented function with clear parameter descriptions.

The docstring properly documents the expected input types and parameters, which is essential for a function that handles multiple data formats.

33-35: Good validation of bin constraints.

The check ensures that the number of bins doesn't exceed the maximum datapoints, preventing unnecessary computation.

53-53: Strategic use of streaming engine for memory efficiency.

Using collect(engine="streaming") is appropriate here as it enables processing of large datasets that might not fit in memory all at once.

76-77: Correct adjustment for scipy's 1-based indexing.

The conversion from scipy's 1-based to Polars' 0-based indexing is properly handled with clear documentation.

src/parse/deconv.py (4)

13-15: Good addition of progress logging for user feedback.

Adding progress logging at the start of processing improves user experience by providing immediate feedback.

22-29: Smart memory management with immediate conversion to Polars.

Deleting the original DataFrames after storing and immediately reloading as Polars LazyFrames is an excellent pattern for reducing memory footprint.

48-48: Appropriate use of streaming collection for large datasets.

The pattern of collect with streaming -> lazy is correct for ensuring data materialization while maintaining memory efficiency.

76-91: Clean Polars implementation for scan_table.

The use of native Polars operations with proper column selection and aliasing is well-structured. The with_row_index and with_columns pattern is idiomatic Polars.

src/workflow/StreamlitUI.py (4)

1017-1018: Good change to use complete log for progress tracking.

Using the 'all.log' file ensures that progress information is always available regardless of the user's selected log level.

1039-1054: Robust progress parsing with regex patterns.

The regex-based parsing for percentage and progress labels is well-implemented with proper fallback handling.

1026-1028: Improved UX with shorter spinner delay.

Reducing the spinner delay from the typical default to 1 second provides faster feedback to users.

1030-1094: Well-structured status management with proper error handling.

The new status-based UI with separate handling for running, complete, and error states provides clear visual feedback. The error detection based on log content is a good safety check.

src/Workflow.py (2)

100-269: Good error aggregation pattern!

The implementation of per-file error handling with ExceptionGroup aggregation is excellent. This ensures all files are attempted before reporting failures, improving user experience and debugging.

269-269: Require Python >= 3.11 or avoid ExceptionGroup

ExceptionGroup is only available on Python 3.11+; repository has no pyproject.toml or setup.py declaring python_requires (requirements.txt header shows pip-compile used Python 3.12 but does not set runtime). Add python_requires = ">=3.11" to your packaging config or replace ExceptionGroup with a compatibility fallback. Location: src/Workflow.py:269

src/render/update.py (1)

41-72: Excellent Polars streaming implementation!

The migration to Polars with streaming collection and intelligent downsampling based on data size is well-implemented. The early size estimation using limit(20000) before deciding whether to downsample is a smart optimization.

coderabbitai · 2025-09-15T15:08:27Z

example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign

 PRECURSOR_CHARGE=7
-PRECURSOR_MASS=9241.289441
+PRECURSOR_MASS=9242.293272
 PRECURSOR_INTENSITY=4475717.00
-PRECURSOR_FEATURE_ID=101
-PRECURSOR_QSCORE=0.93
+PRECURSOR_FEATURE_ID=11
+PRECURSOR_QSCORE=1.00
 1370.253350	3304.76	1


💡 Verification agent

🧩 Analysis chain

Mass/charge consistency spot‑check.

PRECURSOR_MASS should match (mz − 1.007276466812) × z (average vs mono as defined). Run a quick validation to catch data entry/calibration slips across edited blocks.

Also applies to: 218-220, 257-260, 1685-1690, 7217-7224

🏁 Script executed:

#!/usr/bin/env bash python - <<'PY' import re, math, pathlib p=pathlib.Path("example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign") prot=1.007276466812 bad=[] with p.open("r", encoding="utf-8") as f: cur=None; vals={} for ln,line in enumerate(f,1): if line.startswith("SPECTRUM_ID="): cur=int(line.split("=",1)[1]) for k in ("PRECURSOR_MZ","PRECURSOR_CHARGE","PRECURSOR_MASS"): if line.startswith(k+"="): vals[k]=float(line.split("=",1)[1]) if line.startswith("END IONS"): mz, z, mass = vals.get("PRECURSOR_MZ"), vals.get("PRECURSOR_CHARGE"), vals.get("PRECURSOR_MASS") if mz and z and mass: calc=(mz-prot)*z if abs(calc-mass) > 0.01: bad.append((cur, mass, calc)) vals={} print(f"Spectra checked: {cur}") print(f"Mass mismatches > 0.01 Da: {len(bad)}") for x in bad[:20]: print(x) PY

Length of output: 829

Critical: PRECURSOR_MASS inconsistent with PRECURSOR_MZ/CHARGE — 157 mismatches

File: example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign — scanned 3,411 spectra; 157 entries differ from (PRECURSOR_MZ − 1.007276466812) × PRECURSOR_CHARGE by >0.01 Da.

Examples (SPECTRUM_ID, reported_mass, calculated_mass):

(3099, 9399.279607, 9406.299892732315)

(3100, 9242.293272, 9268.292568732315)

(3103, 3086.435554, 3095.563618599564)

(3104, 9046.278856, 9065.350964265503)

(3106, 9203.327291, 9229.320692265503)

Recompute PRECURSOR_MASS = (PRECUROR_MZ − 1.007276466812) × PRECURSOR_CHARGE for all spectra or correct the mass-type/calibration used when writing PRECURSOR_MASS before downstream processing.

🤖 Prompt for AI Agents

In example-data/workspaces/default/flashdeconv/cache/files/example_fd/toppic_ms2.msalign around lines 121–126, many PRECURSOR_MASS values are inconsistent with PRECURSOR_MZ and PRECURSOR_CHARGE; recompute PRECURSOR_MASS for every spectrum using PRECURSOR_MASS = (PRECURSOR_MZ − 1.007276466812) × PRECURSOR_CHARGE and replace the written value (match the file's numeric precision), or if a different mass calibration/type was intended, ensure the writer records that mass-type explicitly and uses the correct calibration before downstream processing; run a post-write validation to assert all entries differ by ≤0.01 Da and fix any remaining mismatches.

coderabbitai · 2025-09-15T15:08:27Z

src/parse/tnt.py

+    logger.log("Progress of 'processing FLASHTnT results':", level=2)
+    logger.log("0.0 %", level=2)


⚠️ Potential issue

Guard logger usage to avoid AttributeError when logger is None.

Create a local no‑op logger and replace direct calls.

Apply:

def parseTnT(file_manager, dataset_id, deconv_mzML, anno_mzML, tag_tsv, protein_tsv, logger=None): - logger.log("Progress of 'processing FLASHTnT results':", level=2) - logger.log("0.0 %", level=2) + log = (logger.log if logger is not None else (lambda *a, **k: None)) + log("Progress of 'processing FLASHTnT results':", level=2) + log("0.0 %", level=2) @@ - logger.log("10.0 %", level=2) + log("10.0 %", level=2) @@ - logger.log("30.0 %", level=2) + log("30.0 %", level=2) @@ - logger.log("50.0 %", level=2) + log("50.0 %", level=2) @@ - logger.log("70.0 %", level=2) + log("70.0 %", level=2) @@ - logger.log("90.0 %", level=2) + log("90.0 %", level=2) @@ - logger.log("100.0 %", level=2) + log("100.0 %", level=2)

Also applies to: 24-24, 40-40, 79-79, 151-151, 169-169, 174-174

🤖 Prompt for AI Agents

In src/parse/tnt.py around lines 17-18 (and also apply same change to lines 24, 40, 79, 151, 169, 174), guard against logger being None by creating a local no-op logger and using it instead of calling methods on a possibly None object; e.g., at the top of the scope, assign a local variable logger = logger if logger is not None else a small no-op object implementing the same methods used (log/info/warn/debug) that do nothing, then replace direct calls with this local logger so AttributeError cannot occur.

coderabbitai · 2025-09-15T15:08:27Z

src/render/initialize.py

        # Fetch all caches
        cached_compression_levels = []
-        for size in compute_compression_levels(20000, len(data_full)):
+        for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()):


⚠️ Potential issue

Bug: wrong Polars collect API and invalid item() usage.

collect(engine="streaming") is not a valid Polars API. Use collect(streaming=True).

.select(pl.len()) returns a 1x1 DataFrame; call .to_series(0).item() to extract the count.

- for size in compute_compression_levels(20000, data_full.select(pl.len()).collect(engine="streaming").item()): + total_rows = data_full.select(pl.len()).collect(streaming=True).to_series(0).item() + for size in compute_compression_levels(20000, total_rows):

Apply the same fix to all four occurrences.

Also applies to: 46-46, 69-69, 92-92

🤖 Prompt for AI Agents

In src/render/initialize.py around lines 24, 46, 69, and 92, the code incorrectly uses collect(engine="streaming") and .select(pl.len()).item(); replace those calls with collect(streaming=True) and extract the scalar count from the single-row DataFrame by calling .select(pl.len()).to_series(0).item() (or call .select(pl.count()).to_series(0).item() if clearer), i.e., change collect(...) to use streaming=True and convert the 1x1 DataFrame to a Series then .item() to get the numeric value for each of the four occurrences.

coderabbitai · 2025-09-15T15:08:28Z

src/render/render.py


    # Update data with current session state
-    data = update_data(data, out_components, additional_data, tool)
+    data = update_data(data, out_components, state, additional_data, tool)


💡 Verification agent

🧩 Analysis chain

API signature change needs verification across codebase.

The function call signature has been updated to pass the full state instead of a filtered version. This is a breaking change for the update_data function.

🏁 Script executed:

#!/bin/bash # Description: Verify that update_data function signature has been properly updated # Check the update_data function definition to confirm it accepts the new parameters rg -A 3 'def update_data' --type py # Check for any other calls to update_data that might need updating rg 'update_data\(' --type py -C 3

Length of output: 1090

update_data called with full state but function expects selection_store — fix required

src/render/update.py defines: def update_data(data, out_components, selection_store, additional_data, tool)
src/render/render.py calls: data = update_data(data, out_components, state, additional_data, tool)

Fix A (preferred): Change the caller to pass the filtered selection_store (extract it from state) instead of the full state (edit src/render/render.py where the call occurs).

Fix B: Or change src/render/update.py to accept the full state (update the function signature and implementation) and update any callers accordingly.

Confirm there are no other callers; ripgrep shows the definition in src/render/update.py and this call in src/render/render.py.

🤖 Prompt for AI Agents

In src/render/render.py around line 28, the call passes the entire state to update_data which expects selection_store; change the caller to extract and pass selection_store from state (e.g., selection_store = state["selection_store"] or state.selection_store depending on the state shape) and call update_data(data, out_components, selection_store, additional_data, tool); then run tests/grep to confirm no other callers need changes.

coderabbitai · 2025-09-15T15:08:28Z

src/render/update.py

+hash_funcs = {pl.LazyFrame : lambda x : 1}
 @st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs)


⚠️ Potential issue

Hash function for LazyFrame caching needs improvement

Using a constant hash value (1) for all LazyFrames defeats the purpose of caching and could lead to incorrect cached results being returned.

Consider implementing a proper hash based on the LazyFrame's query plan or parameters:

-hash_funcs = {pl.LazyFrame : lambda x : 1} +# Hash based on the LazyFrame's logical plan representation +hash_funcs = {pl.LazyFrame : lambda x : hash(str(x.explain(optimized=False)))}

Alternatively, if the LazyFrame content is truly dynamic and shouldn't be cached, consider removing the caching decorator for this function.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

hash_funcs = {pl.LazyFrame : lambda x : 1}

@st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs)

# Hash based on the LazyFrame's logical plan representation

hash_funcs = {pl.LazyFrame: lambda x: hash(str(x.explain(optimized=False)))}

@st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs)

🧰 Tools

🪛 Ruff (0.12.2)

32-32: Unused lambda argument: x

(ARG005)

🤖 Prompt for AI Agents

In src/render/update.py around lines 32-33, the current hash_funcs uses a constant value for pl.LazyFrame which breaks caching correctness; replace this with a stable hash function that derives a deterministic fingerprint from the LazyFrame's query plan/expressions (e.g., serialize the LazyFrame's query plan or SQL/AST representation and return a hash of that string) so different query shapes produce different cache keys, or if the LazyFrame is inherently dynamic and should not be cached, remove the @st.cache_data decorator from the function instead.

coderabbitai · 2025-09-15T15:08:28Z

src/workflow/FileManager.py

+        if isinstance(data, (pl.DataFrame, pl.LazyFrame)):
+            path = Path(path, f"{name_tag}.pq")
+            # Convert LazyFrame to DataFrame for writing
+            if isinstance(data, pl.LazyFrame):
+                data = data.sink_parquet(path)
+            else:
+                data.write_parquet(path)
+            return path


⚠️ Potential issue

Fix LazyFrame sink_parquet return value handling

The sink_parquet method returns None when writing to disk, not the data itself. Line 310 incorrectly assigns this None value to data.

Apply this fix:

# Polars DataFrames and LazyFrames are stored as parquet if isinstance(data, (pl.DataFrame, pl.LazyFrame)): path = Path(path, f"{name_tag}.pq") # Convert LazyFrame to DataFrame for writing if isinstance(data, pl.LazyFrame): - data = data.sink_parquet(path) + data.sink_parquet(path) else: data.write_parquet(path) return path

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if isinstance(data, (pl.DataFrame, pl.LazyFrame)):

path = Path(path, f"{name_tag}.pq")

# Convert LazyFrame to DataFrame for writing

if isinstance(data, pl.LazyFrame):

data = data.sink_parquet(path)

else:

data.write_parquet(path)

return path

if isinstance(data, (pl.DataFrame, pl.LazyFrame)):

path = Path(path, f"{name_tag}.pq")

# Convert LazyFrame to DataFrame for writing

if isinstance(data, pl.LazyFrame):

data.sink_parquet(path)

else:

data.write_parquet(path)

return path

🤖 Prompt for AI Agents

In src/workflow/FileManager.py around lines 306 to 313, the code assigns the result of pl.LazyFrame.sink_parquet to `data` even though sink_parquet returns None; change the logic to call `data.sink_parquet(path)` without assigning its return value (i.e., just invoke to write to disk) and then return the `path` as currently done for the eager DataFrame branch; ensure the eager branch still calls `data.write_parquet(path)` and both branches return the path.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

src/workflow/StreamlitUI.py (5)

1056-1066: Tail the selected log and drop “neon” highlighting.

Full read on every tick is costly; tail with deque.
Some logs contain non-UTF8; add errors="replace".
“neon” isn’t a standard lexer; render as plain text.

Apply this diff:

-                    with open(log_path, "r", encoding="utf-8") as f:
-                        lines = f.readlines()
-                    if log_lines_count == "all":
-                        display_lines = lines
-                    else:
-                        display_lines = lines[-st.session_state.log_lines_count:]
+                    from collections import deque
+                    if log_lines_count == "all":
+                        with open(log_path, "r", encoding="utf-8", errors="replace") as f:
+                            # cap to last 50k lines to avoid OOM on huge logs
+                            display_lines = list(deque(f, maxlen=50_000))
+                    else:
+                        with open(log_path, "r", encoding="utf-8", errors="replace") as f:
+                            display_lines = list(deque(f, maxlen=st.session_state.log_lines_count))
                     code_box = st.container(key='log')
-                    code_box.code(
-                        "".join(display_lines), language="neon",
-                        line_numbers=False
-                    )
+                    code_box.code("".join(display_lines), language="text", line_numbers=False)

1074-1076: Fix timestamp timezone label.

Hardcoding “CET” is incorrect on non‑CET hosts. Use local tz or UTC.

Apply this diff:

-                st.markdown(
-                    f"**Workflow log file: {datetime.fromtimestamp(log_path.stat().st_ctime).strftime('%Y-%m-%d %H:%M')} CET**"
-                )
+                ts = datetime.fromtimestamp(log_path.stat().st_mtime).astimezone()
+                st.markdown(f"**Workflow log file: {ts.strftime('%Y-%m-%d %H:%M %Z')}**")

1077-1085: Use not in (E713) and improve error label.

Style fix and clearer status message when finish marker is missing.

Apply this diff:

-                with open(log_path, "r", encoding="utf-8") as f:
+                with open(log_path, "r", encoding="utf-8", errors="replace") as f:
                     content = f.read()
-                # Check if workflow finished successfully
-                if not "WORKFLOW FINISHED" in content:
+                # Check if workflow finished successfully
+                if "WORKFLOW FINISHED" not in content:
                     status.update(
-                        label='Workflow completed.', state='error', 
+                        label='Workflow finished with errors.', state='error',
                         expanded=True
                     )
                     error_box.error("**Errors occurred, check log file.**")

1091-1092: Render final log as text, not “neon”.

“neon” isn’t a recognized lexer; use plain text.

Apply this diff:

-                code_box = st.container(key='log')
-                code_box.code(content, language="neon", line_numbers=False)
+                code_box = st.container(key='log')
+                code_box.code(content, language="text", line_numbers=False)

1038-1049: Precompile regexes for faster repeated matching.

Minor speedup when polling every second.

Example (outside this function):

PCT_RE = re.compile(r"(\d+(?:\.\d+)?)\s*%")
LBL_RE = re.compile(r"Progress of\s+'([^']+)'")

Then use PCT_RE.search(...) and LBL_RE.search(...).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab425f5 and 2452bf2.

📒 Files selected for processing (1)

src/workflow/StreamlitUI.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/workflow/StreamlitUI.py (1)

src/workflow/CommandExecutor.py (1)

stop (254-268)

🪛 Ruff (0.12.2)

src/workflow/StreamlitUI.py

1080-1080: Test for membership should be not in

Convert to not in

(E713)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-full-app
GitHub Check: build-openms

coderabbitai · 2025-09-16T12:17:22Z

src/workflow/StreamlitUI.py

+        log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log")
+        log_exists = log_path.exists() and log_path_complete.exists()



⚠️ Potential issue

Don’t gate log display on “all.log”; split existence checks.

Requiring both the selected log and all.log to exist suppresses output when users pick “minimal”/“commands and run times” and all.log isn’t present yet. Show the selected log whenever it exists; use all.log only for progress parsing.

Apply this diff:

- log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log") - log_exists = log_path.exists() and log_path_complete.exists() + log_path_complete = Path(self.workflow_dir, "logs", "all.log") + selected_log_exists = log_path.exists() + progress_log_exists = log_path_complete.exists()

And update the later checks:

- if log_exists: + if progress_log_exists:

- elif log_exists and not pid_exists: + elif selected_log_exists and not pid_exists:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log")

log_exists = log_path.exists() and log_path_complete.exists()

log_path_complete = Path(self.workflow_dir, "logs", "all.log")

selected_log_exists = log_path.exists()

progress_log_exists = log_path_complete.exists()

🤖 Prompt for AI Agents

In src/workflow/StreamlitUI.py around lines 1016-1018, the code currently requires both the selected log and "all.log" to exist before showing logs; change this so the selected log is shown whenever its own path exists and only use the "all.log" existence check when doing progress parsing. Concretely: replace the joint existence boolean with separate checks (e.g., log_exists = log_path.exists() and all_log_exists = log_path_complete.exists()), use log_exists to gate display of the selected log, and use all_log_exists where the code parses or extracts progress from all.log; update any later conditional branches that relied on the old combined variable to reference the appropriate new boolean.

coderabbitai · 2025-09-16T12:17:23Z

src/workflow/StreamlitUI.py

+                    with open(log_path_complete, "r", encoding="utf-8") as f:
+                        for line in reversed(f.readlines()):
+                            line = line.strip()
+                            match = re.search(r"(\d+(?:\.\d+)?)\s*%", line)
+                            if match and percentage < 0:
+                                percentage = float(match.group(1))/100
+                            match = re.search(r"Progress of\s+'([^']+)'", line)
+                            if match:
+                                label = match.group(1)
+                                break                            
+                            elif "Process finished:" in line:
+                                break


⚠️ Potential issue

Avoid full-file reads; tail the log and fix early-break logic.

Reading the entire all.log every second is O(n) RAM/IO and will lag/freeze on large logs. Also, breaking as soon as the label is found can drop a valid percent if it appears earlier.

Apply this diff:

- with open(log_path_complete, "r", encoding="utf-8") as f: - for line in reversed(f.readlines()): - line = line.strip() - match = re.search(r"(\d+(?:\.\d+)?)\s*%", line) - if match and percentage < 0: - percentage = float(match.group(1))/100 - match = re.search(r"Progress of\s+'([^']+)'", line) - if match: - label = match.group(1) - break - elif "Process finished:" in line: - break + from collections import deque + with open(log_path_complete, "r", encoding="utf-8", errors="replace") as f: + tail = list(deque(f, maxlen=400)) # only keep last ~400 lines + for raw in reversed(tail): + line = raw.strip() + if percentage < 0: + m = re.search(r"(\d+(?:\.\d+)?)\s*%", line) + if m: + percentage = float(m.group(1)) / 100.0 + if label is None: + m = re.search(r"Progress of\s+'([^']+)'", line) + if m: + label = m.group(1) + if (percentage >= 0 or label is not None) and "Process finished:" in line: + break + if percentage >= 0 and label is not None: + break

Note: if you prefer, precompile the regexes once at module scope.
Additional import (outside this hunk):

from collections import deque

🤖 Prompt for AI Agents

In src/workflow/StreamlitUI.py around lines 1040 to 1051, change the logic that currently reads the entire log file into memory and breaks early upon finding a label so it tails only the last N lines (use collections.deque with maxlen, e.g., 500) and scans those lines for both percentage and label without stopping as soon as one is found — precompile the two regexes at module scope (or locally once) and, when iterating the deque in reverse order, update percentage and label as matches are found but only break after both have been considered or when a terminal "Process finished:" line is encountered; this avoids full-file reads and ensures you don’t miss a percentage that appears earlier than the label.

t0mdavid-m added 21 commits September 1, 2025 13:37

fix sequence

eb84854

Merge branch 'develop' into new_features

b625d5e

add progress bar

1030a85

add progress bar for parsing

48971d0

move deconv parsing to polars

614f9cc

make heatmap lazy

777c43e

polars support for file manager

b24f922

parse polars to file manager

7fdfdae

fix duplicate log

5d56ec4

switch deconv.py directly to polars

e3c8840

move log into fragment

9ee4823

Revert "move log into fragment"

f3b1ecd

This reverts commit 9ee4823.

remove fragment for log

599fa0a

optimize lazy loading for heatmap

c853cdd

retain order

86e3921

fix raw heatmap rendering

dd4cdcc

move example data

689a165

enforce expanded state

789a506

handle errors gracefully

9db7465

bump version

2bf8659

update example data

ab425f5

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

remove autorefresh

2452bf2

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

update vue

edf03a1

t0mdavid-m merged commit b2d3c25 into develop Sep 16, 2025
5 checks passed

coderabbitai bot mentioned this pull request Oct 3, 2025

Share Files #57

Merged

coderabbitai bot mentioned this pull request Dec 3, 2025

Add TicChromaogram #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features #50

New Features #50

Uh oh!

t0mdavid-m commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 16, 2025

Uh oh!

coderabbitai bot Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger.log("Progress of 'processing FLASHTnT results':", level=2)
		logger.log("0.0 %", level=2)

		hash_funcs = {pl.LazyFrame : lambda x : 1}
		@st.cache_data(max_entries=4, show_spinner=False, hash_funcs=hash_funcs)

		log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log")
		log_exists = log_path.exists() and log_path_complete.exists()

-        log_path_complete = Path(self.workflow_dir, "logs", 'all' + ".log")
-        log_exists = log_path.exists() and log_path_complete.exists()
+        log_path_complete = Path(self.workflow_dir, "logs", "all.log")
+        selected_log_exists = log_path.exists()
+        progress_log_exists = log_path_complete.exists()

New Features #50

New Features #50

Uh oh!

Conversation

t0mdavid-m commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

t0mdavid-m commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading