This guide walks you through using sprime to analyze high-throughput screening data, from loading raw data to calculating delta S' values for comparative analysis.
- Quick Start
- Terminology reference
- S' derivation pipeline (branches)
- Data Format
- Loading Data
- Processing Data
- Calculating Delta S'
- Complete Example
- Exporting Results
- Data Quality Reporting
- Advanced Usage
- Running the Test Suite
- Troubleshooting
The basic workflow in sprime is:
- Load raw data from CSV ->
RawDataset - Process data (fit curves, calculate S') ->
ScreeningDataset - Analyze using delta S' for comparative analysis
from sprime import SPrime as sp
# Load and process data (use your CSV, or download a sample from the README Quick Start)
raw_data, _ = sp.load(
"your_data.csv",
response_normalization="asymptote_normalized", # or "response_scale" per lab sheet
)
screening_data, _ = sp.process(raw_data)
# Calculate delta S' for comparative analysis
delta_results = screening_data.calculate_delta_s_prime(
reference_cell_lines="normal_cell_line",
test_cell_lines=["tumor_cell_line"]
)Vocabulary: If any terms above are unfamiliar, use the Terminology reference (plain-language definitions and sprime-specific usage).
Before you format CSVs, it helps to know which conceptual path your study uses. sprime supports multiple conceptual paths: datasets may ship pre-calculated curve parameters (EC50, asymptotes) or raw dose–response points; raw workflows are encouraged when available so curves are fit and QC'd consistently.
CSV Control_Response holds the vehicle (e.g. DMSO) readout for that row.
skip_control_response_normalization=False(default):processapplies response ÷Control_Response, then the import-time flagresponse_normalization:"asymptote_normalized"→sprime.response_pipeline.pipeline_asymptote_normalized: ratio, then rescale so the curve maximum is 1, then ×100."response_scale"→pipeline_response_scale: ratio, then ×100 only.
skip_control_response_normalization=True:processdoes not divide byControl_Response(responses are already on the analysis scale; empty control cells allowed on those rows). You still passresponse_normalizationonloadto document the intended scale of the numbers.
Those steps are specified in S' derivation pipeline Sec.3.4.
Rows with only imported Hill parameters (no raw DATA*/CONC* or list curve) have no per-point vehicle ratio step. You still must pass response_normalization on load (required API); it applies only to rows that also contain raw curves in the same file.
The keyword response_normalization names only the post–control-ratio scaling pipeline above. It is not “asymptote vs response level” in the sense of the Hill CSV fields Zero / Lower and Inf / Upper—those are fitted curve endpoints, not this import flag.
- DMSO (vehicle)-relative ratios for raw readouts help control plate-to-plate variation;
Control_Responseis the per-row vehicle readout (DMSO is shorthand). - Default import validates
Control_Response(non-empty, non-zero) whenskip_control_response_normalization=False. - ×100 is a conventional multiplier in both major raw pipelines for readable magnitudes; it is not a biological parameter by itself.
Library implementation: sprime.response_pipeline also exposes ratios_to_control, normalize_to_max_value, and scale_responses. SPrime.load, RawDataset.load_from_file, get_s_primes_from_file, and get_s_prime_from_data require keyword-only response_normalization= so the lab's sheet style is fixed at first validation; process applies it after the optional test/control step.
Variation reference (tests): tests/fixtures/SPrime_variation_reference.csv encodes reference spreadsheet cases (parallel normalized vs response-scale columns). Values are human-maintained; do not assume every cell matches a strict float pipeline without checking.
Hands-on: open demonstration.ipynb (Import-time choices and the five-route table) beside the demo_*.csv files.
Canonical narrative: S' derivation pipeline · Glossary: Terminology reference.
sprime expects CSV files with the following structure:
- Compound Name: Name of the compound/drug
- Compound_ID: Unique identifier for the compound (required)
- Cell_Line: Name of the cell line being tested
- Concentration_Units: Required when using raw dose-response data (Path A). Units for concentration values (e.g.
microM,nM). See Supported concentration units below.
- Control_Response: Per-row vehicle (DMSO) control readout for raw Path A, same units as
DATA*/Responses. By default (skip_control_response_normalization=False), the loader requires this column and a non-zero value on each raw row. Templates use a toy value; demo_raw_vehicle_control_s_prime.csv ships raw % nucleus and uM concentrations row-aligned withtests/fixtures/SPrime_variation_reference.csv, plus 35.3 from the DMSO row. If responses are already control-normalized (e.g. ipNF-style demos), keep the column present but empty and passskip_control_response_normalization=Truewhen loading--see demo_precontrol_normalized_s_prime.csv and demo_data_control_response_questions.md. - pubchem_sid: PubChem substance identifier
- SMILES: Chemical structure notation
- Cell_Line_Ref_ID: Reference identifier for the cell line
- NCGCID: Optional pass-through per compound (not used for validation)
Raw data can be in two layouts. Use columns (default) or list via values_as:
values_as="columns"(default): One column per value --Data0..DataN,Conc0..ConcN.values_as="list": Two columns --ResponsesandConcentrations. Each cell holds comma-separated values (e.g."4000,300,2","10,3,0.1"). Order must match; same length; >=4 pairs. If your CSV is comma-delimited, quote those cells.
raw_data, _ = sp.load("your_data.csv", values_as="columns") # default
# or
raw_data, _ = sp.load("your_data_list_format.csv", values_as="list")Templates: template_raw.csv (columns), template_raw_list.csv (list).
For raw dose-response data (columns format), use columns named:
- Data0, Data1, Data2, ... DataN: Response values at each concentration
- Conc0, Conc1, Conc2, ... ConcN: Corresponding concentration values
For list format, use Responses and Concentrations (comma-separated values in one cell each).
When using raw data (Path A), Concentration_Units is required. All values are converted to microM internally. Supported units (case-insensitive), smallest to largest: fM (fm, femtom); pM (pm, picom); nM (nm, nanom); microM (uM, um, microm, micro); mM (mm, millim); M (m, mol).
If you already have fitted Hill curve parameters:
- AC50 or ec50: Half-maximal concentration
- Zero_asymptote (or legacy Lower/Zero): Zero asymptote (response as concentration -> 0)
- Inf_asymptote (or legacy Upper/Infinity): Inf asymptote (response at saturating concentration)
- Hill_Slope (or Hill, slope): Hill coefficient
- r2 or R^2: R-squared goodness of fit
Each row is taken as-is. Empty values are treated as null; there is no forward-filling from previous or subsequent rows. If a cell is empty (e.g. MOA), that field is null for that row. Provide explicit values in every row where you want them.
Template files: template_raw.csv (raw columns), template_raw_list.csv (raw list), template_precalc.csv (pre-calculated). Templates put required columns first. Your CSV must use the same header names; column order in the file does not matter.
Use your own file path, or download a sample: demo_precontrol_normalized_delta.csv (raw, pre-control-normalized) / demo_precontrol_normalized_precalc.csv (pre-calculated), then load:
from sprime import SPrime as sp
# You can download samples from the URLs above, then load your file.
# Raw Path A: non-zero Control_Response per row by default, or
# skip_control_response_normalization=True if already control-normalized.
raw_data, _ = sp.load(
"your_data.csv",
response_normalization="asymptote_normalized",
)
# Or specify an assay name
raw_data, _ = sp.load(
"your_data.csv",
assay_name="HTS001",
response_normalization="asymptote_normalized",
)# Check how many profiles were loaded
print(f"Loaded {len(raw_data)} profiles")
# Iterate over profiles
for profile in raw_data.profiles:
print(f"{profile.compound.name} vs {profile.cell_line.name}")
if profile.concentrations:
print(f" Has raw data: {len(profile.concentrations)} data points")
if profile.hill_params:
print(f" Has pre-calculated params: EC50={profile.hill_params.ec50}")
if profile.metadata:
print(f" Metadata: {profile.metadata}")# Get a specific profile by compound and cell line
profile = raw_data.get_profile("DRUG001", "Cell_Line_1")
if profile:
print(f"Found profile: {profile.compound.name}")
else:
print("Profile not found")
# You can also pass Compound and CellLine objects
from sprime import Compound, CellLine
compound = Compound(name="Drug A", drug_id="DRUG001")
cell_line = CellLine(name="Cell_Line_1")
profile = raw_data.get_profile(compound, cell_line)# Access assay information
print(f"Assay name: {raw_data.assay.name}")
print(f"Assay description: {raw_data.assay.description}")
# Create assay with additional metadata
from sprime import Assay
assay = Assay(
name="HTS002",
description="High-throughput screen for neurofibroma cell lines",
screen_id="HTS002",
readout_type="activity",
time_profile="48Hr"
)Processing fits Hill curves (if needed) and calculates S' values:
from sprime import SPrime as sp
# Load data (use your CSV, or download a sample from the README Quick Start)
raw_data, _ = sp.load("your_data.csv")
# Process: fit curves and calculate S'
screening_data, _ = sp.process(raw_data)
# Access results
for profile in screening_data.profiles:
print(f"{profile.compound.name}: S' = {profile.s_prime:.2f}")When your CSV has both raw dose-response data (DATA*/CONC*) and pre-calculated curve parameters (AC50, Upper/Lower or Zero/Inf asymptotes, Hill_Slope, r2), sprime fits from raw and would overwrite those pre-calc values. By default it raises unless you explicitly allow overwriting:
# CSV has both raw DATA/CONC and AC50/Upper/Lower columns - allow overwrite
screening_data, _ = sp.process(raw_data, allow_overwrite_precalc_params=True)allow_overwrite_precalc_params=False(default): Raise if we would overwrite pre-calculated curve parameters.allow_overwrite_precalc_params=True: Fit from raw, overwrite pre-calc (EC50, asymptotes, steepness, r^2), and log a warning (console + report).
Use True when you intentionally want to refit from raw (e.g. demo data, legacy files with both). Same option exists for get_s_prime_from_data and get_s_primes_from_file.
You can pass curve-fitting parameters (distinct from allow_overwrite_precalc_params) to control the Hill fit:
# Curve-fitting parameters (maxfev, initial_ec50, etc.)
screening_data, _ = sp.process(
raw_data,
curve_direction="up", # Force increasing curve
maxfev=10000, # Faster fitting
initial_ec50=10.0 # Better initial guess
)
# When CSV has both raw + pre-calc, allow overwrite and optionally pass fit params
screening_data, _ = sp.process(
raw_data,
allow_overwrite_precalc_params=True,
maxfev=10000
)Common fitting parameters:
curve_direction:"up"(increasing),"down"(decreasing), orNone(auto-detect)maxfev: Maximum function evaluations (default: 3,000,000)initial_zero_asymptote,initial_inf_asymptote,initial_ec50,initial_steepness_coefficient: Initial parameter guessesbounds: Parameter bounds as([lower_bounds], [upper_bounds])tuples (per-parameter min/max lists)zero_replacement: Value to replace zero concentrations (default: 1e-24)
See Hill Curve Fitting Configuration for all available parameters.
You can also process profiles individually:
# Get a profile from raw data
profile = raw_data.get_profile("DRUG001", "Cell_Line_1")
# Fit curve and calculate S' in one step
s_prime = profile.fit_and_calculate_s_prime()
# Or do it step by step
profile.fit_hill_curve()
s_prime = profile.calculate_s_prime()
# Access fitted parameters
print(f"EC50: {profile.hill_params.ec50}")
print(f"Zero asymptote: {profile.hill_params.zero_asymptote}")
print(f"Inf asymptote: {profile.hill_params.inf_asymptote}")
print(f"R^2: {profile.hill_params.r_squared}")
print(f"S': {profile.s_prime}")If processing fails for some profiles, you can handle errors gracefully:
from sprime import SPrime as sp
raw_data, _ = sp.load("your_data.csv")
screening_data = ScreeningDataset(assay=raw_data.assay)
# Process profiles one at a time to handle errors
for profile in raw_data.profiles:
try:
# Create a copy for processing
processed = DoseResponseProfile(
compound=profile.compound,
cell_line=profile.cell_line,
assay=profile.assay,
concentrations=profile.concentrations,
responses=profile.responses,
hill_params=profile.hill_params,
s_prime=profile.s_prime
)
# Process if needed
if processed.hill_params is None and processed.concentrations:
processed.fit_hill_curve()
if processed.s_prime is None:
processed.calculate_s_prime()
screening_data.add_profile(processed)
except (ValueError, RuntimeError) as e:
print(f"Failed to process {profile.compound.name} vs {profile.cell_line.name}: {e}")
continueDelta S' (ΔS') compares drug responses between reference and test cell lines. This is useful for identifying compounds with selective activity.
# Calculate delta S' = S'(reference) - S'(test)
delta_results = screening_data.calculate_delta_s_prime(
reference_cell_lines="normal_tissue",
test_cell_lines=["tumor_cell_line"]
)
# Access results
for ref_cellline, comparisons in delta_results.items():
print(f"\nReference: {ref_cellline}")
for comp in comparisons:
print(f" {comp['compound_name']}: Delta S' = {comp['delta_s_prime']:.2f}")# Compare multiple cell lines
delta_results = screening_data.calculate_delta_s_prime(
reference_cell_lines=["normal_tissue_1", "normal_tissue_2"],
test_cell_lines=["tumor_line_1", "tumor_line_2", "tumor_line_3"]
)Delta S' results are organized by reference cell line:
delta_results = {
"reference_cell_line_1": [
{
"compound_name": "Drug A",
"drug_id": "DRUG001",
"test_cell_line": "tumor_line_1",
"s_prime_reference": 2.5,
"s_prime_test": 4.0,
"delta_s_prime": -1.5 # Negative = more effective in test
},
# ... more comparisons
],
"reference_cell_line_2": [
# ... comparisons for second reference
]
}Interpreting Delta S':
- Negative values: More effective in test cell line (higher S' in test)
- Positive values: More effective in reference cell line (higher S' in reference)
- More negative = more selective: Compounds with more negative delta S' are more selective for the test (e.g., tumor) cell line
# Calculate delta S'
delta_results = screening_data.calculate_delta_s_prime(
reference_cell_lines="normal_tissue",
test_cell_lines=["tumor_line"]
)
# Extract and sort by delta S' (most negative = most selective)
for ref_cellline, comparisons in delta_results.items():
sorted_comps = sorted(comparisons, key=lambda x: x['delta_s_prime'])
print(f"\nRanking for {ref_cellline} vs tumor_line:")
for rank, comp in enumerate(sorted_comps, start=1):
print(f"{rank}. {comp['compound_name']}: Delta S' = {comp['delta_s_prime']:.2f}")Here's a complete example using the demo data:
from sprime import SPrime as sp
# 1. Load data (use your CSV, or download docs/usage/demo_precontrol_normalized_s_prime.csv from the repo)
print("Loading data...")
raw_data, _ = sp.load("docs/usage/demo_precontrol_normalized_s_prime.csv")
print(f"Loaded {len(raw_data)} profiles")
# 2. Process data (fit curves, calculate S')
print("\nProcessing data (fitting curves, calculating S')...")
screening_data, _ = sp.process(raw_data)
print("Processing complete!")
# 3. Display S' values
print("\n=== S' Values ===")
for profile in screening_data.profiles:
print(f"{profile.compound.name} vs {profile.cell_line.name}: "
f"S' = {profile.s_prime:.2f}")
# 4. Calculate delta S' for comparative analysis
print("\n=== Delta S' Analysis ===")
delta_results = screening_data.calculate_delta_s_prime(
reference_cell_lines="Normal_Tissue",
test_cell_lines=["Tumor_Cell_Line"]
)
# 5. Display and rank results
for ref_cellline, comparisons in delta_results.items():
print(f"\nReference: {ref_cellline}")
# Sort by delta S' (most negative = most selective for tumor)
sorted_comps = sorted(comparisons, key=lambda x: x['delta_s_prime'])
print(f"\nRanking (most selective for tumor first):")
for rank, comp in enumerate(sorted_comps, start=1):
print(f" {rank}. {comp['compound_name']}: "
f"Delta S' = {comp['delta_s_prime']:.2f} "
f"(S' ref={comp['s_prime_reference']:.2f}, "
f"S' test={comp['s_prime_test']:.2f})")The easiest way to export results is using the built-in CSV export methods:
# Export all profiles to CSV
screening_data.export_to_csv("master_s_prime_table.csv", include_metadata=True)
# Export delta S' results to CSV
delta_results = screening_data.calculate_delta_s_prime(...)
ScreeningDataset.export_delta_s_prime_to_csv(delta_results, "delta_s_prime_table.csv")The export_to_csv() method includes all profile information including Hill curve parameters, S' values, ranking, and optional metadata (any non-reserved columns from your CSV). Delta S' export adds reserved compound-level columns MOA and drug targets (resolved from common header variants) plus any optional headings you specify.
For programmatic access or custom export formats:
# Export all profiles as dictionaries
results = screening_data.to_dict_list()
# Save to JSON, CSV, etc.
import json
with open("results.json", "w") as f:
json.dump(results, f, indent=2)If you have data as a list of dictionaries (e.g., from a database query), you can process it directly:
from sprime import get_s_prime_from_data, calculate_delta_s_prime
# List of dicts matching CSV row format
list_of_rows = [
{
'Compound Name': 'Drug A',
'Compound_ID': 'DRUG001',
'Cell_Line': 'Cell_Line_1',
'Data0': '10', 'Data1': '20', 'Data2': '50', 'Data3': '90',
'Conc0': '0.1', 'Conc1': '1', 'Conc2': '10', 'Conc3': '100',
},
# ... more rows
]
# Calculate S' values directly from list of dicts
results = get_s_prime_from_data(
list_of_rows,
response_normalization="asymptote_normalized",
)
# If rows have both raw DATA/CONC and pre-calc AC50/Upper/Lower, allow overwrite:
# results = get_s_prime_from_data(
# list_of_rows,
# allow_overwrite_precalc_params=True,
# response_normalization="asymptote_normalized",
# )
# Calculate delta S' from list of dicts
delta_results = calculate_delta_s_prime(
results, # Can use list of dicts directly
reference_cell_line_names="Reference_Cell_Line",
test_cell_line_names="Test_Cell_Line"
)- Skim the Terminology reference when a label or pipeline name is unclear
- Learn about Hill Curve Fitting Configuration
- Read about Understanding 4PL Dose-Response Curves
- Review Background and Concepts for qHTS context and S' metric details
- Follow the branching narrative in S' derivation pipeline
Your CSV has both raw dose-response columns (Data0..DataN, Conc0..ConcN) and pre-calc Hill params (AC50, Upper, Lower, Hill_Slope, r2). By default sprime raises to avoid silently overwriting user-supplied values.
Fix: Set allow_overwrite_precalc_params=True when you intend to refit from raw and overwrite pre-calc:
screening_data, _ = sp.process(raw_data, allow_overwrite_precalc_params=True)
# or
results = get_s_prime_from_data(
list_of_rows,
allow_overwrite_precalc_params=True,
response_normalization="asymptote_normalized",
)When you allow overwrite, sprime logs a warning (console + report) each time pre-calc Hill params are overwritten by fitted values.
This means a profile has no raw data and no pre-calculated Hill parameters. Ensure your CSV has either:
- Data0..DataN and Conc0..ConcN columns (raw data), OR
- AC50, Upper, Lower columns (pre-calculated parameters)
Common causes:
- Missing concentration or response columns
- Empty or malformed numeric values
- Insufficient data points (need at least 4 points for 4-parameter fit)
If curve fits have low R^2 (< 0.7), consider:
- Checking data quality (outliers, missing points)
- Ensuring concentrations span the full response range
- Adjusting initial parameter guesses
- See Hill Curve Fitting Configuration for options
Note: Low R^2 values may indicate:
- Non-sigmoidal dose-response relationship
- Experimental artifacts or errors
- Insufficient concentration range
If calculate_delta_s_prime() returns empty results:
- Verify cell line names match exactly (case-sensitive)
- Ensure you have data for both reference and test cell lines
- Check that S' values were calculated successfully
Debugging tip: Check individual profiles first:
profile = screening_data.get_profile("DRUG001", "Cell_Line_1")
if profile:
print(f"S' = {profile.s_prime}")
else:
print("Profile not found")If you see ImportError: Hill curve fitting requires scipy:
- Install scipy:
pip install scipy - Note: numpy is automatically installed as a dependency of scipy
If CSV export fails:
- Ensure the output directory exists
- Check file permissions
- Verify the ScreeningDataset has profiles with S' values calculated
If metadata is not appearing in exports:
- Ensure non-reserved columns exist in your CSV (e.g. MOA, drug targets, Target, MoA); these are stored as generic metadata under the exact header.
- Check that
include_metadata=Trueis set when callingexport_to_csv(). - For delta S' tables, MOA and drug targets are reserved compound-level columns and are resolved from common variants (MOA/MoA/moa, drug targets/Target/target) at export time.
- Verify metadata was extracted during loading (check
profile.metadata).
If you have data from multiple assays, process them separately:
from sprime import SPrime as sp
# Load and process each assay separately
assay1_data, _ = sp.load("assay1_data.csv", assay_name="HTS001")
assay1_processed, _ = sp.process(assay1_data)
assay2_data, _ = sp.load("assay2_data.csv", assay_name="HTS002")
assay2_processed, _ = sp.process(assay2_data)
# Compare results across assays (be cautious - different conditions)
for profile1 in assay1_processed.profiles:
profile2 = assay2_processed.get_profile(
profile1.compound.drug_id,
profile1.cell_line.name
)
if profile2:
print(f"{profile1.compound.name}: S' in HTS001={profile1.s_prime:.2f}, "
f"S' in HTS002={profile2.s_prime:.2f}")# Filter profiles by S' threshold
high_s_prime = [
p for p in screening_data.profiles
if p.s_prime and p.s_prime > 2.0
]
# Filter by compound
from sprime import Compound
target_compound = Compound(name="Target Drug", drug_id="TARGET001")
target_profiles = [
p for p in screening_data.profiles
if p.compound.drug_id == target_compound.drug_id
]
# Filter by cell line
tumor_profiles = [
p for p in screening_data.profiles
if "tumor" in p.cell_line.name.lower()
]
# Sort by S' value
sorted_profiles = sorted(
screening_data.profiles,
key=lambda p: p.s_prime if p.s_prime else float('-inf'),
reverse=True
)You can create DoseResponseProfile objects programmatically (without loading from CSV), then fit and calculate S':
from sprime import DoseResponseProfile, Compound, CellLine, Assay
compound = Compound(name="Trifluoperazine", drug_id="NCGC00013226-15")
cell_line = CellLine(name="ipNF96.11C")
assay = Assay(name="HTS002", readout_type="activity")
profile = DoseResponseProfile(
compound=compound,
cell_line=cell_line,
assay=assay,
concentrations=[1.30e-9, 3.91e-9, 1.17e-8, 3.52e-8, 1.06e-7, 3.17e-7, 9.50e-7, 2.85e-6, 8.55e-6, 2.56e-5, 7.69e-5],
responses=[-63.23, -66.40, -67.04, -65.47, -63.59, -60.30, -47.43, 46.75, 69.12, 97.97, 85.27],
concentration_units="microM"
)
s_prime = profile.fit_and_calculate_s_prime()
print(f"S' = {s_prime:.2f}")If your data already has fitted parameters, you can skip curve fitting:
from sprime import SPrime as sp, HillCurveParams
# Load data with pre-calculated parameters
raw_data, _ = sp.load("data_with_params.csv")
# Profiles will have hill_params set
for profile in raw_data.profiles:
if profile.hill_params:
# Just calculate S' - no fitting needed
s_prime = profile.calculate_s_prime()
print(f"{profile.compound.name}: S' = {s_prime:.2f}")
# Or manually set parameters
profile = raw_data.get_profile("DRUG001", "Cell_Line_1")
if profile:
profile.hill_params = HillCurveParams(
ec50=10.0,
zero_asymptote=0.0,
inf_asymptote=100.0,
steepness_coefficient=1.5,
r_squared=0.95
)
s_prime = profile.calculate_s_prime()from pathlib import Path
from sprime import SPrime as sp
# Process all CSV files in a directory
data_dir = Path("screening_data")
results = []
for csv_file in data_dir.glob("*.csv"):
print(f"Processing {csv_file.name}...")
try:
raw_data, _ = sp.load(csv_file)
screening_data, _ = sp.process(raw_data)
results.append({
'file': csv_file.name,
'assay': screening_data.assay.name,
'num_profiles': len(screening_data),
'data': screening_data
})
except Exception as e:
print(f"Error processing {csv_file.name}: {e}")
continue
# Combine results or analyze separately
for result in results:
print(f"{result['file']}: {result['num_profiles']} profiles")sprime automatically tracks data quality issues and warnings during data loading and processing. By default, a summary report is printed to the console when processing data. You can configure reporting to disable console output, enable verbose mode, or write detailed log files.
When you load or process data, sprime automatically prints a summary report to the console:
from sprime import SPrime as sp
# Console summary is printed automatically
raw_data, load_report = sp.load("data.csv")
screening_data, process_report = sp.process(raw_data)Example Console Output:
============================================================
DATA PROCESSING SUMMARY
============================================================
Total Rows: 247
Rows Processed: 245
Rows Skipped: 2
Compounds Loaded: 58
Profiles Created: 233
Profiles with S' Calculated: 221
Profiles Failed: 12
DATA QUALITY ISSUES:
Missing Drug IDs: 2
Missing Compound Names: 5
Insufficient Data Points: 8
Invalid Numeric Values: 3
WARNINGS: 25 total
MISSING_DATA: 10
Row 12: Missing Compound_ID, row skipped (CL:A549)
Row 23: Insufficient data points: 3 found (ID:DRUG004, CL:HeLa)
... and 8 more (see log file for details)
============================================================
Use ReportingConfig to control reporting behavior globally:
from sprime import SPrime as sp, ReportingConfig, ConsoleOutput
# Configure reporting settings
ReportingConfig.configure(
log_to_file=True, # Enable log file writing
log_filepath="processing.log", # Optional: custom log file path
console_output=ConsoleOutput.SUMMARY # Console verbosity
)Summary Mode (Default):
from sprime import ReportingConfig, ConsoleOutput
# Default: Brief summary with counts and first 3 examples per category
ReportingConfig.configure(console_output=ConsoleOutput.SUMMARY)Verbose Mode:
# Print all warnings to console with full details
ReportingConfig.configure(console_output=ConsoleOutput.VERBOSE)Disable Console Output:
# No console output (useful for batch processing)
ReportingConfig.configure(console_output=ConsoleOutput.NONE)Enable detailed log file writing:
from sprime import SPrime as sp, ReportingConfig
# Enable log file (auto-generated from input filename)
ReportingConfig.configure(log_to_file=True)
raw_data, report = sp.load("data.csv")
# -> Creates "data_processing.log" automaticallyCustom Log File Path:
from sprime import SPrime as sp, ReportingConfig
# Specify custom log file path
ReportingConfig.configure(
log_to_file=True,
log_filepath="my_custom_log.log"
)
raw_data, report = sp.load("data.csv")
# -> Writes to "my_custom_log.log"Log File Format: The log file contains:
- Summary metrics (rows processed, compounds loaded, profiles created)
- Data quality issue counts
- Detailed warnings grouped by category with row numbers and context
- Each warning includes: Drug ID, Compound Name, Cell Line, Field Name
Example 1: Default Usage (Console Summary Only)
from sprime import SPrime as sp
# Default: Console summary printed, no log file
raw_data, load_report = sp.load("data.csv")
screening_data, process_report = sp.process(raw_data)Example 2: Enable Log File
from sprime import SPrime as sp, ReportingConfig
# Enable log file writing
ReportingConfig.configure(log_to_file=True)
raw_data, load_report = sp.load("data.csv")
# -> Console summary printed
# -> Log file written: "data_processing.log"
screening_data, process_report = sp.process(raw_data)
# -> Console summary printed
# -> Log file appended/updatedExample 3: Verbose Console Output
from sprime import SPrime as sp, ReportingConfig, ConsoleOutput
# Verbose console output
ReportingConfig.configure(console_output=ConsoleOutput.VERBOSE)
raw_data, report = sp.load("data.csv")
# -> All warnings printed to console with full detailsExample 4: Silent Mode (Log File Only)
from sprime import SPrime as sp, ReportingConfig, ConsoleOutput
# No console output, but write log file
ReportingConfig.configure(
log_to_file=True,
console_output=ConsoleOutput.NONE
)
raw_data, report = sp.load("data.csv")
# -> No console output
# -> Log file written: "data_processing.log"Example 5: Reset to Defaults
from sprime import ReportingConfig
# Reset all settings to defaults
ReportingConfig.reset()
# -> Console summary enabled, log file disabledWarning Categories:
MISSING_DATA: Missing required fields (Compound_ID, Cell_Line, insufficient data points)DATA_QUALITY: Invalid values, non-numeric data, missing optional fieldsNUMERICAL: NaN/Inf values encounteredCURVE_FIT: Fitting failures, poor fit quality (R^2 < 0.7)CALCULATION: S' calculation failures
Row Numbers:
- Row numbers start at 2 (row 1 is the CSV header)
- Row number 0 indicates warnings from processing (not from CSV loading)
- Fully blank rows are skipped silently (not logged)
Using Reports Programmatically:
from sprime import SPrime as sp
raw_data, report = sp.load("data.csv")
# Access report metrics
print(f"Compounds loaded: {report.compounds_loaded}")
print(f"Warnings: {len(report.warnings)}")
# Access individual warnings
for warning in report.warnings:
if warning.category == "MISSING_DATA":
print(f"Row {warning.row_number}: {warning.message}")- Development/Interactive Use: Use default settings (console summary) to see issues immediately
- Batch Processing: Disable console output (
ConsoleOutput.NONE) and enable log files - Debugging: Use verbose mode (
ConsoleOutput.VERBOSE) to see all warnings in console - Production: Enable log files for audit trails and quality tracking
sprime includes a comprehensive test suite to verify functionality and catch regressions. Running the tests ensures that the library is working correctly in your environment.
Install the development dependencies:
pip install -e ".[dev]"Or install pytest directly:
pip install pytest pytest-covRun the complete test suite:
# From the project root directory
pytest tests/This will run all tests in the tests/ directory:
test_hill_fitting.py- Tests for Hill curve fitting functionalitytest_sprime.py- Tests for core sprime module functionalitytest_integration.py- End-to-end integration tests
Run tests for a specific module:
# Test only Hill curve fitting
pytest tests/test_hill_fitting.py
# Test only core sprime functionality
pytest tests/test_sprime.py
# Test only integration workflows
pytest tests/test_integration.pyRun a specific test class or function:
# Run a specific test class
pytest tests/test_sprime.py::TestDoseResponseProfile
# Run a specific test function
pytest tests/test_sprime.py::TestDoseResponseProfile::test_fit_hill_curve
# Run tests matching a pattern
pytest tests/ -k "delta" # Runs all tests with "delta" in the nameGet more detailed output:
# Verbose output showing each test
pytest tests/ -v
# Very verbose output with print statements
pytest tests/ -v -s
# Show local variables on failure
pytest tests/ -v -lGenerate a coverage report to see which code is tested:
# Run tests with coverage
pytest tests/ --cov=src/sprime --cov-report=html
# View coverage report
# Open htmlcov/index.html in your browser
# Or get a terminal report
pytest tests/ --cov=src/sprime --cov-report=termThe test suite covers:
Core Functionality:
- Value object creation (Compound, CellLine, Assay, HillCurveParams)
- Dose-response profile operations
- RawDataset loading from CSV
- ScreeningDataset processing and analysis
- S' value calculations
- Delta S' calculations
Edge Cases:
- Missing or malformed data
- Insufficient data points
- Failed curve fits
- Missing profiles in delta S' calculations
- CSV parsing with various formats
Integration Tests:
- Complete workflows (Load -> Process -> Analyze)
- Multiple compounds and cell lines
- Metadata extraction and preservation
- CSV export functionality
- In-memory data processing
If tests fail:
-
Check dependencies: Ensure scipy and numpy are installed
pip install scipy numpy
-
Check Python version: sprime requires Python 3.8+
python --version
-
Run with verbose output to see detailed error messages:
pytest tests/ -v -s
-
Check for import errors: Ensure you're running from the project root
cd /path/to/sprime pytest tests/
Common test failures:
- ImportError: Make sure you're in the project root directory
- FileNotFoundError: Some tests create temporary files - ensure write permissions
- AssertionError: Check that your data matches expected formats
If contributing to sprime, ensure all tests pass before submitting:
# Run full test suite
pytest tests/
# Lint (Ruff -- see docs/background/development.md); use repo root so docs/notebooks are included
ruff check .
# Run with coverage
pytest tests/ --cov=src/sprime --cov-report=term-missingFor more information on testing, see the pytest documentation.