MALCA: Multi-timescale ASAS-SN Light Curve Analysis

MALCA is a Bayesian event-detection pipeline for finding dimming and dipping events in ASAS-SN photometric light curves. It fits per-camera Gaussian process baselines, scores candidate events via marginal log-likelihood grids and leave-one-out posterior probabilities, and applies multi-stage quality filters to produce a catalog of dipper candidates. Post-detection modules add multi-wavelength characterization (Gaia, WISE, dust maps) and astrophysical classification.

Install

# Requires Python >= 3.9
git clone https://github.com/calderlen/malca.git && cd malca
pip install -e "."          # installs all runtime + test dependencies

Conda option:

conda env create -f environment.yml
conda activate malca

Input Files

Per-mag-bin directories: <lcsv2_root>/<mag_bin>/
- Index CSVs: index*.csv with columns like asas_sn_id, ra_deg, dec_deg, pm_ra, pm_dec, ...
Light curves: lc<num>_cal/ folders containing <asas_sn_id>.dat2
Optional catalogs:
- VSX crossmatch: input/vsx/asassn_x_vsx_matches_20250919_2252.csv (pre-crossmatched with columns: asas_sn_id, sep_arcsec, class)
- Raw VSX: input/vsx/vsxcat.090525.csv (used by vsx/filter.py to generate crossmatch)
- Note: Bright nearby star (BNS) filtering is handled upstream by ASAS-SN during LC generation

Dependencies

Core + runtime modules: numpy, pandas, scipy, numba, astropy, celerite2, matplotlib, tqdm, pyarrow
Review + plotting: dash, dash-bootstrap-components, plotly
Characterization + catalog access: astroquery, dustmaps3d, pyvo, banyan-sigma, requests
ML utilities: lightgbm, joblib

Quick Start

# Build manifest (source_id → path index)
malca manifest --index-root /path/to/lcsv2 --lc-root /path/to/lcsv2 --mag-bin 13_13.5 --out output/manifest.parquet --workers 10

# Run event detection pipeline
malca pipeline --mag-bin 13_13.5 --workers 10 --lc-root /path/to/lcsv2 --index-root /path/to/lcsv2 --output output/results.parquet --min-mag-offset 0.1

# Validate results against known candidates (no raw data needed)
malca validate --results output/results.parquet

# Plot light curves
malca plot --input /path/to/lc123.dat2 --out-dir output/plots

# Apply quality filters
malca filter --input output/results.parquet --output output/filtered.parquet

# Multi-wavelength characterization (post-detection)
malca characterize --input output/filtered.parquet --output output/characterized.parquet --dust --starhorse input/starhorse/starhorse2021.parquet

# Get help for any command
malca --help
malca pipeline --help

Minimal split workflow (cluster -> home):

# On cluster: run upstream/raw-dependent steps and export transfer bundle
malca pipeline --stage cluster --mag-bin 13_13.5 --out-dir output/run_001 --export-bundle output/run_001_bundle.zip

# On home machine: import bundle and run downstream/catalog steps only
malca pipeline --stage home --out-dir output/run_001 --import-bundle ~/Downloads/run_001_bundle.zip

Pipeline Architecture

flowchart TB

    %% ── Data Sources ─────────────────────────────────────────
    subgraph sources["Data Sources"]
        RAW["ASAS-SN .dat2 Light Curves"]
        IDX["Index CSVs<br/>(per mag bin)"]
        SKY["SkyPatrol CSVs"]
        VSX_RAW["VSX Catalog"]
        GAIA_SRC["Gaia DR3"]
        SH_SRC["StarHorse Catalog"]
        DUST_SRC["3D Dust Maps<br/>(Wang+ 2025)"]
    end

    %% ── Data Preparation ─────────────────────────────────────
    subgraph prep["Data Preparation"]
        MAN["manifest.py<br/>Build source_id-to-path index"]
        MAN_OUT[("Manifest .parquet")]
        MAN --> MAN_OUT

        subgraph vsxtools["VSX Preprocessing (vsx/)"]
            VFILT["filter.py<br/>Clean variable classes"]
            VCROSS["crossmatch.py<br/>PM-corrected positional match"]
            VFILT --> VCROSS
        end
        VCROSS --> VSX_MATCH[("VSX Crossmatch")]
    end

    RAW --> MAN
    IDX --> MAN
    VSX_RAW --> VFILT

    %% ── Discovery Pipeline ───────────────────────────────────
    subgraph discovery["Discovery Pipeline (detect.py orchestrator)"]
        TAG["tag.py<br/>Sparse-LC, multi-camera,<br/>VSX quality tags"]
        EVENTS["events.py<br/>Bayesian detection, morphology fits,<br/>recurrence analysis, Bayes factors"]
        FILT["filter.py<br/>Evidence strength, run robustness,<br/>morphology, periodicity,<br/>Gaia RUWE/PM, periodic catalogs"]
        TAG --> EVENTS --> FILT
    end

    MAN_OUT --> TAG
    VSX_MATCH -.-> TAG
    GAIA_SRC -.-> FILT
    FILT --> CAND[("Candidates .parquet")]

    %% ── Post-Detection Characterization ──────────────────────
    subgraph postdet["Post-Detection"]
        CHAR["characterize.py<br/>Gaia astrometry/photometry, 3D dust,<br/>YSO classes, galactic coords,<br/>BANYAN, IPHAS, SFR, clusters, unWISE"]
        VET["vetting.py<br/>SIMBAD, Gaia variability/EB,<br/>ASAS-SN Var, ZTF, TNS, ALeRCE,<br/>eROSITA, ATLAS, NEOWISE"]
        CLASS["classify.py<br/>EB/CV/starspot/disk/YSO"]

        subgraph enrichgrp["Enrichment (enrich/)"]
            NEIGH["neighbor.py<br/>Gaia, 2MASS, AllWISE, VSX"]
            SPECTRA["spectra.py<br/>SDSS, LAMOST, GALAH, RAVE"]
        end

        CHAR --> VET --> CLASS --> enrichgrp
    end

    CAND --> CHAR
    GAIA_SRC -.-> CHAR
    DUST_SRC -.-> CHAR
    SH_SRC -.-> CHAR
    enrichgrp --> ENRICHED[("Enriched .parquet")]

    %% ── Visualization ────────────────────────────────────────
    PLOT["plot.py<br/>Light curve + event visualization"]
    CAND --> PLOT
    RAW -.-> PLOT
    SKY -.-> PLOT

    %% ── Review App ───────────────────────────────────────────
    subgraph reviewgrp["Review App (review/)"]
        STORE["store.py<br/>SQLite candidate DB"]
        APP["app.py<br/>Dash GUI: scoring, event classes,<br/>vetting cards, diagnostic plots"]
        RPIPE["pipeline.py<br/>Run missing stages on demand"]
        RMERGE["merge.py<br/>Merge review DBs"]
        RDIAG["diagnostic_plots.py<br/>CMD, Kiel, NEOWISE, Gaia epoch"]
        REXPLORE["explorer.py<br/>EDA + LC explorer"]
        STORE --> APP
        RPIPE -.-> APP
        RDIAG -.-> APP
    end

    CAND --> STORE
    ENRICHED -.-> STORE
    APP --> LABELS[("Labeled Reviews<br/>score + event_class")]

    %% ── Machine Learning ─────────────────────────────────────
    subgraph mlgrp["Machine Learning (ml/)"]
        FEAT["features.py<br/>107 curated features"]
        TRAIN["train.py<br/>LightGBM classifier"]
        PRED["predict.py<br/>Score new candidates"]
        FEAT --> TRAIN --> MODEL[("Model + schema")]
        MODEL --> PRED
    end

    LABELS -.-> TRAIN
    ENRICHED -.-> FEAT

    %% ── LTV Pipeline ─────────────────────────────────────────
    subgraph ltvpipe["LTV Pipeline - Long-Term Variability (ltv/)"]
        LTV_PIPE["pipeline.py<br/>Orchestrator"]
        LTV_CORE["core.py<br/>Season medians, linear/quad fits,<br/>slopes, Lomb-Scargle"]
        LTV_FILT["filter.py<br/>Slope, max diff, dec, PM cuts"]
        LTV_CROSS["crossmatch.py<br/>Gaia, VSX, OGLE, ZTF,<br/>Gaia Alerts, MilliQuas, SIMBAD"]
        LTV_STOCH["stochastic.py<br/>Structure function, IAR,<br/>MHPS, DRW"]
        LTV_NEO["neowise.py<br/>IRSA TAP IR light curves"]
        LTV_DUST["dust.py<br/>Dust excess flags"]
        LTV_CMD["cmd.py<br/>MIST grid, Bailer-Jones distances"]
        LTV_BUNDLE["bundle.py<br/>Package .dat2 files"]
        LTV_INGEST["review.py<br/>Ingest into review DB"]
        LTV_PIPE --> LTV_CORE --> LTV_FILT
        LTV_FILT --> LTV_CROSS --> LTV_STOCH
        LTV_STOCH --> LTV_NEO --> LTV_DUST --> LTV_CMD
        LTV_CMD --> LTV_BUNDLE --> LTV_INGEST
    end

    RAW --> LTV_PIPE
    IDX --> LTV_PIPE
    GAIA_SRC -.-> LTV_CROSS
    LTV_INGEST --> STORE

    %% ── Evaluation ───────────────────────────────────────────
    subgraph evalgrp["Evaluation (evaluation/)"]
        INJ["injection.py<br/>Synthetic dip injection-recovery"]
        DET_RATE["detection_rate.py<br/>Baseline detection rate"]
        VALID["validation.py<br/>Precision/recall vs known targets"]
        REPRO["reproduce.py<br/>Re-run detection on known objects"]
        ATTR["attrition.py<br/>Filter attrition summary"]
        FP_EVAL["false_positive.py<br/>FP contaminant benchmark"]
    end

    MAN_OUT -.-> INJ
    MAN_OUT -.-> DET_RATE
    MAN_OUT -.-> REPRO
    CAND -.-> VALID
    CAND -.-> REPRO
    CAND -.-> ATTR

    %% ── Core Libraries ───────────────────────────────────────
    subgraph corelibs["Core Libraries"]
        UTILS["utils.py<br/>LC I/O, cleaning, kernels"]
        LCIO["lightcurve_io.py<br/>.dat2 / .csv readers"]
        BASE["baseline.py<br/>GP + median baselines"]
        TRIG["triggering.py<br/>logBF / posterior trigger resolution"]
        SCORE_LIB["score.py<br/>Dip/jump/microlensing scoring"]
        STATS_LIB["stats.py<br/>Stetson, von Neumann, RoMS, LS"]
        PERIOD_LIB["periodogram.py<br/>Lomb-Scargle, PDM,<br/>Conditional Entropy"]
        PCA_LIB["pca.py<br/>Variability PCA"]
        FETCH_LIB["fetch.py<br/>SkyPatrol V1/V2 download"]
        GAIA_FETCH["gaia_fetch.py<br/>Bulk Gaia DR3 via AIP TAP"]
    end

    UTILS -.-> EVENTS
    BASE -.-> EVENTS
    TRIG -.-> EVENTS
    SCORE_LIB -.-> EVENTS
    STATS_LIB -.-> SCORE_LIB
    PERIOD_LIB -.-> FILT
    UTILS -.-> REPRO
    BASE -.-> REPRO

    %% ── Configuration ────────────────────────────────────────
    subgraph configgrp["Configuration (config/)"]
        direction LR
        CONF["config_paths, config_pipeline, config_filters,<br/>config_io, config_characterize, config_classify,<br/>config_ltv, config_stats, config_ml, config_vetting"]
    end

    %% ── CLI Entry Point ──────────────────────────────────────
    CLI["__main__.py — malca CLI<br/>manifest, pipeline, filter, tag, events, plot, characterize, classify,<br/>vetting, review, ml_train, ml_predict, injection, validate, reproduce,<br/>ltv-pipeline, ltv-core, ltv-build, ltv-ingest, attrition, stats, ..."]
    CLI -.-> discovery
    CLI -.-> postdet
    CLI -.-> reviewgrp
    CLI -.-> mlgrp
    CLI -.-> ltvpipe
    CLI -.-> evalgrp
    CLI -.-> PLOT

Key Components:

Discovery pipeline: manifest.py → tag.py → events.py → filter.py (orchestrated by detect.py)
Post-detection: characterize.py (Gaia, dust, YSO, galactic coords, auxiliary catalogs) → vetting.py (SIMBAD, ZTF, TNS, eROSITA, ALeRCE, ATLAS, NEOWISE, ...) → classify.py (EB/CV/starspot/disk/YSO) → enrich/ (neighbor catalogs, spectra availability)
LTV pipeline: ltv/pipeline.py → core.py → filter.py → crossmatch.py → stochastic.py → neowise.py → dust.py → cmd.py → bundle.py → review.py (ingest to review DB)
Review: review/app.py (Dash GUI with scoring, event classes, diagnostic plots, vetting cards) → labeled training set
ML: ml/features.py (107 curated features) → ml/train.py (LightGBM classifier) → ml/predict.py (score candidates)
Evaluation: injection.py (synthetic dips), detection_rate.py, validation.py, reproduce.py, attrition.py, false_positive.py
Core libraries: utils.py, lightcurve_io.py, baseline.py, triggering.py, score.py, stats.py, periodogram.py, pca.py, fetch.py, gaia_fetch.py
Configuration: 10 modules in config/ centralizing all pipeline parameters
CLI: Unified interface via malca [command] (__main__.py)

See docs/architecture.md for detailed documentation.

Usage Guide

Detection Pipeline

The full detection workflow has three steps: build a manifest, run detection with batching/resume, then filter.

Build a manifest (map IDs -> light-curve directories):

malca manifest --index-root /path/to/lcsv2 --lc-root /path/to/lcsv2 --mag-bin 13_13.5 --out output/lc_manifest_13_13.5.parquet --workers 10

Tag and run events in batches with resume support:

malca pipeline --mag-bin 13_13.5 --workers 10 --min-time-span 100 --min-points-per-day 0.05 --min-cameras 2 --vsx-crossmatch input/vsx/asassn_x_vsx_matches_20250919_2252.csv --batch-size 2000 --lc-root /path/to/lcsv2 --index-root /path/to/lcsv2 --output output/lc_events_results_13_13.5.parquet --trigger-mode posterior_prob --baseline-func gp --min-mag-offset 0.1

The pipeline command builds/loads the manifest, runs tag checks, then calls events.py in batches.
Resume: if interrupted, skips already-processed paths using the checkpoint file.
VSX tags are saved to tags/vsx_tags/ and merged into results.
To disable VSX handling: --skip-vsx. To tag instead of filter: --vsx-mode tag.

Filter events:

malca filter --input output/lc_events_results_13_13.5.parquet --output output/lc_events_results_13_13.5_filtered.parquet

# With custom thresholds
malca filter --input results.parquet --output filtered.parquet --min-bayes-factor 20 --min-run-points 3 --apply-morphology

Implemented filters: posterior strength, run robustness, score, morphology, periodicity, Gaia RUWE, Gaia PM, multi-catalog periodic consensus

Optional: tune filter behavior directly from malca pipeline / malca detect.
```
# Keep pipeline defaults but disable score-based rejection
malca pipeline --mag-bin 13_13.5 --skip-score-filter

# Enable stricter optional validators
malca pipeline --mag-bin 13_13.5 --apply-morphology --min-delta-bic 12 --apply-periodicity-validation --periodicity-n-bootstrap 2000 --gaia-reject --periodic-catalog-reject
```
- Defaults in pipeline: evidence strength, run robustness, score, Gaia RUWE, Gaia PM, and periodic-catalog consensus validation are on; morphology and periodicity-validation are off.
- Control flags now available in pipeline:
  - Evidence/run: --skip-evidence-strength, --allow-infinite-local-bf, --skip-run-robustness, --min-run-count, --filter-min-run-points, --filter-min-run-cameras
  - Morphology/score: --apply-morphology, --dip-morphology, --jump-morphology, --min-delta-bic, --skip-score-filter, --min-score
  - Validators: --apply-periodicity-validation (+ periodicity knobs), --skip-gaia-ruwe-validation|--gaia-reject, --skip-gaia-pm-validation|--gaia-pm-reject, --skip-periodic-catalog-validation|--periodic-catalog-reject

Detect options:

# logBF triggering (faster)
malca pipeline --mag-bin 13_13.5 --workers 8 --lc-root /path/to/lcsv2 --index-root /path/to/lcsv2 --output output/events_logbf.parquet --trigger-mode logbf --baseline-func gp_masked --min-mag-offset 0.1

# Multiple mag bins (writes one output per bin)
malca pipeline --mag-bin 12_12.5 12.5_13 13_13.5 --lc-root /path/to/lcsv2 --index-root /path/to/lcsv2 --output output/lc_events_results.parquet --trigger-mode logbf

Individual Commands

malca manifest

malca manifest --index-root <index_dir> --lc-root <lc_dir> --mag-bin 12_12.5 --out output/lc_manifest.parquet

malca events

Run event detection directly (without the pipeline orchestrator):

malca events --input /path/to/lc*_cal/*.dat2 --output output/results.parquet --workers 10

# With signal amplitude filtering (requires |event_mag - baseline_mag| > 0.1)
malca events --input /path/to/lc*_cal/*.dat2 --output output/results.parquet --workers 10 --min-mag-offset 0.1

Default Bayesian grid is 12x12. Change p-grid with --p-points.
Output includes per-event morphology fit parameters (best_amp, best_t0, best_alpha, best_tau, best_morph, delta_bic, width_param, symmetry_score) and recurrence statistics (is_single_event, inter_event_spacing_median/std, amplitude_consistency, duration_consistency) for both dips and jumps.

malca tag

malca tag --help

Expects columns asas_sn_id and path pointing to lc_dir.
VSX handling: default is tag (keeps all rows and attaches vsx_sep_arcsec/vsx_class). Use --vsx-mode filter only when you explicitly want VSX-based rejection.

malca filter

malca filter --input output/results.parquet --output output/results_filtered.parquet

malca plot

# Single file
malca plot --input /path/to/lc123.dat2 --out-dir output/plots --format png

# Multiple files (glob patterns supported)
malca plot --input input/skypatrol2/*.csv --out-dir output/plots --skip-events

# All files from events.py results
malca plot --events output/lc_events_results_13_13.5_filtered.parquet --out-dir output/plots

Note: Event scores are computed automatically during detection and included in the results table (dipper_score, dipper_n_dips, dipper_n_valid_dips columns).

Legacy batch plotting: malca old.plot_results_bayes /path/to/*.csv --results-csv output/lc_events_results_13_13.5.csv --out-dir output/plots

malca injection

# Full run
malca injection --workers 10

# Quick test with limited trials
malca injection --max-trials 1000 --workers 10

# Custom manifest and output directory
malca injection --manifest /path/to/manifest.parquet --out-dir output/injection

See Injection Testing output for the directory layout.

Injects synthetic dips with skew-normal profiles onto real observed light curves
Preserves real cadence, systematics, and noise characteristics
Supports resume for long-running parameter sweeps

Python API:

from malca.evaluation.injection import (
    load_efficiency_cube,
    plot_efficiency_all,
    plot_efficiency_mag_slices,
    plot_efficiency_marginalized,
    plot_efficiency_threshold_contour,
    plot_efficiency_3d,
)

cube = load_efficiency_cube("output/injection/cubes/efficiency_cube.npz")
plot_efficiency_marginalized(cube, axis="mag", output_path="avg_over_mag.png")
plot_efficiency_threshold_contour(cube, threshold=0.5, output_path="depth_at_50pct.png")

malca reproduce

# Re-run detection on raw data (requires manifest and .dat2 files)
malca reproduce --manifest output/lc_manifest.parquet --candidates my_targets.csv --out-dir output/results_repro --workers 10

Note: Reproduction uses Bayesian detection.

malca validate

# Auto-discover and validate ALL results for LOO method
malca validate --method loo

# Auto-discover for Bayes Factor method
malca validate --method bf

# Filter to specific magnitude bin
malca validate --method loo --mag-bin 13_13.5

# Direct file specification
malca validate --results output/results.parquet

# Validate latest detect run output (output/runs/<timestamp>/results)
malca validate --latest-run

# Validate a specific detect run directory
malca validate --run-dir output/runs/20250119_1349

# With custom candidates
malca validate --method loo --candidates my_targets.csv -v

# Reproduce on built-in candidates using local SkyPatrol CSVs
malca validate --candidates brayden_candidates --skypatrol-dir input/skypatrol2 --method bf --workers 4

# Validate using a direct results file path
malca validate --results output/events_logbf.parquet

malca characterize

After detecting dipper candidates, characterize them using multi-wavelength data:

malca characterize --input output/filtered.parquet --output output/characterized.parquet --dust --starhorse input/starhorse/starhorse2021.parquet

Features:

Gaia DR3 Queries: Astrometry, astrophysics (Teff, logg, metallicity, distance), 2MASS/AllWISE photometry
3D Dust Extinction: All-sky coverage via dustmaps3d (Wang et al. 2025, ~350MB)
YSO Classification: Koenig & Leisawitz (2014) IR color-color diagram with dust correction
Galactic Coordinates: Galactic longitude/latitude (l, b) from ra/dec
Galactic Population: Thin/thick disk classification using metallicity or StarHorse ages
StarHorse (if provided): Stellar ages, masses, distances from local catalog join
Auxiliary Catalog Crossmatches (Tzanidakis+2025):
- BANYAN Σ: Young stellar association membership probabilities
- IPHAS DR2: Hα emission detection for Galactic plane sources
- Star-forming regions: Proximity check to known SFRs (Prisinzano+2022)
- Open clusters: Cantat-Gaudin+2020 membership crossmatch
- unWISE/unTimely: Mid-IR variability z-scores
Caching: Gaia results cached locally to speed up repeated analyses

Setup:

# Dust maps auto-download on first use (~350MB)
# For StarHorse, download catalog manually:
# https://cdsarc.cds.unistra.fr/viz-bin/cat/I/354

Output columns:

source_id, ra, dec, parallax, distance_gspphot
tmass_j, tmass_h, tmass_k, unwise_w1, unwise_w2
A_v_3d, ebv_3d (3D dust extinction)
H_K, W1_W2, yso_class (Class I/II/Transition Disk/Main Sequence)
population (thin_disk/thick_disk from metallicity or age)
age50, mass50 (if StarHorse provided)
gal_l, gal_b (Galactic coordinates)
Auxiliary crossmatches (Tzanidakis+2025):
- banyan_field_prob, banyan_best_assoc (BANYAN Σ membership)
- iphas_r_ha, iphas_ha_excess (IPHAS Hα)
- near_sfr, sfr_name (star-forming region proximity)
- cluster_name, cluster_age_myr (open cluster membership)
- unwise_w1_zscore, unwise_w2_zscore, unwise_w1_var (IR variability)

malca vetting

Run post-review vetting against external catalogs:

# Vet all candidates in a characterized parquet
malca vetting output/characterized.parquet -o output/vetted.parquet

# Skip slow modules
malca vetting output/characterized.parquet --no-simbad --no-alerce

# Only vet high-scoring candidates
malca vetting output/characterized.parquet --min-score 3.0

# With crash-resume checkpoint
malca vetting output/characterized.parquet --checkpoint output/vetting_checkpoint.parquet

Modules (all on by default, disable with --no-*):

SIMBAD: Object type, bibliography, cross-IDs
Gaia DR3 variability: Variable flag, classification, score
Gaia DR3 eclipsing binaries: Period, morphology, global ranking
Gaia epoch photometry: Availability, observation count, G-band range
ASAS-SN variables: Variable star catalog crossmatch
ZTF variables: Chen+ 2020 periodic variables (type, period, amplitude)
TNS: Transient Name Server (name, type, redshift, discovery date)
ALeRCE: ZTF broker classifications and stamp probabilities
eROSITA: X-ray detection, flux, separation
PM consistency: Proper motion agreement with host cluster
ATLAS (opt-in, --atlas-token): Forced photometry light curves
NEOWISE (opt-in, --neowise-lc): Full NEOWISE light curves

Pipeline default: vetting runs by default in malca pipeline; use --no-run-vetting to opt out.

Vetting is also available during import in the review GUI ("Vet on import" toggle). Results are cached per input file so re-imports skip already-vetted candidates.

malca classify

malca classify --input output/characterized.parquet --output output/classified.parquet

malca stats

malca stats /path/to/lc123.dat2

malca attrition

malca attrition --pre output/pre.parquet --post output/post.parquet

Candidate Review

# Launch Dash review GUI against an existing run bundle
malca review --plot-dir output/runs/YOUR_RUN/plots

# Standalone mode (no plot directory required)
malca review

Dash GUI features:

Native Plotly light-curve viewer with PNG fallback, camera filtering, and plot presets/overlays (raw points, dip/jump markers, residuals, phase-fold, diagnostics)
Confidence scoring (1-4) via number keys or clickable buttons
Event class labeling (single-select) with direct key shortcuts and clickable badges: dipper, microlensing, flare, yso, unknown_interesting, instrumental, other (toggle off to unclassified)
Collapsible candidate panels with metadata health, vetting banner, external follow-up cards, diagnostic plots, and run-config provenance
Sidebar queue controls: unreviewed/failed filters, grouped numeric/text/select filters, multi-column sort, open-existing jump, and native camera selection
Import/fetch workflows: import tables or raw LC files (optional characterize + vet on import), or fetch by ASAS-SN ID, Gaia DR3 ID, or coordinates
Per-candidate pipeline stage chips with "Run All Missing" / "Re-run Current", plus notes/followup/review-pass tracking and CSV/Parquet export

malca ml_train

Train a baseline classifier on reviewed labels:

malca ml_train --input output/review/reviewed.parquet --out-dir output/ml --cv-folds 5

Uses curated physics/context features from malca/ml/features.py
Trains a LightGBM classifier on labeled event_class values (dropping unclassified by default)
Saves model artifacts to output/ml/ (candidate_classifier.joblib, feature_schema.json, metrics.json)

malca ml_predict

Score candidates with a trained classifier:

malca ml_predict --model-dir output/ml --input output/review/reviewed.parquet --output output/review/scored.parquet

Loads candidate_classifier.joblib + feature_schema.json
Applies the same feature transforms used during training
Appends ml_predicted_class and ml_prob_<class> columns to the output table

malca vsx-filter

Build the cleaned ASAS-SN index and filtered VSX catalog:

malca vsx-filter --help
malca vsx-filter --vsx-file input/vsx/vsxcat.090525.csv --masked-dir /path/to/lcsv2_masked --output-dir input/vsx
malca vsx-filter --stamp 20260213_120000   # timestamped output filenames

Reads the raw fixed-width VSX catalog and filters out unwanted variability classes (eclipsing binaries, supernovae, AGN, etc.)
Concatenates masked ASAS-SN index CSVs from all magnitude bins
Outputs asassn_catalog.csv and vsx_cleaned.csv (or timestamped variants with --stamp)

malca vsx-crossmatch

Crossmatch ASAS-SN sources with VSX by position (with proper-motion correction):

malca vsx-crossmatch --help
malca vsx-crossmatch --asassn-csv input/vsx/asassn_catalog.csv --vsx-csv input/vsx/vsx_cleaned.csv
malca vsx-crossmatch --radius 5.0 --stamp 20260213_120000

Propagates ASAS-SN coordinates from epoch 2016.0 to 2000.0 using proper motions
Default match radius is 3 arcseconds
Outputs asassn_x_vsx_matches_{stamp}.csv to input/vsx/

Output Directory Structure

Integrated Pipeline

When running malca pipeline, the following directory structure is created for complete provenance tracking:

output/runs/20250121_143052/          # Timestamp-based run directory
├── run_params.json                   # Detection parameters (detect.py)
├── run_summary.json                 # Detection results stats (detect.py)
├── filter_log.json                   # Filtering parameters & stats (filter.py)
├── plot_log.json                     # Plotting parameters (plot.py)
├── run.log                           # Simple text log with paths
│
├── manifests/                        # Manifest files
│   └── lc_manifest_{mag_bin}.parquet
│
├── tags/                             # Tagging results
│   ├── lc_filtered_{mag_bin}.parquet
│   ├── lc_stats_checkpoint_{mag_bin}.parquet
│   ├── rejected_tag_{mag_bin}.csv
│   └── vsx_tags/
│       └── vsx_tags_{mag_bin}.csv
│
├── paths/                            # Input paths
│   └── filtered_paths_{mag_bin}.txt
│
├── results/                          # Detection results
│   ├── lc_events_results.parquet     # Raw detection output (includes dipper_score)
│   ├── lc_events_results_PROCESSED.txt  # Checkpoint log
│   ├── lc_events_results_filtered.parquet   # After filter.py
│   └── rejected_filter.csv           # Filter rejections
│
└── plots/                            # Visualizations (plot.py)
    ├── {source_id}_dips.png
    ├── {source_id}_dips.png
    └── ...

Key Features:

JSON logs track full provenance: Every parameter and result is logged for reproducibility
Self-contained runs: Each timestamped directory contains everything needed to reproduce the analysis
Checkpoint support: Detection runs can be interrupted and resumed using *_PROCESSED.txt files
Rejection tracking: Both tagging and filter rejections are logged with reasons

JSON Log Contents:

run_params.json: All tagging and detection parameters (thresholds, workers, baseline settings)
run_summary.json: Manifest statistics, tag rejection breakdown, detection results
filter_log.json: Filter toggles, thresholds, input/output counts, rejection breakdown
plot_log.json: Plotting parameters, GP settings, number of plots generated

Note: Event scores (dipper_score, dipper_n_dips, dipper_n_valid_dips) are automatically computed during detection for significant events and included in the results table.

Standalone Module Outputs

Injection Testing

output/injection/                     # Default output directory
├── results/
│   ├── injection_results.parquet     # Trial-by-trial injection results
│   └── injection_results_PROCESSED.txt  # Checkpoint for resume
│
├── cubes/
│   └── efficiency_cube.npz           # 3D efficiency cube (depth × duration × mag)
│
└── plots/
    ├── mag_slices/                   # Per-magnitude 2D heatmaps
    │   ├── mag_12.0_efficiency.png
    │   ├── mag_13.0_efficiency.png
    │   └── ...
    ├── efficiency_marginalized_*.png  # Averaged over one axis
    ├── depth_at_*pct_efficiency.png   # Threshold contour maps
    └── efficiency_3d_volume.html      # Interactive 3D (if plotly installed)

Detection Rate

output/detection_rate/                # Default base directory
├── 20250121_143052/                  # Timestamped run directory
│   ├── run_params.json                # Full parameter dump
│   ├── results/
│   │   ├── detection_rate_results.parquet
│   │   ├── detection_rate_results_PROCESSED.txt  # Checkpoint
│   │   └── detection_summary.json     # Detection rate summary
│   └── plots/
│       ├── detection_rate_vs_mag.png
│       ├── detection_duration_dist.png
│       └── detection_depth_dist.png
│
├── 20250121_150318_custom_tag/       # Optional --run-tag appended
│   └── ...
│
└── latest -> 20250121_150318_custom_tag/  # Symlink to latest run

Multi-Wavelength Characterization

output/
├── characterized.parquet             # Single output file with added columns:
                                      #   - Gaia astrometry & photometry
                                      #   - 3D dust extinction (A_v_3d, ebv_3d)
                                      #   - YSO classification (yso_class)
                                      #   - Galactic population (thin_disk/thick_disk)
                                      #   - StarHorse ages/masses (if provided)
                                      #   - Auxiliary crossmatches (BANYAN Σ, IPHAS, etc.)
└── gaia_cache/                       # Gaia query cache (created when cache is used)
    └── gaia_results_{hash}.parquet

Dipper Classification

output/
└── classified.parquet                # Single output file with added columns:
                                      #   - P_eb, P_cv, P_starspot, P_disk
                                      #   - yso_class
                                      #   - a_circ_au, transit_prob
                                      #   - final_class (EB/CV/Starspot/Disk/YSO/Unknown)

Manifest Building

output/
└── lc_manifest_{mag_bin}.parquet     # Single parquet file with:
                                      #   - asas_sn_id
                                      #   - ra_deg, dec_deg
                                      #   - lc_dir (directory path)
                                      #   - dat_path (full .dat2 path)
                                      #   - dat_exists (bool)

Citation

If you use MALCA or any part of its codebase in published research, please cite this repository:

Lenhart, C. (2025). MALCA: Multi-timescale ASAS-SN Light Curve Analysis [Software].
https://github.com/calderlen/malca

License

This project is licensed under the GNU General Public License v3.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 553 Commits
.github/workflows		.github/workflows
assets		assets
diagnostics		diagnostics
input		input
malca		malca
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
ltv_candidates.db		ltv_candidates.db
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MALCA: Multi-timescale ASAS-SN Light Curve Analysis

Contents

Install

Input Files

Dependencies

Quick Start

Pipeline Architecture

Usage Guide

Detection Pipeline

Individual Commands

malca manifest

malca events

malca tag

malca filter

malca plot

malca injection

malca reproduce

malca validate

malca characterize

malca vetting

malca classify

malca stats

malca attrition

Candidate Review

malca ml_train

malca ml_predict

malca vsx-filter

malca vsx-crossmatch

Output Directory Structure

Integrated Pipeline

Standalone Module Outputs

Injection Testing

Detection Rate

Multi-Wavelength Characterization

Dipper Classification

Manifest Building

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages