ULTRA l1b statistical outlier culling #2781
ULTRA l1b statistical outlier culling #2781lacoak21 wants to merge 6 commits intoIMAP-Science-Operations-Center:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements statistical outlier culling for ULTRA l1b data processing, as specified by the ULTRA IT team. The algorithm identifies time-dependent variations in signal counts by comparing observed standard deviations against expected Poisson statistics, iteratively flagging outlier spin bins that exceed 3 standard deviations from the mean.
Changes:
- Added statistical outlier culling algorithm with configurable iteration count and standard deviation threshold
- Refactored low voltage and high energy culling functions to return boolean arrays instead of bit flags (converted to bit flags later in the pipeline)
- Integrated all three culling algorithms (low voltage, high energy, and statistical) in the extended spin calculation workflow
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| imap_processing/ultra/constants.py | Added constants for statistical culling iterations (5) and standard deviation threshold (0.05) |
| imap_processing/ultra/l1b/ultra_l1b_culling.py | Refactored flag_low_voltage and flag_high_energy to return boolean arrays; added flag_statistical_outliers, get_poisson_stats, and get_valid_de_count_summary functions |
| imap_processing/ultra/l1b/extendedspin.py | Integrated statistical outlier culling into the extended spin workflow; added logic to convert boolean flags to bitwise flags |
| imap_processing/tests/ultra/unit/test_ultra_l1b_culling.py | Updated tests for refactored functions; added tests for statistical outlier culling and Poisson statistics calculation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
laspsandoval
left a comment
There was a problem hiding this comment.
Some comments and questions. Nicely done! It looks good.
| """ | ||
| std = np.std(counts) | ||
| std_ratio = std / np.sqrt(np.mean(counts)) - 1 | ||
| sub_mask = np.abs((counts - np.mean(counts)) / std) > 3 |
There was a problem hiding this comment.
Will std ever be zero here?
There was a problem hiding this comment.
Good point. It might
Change Summary
closes #2710
Overview
This PR implements the statistical culling algorithm provided by the ULTRA IT team.
Algorithm explained:
After removing the voltage drops and the high energy events, there still appear to be time some dependence in the signal during a pointing.
•For each energy range in a pointing, we can calculate a mean and a standard deviation.
•If the counts come from a source with no time dependence, then the calculated standard deviation should be very close to the square root of the mean (i.e., they should obey Poisson statistics).
•If there is time variation in the signal, then the standard deviation will be larger than the square root of the mean.
This PR also changes the return qf arrays from the high energy and low voltage culling algorithms. Originally they were bit flag arrays but since I needed to use those masks in the stat culling algorithm i decided it would be far easier to keep them as bools and then convert them to flag arrays later in the code.
NOTE
This code is basically the same as bobs culling code that he provided for reference here. . I didnt find many areas to improve since it is an iterative process and can not be vectorized so I followed his code very closely. Again, there will be a future PR with the validation test that compares with his outputs directly.
File changes
Testing
Added tests for the stat cull. I will be adding the validation test in a separate PR because this one started getting pretty big.