fix: replace manual anomalies with a hampel filter by matthewp · Pull Request #1997 · npmx-dev/npmx.dev

matthewp · 2026-03-08T21:36:55Z

🔗 Linked issue

Previous issue: #1707

🧭 Context

The current implementation of anomaly removal is biased due to the manual nature.

This replaces the implementation with one that uses a hampel filter to automatically remove deviations.

📚 Description

The important part here is applyHampelCorrection.

It goes over each data point and creates a sliding window of data points, by default 3 on each side.

Of those data points it finds the median, since a spike can't pull it off center.

Then it measures how spread out the neighbors are by calculating the MAD (see here).

Then it gives the data point a score and if it's above a threshold then it gets replaced with the median of that window and marked as being an anomaly.

Here are two articles about how hampel filters work:

Here's a screenshot of Vite which still shows its spike removed after this change.

vercel · 2026-03-08T21:37:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
npmx.dev	Ready	Preview, Comment	Mar 14, 2026 6:09pm

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
docs.npmx.dev	Ignored	Preview	Mar 14, 2026 6:09pm
npmx-lunaria	Ignored		Mar 14, 2026 6:09pm

codecov · 2026-03-08T21:38:52Z

Codecov Report

❌ Patch coverage is 31.11111% with 31 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
app/utils/download-anomalies.ts	20.00%	20 Missing and 4 partials ⚠️
app/components/Package/TrendsChart.vue	46.15%	1 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-03-08T21:43:32Z

Lunaria Status Overview

🌕 This pull request will trigger status changes.

Learn more

By default, every PR changing files present in the Lunaria configuration's files property will be considered and trigger status changes accordingly.

You can change this by adding one of the keywords present in the ignoreKeywords property in your Lunaria configuration file in the PR's title (ignoring all files) or by including a tracker directive in the merged commit's description.

Tracked Files

File	Note
i18n/locales/az-AZ.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/bg-BG.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/cs-CZ.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/de-DE.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/en.json	Source changed, localizations will be marked as outdated.
i18n/locales/es.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/fr-FR.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/hu-HU.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/id-ID.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/ja-JP.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/pl-PL.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/ru-RU.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/tr-TR.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/uk-UA.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/zh-CN.json	Localization changed, will be marked as complete. 🔄️
i18n/locales/zh-TW.json	Localization changed, will be marked as complete. 🔄️

Warnings reference

Icon	Description
🔄️	The source for this localization has been updated since the creation of this pull request, make sure all changes in the source have been applied.

coderabbitai · 2026-03-08T21:47:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR replaces blocklist-based anomaly correction with a Hampel-filter implementation: adds applyHampelCorrection and removes blocklist helpers and the DOWNLOAD_ANOMALIES dataset. Chart components (TrendsChart.vue, WeeklyDownloadStats.vue) now call the Hampel correction and per-package anomaly UI/state (per-package lists, range formatting, contribute links, interactive anomaly tooltips) was removed or simplified. New settings hampelWindow and hampelThreshold were added and i18n keys for known anomaly ranges were deleted. Tests updated for applyHampelCorrection.

Possibly related PRs

feat: fix known download anomalies with interpolation #1636 — Touches the same anomaly-correction pipeline and chart UI; overlaps on replacing blocklist logic with Hampel-style changes.
feat: add Svelte CI anomalies to download-anomalies.data.ts #1934 — Adds entries to DOWNLOAD_ANOMALIES in app/utils/download-anomalies.data.ts, conflicting with this PR’s removal of that export.
fix: the weekly data anomaly detection was broken for the Svelte anomalies #1983 — Modifies app/utils/download-anomalies.ts and anomaly-detection/correction logic; overlaps on replacement/refactor of blocklist-based logic.

Suggested reviewers

danielroe

🚥 Pre-merge checks | ✅ 1

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description clearly relates to the changeset, explaining the shift from manual anomaly removal to Hampel filter-based anomaly detection with technical details and linked references.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: beb9630b-e84a-4279-a4f8-362d18f451b4

📥 Commits

Reviewing files that changed from the base of the PR and between d26e250 and 1b0a0c7.

📒 Files selected for processing (5)

app/components/Package/TrendsChart.vue
app/components/Package/WeeklyDownloadStats.vue
app/utils/download-anomalies.data.ts
app/utils/download-anomalies.ts
test/unit/app/utils/download-anomalies.spec.ts

💤 Files with no reviewable changes (1)

app/utils/download-anomalies.data.ts

app/utils/download-anomalies.ts

test/unit/app/utils/download-anomalies.spec.ts

coderabbitai

🧹 Nitpick comments (1)

app/components/Package/TrendsChart.vue (1)
1831-1838: Consider using v-model for simpler checkbox binding.

The explicit :checked + @change pattern is functionally correct, but v-model provides equivalent behaviour with less boilerplate.
♻️ Optional simplification
 <input
-  :checked="settings.chartFilter.anomaliesFixed"
-  `@change`="
-    settings.chartFilter.anomaliesFixed = ($event.target as HTMLInputElement).checked
-  "
+  v-model="settings.chartFilter.anomaliesFixed"
   type="checkbox"
   class="accent-[var(--accent-color,var(--fg-subtle))]"
 />

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91971e88-96ef-48e4-befd-3a0865a5eb4c

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0a0c7 and 17e63ce.

📒 Files selected for processing (17)

app/components/Package/TrendsChart.vue
i18n/locales/bg-BG.json
i18n/locales/cs-CZ.json
i18n/locales/de-DE.json
i18n/locales/en.json
i18n/locales/es.json
i18n/locales/fr-FR.json
i18n/locales/hu-HU.json
i18n/locales/id-ID.json
i18n/locales/ja-JP.json
i18n/locales/pl-PL.json
i18n/locales/ru-RU.json
i18n/locales/tr-TR.json
i18n/locales/uk-UA.json
i18n/locales/zh-CN.json
i18n/locales/zh-TW.json
i18n/schema.json

💤 Files with no reviewable changes (16)

i18n/locales/en.json
i18n/locales/bg-BG.json
i18n/locales/es.json
i18n/locales/de-DE.json
i18n/locales/zh-TW.json
i18n/locales/fr-FR.json
i18n/locales/tr-TR.json
i18n/locales/cs-CZ.json
i18n/locales/hu-HU.json
i18n/schema.json
i18n/locales/zh-CN.json
i18n/locales/uk-UA.json
i18n/locales/ru-RU.json
i18n/locales/ja-JP.json
i18n/locales/pl-PL.json
i18n/locales/id-ID.json

danielroe · 2026-03-08T23:28:11Z

this looks very promising! tagging @jycouet who may have some thoughts on this 🙏

jycouet · 2026-03-09T06:37:50Z

Great to have a second look into this.

You probably checked my initial PR, #1636 I started with hampel implementation 👍

If I remember correctly, the sweet spot was 2.5 or 3 for vite. But this setting would hide "great start" of a lib. (eg: 0 0 0 0 0 0 20000)
I will pull the branch later (I'm on phone atm) as I'm curious to see it, maybe you have a more robust implem' 👍

Another note, when the start or end of the chart is in an anomalie period, any algo can't fix it. A good example is: if you start the chart in the middle of the vite spike.

Maybe we should add some tests around all this ? (to keep the intent)

Another note 2: I would love to have more than just vite there today ! 😅

graphieros · 2026-03-09T06:46:23Z

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

jycouet · 2026-03-09T06:49:13Z

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

That's the drawback of answering on phone
It shows my age 😅

jycouet · 2026-03-09T07:59:35Z

Thanks for this! A few thoughts after pulling the branch:

Reproduction URL that matters:
http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-09&end=2026-03-07
(because I suspect there's also a day-of-the-week issue atm, see #2005 let's not speak about this here)

"Great start" test cases:

Packages like @bramus/specificity or @sveltejs/sv-utils are good real-world examples of the 0 0 0 0 0 → lots of downloads pattern.
Would be great to add these as test cases (or manual verification) to see how Hampel manages them (or to find good defaults)

PR suggestion

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

// Happy coding

matthewp · 2026-03-09T12:23:47Z

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

I don't follow, as its own what?

jycouet · 2026-03-09T12:28:24Z

own

Something like

It's own controls. (It's more to be able to tests different senarios)

matthewp · 2026-03-09T15:20:25Z

Yes I can do that.

Skip points without a full symmetric window to avoid flattening real growth at series edges ("great start" problem). Use a relative check when MAD is 0 instead of flagging any deviation, so sparse packages like [0,0,0,1,0,0,0] keep their real activity.

Expose halfWindow and threshold as sliders on a second row in the data correction panel so users can tune the filter interactively.

matthewp · 2026-03-11T22:43:59Z

@jycouet I added sliders and addressed your concerns.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

i18n/locales/en.json (1)

469-477: ⚠️ Potential issue | 🟡 Minor

Rename this section away from “Known anomalies”.

These strings now sit beside Hampel window/threshold controls, but they still describe a curated set of bot/CI spikes. The PR now applies automatic statistical filtering instead, so this wording misrepresents what the toggle does and how the values are changed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 848ad525-54b8-48c1-961d-cda990fce395

📥 Commits

Reviewing files that changed from the base of the PR and between 17e63ce and bfc1bcb.

📒 Files selected for processing (23)

app/components/Package/TrendsChart.vue
app/components/Package/WeeklyDownloadStats.vue
app/composables/useSettings.ts
app/utils/download-anomalies.data.ts
app/utils/download-anomalies.ts
i18n/locales/az-AZ.json
i18n/locales/bg-BG.json
i18n/locales/cs-CZ.json
i18n/locales/de-DE.json
i18n/locales/en.json
i18n/locales/es.json
i18n/locales/fr-FR.json
i18n/locales/hu-HU.json
i18n/locales/id-ID.json
i18n/locales/ja-JP.json
i18n/locales/pl-PL.json
i18n/locales/ru-RU.json
i18n/locales/tr-TR.json
i18n/locales/uk-UA.json
i18n/locales/zh-CN.json
i18n/locales/zh-TW.json
i18n/schema.json
test/unit/app/utils/download-anomalies.spec.ts

💤 Files with no reviewable changes (15)

i18n/locales/ja-JP.json
i18n/locales/zh-TW.json
i18n/locales/fr-FR.json
i18n/locales/hu-HU.json
i18n/locales/es.json
i18n/locales/cs-CZ.json
i18n/locales/pl-PL.json
i18n/locales/de-DE.json
i18n/locales/tr-TR.json
i18n/locales/ru-RU.json
i18n/locales/bg-BG.json
i18n/locales/zh-CN.json
i18n/locales/id-ID.json
i18n/locales/uk-UA.json
app/utils/download-anomalies.data.ts

✅ Files skipped from review due to trivial changes (1)

i18n/locales/az-AZ.json

coderabbitai · 2026-03-11T22:59:03Z

app/composables/useSettings.ts

+    hampelWindow: number
+    hampelThreshold: number


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "useSettings.ts" | head -5

Repository: npmx-dev/npmx.dev

Length of output: 93

🌐 Web query:

VueUse useLocalStorage mergeDefaults shallow merge documentation

💡 Result:

In VueUse, useLocalStorage() uses the same options as useStorage(), and the mergeDefaults behavior is documented on the useStorage page. (vueuse.org)

mergeDefaults: true merges the default value with the value read from storage, and for objects it does a shallow merge (top-level keys only). (vueuse.org)

For a deep merge, pass a custom function to mergeDefaults (the docs explicitly show this pattern). (vueuse.org)

Example (custom merge = deep merge):

import { useLocalStorage } from '@vueuse/core' // import deepMerge from '...' const state = useLocalStorage('key', { a: { b: 1 } }, { mergeDefaults: (storageValue, defaults) => deepMerge(defaults, storageValue), })

Citations:

1: https://vueuse.org/core/usestorage/?utm_source=openai

2: https://vueuse.org/core/usestorage/?utm_source=openai

3: https://vueuse.org/core/useStorage/?utm_source=openai

🏁 Script executed:

cat -n ./app/composables/useSettings.ts

Repository: npmx-dev/npmx.dev

Length of output: 7350

🏁 Script executed:

# Search for migration or normalisation logic in settings/preferences rg -i "migrat|normaliz|hydrat" --type ts --type tsx -A 3 app/composables/ 2>/dev/null | head -50

Repository: npmx-dev/npmx.dev

Length of output: 43

🏁 Script executed:

# Check usePreferencesProvider.ts for migration patterns find . -type f -name "*usePreferences*" -o -name "*preferences*" | grep -E "\.(ts|tsx)$"

Repository: npmx-dev/npmx.dev

Length of output: 236

🏁 Script executed:

head -150 ./app/composables/usePreferencesProvider.ts

Repository: npmx-dev/npmx.dev

Length of output: 2609

🏁 Script executed:

# Search for any settings normalisation, migration, or post-hydration logic rg "chartFilter|hampel" --type ts --type tsx -B 3 -A 3 app/ | head -100

Repository: npmx-dev/npmx.dev

Length of output: 88

🏁 Script executed:

# Check if there's any middleware or plugin that normalises settings rg "DEFAULT_SETTINGS|STORAGE_KEY|npmx-settings" --type ts --type tsx -B 2 -A 2 app/

Repository: npmx-dev/npmx.dev

Length of output: 88

🏁 Script executed:

# Search for chartFilter and hampel references rg "chartFilter|hampel" --type ts -B 3 -A 3 app/

Repository: npmx-dev/npmx.dev

Length of output: 1232

🏁 Script executed:

# Search for settings normalisation anywhere rg "DEFAULT_SETTINGS|STORAGE_KEY.*npmx-settings" --type ts -B 2 -A 2 app/

Repository: npmx-dev/npmx.dev

Length of output: 885

Deep-merge nested chartFilter settings on hydration.

VueUse's useLocalStorage with mergeDefaults: true only performs a shallow merge at the top level. Returning users with an existing stored chartFilter object will not receive the new hampelWindow and hampelThreshold fields—the entire default chartFilter is replaced, leaving the new settings unset until explicitly written. VueUse documentation recommends a custom merge function for nested defaults.

Suggested fix

if (!settingsRef) { settingsRef = useLocalStorage<AppSettings>(STORAGE_KEY, DEFAULT_SETTINGS, { - mergeDefaults: true, + mergeDefaults: (storageValue, defaults) => ({ + ...defaults, + ...storageValue, + connector: { ...defaults.connector, ...(storageValue.connector ?? {}) }, + sidebar: { ...defaults.sidebar, ...(storageValue.sidebar ?? {}) }, + chartFilter: { + ...defaults.chartFilter, + ...(storageValue.chartFilter ?? {}), + }, + }), }) }

coderabbitai · 2026-03-11T22:59:04Z

app/utils/download-anomalies.ts

+  // Only evaluate points that have a full symmetric window.
+  // Boundary points lack enough context on one side, making them
+  // prone to false positives (e.g., a "great start" at the end of a series).
+  for (let i = halfWindow; i < values.length - halfWindow; i++) {
+    const start = i - halfWindow
+    const end = i + halfWindow
+    const window = values.slice(start, end + 1)


⚠️ Potential issue | 🟠 Major

Boundary spikes still leak back in through date-range trimming.

This loop never scores the first or last halfWindow buckets. Because Package/TrendsChart.vue applies Hampel after fetching the user-selected range, any anomaly that becomes a boundary point after cropping is guaranteed to survive, so narrowing the chart around a spike can reintroduce the very point the default view removed. Please run the filter on padded history and trim afterwards, and add a regression for a cropped-range edge spike.

jycouet · 2026-03-12T06:15:47Z

@matthewp I pulled your branch, and see the 2 controls

And with http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-13&end=2026-03-11 I get:

I tried also, removing my local storage to start from "fresh", but I still see the big spike.
You have the same on your end? I'm missing something?

I didn't look at the code at all, let me know how you want to proceed.

jycouet · 2026-03-14T18:00:19Z

@graphieros I think that the merge broke the test

@matthewp I tried with your fe605db, If I understand your branch, you need to set Hampel window (4) + Hampel threshold (4) + Apply correction? Right?! And the apply correction is disconnected from the actual behaviorr.

To be able to move forward, I would suggest to:
1/ start from main
2/ add just these 2 controls Hampel window and Hampel threshold. If it's 0, it's not used, if it's x, it's used by x. Like this, Apply correction stays like main is working.
Like this we are able to play with all values and option to see what to do & understand better.

My thinking is that without manual data, I don't see how to avoid this:
http://127.0.0.1:3000/package/vite?modal=chart&start=2025-08-15&end=2026-03-13

Remove reference to unused variable

graphieros · 2026-03-14T18:07:37Z

@graphieros I think that the merge broke the test

Sorry for that, removed the culprit line

matthewp · 2026-03-14T18:12:21Z

@jycouet Removing the biased manual data manipulation is the primary motivation of this pull request. So I'll keep this one until we can resolve whatever issues get us there.

Is "Vite must look perfect at all times" really the primary requirement of this feature? Vs. correcting the unknown (hundreds? thousands?) of packages that likely have anomalies that are not corrected on npmx.dev currently? Currently Vite only looks bad if you enter a date range that no one is going to look at unless they are attempting to play with the controls.

Tell me what the requirements are and I'll figure out a solution.

jycouet · 2026-03-14T18:46:08Z

@matthewp I understand that it's the motivation, and if we manage to have a better result than manual without manual, I'm all for it. I'm just trying to see how to test the feature! So my idea was to keep everything as is (where we can disable everything), and ADD this feature to be able to see behavior on multiple packages.
If we don't do this, I don't see how we can compare data... We have to play with this branch + live site? I feel that it's less practical. but if you have a way, let me know, I'll follow your advice.

It all started with vite, because it's obvious that there is an issue for a few weeks... But we now have a bit more: https://github.com/npmx-dev/npmx.dev/edit/main/app/utils/download-anomalies.data.ts

Is "Vite must look perfect at all times" really the primary requirement of this feature?

I think it's not only vite, and what I like with pnpm community, is that anyone can come and improve it by adding a manual data. Or by having a cool algo that could help everyone 🎉

Currently Vite only looks bad if you enter a date range

Also when we look at the daily view

the default half window to 4 fixes

That's also a "big" decision, what default value to set, and the bigger the more we affect other parts.

correcting the unknown (hundreds? thousands?) of packages that likely have anomalies that are not corrected on npmx.dev currently?

Manual data is by package right now, but if one day we know something affecting ALL packages, we could also add this one info and all packages will have it 💪

Tell me what the requirements are and I'll figure out a solution.

I don't know what to answer here, you opend the issue & the PR ^^. I think that we are trying to do better things all together :)
Today I have to say that I like the manual data (I'm biased, I did it ^^)
And as I said as first reply here:

You probably checked my initial PR, #1636 I started with hampel implementation. But it didn't cover all situations. I'm happy to see a better implementation.

graphieros · 2026-03-14T19:02:56Z

I think acceptance criteria could be summed up as follows:

obvious isolated spikes should be corrected
genuine growth ramps should not be flattened
low volume packages should not get false positives
changing the selected date range should not reintroduce known anomalies
daily and aggregated views should behave consistently enough

matthewp · 2026-03-14T19:04:43Z

@jycouet I don't agree that the manual data manipulation is better, it misses an unknown number of packages and introduces bias into the entire site's data set. I didn't review or know about your PR when I submitted this one, to be clear, but I would assume that it did it better than the current state as well.

I think looking at this by wanting specific packages to have nice looking charts is the wrong approach. You are devaluing having a consistent, unbiased method for detecting anomalies. Not having false positives is what is most important, some false negatives will happen, but it will be universally applied.

I think the way to review a change like this is not focusing on specific desired results, but by looking 1) at the documentation on hampel detection and whether it's a good algorithm for this sort of problem. It's used within Matlab which I think is a good indicator that it is. And then 2) is the algorithm implemented correctly, is the code good, etc.

I know that the "everyone chips in with what they know" feels warm and good, and for some things that's true, but when it comes to data it's not the case, everyone chipping in with their individual biases results in a distorted data set. Not maliciously, but biased nevertheless through selection bias, recall bias, survivorship bias, etc.

As for the individual recent changes, I don't know what the correct default values should be, the original values seemed correct to me, I adjusted them based on the view you were looking at, but it wasn't completely clear to me what you were looking at as I do not get the same results locally. So happy to roll that back to previous values.

But first I will reiterate that I am not sure what the requirements are here. I again ask, is it that Vite must never show this spike, regardless of what controls and date ranges are used?

matthewp · 2026-03-14T19:06:39Z

I would like some maintainer feedback please, cc @danielroe do you have views here, thanks.

matthewp · 2026-03-14T19:12:11Z

@graphieros Thank you, which of those criteria do you think are not passing currently? I think it's just (4), and for the date range in @jycouet's last screenshot, only for Vite. Is that correct? Or is there another?

graphieros · 2026-03-14T19:32:52Z

@matthewp it does appear like 4 is the current blocker, from what I see. Is the correction applied at the right stage of the data pipeline ?

I think we need to define a set of packages to test against, like (probably non exhaustive):

isolated mature-package spike
great start new package
low vol package
cropped range edge anomaly
daily vs weekly etc

jycouet · 2026-03-14T20:44:04Z

Thank you @matthewp for sharing, I understand better your view.
Not easy to satisfy everyone. One cool thing tho is that you can play with controls to see the view you perfer (and it will remain).

I ran some benchmark on: @bramus/specificity, knip, remult, vite, sv that represent some different styles.
The green line is the hampel one, the other one is the raw value

sorry for the flashbang 🤪

ghostdevv · 2026-04-05T03:11:26Z

what's needing to be done still? minus dealing with the merge conflicts :p

cc: @graphieros @jycouet

graphieros · 2026-04-06T08:36:12Z

what's needing to be done still? minus dealing with the merge conflicts :p

This PR looks like it is still pending.

jycouet · 2026-04-07T08:55:19Z

Hey @matthewp, thanks for pushing on this, it made me dig deeper Hampel again.

I genuinely think a pure statistical approach is elegant, and in a perfect world I'd love to get there. But as the benchmarks showed, Hampel still struggles with real-world edge cases (great-start packages, boundary spikes on date range changes, low-volume data). These aren't minor => they affect how people read the charts.

As the correction data is open, anyone can see what was corrected and why. Transparency seems like a better answer to the bias concern than removing human judgment entirely.

I think that we can close this PR for now (or move as draft so that it's not in list of "to look").
The conversation was valuable and I'm open to revisiting a hybrid approach down the road.

matthewp · 2026-04-07T12:21:26Z

As the correction data is open, anyone can see what was corrected and why. Transparency seems like a better answer to the bias concern than removing human judgment entirely.

Transparency in this case would explicitly labeling the correction data as a biased sample. Are you willing to do that?

I'd still like to work on this PR, happy to move into a draft or close, whatever you prefer.

Also open to a separate PR that removes the correction data entirely, so all packages are equally flawed, if you are open to that.

jycouet · 2026-04-07T13:10:13Z

The corrections are primarily based on known npm reporting events (outages, API bugs), not aesthetic preferences for specific packages. I wouldn't call that a biased sample -> it's closer to an errata log.

I was thinking that it's a good thing for everyone, apparently not. I didn't mean to corrupt data, just to make charts more accurate when we know something went wrong on npm's side. When a known anomaly adds 100M+ downloads/week for over a month, it's not exactly a rounding error. Leaving that uncorrected feels like the bigger data integrity issue to me.

Adding a Hampel option alongside the current corrections sounds great to me. Removing "Known anomalies" that I believe aren't destroying data, I'm less sure about, but I'm just an enthusiast who likes coding and learning here, nothing else.

I'm sure the project will lean toward the best, no matter the direction, all good.
I'll go spend my time and energy on positive stuff. Good luck and happy coding 👍

graphieros · 2026-04-07T13:30:05Z

I suggest we:

Keep the Hampel implementation and UI controls
Do not remove manual corrections in this PR.

If Hampel (or another method) proves reliable across all scenarios, we can then later reduce reliance on manual corrections, or keep them only for extreme known events (?), but we should earn that transition with data, not assume it upfront.

In the meantime, this PR can be converted to a draft, if this works for you.

matthewp · 2026-04-07T13:48:16Z

@jycouet

The corrections are primarily based on known npm reporting events (outages, API bugs), not aesthetic preferences for specific packages. I wouldn't call that a biased sample -> it's closer to an errata log.

Known to a subset of people who contribute to this project, that's known as ascertainment bias. The act of cleaning the data is not neutral, it encodes the cleaner's subjective knowledge into the dataset. Also related is detection bias.

I was thinking that it's a good thing for everyone, apparently not. I didn't mean to corrupt data, just to make charts more accurate when we know something went wrong on npm's side. When a known anomaly adds 100M+ downloads/week for over a month, it's not exactly a rounding error. Leaving that uncorrected feels like the bigger data integrity issue to me.

It's not, because the unknown anomalies are not corrected, so when users who visit 2 package pages, one corrected and one not, the uncorrected packages looks less trustworthy, and the user is more likely to favor the package with normal looking charts.

So, unintentionally or not, putting bias into this product leads to the maintainers own packages (of which the maintainers are involved in many high profile projects) having a higher conversion rate than competing packages which have not been corrected. This is the alarming thing and why I originally signaled that the feature should just be removed.

matthewp · 2026-04-07T13:53:06Z

@graphieros

If Hampel (or another method) proves reliable across all scenarios, we can then later reduce reliance on manual corrections, or keep them only for extreme known events (?), but we should earn that transition with data, not assume it upfront.

The manual correction is not reliable across all scenarios. One of the reasons I haven't been enthusiastic about coming back to this PR is because there are very high standards for a known anomaly correction mechanism used by MATLAB among other places (Hampel), and there is no standards for the known biased correction mechanism (manual). Can we apply some requirements for manual corrections in order to balance things out better?

graphieros · 2026-04-07T14:14:35Z

@matthewp

Can we apply some requirements for manual corrections in order to balance things out better?

Sounds fair enough to me.
What are you thinking about ? Documenting sources for manual corrections ?

matthewp · 2026-04-07T14:21:20Z

Yeah something like that, also:

Official and journalistic sources only (no self-reporting or tweets as reports)
Applies equally to all packages named in the report. If no packages named, either apply universally or don't apply that event to any packages.

graphieros · 2026-04-07T14:44:14Z

Official and journalistic sources only (no self-reporting or tweets as reports)

As decent as this would be for clinical data, it does not seem to be the case for npm downloads. Any manual correction is curated by npmx maintainers. Note that the few corrections applied correct obvious bursts, and the correction can be turned off to view the raw data, but we have been through this already. The source of truth remains available which is essential.

This PR should be focused on improving the correction algorithm, and its coexistence with the manual correction is a requirement for now.

matthewp · 2026-04-07T14:49:27Z

@graphieros Happy to table that discussion. I'm a little confused though, you previously said

I think acceptance criteria could be summed up as follows:

obvious isolated spikes should be corrected

genuine growth ramps should not be flattened

low volume packages should not get false positives

changing the selected date range should not reintroduce known anomalies

daily and aggregated views should behave consistently enough

Are you now saying this isn't an acceptance criteria for this PR, but that it must also bring back the biased manual corrections?

graphieros · 2026-04-07T15:00:38Z

If the algorithm checks all the boxes of listed requirements, it would basically be redundant with manual corrections. I would applaud such a feat.

Until then, manual correction will remain featured.

matthewp changed the title ~~Replace manual anomalies with a hampel filter~~ fix: replace manual anomalies with a hampel filter Mar 8, 2026

vercel bot deployed to Preview – npmx.dev March 8, 2026 21:39 View deployment

vercel bot deployed to Preview – npmx.dev March 8, 2026 21:42 View deployment

vercel bot deployed to Preview – npmx.dev March 8, 2026 21:45 View deployment

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

app/utils/download-anomalies.ts Outdated Show resolved Hide resolved

app/utils/download-anomalies.ts Show resolved Hide resolved

test/unit/app/utils/download-anomalies.spec.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

matthewp added 5 commits March 11, 2026 18:06

Replace manual anomalies with a hampel filter

a29c7d3

Fix typechecking

9cf5d8b

Remove i18n keys

55388db

Add tweakable Hampel filter sliders to chart UI

88334f4

Expose halfWindow and threshold as sliders on a second row in the data correction panel so users can tune the filter interactively.

matthewp force-pushed the feat/hampel-anomaly-detection branch from 17e63ce to 88334f4 Compare March 11, 2026 22:41

[autofix.ci] apply automated fixes

bfc1bcb

vercel bot deployed to Preview – npmx.dev March 11, 2026 22:45 View deployment

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

Bump default half window to 4

fe605db

Merge branch 'main' into feat/hampel-anomaly-detection

88e0ab6

vercel bot deployed to Preview – docs.npmx.dev March 14, 2026 17:45 View deployment

vercel bot deployed to Preview – npmx.dev March 14, 2026 17:46 View deployment

fix: post merge

9a71cd8

Remove reference to unused variable

vercel bot deployed to Preview – npmx.dev March 14, 2026 18:09 View deployment

serhalp assigned jycouet Apr 6, 2026

matthewp marked this pull request as draft April 7, 2026 14:30

Uh oh!

Conversation

matthewp commented Mar 8, 2026

🔗 Linked issue

🧭 Context

📚 Description

Uh oh!

vercel bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lunaria Status Overview

Tracked Files

Uh oh!

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

danielroe commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jycouet commented Mar 9, 2026

Uh oh!

graphieros commented Mar 9, 2026

Uh oh!

jycouet commented Mar 9, 2026

Uh oh!

jycouet commented Mar 9, 2026

"Great start" test cases:

PR suggestion

Uh oh!

matthewp commented Mar 9, 2026

Uh oh!

jycouet commented Mar 9, 2026

Uh oh!

matthewp commented Mar 9, 2026

Uh oh!

matthewp commented Mar 11, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jycouet commented Mar 12, 2026

Uh oh!

jycouet commented Mar 14, 2026

Uh oh!

graphieros commented Mar 14, 2026

Uh oh!

matthewp commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jycouet commented Mar 14, 2026

Uh oh!

graphieros commented Mar 14, 2026

Uh oh!

matthewp commented Mar 14, 2026

Uh oh!

vercel bot commented Mar 8, 2026 •

edited

Loading

codecov bot commented Mar 8, 2026 •

edited

Loading

github-actions bot commented Mar 8, 2026 •

edited

Loading

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

danielroe commented Mar 8, 2026 •

edited

Loading

matthewp commented Mar 14, 2026 •

edited

Loading

jycouet commented Mar 14, 2026 •

edited

Loading

matthewp commented Apr 7, 2026 •

edited

Loading

matthewp commented Apr 7, 2026 •

edited

Loading