CzechFOI-DRATE: Exploring ways to minimize bias when dividing real-world data into two groups (vaccinated vx /unvaccinated uvx)
Hypothesis 1 - CzechFOI-DRATE-NOBIAS repository:
It is impossible to perfectly and fairly compare vaccinated (VX) and unvaccinated (UVX) groups — either by measurement or mathematically — when vaccination is time-dependent and not random.
This remains true if both groups have the same homogen individual constant death rates.
Hypothesis 2 - see CzechFOI-SIM repository:
There is currently no reliable statistical method to determine the rate of death-related Adverse Events Following Immunisation (dAEFIs) at a frequency of approximately one additional death per 10,000 doses when the baseline mortality is unknown in real-world settings.
To the best of my knowledge, this (vital) problem is still waiting for the head that can solve it, as a benefactor of mankind?
This also applies vice versa (one death per 10,000 doses removed/saved)
Project GOAL
The aim is to find a method that compensates for biases introduced by the non-random assignment of individuals to vaccinated (vx) and unvaccinated (uvx) groups based on the timing of vaccination. This type of bias is unavoidable in real-world datasets, but it must be corrected in order to enable a fair comparison between the two groups.
Simulated Test Dataset with Minimal Bias
A synthetic dataset was generated in which individuals within each age group share a constant and homogeneous risk of death, estimated from real-world age-specific death rates. Death dates were simulated independently of vaccination status.
Real-world vaccination schedules (dose sets) were then reassigned randomly to individuals within the same age group, ensuring that each individual's entire dose schedule occurred on or before their simulated death date. No actual death dates were removed or altered—only the assignment of dose dates was adjusted to maintain this temporal consistency. This approach minimizes bias while preserving realistic dose timing patterns.
sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
30.06.2025 changed constraint death day >= last dose day to death day > last dose day as cox can't handle zero intervalls where start = stop
CzechFOI-DRATE_EXAM project for further investigations
Impact of Dose Assignment Strategy on bias correction and Estimated Mortality Risk
Objective To assess how different vaccine dose assignment strategies affect estimated hazard ratios (HRs) for mortality and test Bias adjustment.
Methods Time-varying Cox regression was used to compare mortality risk between vaccinated and unvaccinated individuals under four scenarios: Except Case 1A (Real World), all were simulated assuming a random, homogeneous, and constant death risk across the whole population.
Case 1A: Real-world Czech FOI data (death must follow dose - real world)
Uses the Czech freedoem of information request raw dataset Vesely_106_202403141131.csv
Case 1B: Simulated doses with the same distribution as the Real-world Czech FOI data (death must follow dose) -
The csv dataset was created by "NK) generate csv simulate deaths minimal bias.py" (see Project GOAL).
It uses the same dose schedule distribution and aproximate same but constant death rate from the Czech-FOI real World Data AGE 70
Case 2: Simulated doses with flat random assignment (death must follow dose).
Case 3: Simulated doses with a bell-curve distribution(death must follow dose).
Results:
Cases 1B, 2, and 3 were simulated with uniform risk and should theoretically yield HR ≈ 1, reflecting no vaccine effect
This table summarizes Cox regression hazard ratios (HR) across four datasets:
| Case | Description | β (coef) | HR = exp(β) | Risk Reduction (%) | 95% CI (HR) | z | p-value | −log₂(p) | Expected HR | Interpretation |
|---|---|---|---|---|---|---|---|---|---|---|
| 1A | Real data – Czech FOI dataset | -0.34 | 0.71 | 29% | 0.70–0.73 | -27.36 | < 0.005 | 514.18 | <1 | Strong protective effect observed |
| 1B | Simulated deaths + real dose distribution | -0.28 | 0.75 | 25% | 0.73–0.77 | -19.84 | < 0.005 | 287.13 | ≈1 | False protective effect observed Bias inflated |
| 2 (Flat) | Simulated deaths + flat dose distribution | -0.04 | 0.96 | 4% | 0.93–0.99 | -2.82 | 0.005 | 8.39 | ≈1 | Minimal bias – HR near 1, as expected |
| 3 (Bell) | Simulated deaths + bell curve dose distribution | +0.28 | 1.33 | -33% | 1.30–1.37 | +17.94 | < 0.005 | 272.61 | ≈1 | Artificial harm due to reversed timing bias |
Notes:
- Risk Reduction (%) = (1 - HR) × 100%.
- Values > 0 indicate reduced risk (protective effect).
- Negative values (like Case 3) indicate increased risk (harm).
- Cases 1B, 2, and 3 were simulated with constant death risk; HR≈1 is expected in absence of real effect.
- Case 1B shows bias due to the real dose distribution skew causing a false appearance of protection.
Conclusion
Dose classification strategies strongly influence observed vaccine effectiveness. Careful control of timing and classification is essential to avoid bias in survival analyses.
Result of CASE 1B) requires further investigation!!
Phyton script AB) Cox fair compare vx uvx.py -> Detailed Results
Case 1A real data
Case 1B simulated deaths with real dose schedule
Case 2 simulated deaths flat
Case 3 simulated deaths bell curve
Phyton script ZF) vx uvx norm.py
Download interactive html
Czech freedom of information real world data
To test for bias, the plot below shows a simulated dataset where all individuals have the same constant mortality risk — uniform across age groups and over time, at approximately real-world levels. Individuals were then randomly assigned to vaccinated or unvaccinated groups, using real-world vaccination schedules. Critically, deaths were only allowed to occur after vaccination — reflecting a real-world constraint. This introduces immortal time bias, which can create the false impression that vaccination offers protection, even when mortality risk is identical for all. As a reminder: in this simulation, everyone has the same baseline risk of death. But if group assignment is not random — as in real-world data — it introduces bias.
The bias makes the normalized mortality rate of the unvaccinated (UVX) group appear artificially much worse!
The result needs further investigation to verify!
Uses dsimulated dataset sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
Download interactive html
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, this time ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Tried again to recode the methode used by r-code @henjin256's
https://sars2.net/czech2.html#Excess_mortality_by_weeks_after_vaccination
https://sars2.net/czech.html#Bucket_analysis
Creates buckets csv file for AGE 70 : AA) generate bucket csv.py (takes 1,5 hours only for AG 70)
Create the plot file : AA) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx.
If the method correctly adjusts for bias, the vaccinated excess mortality curve should be flat, at or near 0%
Since that is not observed:
- The MINBIAS test data or the underlying assumptions are incorrect,
- The method was not reproduced correctly
- There are errors in my code
Needs further investigation!
Disadvantage of this method: creating bucket files is slow and demands a lot of memory, when using R it's no problem.
Plot below uses only group unvaxed as basline!
Create the plot file : AC) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx+vx.
Plot of simulated minbias dataset AGE 70 should be a horizontal line at 0% - uses combined basline (group unvaxed + vaxed)!
Plot of real world Czech-FOI dataset - uses combined basline (group unvaxed + vaxed)!
Phyton script CA) Landmark adjust resampling truncation bias.py
Person days real world dataset Vesely_106_202403141131.csv
Uses a simulated minbias dataset to test whether the method compensates for mortality bias (forced restriction death_day >= last_dose_day). It seems to only partially correct the bias. RR should theoretically run horizontally at about 1
Uses simulated nobias dataset to test methode (no constraint death_day >= last_dose_day). RR should theoretically run horizontally at about 1
Phyton script Y) vx uvx persondays immortal time adjusted.py
Person days real world dataset Vesely_106_202403141131.csv
test dataset sim_NOBIAS_Vesely_106_202403141131.csv
To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.
test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?
Phyton script Y) vx uvx persondays.py
Person days real world dataset Vesely_106_202403141131.csv
simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv
To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.
simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?
Phyton script Y) vx uvx persondays baslinemort.py
Person days real world dataset Vesely_106_202403141131.csv
To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.
The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?
W) When comparing different methods, Cox PH seemed to calculate the best approximation for Hypothsis 1
Phyton script W) coxph real deaths real vax dates by age Here you can Download interactive html
Cox PH analysis using Czech-FOI real world dataset Vesely_106_202403141131.csv
test dataset sim_NOBIAS_Vesely_106_202403141131.csv
To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.
If the code below is correct, this might explain why scientists endlessly debate the results of their comparison
simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Phyton script ZA) dowhy doses vs sim_total_death individual.py
Download interactive html
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Plots generated by Phyton script X) event_stacking.py . Here you can Download the interactive htmls
Plot of simulated dataset below assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world scenario) inherently introducing immortal time bias (Case 1B obove), as illustrated below.
As a reminder, every individual in the homogeneous population below has the same constant mortality risk. If group assignment is non-random (as occurs in the real world), this introduces bias, making the normalized mortality rate of the UVX group looks much worse.
Test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv
The stacked events for the MINBIAS simulation AG 70
Normalized Vx/Uvx death rate plot of Czech real world data AG 70
Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request
The stacked events for czech reqal world data AG 70
Plot of simulated dataset estimates age-specific death rates and simulates constant, uniformly random death dates across the observation window, preserving vaccination timings but ignoring cases where death precedes vaccination to prevent any bias - not the case in real world scenario.
Test dataset sim_NOBIAS_Vesely_106_202403141131.csv created by NC) generate csv simulate deaths no bias.py
The stacked events AG 70 for the NOBIAS simulation should theoretically run horizontally at about the same level — could be a bug in the code.
Test using simulated dataset based on real world paramter created by "NK) generate csv simulate deaths minimal bias.py".
Tried to evaluate whether the G-estimation (psi) method could correct for bias, but struggled with error messages.
By using the parameter of "Cox time variing methode" below it should probably work.
Phyton script G) generate interval data per person.py
Phyton script G) G-estimation on interval data per person.py
Phyton script GA) G-estimation on interval data per person all age.py
Phyton script G) cox on interval data per person.py
This produces the same results as "ZF) vx uvx norm" above, but uses aggregated CSV files for the calculations.
The aggregated CSV files were generated using the Python script: A) generate aggregated csv files from CzechFOI.py
The data is then plotted with the Python script: B) DeathRatesByAge from aggregated.py
Download interactive html
Phyton script E) death risk by age.py
Download interactive html
Czech-FOI real world dataset Vesely_106_202403141131.csv
Simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv.
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Phyton script F) rolling daily crude HR by age.py
Download interactive html
Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Phyton script J) Bias Study ratio vx_uvx.py
Download interactive html
This script simulates uniform death risk over time to test bias in survival analysis. It compares static vs. time-dependent vaccinated/unvaccinated classification, computes death rates and 1st derivatives, Kaplan-Meier curves, and Cox models, and visualizes the results in an interactive Plotly HTML plot.
ZB) Hypergeometric (used by Charles Sanders Peirce) Vaccine Effectiveness Analysis with Confidence Intervals
Phyton script ZB) CS-Pierce Hypergeometric VaxCodes.py
Download interactive html
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Phyton script ZC) dowhy vaxcode doses vs total_death individual.py
Download interactive html
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Phyton script AC) age_mean.py
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Phyton script EA) batches vs death.py
Phyton script UA) diff death dose agebin.py
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Rolling correlation between the daily vaccine dose curve and the difference in normalized death rates between uvx and vx individuals (uvx - vx) !! The difference in deaths (uvx - vx) compensates for external influences, so vax effect should be left!!
Phyton script UC) diff norm death dose agebin.py
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
<br**>As the simulated Death risk for the whole homogen population is constant over time, differnece uvx-vx should fluctuate horizontally around level 0!**
Phyton script ZG) doses_vs_deaths_dowhy.png
Zoomd in
DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Zoomd in should be horizonta line (same simulated death reate for all three traces)
S) By calculating the difference between unvaccinated and vaccinated groups (uvx - vx), external influences should largely cancel out, isolating the effect of the vaccine.
Phyton script S) diff death dose agebin.py
Zoomd in
DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.
Phyton script UB) diff death dose agebin.py
These scripts don't require SQLite queries to aggregate the 11 million individual data rows. Instead, the aggregation is handled directly by Python scripts, which can generate aggregated CSV files very quickly. For coding questions or help, visit https://chatgpt.com.
- Python 3.12.5 to run the scripts.
- Visual Studio Code 1.92.2 to edit and run scripts.
The results have not been checked for errors. Neither methodological nor technical checks or data cleansing have been performed.



