CzechFOI-DRATE

CzechFOI-DRATE: Exploring ways to minimize bias when dividing real-world data into two groups (vaccinated vx /unvaccinated uvx)

Hypothesis 1 - CzechFOI-DRATE-NOBIAS repository:

It is impossible to perfectly and fairly compare vaccinated (VX) and unvaccinated (UVX) groups — either by measurement or mathematically — when vaccination is time-dependent and not random.
This remains true if both groups have the same homogen individual constant death rates.

Hypothesis 2 - see CzechFOI-SIM repository:

There is currently no reliable statistical method to determine the rate of death-related Adverse Events Following Immunisation (dAEFIs) at a frequency of approximately one additional death per 10,000 doses when the baseline mortality is unknown in real-world settings.

To the best of my knowledge, this (vital) problem is still waiting for the head that can solve it, as a benefactor of mankind? This also applies vice versa (one death per 10,000 doses removed/saved)

Project GOAL

The aim is to find a method that compensates for biases introduced by the non-random assignment of individuals to vaccinated (vx) and unvaccinated (uvx) groups based on the timing of vaccination. This type of bias is unavoidable in real-world datasets, but it must be corrected in order to enable a fair comparison between the two groups.

Simulated Test Dataset with Minimal Bias

A synthetic dataset was generated in which individuals within each age group share a constant and homogeneous risk of death, estimated from real-world age-specific death rates. Death dates were simulated independently of vaccination status. Real-world vaccination schedules (dose sets) were then reassigned randomly to individuals within the same age group, ensuring that each individual's entire dose schedule occurred on or before their simulated death date. No actual death dates were removed or altered—only the assignment of dose dates was adjusted to maintain this temporal consistency. This approach minimizes bias while preserving realistic dose timing patterns.

sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
30.06.2025 changed constraint death day >= last dose day to death day > last dose day as cox can't handle zero intervalls where start = stop

CzechFOI-DRATE_EXAM project for further investigations

AB) Testscript for Cox time variing methode bias adjustment AGE 70 - Hypothsis 1

Impact of Dose Assignment Strategy on bias correction and Estimated Mortality Risk

Objective To assess how different vaccine dose assignment strategies affect estimated hazard ratios (HRs) for mortality and test Bias adjustment.

Methods Time-varying Cox regression was used to compare mortality risk between vaccinated and unvaccinated individuals under four scenarios: Except Case 1A (Real World), all were simulated assuming a random, homogeneous, and constant death risk across the whole population.

Case 1A: Real-world Czech FOI data (death must follow dose - real world) 
         Uses the Czech freedoem of information request raw dataset Vesely_106_202403141131.csv             
        
Case 1B: Simulated doses with the same distribution as the Real-world Czech FOI data (death must follow dose) - 
         The csv dataset was created by "NK) generate csv simulate deaths minimal bias.py" (see Project GOAL).
         It uses the same dose schedule distribution and aproximate same but constant death rate from the Czech-FOI real World Data AGE 70
        
Case 2:  Simulated doses with flat random assignment (death must follow dose).

Case 3:  Simulated doses with a bell-curve distribution(death must follow dose).

Results:

Cases 1B, 2, and 3 were simulated with uniform risk and should theoretically yield HR ≈ 1, reflecting no vaccine effect

Cox Proportional Hazards Results – Real and Simulated Data

This table summarizes Cox regression hazard ratios (HR) across four datasets:

Case	Description	β (coef)	HR = exp(β)	Risk Reduction (%)	95% CI (HR)	z	p-value	−log₂(p)	Expected HR	Interpretation
1A	Real data – Czech FOI dataset	-0.34	0.71	29%	0.70–0.73	-27.36	< 0.005	514.18	<1	Strong protective effect observed
1B	Simulated deaths + real dose distribution	-0.28	0.75	25%	0.73–0.77	-19.84	< 0.005	287.13	≈1	False protective effect observed Bias inflated
2 (Flat)	Simulated deaths + flat dose distribution	-0.04	0.96	4%	0.93–0.99	-2.82	0.005	8.39	≈1	Minimal bias – HR near 1, as expected
3 (Bell)	Simulated deaths + bell curve dose distribution	+0.28	1.33	-33%	1.30–1.37	+17.94	< 0.005	272.61	≈1	Artificial harm due to reversed timing bias

Notes:

Risk Reduction (%) = (1 - HR) × 100%.
Values > 0 indicate reduced risk (protective effect).
Negative values (like Case 3) indicate increased risk (harm).
Cases 1B, 2, and 3 were simulated with constant death risk; HR≈1 is expected in absence of real effect.
Case 1B shows bias due to the real dose distribution skew causing a false appearance of protection.

Conclusion Dose classification strategies strongly influence observed vaccine effectiveness. Careful control of timing and classification is essential to avoid bias in survival analyses. Result of CASE 1B) requires further investigation!!

Phyton script AB) Cox fair compare vx uvx.py -> Detailed Results

Histogram death distribution

Case 1A real data

Case 1B simulated deaths with real dose schedule

Case 2 simulated deaths flat

Case 3 simulated deaths bell curve

ZF) vx uvx norm AGE 70

Phyton script ZF) vx uvx norm.py
Download interactive html

Czech freedom of information real world data

To test for bias, the plot below shows a simulated dataset where all individuals have the same constant mortality risk — uniform across age groups and over time, at approximately real-world levels. Individuals were then randomly assigned to vaccinated or unvaccinated groups, using real-world vaccination schedules. Critically, deaths were only allowed to occur after vaccination — reflecting a real-world constraint. This introduces immortal time bias, which can create the false impression that vaccination offers protection, even when mortality risk is identical for all. As a reminder: in this simulation, everyone has the same baseline risk of death. But if group assignment is not random — as in real-world data — it introduces bias.

The bias makes the normalized mortality rate of the unvaccinated (UVX) group appear artificially much worse!
The result needs further investigation to verify!

Uses dsimulated dataset sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, this time ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

AA) Time-since-first-dose person-time stratification

Tried again to recode the methode used by r-code @henjin256's
https://sars2.net/czech2.html#Excess_mortality_by_weeks_after_vaccination
https://sars2.net/czech.html#Bucket_analysis

Creates buckets csv file for AGE 70 : AA) generate bucket csv.py (takes 1,5 hours only for AG 70)
Create the plot file : AA) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx.

If the method correctly adjusts for bias, the vaccinated excess mortality curve should be flat, at or near 0%
Since that is not observed:
- The MINBIAS test data or the underlying assumptions are incorrect,
- The method was not reproduced correctly
- There are errors in my code
Needs further investigation!
Disadvantage of this method: creating bucket files is slow and demands a lot of memory, when using R it's no problem.

Plot below uses only group unvaxed as basline!

Create the plot file : AC) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx+vx.

Plot of simulated minbias dataset AGE 70 should be a horizontal line at 0% - uses combined basline (group unvaxed + vaxed)!

Plot of real world Czech-FOI dataset - uses combined basline (group unvaxed + vaxed)!

CA) person days landmark methode for Hypothsis 1

Phyton script CA) Landmark adjust resampling truncation bias.py

Person days real world dataset Vesely_106_202403141131.csv

Uses a simulated minbias dataset to test whether the method compensates for mortality bias (forced restriction death_day >= last_dose_day). It seems to only partially correct the bias. RR should theoretically run horizontally at about 1

Uses simulated nobias dataset to test methode (no constraint death_day >= last_dose_day). RR should theoretically run horizontally at about 1

Y) person days methode for Hypothsis 1

Phyton script Y) vx uvx persondays immortal time adjusted.py

Person days real world dataset Vesely_106_202403141131.csv

test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.

test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?

Phyton script Y) vx uvx persondays.py

Person days real world dataset Vesely_106_202403141131.csv

simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.

simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?

Phyton script Y) vx uvx persondays baslinemort.py

Person days real world dataset Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?

W) When comparing different methods, Cox PH seemed to calculate the best approximation for Hypothsis 1

Phyton script W) coxph real deaths real vax dates by age Here you can Download interactive html

Cox PH analysis using Czech-FOI real world dataset Vesely_106_202403141131.csv

test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.

If the code below is correct, this might explain why scientists endlessly debate the results of their comparison

simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.

ZA) DoWhy causal impact estimation

Phyton script ZA) dowhy doses vs sim_total_death individual.py
Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

Following a lot of other analyses

X) Normalized UVx/Vx comparison and stacked events plot - Hypothesis 1

Plots generated by Phyton script X) event_stacking.py . Here you can Download the interactive htmls

Plot of simulated dataset below assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world scenario) inherently introducing immortal time bias (Case 1B obove), as illustrated below.

As a reminder, every individual in the homogeneous population below has the same constant mortality risk. If group assignment is non-random (as occurs in the real world), this introduces bias, making the normalized mortality rate of the UVX group looks much worse.

Test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv

The stacked events for the MINBIAS simulation AG 70

Normalized Vx/Uvx death rate plot of Czech real world data AG 70
Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request

The stacked events for czech reqal world data AG 70

Plot of simulated dataset estimates age-specific death rates and simulates constant, uniformly random death dates across the observation window, preserving vaccination timings but ignoring cases where death precedes vaccination to prevent any bias - not the case in real world scenario.

Test dataset sim_NOBIAS_Vesely_106_202403141131.csv created by NC) generate csv simulate deaths no bias.py

The stacked events AG 70 for the NOBIAS simulation should theoretically run horizontally at about the same level — could be a bug in the code.

G) G-estimate and Cox time variing methode to compensate for bias - Hypothsis 1

Test using simulated dataset based on real world paramter created by "NK) generate csv simulate deaths minimal bias.py".
Tried to evaluate whether the G-estimation (psi) method could correct for bias, but struggled with error messages.
By using the parameter of "Cox time variing methode" below it should probably work.

Phyton script G) generate interval data per person.py
Phyton script G) G-estimation on interval data per person.py
Phyton script GA) G-estimation on interval data per person all age.py
Phyton script G) cox on interval data per person.py

B) DeathRatesBy Age from aggregated

This produces the same results as "ZF) vx uvx norm" above, but uses aggregated CSV files for the calculations.

The aggregated CSV files were generated using the Python script: A) generate aggregated csv files from CzechFOI.py

The data is then plotted with the Python script: B) DeathRatesByAge from aggregated.py
Download interactive html

E) Death risk by age over time

Phyton script E) death risk by age.py
Download interactive html

Czech-FOI real world dataset Vesely_106_202403141131.csv

Simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv.
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.

F) daily crude HR vx/uvx

Phyton script F) rolling daily crude HR by age.py
Download interactive html

Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.

J) Bias Study ratio vx_uvx.py

Phyton script J) Bias Study ratio vx_uvx.py
Download interactive html

This script simulates uniform death risk over time to test bias in survival analysis. It compares static vs. time-dependent vaccinated/unvaccinated classification, computes death rates and 1st derivatives, Kaplan-Meier curves, and Cox models, and visualizes the results in an interactive Plotly HTML plot.

ZB) Hypergeometric (used by Charles Sanders Peirce) Vaccine Effectiveness Analysis with Confidence Intervals

Phyton script ZB) CS-Pierce Hypergeometric VaxCodes.py
Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

ZC)

Phyton script ZC) dowhy vaxcode doses vs total_death individual.py
Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

More plots added:

AC) Mean age at death before and after the start of vaccination czech real world data

Phyton script AC) age_mean.py

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

EA) Days since Doses

Phyton script EA) batches vs death.py

UA)

Phyton script UA) diff death dose agebin.py

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

UC)

Rolling correlation between the daily vaccine dose curve and the difference in normalized death rates between uvx and vx individuals (uvx - vx) !! The difference in deaths (uvx - vx) compensates for external influences, so vax effect should be left!!

Phyton script UC) diff norm death dose agebin.py

Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias. <br**>As the simulated Death risk for the whole homogen population is constant over time, differnece uvx-vx should fluctuate horizontally around level 0!**

ZG)

Phyton script ZG) doses_vs_deaths_dowhy.png

Zoomd in

DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

Zoomd in should be horizonta line (same simulated death reate for all three traces)

S) By calculating the difference between unvaccinated and vaccinated groups (uvx - vx), external influences should largely cancel out, isolating the effect of the vaccine.

Phyton script S) diff death dose agebin.py

Zoomd in

DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.

UB)

Phyton script UB) diff death dose agebin.py

Software Requirements:

These scripts don't require SQLite queries to aggregate the 11 million individual data rows. Instead, the aggregation is handled directly by Python scripts, which can generate aggregated CSV files very quickly. For coding questions or help, visit https://chatgpt.com.

Python 3.12.5 to run the scripts.
Visual Studio Code 1.92.2 to edit and run scripts.

Disclaimer:

The results have not been checked for errors. Neither methodological nor technical checks or data cleansing have been performed.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
Documentation		Documentation
Plot Results		Plot Results
Py Scripts		Py Scripts
TERRA		TERRA
intervals_per_agebin		intervals_per_agebin
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CzechFOI-DRATE

AB) Testscript for Cox time variing methode bias adjustment AGE 70 - Hypothsis 1

Cox Proportional Hazards Results – Real and Simulated Data

Histogram death distribution

ZF) vx uvx norm AGE 70

AA) Time-since-first-dose person-time stratification

CA) person days landmark methode for Hypothsis 1

Y) person days methode for Hypothsis 1

W) When comparing different methods, Cox PH seemed to calculate the best approximation for Hypothsis 1

If the code below is correct, this might explain why scientists endlessly debate the results of their comparison

ZA) DoWhy causal impact estimation

Following a lot of other analyses

X) Normalized UVx/Vx comparison and stacked events plot - Hypothesis 1

G) G-estimate and Cox time variing methode to compensate for bias - Hypothsis 1

B) DeathRatesBy Age from aggregated

E) Death risk by age over time

F) daily crude HR vx/uvx

J) Bias Study ratio vx_uvx.py

ZB) Hypergeometric (used by Charles Sanders Peirce) Vaccine Effectiveness Analysis with Confidence Intervals

ZC)

More plots added:

AC) Mean age at death before and after the start of vaccination czech real world data

EA) Days since Doses

UA)

UC)

ZG)

S) By calculating the difference between unvaccinated and vaccinated groups (uvx - vx), external influences should largely cancel out, isolating the effect of the vaccine.

UB)

Software Requirements:

Disclaimer:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages