Skip to content

gitfrid/CzechFOI-DRATE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

279 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CzechFOI-DRATE

CzechFOI-DRATE: Exploring ways to minimize bias when dividing real-world data into two groups (vaccinated vx /unvaccinated uvx)


Hypothesis 1 - CzechFOI-DRATE-NOBIAS repository:

It is impossible to perfectly and fairly compare vaccinated (VX) and unvaccinated (UVX) groups — either by measurement or mathematically — when vaccination is time-dependent and not random.
This remains true if both groups have the same homogen individual constant death rates.


Hypothesis 2 - see CzechFOI-SIM repository:

There is currently no reliable statistical method to determine the rate of death-related Adverse Events Following Immunisation (dAEFIs) at a frequency of approximately one additional death per 10,000 doses when the baseline mortality is unknown in real-world settings.


To the best of my knowledge, this (vital) problem is still waiting for the head that can solve it, as a benefactor of mankind? This also applies vice versa (one death per 10,000 doses removed/saved)


Project GOAL

The aim is to find a method that compensates for biases introduced by the non-random assignment of individuals to vaccinated (vx) and unvaccinated (uvx) groups based on the timing of vaccination. This type of bias is unavoidable in real-world datasets, but it must be corrected in order to enable a fair comparison between the two groups.

Simulated Test Dataset with Minimal Bias

A synthetic dataset was generated in which individuals within each age group share a constant and homogeneous risk of death, estimated from real-world age-specific death rates. Death dates were simulated independently of vaccination status. Real-world vaccination schedules (dose sets) were then reassigned randomly to individuals within the same age group, ensuring that each individual's entire dose schedule occurred on or before their simulated death date. No actual death dates were removed or altered—only the assignment of dose dates was adjusted to maintain this temporal consistency. This approach minimizes bias while preserving realistic dose timing patterns.

sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
30.06.2025 changed constraint death day >= last dose day to death day > last dose day as cox can't handle zero intervalls where start = stop

CzechFOI-DRATE_EXAM project for further investigations


AB) Testscript for Cox time variing methode bias adjustment AGE 70 - Hypothsis 1


Impact of Dose Assignment Strategy on bias correction and Estimated Mortality Risk

Objective To assess how different vaccine dose assignment strategies affect estimated hazard ratios (HRs) for mortality and test Bias adjustment.

Methods Time-varying Cox regression was used to compare mortality risk between vaccinated and unvaccinated individuals under four scenarios: Except Case 1A (Real World), all were simulated assuming a random, homogeneous, and constant death risk across the whole population.

Case 1A: Real-world Czech FOI data (death must follow dose - real world) 
         Uses the Czech freedoem of information request raw dataset Vesely_106_202403141131.csv             
        
Case 1B: Simulated doses with the same distribution as the Real-world Czech FOI data (death must follow dose) - 
         The csv dataset was created by "NK) generate csv simulate deaths minimal bias.py" (see Project GOAL).
         It uses the same dose schedule distribution and aproximate same but constant death rate from the Czech-FOI real World Data AGE 70
        
Case 2:  Simulated doses with flat random assignment (death must follow dose).

Case 3:  Simulated doses with a bell-curve distribution(death must follow dose).

Results:

Cases 1B, 2, and 3 were simulated with uniform risk and should theoretically yield HR ≈ 1, reflecting no vaccine effect

Cox Proportional Hazards Results – Real and Simulated Data

This table summarizes Cox regression hazard ratios (HR) across four datasets:

Case     Description β (coef) HR = exp(β) Risk Reduction (%) 95% CI (HR) z p-value −log₂(p) Expected HR Interpretation
1A Real data – Czech FOI dataset -0.34 0.71 29% 0.70–0.73 -27.36 < 0.005 514.18 <1 Strong protective effect observed
1B Simulated deaths + real dose distribution -0.28 0.75 25% 0.73–0.77 -19.84 < 0.005 287.13 ≈1 False protective effect observed Bias inflated
2 (Flat) Simulated deaths + flat dose distribution -0.04 0.96 4% 0.93–0.99 -2.82 0.005 8.39 ≈1 Minimal bias – HR near 1, as expected
3 (Bell) Simulated deaths + bell curve dose distribution +0.28 1.33 -33% 1.30–1.37 +17.94 < 0.005 272.61 ≈1 Artificial harm due to reversed timing bias

Notes:

  • Risk Reduction (%) = (1 - HR) × 100%.
  • Values > 0 indicate reduced risk (protective effect).
  • Negative values (like Case 3) indicate increased risk (harm).
  • Cases 1B, 2, and 3 were simulated with constant death risk; HR≈1 is expected in absence of real effect.
  • Case 1B shows bias due to the real dose distribution skew causing a false appearance of protection.


Conclusion Dose classification strategies strongly influence observed vaccine effectiveness. Careful control of timing and classification is essential to avoid bias in survival analyses. Result of CASE 1B) requires further investigation!!

Phyton script AB) Cox fair compare vx uvx.py -> Detailed Results


Histogram death distribution



Case 1A real data



Case 1B simulated deaths with real dose schedule



Case 2 simulated deaths flat



Case 3 simulated deaths bell curve



ZF) vx uvx norm AGE 70


Phyton script ZF) vx uvx norm.py
Download interactive html

Czech freedom of information real world data



To test for bias, the plot below shows a simulated dataset where all individuals have the same constant mortality risk — uniform across age groups and over time, at approximately real-world levels. Individuals were then randomly assigned to vaccinated or unvaccinated groups, using real-world vaccination schedules. Critically, deaths were only allowed to occur after vaccination — reflecting a real-world constraint. This introduces immortal time bias, which can create the false impression that vaccination offers protection, even when mortality risk is identical for all. As a reminder: in this simulation, everyone has the same baseline risk of death. But if group assignment is not random — as in real-world data — it introduces bias.

The bias makes the normalized mortality rate of the unvaccinated (UVX) group appear artificially much worse!
The result needs further investigation to verify!

Uses dsimulated dataset sim_MINBIAS_Vesely_106_SIMULATED.csv created by NK) generate csv simulate deaths minimal bias.py
Download interactive html

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, this time ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.


AA) Time-since-first-dose person-time stratification

Tried again to recode the methode used by r-code @henjin256's
https://sars2.net/czech2.html#Excess_mortality_by_weeks_after_vaccination
https://sars2.net/czech.html#Bucket_analysis


Creates buckets csv file for AGE 70 : AA) generate bucket csv.py (takes 1,5 hours only for AG 70)
Create the plot file : AA) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx.

If the method correctly adjusts for bias, the vaccinated excess mortality curve should be flat, at or near 0%
Since that is not observed:
- The MINBIAS test data or the underlying assumptions are incorrect,
- The method was not reproduced correctly
- There are errors in my code
Needs further investigation!
Disadvantage of this method: creating bucket files is slow and demands a lot of memory, when using R it's no problem.


Plot below uses only group unvaxed as basline!

Create the plot file : AC) AG70 sim MINBIAS record_level_mort_vx uvx.py Basline uvx+vx.

Plot of simulated minbias dataset AGE 70 should be a horizontal line at 0% - uses combined basline (group unvaxed + vaxed)!

Plot of real world Czech-FOI dataset - uses combined basline (group unvaxed + vaxed)!


CA) person days landmark methode for Hypothsis 1


Phyton script CA) Landmark adjust resampling truncation bias.py

Person days real world dataset Vesely_106_202403141131.csv


Uses a simulated minbias dataset to test whether the method compensates for mortality bias (forced restriction death_day >= last_dose_day). It seems to only partially correct the bias. RR should theoretically run horizontally at about 1


Uses simulated nobias dataset to test methode (no constraint death_day >= last_dose_day). RR should theoretically run horizontally at about 1


Y) person days methode for Hypothsis 1


Phyton script Y) vx uvx persondays immortal time adjusted.py

Person days real world dataset Vesely_106_202403141131.csv



test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.



test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?




Phyton script Y) vx uvx persondays.py

Person days real world dataset Vesely_106_202403141131.csv


simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.


simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?



Phyton script Y) vx uvx persondays baslinemort.py

Person days real world dataset Vesely_106_202403141131.csv



To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.




The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.
Should be the same for vx uvx - perhaps bug in code or method not correct applied, or personday method don't adjust for death day >= last dose day bias?




W) When comparing different methods, Cox PH seemed to calculate the best approximation for Hypothsis 1


Phyton script W) coxph real deaths real vax dates by age Here you can Download interactive html

Cox PH analysis using Czech-FOI real world dataset Vesely_106_202403141131.csv


test dataset sim_NOBIAS_Vesely_106_202403141131.csv

To test for bias, I run the same code on simulated data with a homogen uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, to avoid any selection bias.


If the code below is correct, this might explain why scientists endlessly debate the results of their comparison

simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.

The test dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.



ZA) DoWhy causal impact estimation


Phyton script ZA) dowhy doses vs sim_total_death individual.py
Download interactive html



To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



Following a lot of other analyses



X) Normalized UVx/Vx comparison and stacked events plot - Hypothesis 1


Plots generated by Phyton script X) event_stacking.py . Here you can Download the interactive htmls


Plot of simulated dataset below assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world scenario) inherently introducing immortal time bias (Case 1B obove), as illustrated below.

As a reminder, every individual in the homogeneous population below has the same constant mortality risk. If group assignment is non-random (as occurs in the real world), this introduces bias, making the normalized mortality rate of the UVX group looks much worse.


Test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv

The stacked events for the MINBIAS simulation AG 70



Normalized Vx/Uvx death rate plot of Czech real world data AG 70
Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request

The stacked events for czech reqal world data AG 70



Plot of simulated dataset estimates age-specific death rates and simulates constant, uniformly random death dates across the observation window, preserving vaccination timings but ignoring cases where death precedes vaccination to prevent any bias - not the case in real world scenario.

Test dataset sim_NOBIAS_Vesely_106_202403141131.csv created by NC) generate csv simulate deaths no bias.py


The stacked events AG 70 for the NOBIAS simulation should theoretically run horizontally at about the same level — could be a bug in the code.


G) G-estimate and Cox time variing methode to compensate for bias - Hypothsis 1


Test using simulated dataset based on real world paramter created by "NK) generate csv simulate deaths minimal bias.py".
Tried to evaluate whether the G-estimation (psi) method could correct for bias, but struggled with error messages.
By using the parameter of "Cox time variing methode" below it should probably work.

Phyton script G) generate interval data per person.py
Phyton script G) G-estimation on interval data per person.py
Phyton script GA) G-estimation on interval data per person all age.py
Phyton script G) cox on interval data per person.py


B) DeathRatesBy Age from aggregated



This produces the same results as "ZF) vx uvx norm" above, but uses aggregated CSV files for the calculations.

The aggregated CSV files were generated using the Python script: A) generate aggregated csv files from CzechFOI.py

The data is then plotted with the Python script: B) DeathRatesByAge from aggregated.py
Download interactive html




E) Death risk by age over time


Phyton script E) death risk by age.py
Download interactive html

Czech-FOI real world dataset Vesely_106_202403141131.csv


Simulated test dataset sim_NOBIAS_Vesely_106_202403141131.csv.
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.



F) daily crude HR vx/uvx


Phyton script F) rolling daily crude HR by age.py
Download interactive html


Czech real world dataset Vesely_106_202403141131.csv Download Freedom of Information Request

To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



Simulated test dataset sim_MINBIAS_Vesely_106_SIMULATED.csv.
The dataset assuming a homogeneous, uniform, and time-invariant mortality rate across age groups (at about real world level). Afterward Individuals were randomly assigned to vaccinated or unvaccinated cohorts, with real-world dosing schedules applied. Enforcing that death could only occur post-vaccination (real world) inherently introducing immortal time bias, as illustrated below.



J) Bias Study ratio vx_uvx.py


Phyton script J) Bias Study ratio vx_uvx.py
Download interactive html

This script simulates uniform death risk over time to test bias in survival analysis. It compares static vs. time-dependent vaccinated/unvaccinated classification, computes death rates and 1st derivatives, Kaplan-Meier curves, and Cox models, and visualizes the results in an interactive Plotly HTML plot.


ZB) Hypergeometric (used by Charles Sanders Peirce) Vaccine Effectiveness Analysis with Confidence Intervals


Phyton script ZB) CS-Pierce Hypergeometric VaxCodes.py
Download interactive html



To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



ZC)


Phyton script ZC) dowhy vaxcode doses vs total_death individual.py
Download interactive html



To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



More plots added:


AC) Mean age at death before and after the start of vaccination czech real world data


Phyton script AC) age_mean.py



To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



EA) Days since Doses


Phyton script EA) batches vs death.py





UA)


Phyton script UA) diff death dose agebin.py



To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



UC)


Rolling correlation between the daily vaccine dose curve and the difference in normalized death rates between uvx and vx individuals (uvx - vx) !! The difference in deaths (uvx - vx) compensates for external influences, so vax effect should be left!!


Phyton script UC) diff norm death dose agebin.py

Download interactive html


To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias. <br**>As the simulated Death risk for the whole homogen population is constant over time, differnece uvx-vx should fluctuate horizontally around level 0!**


ZG)


Phyton script ZG) doses_vs_deaths_dowhy.png



Zoomd in

DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



Zoomd in should be horizonta line (same simulated death reate for all three traces)


S) By calculating the difference between unvaccinated and vaccinated groups (uvx - vx), external influences should largely cancel out, isolating the effect of the vaccine.


Phyton script S) diff death dose agebin.py



Zoomd in

DoWhy seems not to be correct here
To test for bias, I run the same code on simulated data with a uniform, constant death rate across ages and time. I then afterwards split into vaccinated and unvaccinated groups, ignoring real-world constraints like requiring death to occur after vaccination, which would introduce selection bias.



UB)


Phyton script UB) diff death dose agebin.py



Software Requirements:

These scripts don't require SQLite queries to aggregate the 11 million individual data rows. Instead, the aggregation is handled directly by Python scripts, which can generate aggregated CSV files very quickly. For coding questions or help, visit https://chatgpt.com.

Disclaimer:

The results have not been checked for errors. Neither methodological nor technical checks or data cleansing have been performed.

About

CzechFOI-DRATE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages