Long story short, we should remove the winsorizing in this do file. But I spent wayy to much time tracking this down, because of OCD, and thought we should document how OCD this repo is.
So there's a bug in
https://github.com/OpenSourceAP/CrossSection/blob/d81c696d283d62b61260f223eeac0e90511a4e77/Signals/Code/Predictors/ZZ2_PriceDelaySlope_PriceDelayRsq_PriceDelayTstat.do
line 84 has
gstats winsor PriceDelayTstat, by(time_avail_m) trim cuts(10 90) replace // Trim very aggressively because coefficient/se not very well-behaved
which should mean that all extreme values are forced to the same value for a given time_avail_m. But, instead, we have these weird missing values if i run
gstats winsor PriceDelayTstat, by(time_avail_m) trim cuts(10 90) gen(TstatWin)
list permno time_avail_m PriceDelayTstat TstatWin if time_avail_m == tm(1954m7) & permno >= 20677 & permno <= 20800
The value of 14.03342 should be trimmed to a smaller value of around 6, based on the summary stats:

but instead it's made missing. Then the missing value is filled later on in the code, in the "Fill to Monthly" step. This weirdness might happen because the underlying data is actually daily, and we don't sort by daily date, but I'm honestly not sure.
It doesn't matter, because the OP (Hou and Moskowitz) don't mention any winsorizing. It also doesn't make sense to me why we should winsorize t-stats but not the slopes (if anything the slopes would be more noisy). Last, the winsorizing should not affect any portfolio sorts anyway.
I found this bug while comparing the python and Stata outputs. For the Stata liberation.
Long story short, we should remove the winsorizing in this do file. But I spent wayy to much time tracking this down, because of OCD, and thought we should document how OCD this repo is.
So there's a bug in
https://github.com/OpenSourceAP/CrossSection/blob/d81c696d283d62b61260f223eeac0e90511a4e77/Signals/Code/Predictors/ZZ2_PriceDelaySlope_PriceDelayRsq_PriceDelayTstat.do
line 84 has
which should mean that all extreme values are forced to the same value for a given time_avail_m. But, instead, we have these weird missing values if i run
The value of 14.03342 should be trimmed to a smaller value of around 6, based on the summary stats:

but instead it's made missing. Then the missing value is filled later on in the code, in the "Fill to Monthly" step. This weirdness might happen because the underlying data is actually daily, and we don't sort by daily date, but I'm honestly not sure.
It doesn't matter, because the OP (Hou and Moskowitz) don't mention any winsorizing. It also doesn't make sense to me why we should winsorize t-stats but not the slopes (if anything the slopes would be more noisy). Last, the winsorizing should not affect any portfolio sorts anyway.
I found this bug while comparing the python and Stata outputs. For the Stata liberation.