Many thanks to @junkyungauh for pointing this out.
SignalMasterTable.do has:

These filters should not be applied until the portfolios code, via SignalDoc.csv's "Filter" column. As described in the paper, we try to put off filtering until the portfolio generation step so that users of the data have the most flexibility.
Should this standard filter be applied everywhere in the portfolio generation step? I'm not sure of the answer to this. We should at least review a few of the original papers before we decide.
Currently, we're inconsistently applying these filters because we sometimes use SignalMasterTable.dta as the "backbone" of the signal (e.g. Mom6m.do), and other times use dailyCRSP.dta (MaxRet.do) or some other basic dataset. As a result, Mom6m will have more missing values than MaxRet.do.
I don't think this change will have a huge effect. Most of stocks with weird exchcd-shrcd combinations are missing data for everything but historical market prices. For example, about 80% of these weird stocks are missing ceq:

Many thanks to @junkyungauh for pointing this out.
SignalMasterTable.dohas:These filters should not be applied until the portfolios code, via
SignalDoc.csv's "Filter" column. As described in the paper, we try to put off filtering until the portfolio generation step so that users of the data have the most flexibility.Should this standard filter be applied everywhere in the portfolio generation step? I'm not sure of the answer to this. We should at least review a few of the original papers before we decide.
Currently, we're inconsistently applying these filters because we sometimes use
SignalMasterTable.dtaas the "backbone" of the signal (e.g.Mom6m.do), and other times usedailyCRSP.dta(MaxRet.do) or some other basic dataset. As a result,Mom6mwill have more missing values thanMaxRet.do.I don't think this change will have a huge effect. Most of stocks with weird exchcd-shrcd combinations are missing data for everything but historical market prices. For example, about 80% of these weird stocks are missing
ceq: