The Kill-Log
136 systems tested and rejected.
Every strategy below cleared an initial hurdle. Returns looked real. The thesis made sense. We tested anyway.
Our standard is not whether these moves happen. They do. Our question is whether they happen at better than random odds, consistently, across different market regimes, in out-of-sample data. Most did not survive that test.
5-fold walk-forward validation required before any strategy reaches subscribers. 6 entire categories failed with 100% kill rates across all variants tested.
Kill 01
The Short Squeeze
GME changed how a lot of people think about short interest. Melvin Capital lost 53% in January 2021. The trade felt obvious after that. High short interest plus an earnings beat equals covering pressure plus positive momentum. We had to test it.
We ran it on thousands of post-earnings events across 21 years of data.
High short interest turned out to predict worse outcomes, not better ones.
| Metric | Result |
|---|---|
| Direction | Opposite of hypothesis |
| Confidence | p < 0.001 |
| Events tested | 2,000+ |
| Verdict | Killed |
What actually happens is this. The investors who shorted the stock are watching the same earnings report you are. When the beat comes in, they cover immediately. The stock gaps up hard. That covering pressure is genuine. It just plays out overnight, before the open, before you can do anything about it. What you see the next morning is a stock that sophisticated money just used as an exit. What follows is distribution.
The squeeze is real. Timing it consistently at better than random odds is not.
Kill 02
Analyst Upgrades
If Goldman upgrades a stock the morning after a strong earnings beat, that feels like confirmation. An expert looked at the same data and agreed. We tested whether that agreement produced better outcomes than beats without upgrades.
We tested 11 versions of this idea. Consensus shifts, grade changes, fast upgrades, five forms of revision momentum, upgrade combined with beat.
None of them worked.
| Variant | p-value |
|---|---|
| Consensus upgrade | 0.779 |
| Fast upgrade | 0.870 |
| Revision momentum (5 forms) | 0.436 to 0.994 |
| Upgrade combined with beat | 0.240 to 0.999 |
| Verdict | Killed — 11 of 11 |
The reason is straightforward once you see it. Analysts are watching the stock price move the same as everyone else. The upgrade reflects the gap that already happened. By the time the note publishes the market has already repriced. The analyst is documenting yesterday's move, not predicting tomorrow's.
Eleven tests. Eleven kills. We closed this category permanently.
Kill 03
Confirming Signals Across Strategies
This one felt like common sense. When two separate strategies fire on the same stock at the same time, that should mean higher conviction. Two independent mechanisms agreeing ought to be stronger than one.
We built the test expecting to find a boost. We found the opposite.
| Metric | Result |
|---|---|
| Win rate change from dual agreement | Down 6.3 points |
| Confidence | p < 0.0001 |
| Direction | Consistently negative |
| Verdict | Killed |
The reason took some time to understand. When two strategies fire on the same stock simultaneously, it usually means both are responding to the same underlying event. They are not providing independent confirmation. They are measuring the same thing twice. The overlap produces contamination, not conviction.
We stopped treating multi-strategy agreement as a positive signal entirely.
Kill 04
Quality Screens
Academic finance has spent decades building quality factors. Profitability, earnings consistency, balance sheet strength. The idea is that high quality companies outperform over time. There is genuine evidence for this in long-horizon studies.
We applied a respected multi-factor quality composite to filter our deep value signals. We expected it to improve returns by removing weaker candidates.
It made things worse. Substantially worse.
| Quality Tier | Avg Return |
|---|---|
| Low quality (bottom tier) | +21.07% |
| High quality (top tier) | +8.99% |
| Spread | 12 points against quality |
| Verdict | Killed — removed from scoring |
The deep value setup already screens for cash generation as its primary condition. Adding a broad quality filter on top of that eliminated the most stressed situations, which turned out to be exactly where the largest recoveries happened. The academically validated quality screen was removing the best trades.
We pulled it from scoring entirely.
Kill 05
The Strategy That Almost Launched
This one matters most to explain because our primary strategy also involves post-earnings stocks. There is an important distinction and this kill is where it became clear.
We built and backtested a strategy that entered positions in the days before earnings, intending to capture the reaction when the report came out. The backtest looked reasonable. Returns were positive. Win rate was acceptable.
Then we decomposed where the returns were actually coming from.
| Component | Share of measured return |
|---|---|
| Overnight gap on earnings night | 98.8% |
| Everything after that | 1.2% |
| Verdict | Killed — entry artifact |
The strategy was not predicting anything. It was holding stocks through a known binary event and measuring the gap that resulted. Strip that single overnight move out and there was nothing left.
This matters for how we think about post-earnings trading generally. Our primary strategy does the opposite. It enters the morning after earnings, after the gap has already happened, and specifically skips the overnight move. What it targets is the institutional repricing that plays out over the following weeks as analysts revise models, funds adjust positions, and the market slowly catches up to what the earnings report actually meant. That process takes days to weeks. It has nothing to do with the gap itself.
Gap capture and post-earnings drift are different markets. This kill is what taught us that clearly.
Why we publish this
Most services show you their winners. The kill rate tells you more. A system that has killed 152 strategies is a system that has been tested seriously. Anything still standing has survived real scrutiny, not just curve-fitting.
We will keep adding to this list. Every new hypothesis we test and reject gets documented here. The log grows. The active strategies stay small. That is how it should work.