The Kill-Log

136 systems tested and rejected.

Every strategy below cleared an initial hurdle. Returns looked real. The thesis made sense. We tested anyway.

Our standard is not whether these moves happen. They do. Our question is whether they happen at better than random odds, consistently, across different market regimes, in out-of-sample data. Most did not survive that test.

218

Systems Tested

152+

Confirmed Killed

69.7%

Kill Rate

21yrs

Data Tested

5-fold walk-forward validation required before any strategy reaches subscribers. 6 entire categories failed with 100% kill rates across all variants tested.

Kill 01

The Short Squeeze

Killed

GME changed how a lot of people think about short interest. Melvin Capital lost 53% in January 2021. The trade felt obvious after that. High short interest plus an earnings beat equals covering pressure plus positive momentum. We had to test it.

We ran it on thousands of post-earnings events across 21 years of data.

High short interest turned out to predict worse outcomes, not better ones.

Metric	Result
Direction	Opposite of hypothesis
Confidence	p < 0.001
Events tested	2,000+
Verdict	Killed

What actually happens is this. The investors who shorted the stock are watching the same earnings report you are. When the beat comes in, they cover immediately. The stock gaps up hard. That covering pressure is genuine. It just plays out overnight, before the open, before you can do anything about it. What you see the next morning is a stock that sophisticated money just used as an exit. What follows is distribution.

The squeeze is real. Timing it consistently at better than random odds is not.

Kill 02

Analyst Upgrades

Killed

If Goldman upgrades a stock the morning after a strong earnings beat, that feels like confirmation. An expert looked at the same data and agreed. We tested whether that agreement produced better outcomes than beats without upgrades.

We tested 11 versions of this idea. Consensus shifts, grade changes, fast upgrades, five forms of revision momentum, upgrade combined with beat.

None of them worked.

Variant	p-value
Consensus upgrade	0.779
Fast upgrade	0.870
Revision momentum (5 forms)	0.436 to 0.994
Upgrade combined with beat	0.240 to 0.999
Verdict	Killed — 11 of 11

The reason is straightforward once you see it. Analysts are watching the stock price move the same as everyone else. The upgrade reflects the gap that already happened. By the time the note publishes the market has already repriced. The analyst is documenting yesterday's move, not predicting tomorrow's.

Eleven tests. Eleven kills. We closed this category permanently.

Kill 03

Confirming Signals Across Strategies

Killed

This one felt like common sense. When two separate strategies fire on the same stock at the same time, that should mean higher conviction. Two independent mechanisms agreeing ought to be stronger than one.

We built the test expecting to find a boost. We found the opposite.

Metric	Result
Win rate change from dual agreement	Down 6.3 points
Confidence	p < 0.0001
Direction	Consistently negative
Verdict	Killed

The reason took some time to understand. When two strategies fire on the same stock simultaneously, it usually means both are responding to the same underlying event. They are not providing independent confirmation. They are measuring the same thing twice. The overlap produces contamination, not conviction.

We stopped treating multi-strategy agreement as a positive signal entirely.

Kill 04

Quality Screens

Killed

Academic finance has spent decades building quality factors. Profitability, earnings consistency, balance sheet strength. The idea is that high quality companies outperform over time. There is genuine evidence for this in long-horizon studies.

We applied a respected multi-factor quality composite to filter our deep value signals. We expected it to improve returns by removing weaker candidates.

It made things worse. Substantially worse.

Quality Tier	Avg Return
Low quality (bottom tier)	+21.07%
High quality (top tier)	+8.99%
Spread	12 points against quality
Verdict	Killed — removed from scoring

The deep value setup already screens for cash generation as its primary condition. Adding a broad quality filter on top of that eliminated the most stressed situations, which turned out to be exactly where the largest recoveries happened. The academically validated quality screen was removing the best trades.

We pulled it from scoring entirely.

Kill 05

The Strategy That Almost Launched

Killed

This one matters most to explain because our primary strategy also involves post-earnings stocks. There is an important distinction and this kill is where it became clear.

We built and backtested a strategy that entered positions in the days before earnings, intending to capture the reaction when the report came out. The backtest looked reasonable. Returns were positive. Win rate was acceptable.

Then we decomposed where the returns were actually coming from.

Component	Share of measured return
Overnight gap on earnings night	98.8%
Everything after that	1.2%
Verdict	Killed — entry artifact

The strategy was not predicting anything. It was holding stocks through a known binary event and measuring the gap that resulted. Strip that single overnight move out and there was nothing left.

This matters for how we think about post-earnings trading generally. Our primary strategy does the opposite. It enters the morning after earnings, after the gap has already happened, and specifically skips the overnight move. What it targets is the institutional repricing that plays out over the following weeks as analysts revise models, funds adjust positions, and the market slowly catches up to what the earnings report actually meant. That process takes days to weeks. It has nothing to do with the gap itself.

Gap capture and post-earnings drift are different markets. This kill is what taught us that clearly.

Why we publish this

Most services show you their winners. The kill rate tells you more. A system that has killed 152 strategies is a system that has been tested seriously. Anything still standing has survived real scrutiny, not just curve-fitting.

We will keep adding to this list. Every new hypothesis we test and reject gets documented here. The log grows. The active strategies stay small. That is how it should work.