What is CoinClaw's three-gate validation framework?

CoinClaw validates strategies through three sequential gates: Gate 1 (Monte Carlo simulation, p 0) tests whether the edge survives out-of-sample data. Gate 3 (Regime Analysis, positive Sharpe in current regime) tests whether the edge works in current market conditions. A strategy must pass all three gates to be approved for live trading.

Why did 4 out of 5 CoinClaw experiments fail validation?

BTC Grid Regime-Filtered failed Gate 1 (p=0.178) — the regime filter actually worsened BTC Grid performance. SOL Breakout Regime-Filtered failed Gate 1 (p=0.251). BTC Mean Reversion failed Gate 1 (p=1.000) — RSI mean-reversion has zero edge on BTC. SOL Grid Config B passed Gate 1 (p=0.000) but failed Gate 2 (WFE=-0.842) — classic overfitting. ETH FG-Gated-40 failed Gate 1 (p=0.056) — a near-miss that shows F&G gating is weaker than regime filtering.

What is the V3.5 paradox?

V3.5 Grid has a Monte Carlo p-value of 0.938 — meaning its performance is statistically indistinguishable from random entry timing. Yet V3.5 has generated +$33.93 in real live P&L. The paradox: a strategy with no validated edge is making money. The explanation is that grid strategies in sideways markets tend to break even or profit slightly from bid-ask oscillation. V3.5 isn't exploiting a real edge — it's harvesting market microstructure noise. This works until a sustained directional move wipes out the accumulated small gains.

What makes ETH Grid Config B different from the failed experiments?

ETH Grid Config B is the only strategy that combines three properties: (1) a statistically significant edge (p=0.003), (2) out-of-sample robustness (WFE=2.559 — it actually performs better out-of-sample than in-sample), and (3) regime awareness (the regime filter blocks entries during bear markets, eliminating the bear Sharpe=-0.045 that the unfiltered version suffers from). The regime filter was the key innovation — it turned a strategy that loses money in bear markets into one that only trades when conditions favor it.

Why do SOL strategies consistently overfit?

Both SOL Breakout and SOL Grid Config B passed Gate 1 (in-sample) but failed Gate 2 (out-of-sample). SOL's higher volatility creates stronger apparent patterns in historical data — patterns that look like edges but are actually artifacts of SOL's regime shifts. When tested on new data, these patterns disappear. SOL Grid Config B had 7 of 14 walk-forward windows with negative WFE, with extreme outliers at WFE=-6.59 and WFE=-19.89. The in-sample performance was an illusion.

Published April 8, 2026 · 14 min read

The Strategy Research Roundup — 5 Experiments, 1 Winner, and What CoinClaw Learned About Finding a Real Edge

CoinClaw has now run five strategy experiments across BTC, ETH, and SOL. Four failed. One passed — and it passed with the strongest validation results in CoinClaw history. This is the full analysis: what was tested, what the numbers say, and what the V3.5 paradox means for everything we think we know about strategy validation.

Key Takeaways

Grid trading places buy and sell orders at fixed intervals around a price
Strategy validation requires passing Monte Carlo, walk-forward, and live paper trading gates
All results shown are from real exchange execution, not backtests
Failed experiments are documented honestly — most strategies do not survive validation

The Three-Gate Validation Framework

Before we look at the experiments, you need to understand what they were tested against. CoinClaw uses a three-gate validation framework that every strategy must pass before it's approved for live trading with real money. Each gate tests a different dimension of strategy quality.

Gate 1: Monte Carlo Simulation (Is the Edge Real?)

Gate 1 answers the most fundamental question in quantitative trading: is this strategy's performance due to a real edge, or could random entry timing have produced the same result?

The test works by running thousands of Monte Carlo simulations with randomized entry timing against the same historical data. If the strategy's actual performance ranks in the top 5% of random simulations (p < 0.05), the edge is statistically significant — there's less than a 5% chance it's due to luck.

A p-value of 0.003 means there's a 0.3% chance the result is random. A p-value of 0.938 means 93.8% of random simulations performed equally well or better — the strategy has no edge at all.

Gate 1 is the first filter. Most strategies fail here. If a strategy can't demonstrate statistical significance against random entry timing, there's no point testing it further.

Gate 2: Walk-Forward Efficiency (Does the Edge Survive New Data?)

Gate 2 catches the most dangerous failure mode in quantitative trading: overfitting. A strategy can look spectacular on historical data because it's been (consciously or unconsciously) tuned to fit the specific patterns in that data. Gate 2 tests whether the edge persists on data the strategy has never seen.

Walk-Forward Efficiency (WFE) divides the historical data into multiple windows. For each window, the strategy is optimized on the in-sample portion and then tested on the out-of-sample portion. WFE measures the ratio of out-of-sample performance to in-sample performance.

A WFE above 0 means the strategy generalizes — it works on new data, not just the data it was trained on. A WFE of 2.559 means the strategy actually performs 2.5x better out-of-sample than in-sample — the opposite of overfitting. A WFE of -0.842 means the strategy performs worse than random on new data — classic overfitting.

Gate 2 is where SOL strategies die. They look great in-sample and collapse out-of-sample.

Gate 3: Regime Analysis (Does It Work Now?)

Gate 3 tests whether the strategy is appropriate for current market conditions. A strategy might have a real, robust edge — but only in bull markets. If the current market is ranging or bearish, deploying that strategy would be a mistake.

The test decomposes historical performance by market regime (bull, bear, range) and checks whether the strategy is profitable in the current regime. For CoinClaw, the threshold is a positive Sharpe ratio in the bull regime (the current market classification).

Gate 3 is where the unfiltered ETH Grid failed. It had a real edge (Gate 1 pass) that survived out-of-sample (Gate 2 pass), but it lost money in bear markets (bear Sharpe = -0.045). The regime-filtered version solved this by simply not trading during bear regimes.

The Five Experiments

CoinClaw's strategy research team (Kai Tanaka, with validation framework built by Riley Nakamura) ran five experiments between March and April 2026. Here are the results.

#	Strategy	Gate 1 (p-value)	Gate 2 (WFE)	Result
1	BTC Grid Regime-Filtered	p = 0.178 ❌	—	Gate 1 fail
2	SOL Breakout Regime-Filtered	p = 0.251 ❌	—	Gate 1 fail
3	BTC Mean Reversion (RSI)	p = 1.000 ❌	—	Gate 1 fail — no edge
4	SOL Grid Config B	p = 0.000 ✅	WFE = -0.842 ❌	Gate 2 fail — overfit
5	ETH Grid FG-Gated-40	p = 0.056 ❌	—	Gate 1 near-miss

Experiment 1: BTC Grid Regime-Filtered (p = 0.178)

The hypothesis was straightforward: if regime filtering transformed ETH Grid from a near-miss (unfiltered) to a strong pass (p=0.003), maybe it would do the same for BTC Grid.

It didn't. The unfiltered BTC Grid had a p-value of 0.052 — a near-miss that suggested a real but marginal edge. Adding the regime filter made it worse: p=0.178. The regime filter blocked too many profitable trades on BTC, where the grid strategy's edge comes from a different volatility pattern than ETH.

The lesson: regime filtering isn't universally beneficial. It works on ETH because ETH Grid's losses are concentrated in bear regimes. BTC Grid's losses are more evenly distributed across regimes, so filtering out bear periods removes profitable trades along with unprofitable ones.

Experiment 2: SOL Breakout Regime-Filtered (p = 0.251)

SOL Breakout was already tested by Riley (p=0.192 unfiltered). Adding a regime filter didn't help — p moved to 0.251. SOL's breakout patterns don't align with the bull/bear/range regime classification that CoinClaw uses. SOL breaks out in all regimes, and the breakouts are equally likely to be false signals in all regimes.

Experiment 3: BTC Mean Reversion (p = 1.000)

This was the most definitive result in the entire research program. BTC Mean Reversion uses RSI (Relative Strength Index) to identify overbought and oversold conditions, then trades the expected reversion to the mean.

p = 1.000. Every single random simulation outperformed the strategy. RSI mean-reversion on BTC doesn't just lack an edge — it's actively worse than random. BTC's price action doesn't mean-revert on the timeframes RSI measures. When BTC is "oversold" by RSI standards, it's more likely to continue falling than to bounce.

This is a useful negative result. It eliminates an entire class of strategies (RSI-based mean reversion) from consideration for BTC. The research team doesn't need to test RSI variants, different RSI periods, or RSI combined with other indicators on BTC — the fundamental approach has no edge.

Experiment 4: SOL Grid Config B (p = 0.000, WFE = -0.842)

This is the most instructive failure. SOL Grid Config B had a perfect Gate 1 result: p=0.000, Sharpe=0.1025, 3,519 trades, net P&L of +$3,723.89. In-sample, it looked like a money machine.

Then Gate 2 destroyed it. Walk-Forward Efficiency: -0.842. Seven of fourteen walk-forward windows had negative WFE. Windows 1 and 13 were extreme outliers at WFE=-6.59 and WFE=-19.89 respectively.

What happened? SOL's high volatility creates strong apparent patterns in historical data. The grid strategy's parameters (grid_pct=0.12, 15 levels/side, 100 USDT/level) were perfectly tuned to capture SOL's specific historical oscillation patterns. But those patterns were artifacts of specific market conditions that didn't repeat. On new data, the strategy performed worse than random.

This is textbook overfitting. The strategy memorized the training data instead of learning a generalizable pattern. Gate 2 exists specifically to catch this failure mode, and it caught it cleanly.

Experiment 5: ETH Grid FG-Gated-40 (p = 0.056)

The hypothesis: the original ETH Grid with Fear & Greed gating (threshold < 25) produced p=0.064 — a near-miss. Relaxing the threshold to F&G < 40 should increase trade count and push the p-value below 0.05.

It helped, but not enough. p moved from 0.064 to 0.056. Trade count increased from ~500 to 863 as the bot traded in moderate-fear periods (F&G 25-40) in addition to extreme-fear periods. But the additional trades in moderate-fear conditions diluted the edge — the strongest signal comes from extreme fear only.

The conclusion: F&G gating is a weaker signal than regime filtering for ETH Grid. The regime filter (which uses price action and volatility, not sentiment) produces p=0.003. The F&G gate (which uses a sentiment index) produces p=0.056 at best. Regime filtering captures the same information more precisely.

The Winner: ETH Grid Config B (Regime-Filtered)

Out of five experiments, one strategy passed all three gates:

Gate	Threshold	ETH Grid Config B Result
Gate 1: Monte Carlo	p < 0.05	p = 0.003
Gate 2: Walk-Forward	WFE > 0	WFE = 2.559 (OOS Sharpe = 0.279)
Gate 3: Regime	Positive bull Sharpe	Bull Sharpe = +0.218

What makes ETH Grid Config B different from the four failures?

The right asset. ETH's volatility profile is better suited to grid trading than BTC or SOL. ETH oscillates within ranges more consistently than BTC (which trends) or SOL (which has extreme regime shifts). The grid captures these oscillations as profit.

The regime filter. This was the key innovation. The unfiltered ETH Grid Config B had a bear regime Sharpe of -0.045 — it lost money in bear markets. The regime filter blocks all entries during bear regimes, eliminating those losses entirely. The strategy only trades when market conditions favor grid trading (bull and range regimes).

Genuine out-of-sample robustness. A WFE of 2.559 is remarkable. It means the strategy performs 2.5x better on data it's never seen than on data it was optimized for. This is the opposite of overfitting — it suggests the strategy has captured a real, persistent pattern in ETH's price behavior rather than memorizing historical noise.

ETH Grid Config B is now live as V3.8, trading ETH/USDC on Binance (after a pivot from ETH/USDT due to an account-level trading pair restriction).

Patterns Across the Failures

Looking at all five experiments together, three patterns emerge:

1. SOL strategies consistently overfit. Both SOL experiments (Breakout and Grid Config B) showed the same pattern: strong in-sample performance that collapses out-of-sample. SOL's higher volatility creates more apparent patterns in historical data, but those patterns are less persistent than ETH's. For CoinClaw's current framework, SOL is not a viable asset for grid or breakout strategies.

2. Regime filtering helps ETH but hurts BTC. The regime filter improved ETH Grid from near-miss to strong pass, but worsened BTC Grid from near-miss to clear fail. The difference is in where each asset's grid losses are concentrated. ETH Grid losses cluster in bear regimes (filterable). BTC Grid losses are distributed across all regimes (not filterable without removing profitable trades).

3. RSI mean-reversion has no edge on crypto. The p=1.000 result for BTC Mean Reversion is definitive. RSI was designed for equity markets with different microstructure. Crypto markets don't mean-revert on RSI timeframes — they trend. This eliminates a large class of potential strategies from consideration.

The V3.5 Paradox: No Edge, Real Profit

Here's the number that challenges the entire validation framework: V3.5 Grid has a Monte Carlo p-value of 0.938. That means 93.8% of random entry timing simulations performed equally well or better. By every statistical measure, V3.5 has no edge.

And yet V3.5 has generated +$33.93 in real live P&L on $607 of capital — a 5.59% return.

How can a strategy with no validated edge make money?

The answer lies in understanding what grid strategies actually do at the microstructure level. A grid strategy places buy orders below the current price and sell orders above it. When price oscillates — even randomly — the grid captures small profits from each oscillation. In a sideways market, this produces a steady stream of small gains.

The key insight: V3.5 isn't exploiting a directional edge. It's harvesting market microstructure noise. The bid-ask bounce, the small random oscillations that happen in any liquid market — these produce grid fills that net small positive P&L. The Monte Carlo simulation correctly identifies this as "no edge" because random entry timing produces the same result. The grid doesn't need to be smart about when to enter — it just needs price to oscillate, which it always does.

This works until it doesn't. The risk is a sustained directional move. If BTC drops 10% without oscillating back through the grid levels, V3.5 accumulates unrealized losses that dwarf the accumulated small gains. The +$33.93 profit exists because BTC has been ranging. A trend would erase it.

The V3.5 paradox doesn't invalidate the validation framework — it illuminates its purpose. The framework correctly identifies that V3.5 has no statistical edge. The fact that V3.5 is currently profitable doesn't mean the framework is wrong. It means V3.5 is in a favorable market regime (sideways) that won't last forever. When the regime changes, V3.5's lack of edge will become a lack of profit.

This is exactly why V3.8 (with its validated edge and regime filter) exists. V3.8 doesn't just harvest noise — it has a real, persistent, regime-aware edge that the validation framework confirmed. When the market regime changes, V3.8 adapts (by stopping trading in unfavorable regimes). V3.5 doesn't.

The paradox is a reminder: short-term P&L is not evidence of edge. Only rigorous statistical testing can distinguish a real edge from favorable market conditions. V3.5 has the latter. V3.8 has both.

What This Means for CoinClaw

Five experiments. One winner. An 80% failure rate. Is that good or bad?

In quantitative trading research, it's normal. Most strategy ideas don't work. The value of a rigorous validation framework isn't that it approves strategies — it's that it rejects bad ones before they lose real money. Each of the four failed experiments would have been deployed to live trading without the three-gate framework. Each would have eventually lost money.

The research program has produced three actionable conclusions:

1. ETH is CoinClaw's best asset for grid strategies. ETH Grid Config B is the only validated strategy. BTC Grid is a near-miss that regime filtering can't fix. SOL strategies overfit. Future grid strategy research should focus on ETH variants.

2. Regime filtering is the single most impactful technique. It transformed ETH Grid from a strategy that loses money in bear markets to one that only trades in favorable conditions. Every future strategy should be tested with and without regime filtering.

3. The live bots (V3.5, V3.6) have no validated edge. V3.5 (p=0.938) and V3.6 (p=0.114) are running with real money on strategies that fail Gate 1. They're currently profitable because market conditions happen to favor their approach. This is the definition of uncompensated risk — returns that could disappear when conditions change.

The strategy research continues. Kai's next experiments will test tighter grid spacing on the validated Key BTC Grid Range strategy and explore whether the regime filter can rescue BTC Grid Config B. But the core finding is clear: finding a real, validated, deployable edge in crypto trading is hard. CoinClaw has found exactly one in five attempts. That one — ETH Grid Config B, now live as V3.8 — is the foundation everything else builds on.