The 6th Experiment — Why More Trades Didn't Save ETH Grid on the 1-Hour Timeframe
CoinClaw has now run six strategy experiments through its three-gate validation framework. Only one has passed. The latest — ETH Grid on a 1-hour timeframe — looked promising at Gate 1 with a p-value of 0.002 and nearly three times more trades than the validated 4-hour version. Then it hit Gate 2 and collapsed. Here's what happened, and why more trades don't automatically mean a better strategy.
Key Takeaways
- Grid trading places buy and sell orders at fixed intervals around a price
- Strategy validation requires passing Monte Carlo, walk-forward, and live paper trading gates
- Failed experiments are documented honestly — most strategies do not survive validation
The Hypothesis: Faster Timeframe, More Trades, Better Signal
ETH Grid Config B on the 4-hour timeframe is the only validated strategy in CoinClaw history. It passed all three gates — Monte Carlo p=0.003, Walk-Forward Efficiency of 2.559, and a bull-regime Sharpe of +0.218. It's the benchmark.
The natural question: what happens if you run the same strategy on a faster timeframe?
The 1-hour timeframe should produce roughly 4× more trading opportunities. More trades means more data points. More data points means better statistical significance. Better significance means a more robust strategy. Right?
That was the hypothesis behind CoinClaw's 6th experiment (TASK-35). Take the validated ETH Grid Config B parameters and run them on 1-hour candles instead of 4-hour candles. Same grid spacing. Same regime filter. Just faster.
Gate 1: Monte Carlo — Pass (p=0.002)
The results at Gate 1 looked excellent:
| Metric | 4H (Validated) | 1H (Experiment) |
|---|---|---|
| p-value | 0.003 | 0.002 |
| Total Trades | ~500 | 1,357 |
| Net P&L | — | +$1,061.35 |
| Sharpe Ratio | — | 0.0705 |
A p-value of 0.002 is even stronger than the 4-hour version's 0.003. The strategy generated 1,357 trades — nearly three times the 4-hour's ~500. The net P&L was positive at +$1,061.35. On paper, this looked like a clear improvement.
Gate 1 passed. On to Gate 2.
Gate 2: Walk-Forward — Catastrophic Failure (WFE=0.021)
This is where the 1-hour version fell apart.
Walk-Forward Efficiency measures whether a strategy's in-sample performance persists out-of-sample. A WFE above 0.5 means the strategy retains at least half its edge when tested on unseen data. The 4-hour version scored 2.559 — meaning it actually performed better out-of-sample than in-sample. That's rare and extremely encouraging.
The 1-hour version scored 0.021.
That's not a typo. The Walk-Forward Efficiency dropped from 2.559 to 0.021 — a 99.2% collapse. Of the 14 walk-forward windows tested, 7 showed negative WFE. Half the time, the strategy performed worse out-of-sample than random.
| Gate 2 Metric | 4H (Validated) | 1H (Experiment) |
|---|---|---|
| WFE | 2.559 | 0.021 |
| OOS Sharpe | 0.279 | 0.207 |
| Overfit Windows | — | 7/14 (50%) |
| DSR p-value | — | 1.000 |
| Result | PASS ✅ | FAIL ❌ |
The Deflated Sharpe Ratio p-value of 1.000 is the final nail. It means the out-of-sample Sharpe ratio is entirely explained by multiple testing bias. There is no real edge.
What Went Wrong: The Noise Problem
The 1-hour timeframe didn't add signal. It added noise.
Here's the intuition. A grid trading bot places buy orders below the current price and sell orders above it. When price oscillates within the grid, the bot captures small profits on each round trip. The strategy works when price movements are meaningful — when a drop to a grid level represents a genuine shift in supply/demand that's likely to reverse.
On a 4-hour timeframe, each candle represents four hours of market activity. The price movements captured by the grid are substantial enough to reflect real market dynamics. When the 4-hour chart shows ETH dropping to a grid level, that's a meaningful move that the regime filter can contextualize.
On a 1-hour timeframe, the same grid captures much smaller price fluctuations. Many of these are just intraday noise — random walks that happen to touch a grid level. The bot fills orders on these noise-driven touches, and the fills look profitable in-sample because the grid parameters were optimized for that specific noise pattern.
But noise doesn't persist. The specific pattern of intraday fluctuations in the training data won't repeat in the test data. That's why 7 out of 14 walk-forward windows showed negative WFE — the "edge" was just memorized noise.
The 4-hour version avoids this trap. Fewer trades, but each trade is driven by a price movement large enough to carry real information. The regime filter adds another layer of signal — it only trades when the broader market context supports the grid strategy. On the 1-hour timeframe, the regime filter can't distinguish signal from noise at that granularity.
The Scorecard: 6 Experiments, 1 Winner
With this result, CoinClaw's strategy research scorecard now stands at:
| # | Strategy | Gate 1 (p-value) | Gate 2 (WFE) | Result |
|---|---|---|---|---|
| 1 | BTC Grid Regime-Filtered | 0.178 ❌ | — | Gate 1 Fail |
| 2 | SOL Breakout Regime-Filtered | 0.251 ❌ | — | Gate 1 Fail |
| 3 | BTC Mean Reversion | 1.000 ❌ | — | Gate 1 Fail |
| 4 | SOL Grid Config B | 0.000 ✅ | -0.842 ❌ | Gate 2 Fail |
| 5 | ETH FG-Gated-40 | 0.056 ❌ | — | Gate 1 Fail |
| 6 | ETH Grid 1H | 0.002 ✅ | 0.021 ❌ | Gate 2 Fail |
| ✅ | ETH Grid Config B (4H) | 0.003 ✅ | 2.559 ✅ | Validated |
A pattern is emerging. Two experiments have now passed Gate 1 but failed Gate 2 — SOL Grid Config B (WFE=-0.842) and ETH Grid 1H (WFE=0.021). Both had strong in-sample performance that didn't survive out-of-sample testing. Gate 2 is doing exactly what it's designed to do: catching strategies that look good on historical data but won't work in live trading.
The three Gate 1 failures (BTC Grid, SOL Breakout, BTC Mean Reversion) didn't even have enough in-sample edge to pass the Monte Carlo test. ETH FG-Gated-40 came close at p=0.056 — just above the 0.05 threshold — but close doesn't count in statistical validation.
The Lesson: More Data Points ≠ Better Strategy
This experiment challenges a common assumption in quantitative trading: that more trades automatically produce better results. The logic seems sound — more data points should reduce variance and improve statistical confidence. And at Gate 1, that's exactly what happened. The 1-hour version had a slightly better p-value (0.002 vs 0.003) thanks to its larger sample size.
But statistical significance at Gate 1 only tells you that the strategy beat random in the training data. It doesn't tell you why. If the "why" is genuine market structure — price oscillations driven by real supply and demand dynamics — then the edge persists out-of-sample. If the "why" is noise — random intraday fluctuations that happened to align with your grid — then the edge evaporates.
The 4-hour timeframe works because it operates at a scale where price movements carry information. The 1-hour timeframe operates at a scale where price movements are dominated by noise. More trades at the noise scale just means more noise-driven fills that look profitable in hindsight but don't repeat.
This is why CoinClaw's three-gate framework exists. Gate 1 alone would have approved the 1-hour version. Gate 2 caught the overfitting. Without walk-forward analysis, this strategy would have been deployed with real money — and it would have lost.
What's Next
ETH Grid Config B on the 4-hour timeframe remains the only validated CoinClaw strategy. It's currently paper trading at +$18.31 (+1.83%) while its live deployment awaits a capital allocation decision. The live bot was installed but hit an "insufficient balance" error — the Binance account's USDC is already deployed across V3.5, V3.6, and V3.7.
Meanwhile, the unvalidated bots continue their own stories. V3.7 Scalper is up +$4.41 with zero errors across 181 cycles. V3.6 FG is accumulating positions in extreme fear territory (Fear & Greed Index at 17), buying the dip with dynamic position sizing. And V3.5 — the paradox bot that failed validation with p=0.938 but accumulated +$33.93 in live P&L — remains intentionally paused.
Six experiments. One winner. The search for a second validated strategy continues.