60 days running poly5m-v4 on BTC — what the numbers actually look like

A first-person post-mortem on two months of live trading with the multi-exchange median scalper. Real ranges, real drawdowns, no cherry-picking.

2026-05-20by Arnaud Knobloch

Two months ago I switched my BTC scalper from v3 to v4 on live capital. v4 introduced a multi-exchange median price signal (Binance + Coinbase + Kraken) to replace the single-feed Binance signal, added a composite momentum score, and hardened the chop detector. This is a post-mortem on what happened.

I'm going to give ranges rather than exact dollar figures because PnL depends heavily on sizing, and I don't want anyone reverse-engineering my position sizes or mistaking my specific numbers for a guarantee. The percentages are what matter.

The setup

Capital: $30 bank, fixed $3 per trade
Bot: poly5m-v4 on BTC/USD 5-minute binary markets
Window: 60 trading days, roughly 3–5 trades per day depending on chop suppression
Resolution: Chainlink oracle via Gamma API (not Binance price — critical distinction)

Win rate

Over the 60-day window, v4 won roughly 53–55% of resolved trades. That sounds modest. For a binary bet at roughly $0.50 average entry, you need a win rate above ~53% just to break even after taker fees. So sitting at 53–55% means the margin is thin — but it's real margin.

v3 on the same asset clocked about 50–51% over a comparable window before I switched. The multi-exchange median signal appears to add a small but measurable edge.

Net PnL

Approximate net PnL over the 60 days: somewhere in the +$8–$12 range on a $30 starting bank after fees. Peak drawdown hit roughly -$9 in a stretch where the chop detector failed to fire in time during a sideways accumulation phase. That drawdown window lasted about 8 days and was genuinely uncomfortable.

What worked

The multi-exchange median signal. Binance alone leads the Chainlink oracle. Binance + Coinbase + Kraken median leads it by slightly more and produces fewer false signals. On days where Binance had a brief spike that didn't confirm on the other two feeds, v4 correctly sat out. v3 would have traded those — and many of them resolved against the Binance direction.

Chop detection. The ATR-based chop detector reduced trade frequency by roughly 30% compared to v3. That sounds like a loss of opportunity, but the trades it suppressed had a win rate around 48% when I back-checked the logs. Skipping them improved overall win rate by about 1.5–2 points.

Hardened resolution logic. resolveViaGamma() polling until outcome >= $0.99 rather than inferring from Binance price at expiry saved me from several phantom wins where Binance was above strike at T+0 but Chainlink settled below it. If you're using Binance price as your proxy for oracle settlement, you will miscount wins.

What surprised me

How narrow the edge is. Even on a good 2-week stretch with a 57% win rate, the per-trade edge after fees is roughly $0.03 on a $3 bet. A three-trade losing streak at that sizing wipes a week of gains. This is not a "set and forget" strategy — it requires monitoring and the discipline to stop when the signal regime changes.

Volume noise at market open. The first 2–3 markets of each UTC day have thin books. I got filled at worse prices than the model expected, which compressed actual returns relative to back-test. The fix was adding a minimum book depth check before entering, which I rolled in around day 40.

Chainlink settlement lag. There's a window of a few seconds to a few minutes between the binary market's nominal expiry and when Gamma marks the outcome. During that window, the position is in limbo. This is expected behavior, but watching $3 in limbo is psychologically harder than the math suggests it should be.

What almost killed the run

Days 22–30 were brutal. I lost 6 of 8 trades in a stretch where BTC price was grinding sideways with 0.3% hourly moves — exactly the environment where a 5-minute scalper has no edge. The chop detector was slower to adapt than I wanted because it uses a trailing ATR window. I tightened the lookback from 14 periods to 10, which improved responsiveness but introduced more noise on trending days.

The tradeoff between fast and slow chop detection is still not fully solved. If ATR lookback is the main knob, the right setting is regime-dependent. I don't have an automatic regime classifier yet.

Bottom line

v4 is better than v3 in the ways I expected: fewer bad trades, slightly higher win rate, more robust signal. It is not a money printer. On a $30 bank it generates beer money in good months and stress in bad ones. The interesting question is whether the edge scales — whether the same win rate holds at $10/trade or $20/trade. I haven't tested that yet.

If you're evaluating poly5m-v4, the honest benchmark is: can you sustain a 53–55% win rate for 200+ trades before trusting the edge is real? My 60-day window is barely enough to be statistically meaningful. Two standard deviations on a 53% win rate over 250 trades puts the confidence interval at roughly 47–59%. You can't tell from 60 days whether you have edge or variance.

I'm continuing to run it. The multi-exchange median architecture is the right direction. The numbers don't lie — they just don't say much yet.