Machine Learning Feature Engineering for Markets
Machine Learning

Machine Learning Feature Engineering for Markets

June 2, 202512 min readby QuantArtisan
feature engineeringmachine learningsignal processing

Machine Learning Feature Engineering for Markets

The quality of your features determines the ceiling of your model's performance. No amount of architectural sophistication can compensate for poor features. This is the practitioner's guide to building predictive features for ML trading models.

The Feature Engineering Mindset

Every feature should encode a hypothesis about market behavior. Don't add features because they're available — add them because you have a reason to believe they contain predictive information. Features without economic rationale are noise that will hurt your model's out-of-sample performance.

Price-Based Features

Returns at multiple horizons: 1-day, 5-day, 21-day, 63-day, 252-day returns capture momentum at different timescales.

Volatility features: Realized volatility (rolling standard deviation of returns), Parkinson volatility (uses high-low range), and GARCH-estimated conditional volatility.

Technical indicators: RSI, MACD, Bollinger Band position, ATR. Normalize these to be stationary before feeding to ML models.

python
1def build_price_features(prices: pd.DataFrame) -> pd.DataFrame:
2    features = pd.DataFrame(index=prices.index)
3    
4    # Multi-horizon returns
5    for horizon in [1, 5, 21, 63, 252]:
6        features[f'ret_{horizon}d'] = prices['close'].pct_change(horizon)
7    
8    # Volatility
9    features['vol_21d'] = prices['close'].pct_change().rolling(21).std()
10    features['vol_ratio'] = (
11        prices['close'].pct_change().rolling(5).std() /
12        prices['close'].pct_change().rolling(21).std()
13    )
14    
15    # RSI
16    delta = prices['close'].diff()
17    gain = delta.clip(lower=0).rolling(14).mean()
18    loss = (-delta.clip(upper=0)).rolling(14).mean()
19    features['rsi_14'] = 100 - (100 / (1 + gain / loss))
20    
21    return features.dropna()

The Stationarity Requirement

Most ML models assume features are stationary (constant mean and variance over time). Raw prices are non-stationary. Always transform prices to returns, log-returns, or z-scored values before feeding them to ML models. Failure to do this is one of the most common sources of look-ahead bias in ML trading research.

Feature Importance and Selection

After training, use feature importance (for tree models) or permutation importance (model-agnostic) to identify which features are actually contributing to predictions. Prune features with near-zero importance — they add noise and slow inference.

Applied Ideas

The frameworks discussed above translate directly into deployable trading logic. Here are concrete next steps for practitioners:

  • Backtest first: Validate any signal-generation or risk-management approach with walk-forward analysis before committing capital.
  • Start small: Deploy with fractional position sizing and paper-trade for at least one full market cycle.
  • Monitor regime shifts: Set automated alerts for when your model detects a regime change — manual review before large rebalances is prudent.
  • Iterate on KPIs: Track Sharpe, Sortino, max drawdown, and win rate weekly. If any metric degrades beyond your predefined threshold, pause and re-evaluate.
  • Combine signals: The strongest edges come from combining uncorrelated signals — pair the ideas in this post with your existing alpha sources.

Found this useful? Share it with your network.