研究指南¶

tradelearn.research 用来组织机器学习策略和指数增强研究里的固定流程。

典型流水线¶

from tradelearn import research

feature_set = research.FeatureSet(
    {
        "alpha": lambda p: p.close.pct_change(20)
        / p.close.pct_change().rolling(20).std(),
        "size": lambda p: p.close,
    },
    target={"future_return": lambda p: p.close.pct_change().shift(-1)},
)
features = feature_set.fit_transform(bars, include_target=True).dropna()
train, test = research.time_split(features, split="2023-09-01", level="timestamp")

pipeline = research.Pipeline([
    research.preprocess.Winsorizer(columns=["alpha"]),
    research.preprocess.Neutralizer(columns=["alpha"], exposures=["size"]),
    research.preprocess.StandardScaler(columns=["alpha"]),
])

train = pipeline.fit_transform(train)
test = pipeline.transform(test)
scores = scorer.predict(test)

allocator = research.portfolio.Allocator.topk_equal(k=50, gross=0.95, max_weight=0.03)
weights = allocator.build(scores)

自定义 Pipeline 工具¶

research.Pipeline 接受 sklearn-like transformer。只要对象提供 fit() / transform() / fit_transform()，就可以放进流水线。约定很简单：

fit(train) 只学习训练集状态，例如分位数、均值、行业暴露系数。
transform(data) 只应用已学习状态，不能偷看测试集未来数据。
返回值保持原来的 index，方便后续按 timestamp / symbol 对齐权重。

import pandas as pd
from tradelearn import research


class RankNormalizer:
    def __init__(self, column: str):
        self.column = column

    def fit(self, data: pd.DataFrame):
        return self

    def transform(self, data: pd.DataFrame) -> pd.DataFrame:
        out = data.copy()
        out[self.column] = out.groupby(level="timestamp")[self.column].rank(pct=True)
        return out

    def fit_transform(self, data: pd.DataFrame) -> pd.DataFrame:
        return self.fit(data).transform(data)


pipeline = research.Pipeline([
    RankNormalizer("alpha"),
    research.preprocess.StandardScaler(columns=["alpha"]),
])

train = pipeline.fit_transform(train)
test = pipeline.transform(test)

如果自定义工具还要写入实验记录，可以在 ResearchRun 里记录步骤参数：

run = research.ResearchRun("jp_us_factor_study")
run.add_step("rank_normalize", category="preprocess", params={"column": "alpha"})

投研语义 vs 实盘语义¶

语义	适合场景	计算位置	回测执行
投研语义	离线因子检验、指数增强、模型评估	策略外提前算好 features / scores / weights	策略读取 `research_result.weights` 下单
实盘语义	paper/live、接近真实交易的逐 bar 推理	策略内用 `history_panel()` 取当前可见窗口	策略当场生成目标权重并下单

投研语义更快、更适合复盘；实盘语义更接近真实交易。

避免训练期进入评估¶

test_bars = research.split_bars(bars, split="2023-09-01")
stats = tl.Backtest(test_bars, Strategy).run(research_result=research_result)