システム最適化と最小２乗回帰の共通点

How is optimizing a trading system analogous to fitting a curve to a set of data points? With the linear equation, y = m*x + b, we plug in a value of x and get out a value y. The value y depends not only on x but on the form of the equation (linear in this case) and the equation’s parameters, m and b. By analogy, with a trading system, we plug in a series of prices (we’ll neglect volume and open interest for the sake of simplicity) and get out a profit or loss. The profit or loss depends not only on the prices but on the system’s rules and parameter values. In this analogy, the system’s rules are analogous to the equation fit to the points, and the system’s parameters are analogous to the equation’s parameters, m and b. The prices we feed into our system are analogous to the value x, and the resulting profit or loss is analogous to y. In other words, our system’s trades are the data points (x, y).

１連のデータポイントに（曲）線を適合させることに似たトレーディングシステムの最適化はどのようなものなのか？

直線の式y = m*x + bのｘに値を代入してｙの値を得た。
ｙの値はｘだけでなく式の形式（このケースでは“線形”である。）と式のパラメータであるｍとｂにも依存している。

類推すると、トレーディングシステムを使って、１連の価格に代入し利益または損失を得る（話を単純にするため出来高と建て玉については無視する）。

この利益または損失は価格にだけでなくシステムのルールとパラメータの数値にも依存する。

この類推では、システムのルールはポイントに適合した式に似ており、システムのパラメータは式のパラメータｍとｂに似ている。
我々のシステムに代入する価格はｘの値に似ており、結果として得る利益または損失はｙに似ている。

言い換えれば、我々のシステムのトレードはデータポイント（ｘ、ｙ）なのである。

The key to understanding how tightly an equation is fit to a set of data points or how tightly a trading system is fit to the price data is the “degrees-of-freedom (dof).” The number of dof is equal to the number of data points minus the number of restrictions or constraints. In linear least squares regression, the number of restrictions is equal to the number of adjustable parameters. For example, the linear equation above has two adjustable parameters, m and b. If we have two or fewer data points, we will have no dof.

いかにきつく式を一連のデータポイントに適合させるかまたはいかにきつくトレーディングシステムをプライスデータに適合させるかを理解するための鍵は「自由度」である。

自由度の数はデータポイントの数から制限の数または制約の数を差し引いたものに等しい。

線形最小２乗回帰では、制限数は調整可能なパラメータの数に等しい。
例えば、上記の直線式は２つの調整可能なパラメータｍとｂをもっている。

もし我々が２または数個のデータポイントをもっているのであれば、我々は自由度をもたない。

This would be the “tightest” fit possible, analogous to over-fitting a trading system. For example, if we have exactly two points, (x1, y1) and (x2, y2), we can determine m and b to exactly fit the data points. If we have more than two data points, we can determine m and b using the least squares method to minimize the deviation between the line and the data points. If we imagine the data points as being culled from a probability distribution of such points, then the more data points we include in our curve-fit, the better our curve-fit equation will represent that distribution. In other words, the more data points we use, the more robust our fit will be. In terms of dof, we want as many as possible.

これは、トレーディングシステムを過剰に適合させるのと似て、「もっともきつく」適合させる可能性となる。

例えるなら、もし正確な２点(x1, y1) と (x2, y2)があれば、この２点に正確に適合したｍとｂの値を我々は決定することができる。

もし３つ以上のデータポイントがあるならば、最小２乗法を線とデータポイント間の偏差を最小化するために用いることでｍとｂの値を決定することが可能である。

もし我々がこれらのデータポイントを確率分布から選択されたものと仮定するならば、我々がカーブフィットにより多くのデータポイントを含めれば含めるほど、よりよく我々のカーブフィットの式はその分布を表現する。

言い換えれば、我々がデータポイントをより多く使うほど、我々の適合はより強固なものとなる。
自由度に関していえば、我々はできるだけ多くデータポイントがほしいのだ。

We’ve already noted that the data points in system optimization are the system’s trades. To make sure our system is not over fit to the market, then, we need to have a sufficient number of trades. By “sufficient” we mean more trades than the number of restrictions, conditions, and rules of our system.

我々はすでにシステム最適化に使われるデータポイントはそのシステムの行うトレードであると気づいている。

我々のシステムが市場に過剰に適合していないことを確認するために、我々は十分なだけのトレード回数を必要とする。

「十分なだけ」とは、我々のシステムに関する制限・条件・我々のシステムのルールの数よりも多くのトレード回数を意味している。

To count the number of restrictions, Thomas Hoffman (1) suggests scanning a trading system’s rules and counting any condition that would change the resulting trades. For example, suppose you have a trading system that buys when today’s close is less than yesterday’s close in an up trend. It defines an up trend as when a shorter moving average is greater than a longer moving average. For simplicity, let’s assume the sell side is the reverse, and there are no stops. It’s a simple stop and reverse system.

制限数を数えるためにトーマス＝ホフマンは１つのトレーディングシステムのルールを詳しく調べることと、トレードの結果を変えるすべての条件を数えることを提案している。

例えば、あなたが
「トレンドが上昇している（上昇トレンド）中で今日の終値が昨日の終値より小さいときに買う」
というトレーディングシステムをもっていると仮定しよう。

上昇トレンドは「短期移動平均線が長期移動平均線より大きいとき」と定義される。

簡素化するため、売りは反転（ドテン）としストップはないものとする。
これは単純なストップと反転（ドテン）システムである。

We would probably count the moving average cross over condition as three restrictions, one for the condition itself, and one for each moving average period. The price pattern would be another restriction for a total of four restrictions for the long side. We would then count four more for the short side. This would give us eight restrictions in total. If we wanted to avoid over fitting this simple system to the market, we should have more than eight trades. With eight or fewer trades, there are no degrees of freedom, and any optimization is likely to result in an over fit system. The next question is: how many more trades than eight would be enough to avoid over fitting?

我々はおそらく移動平均線が交差するという状態を３つの制限として数えるだろう。

１つはその状態そのものであり、残りはそれぞれの移動平均線（短期と長期）の期間である。
プライスパターンはロング（買い）側の制限を合計４つとするためのもう１つの制限である。

我々はさらにショート（売り）側に４つの制限を数える。
これで、合計８つの制限があることになる。

もし我々がこの単純なシステムを市場に対して過剰に適合させないようにしたいのであれば、９回以上のトレードが必要となる。
８もしくはより少ないトレード回数では自由度はなく、そしてどんなに最適化しても過剰に適合したシステムとしての結果をもたらす。

次の質問は
「９回以上といっても何回のトレード回数があれば過剰な適合を避けるのに十分なのか？」
である。

It turns out we can address this question using the same equation I presented last month; namely, the equation for the confidence interval for the average trade:

我々がこの質問に先月私が紹介したものと同じ式を使って取り組めるのは明らかである。
すなわち、平均トレードの信頼区間の式は

CI = t * SD/sqrt(N)

where t is the Student’s t statistic, SD is the standard deviation of the trades, N is the number of trades, and sqrt represents “square root.

ここでｔはスチューデント（人名）のｔ分布であり、ＳＤはトレードの標準偏差、Ｎはトレードの数、そしてｓｑｒは平方根（ルート）を表現している。

平均トレードはT – CI と T + CI　の間にありそうである。
我々の指定した信頼度（または信頼係数）で利益の出せるシステムのために、平均トレードＴが下限T – CIにおいて０以上、すなわちT > CI　であるようにしたい。

The part that I didn’t explain last month involves the number of degrees of freedom. In last month’s newsletter, I glossed over the choice of the t statistic, saying it was dependent on the number of trades and the confidence level. More precisely, t depends on the dof and confidence level. As long as the number of dof is large enough, the analysis I presented last month will work fine (although I incorrectly listed the confidence level for t =2 at 95%; it’s actually 97.5% for a one-tailed test, such as we have here; see below).

先月私は自由度の数に関して説明しなかった。
先月のニュースレターでは、ｔ統計量の選択について、トレードの回数と信頼度に依存するという表現でとどまった。

さらに正確に言えば、ｔは自由度と信頼度に依存する。
自由度の数が十分に多い限りは、私が先月紹介した分析法はよく機能するだろう
（私は自由度のレベルをt =2 at 95%と不正確に表示したが、このように片側検定で実際には97.5%である。
下記を参照）。

So, to see if our trading system is over fit to the market, we calculate the number of dof, look up the t statistic for our chosen confidence level and dof, and calculate the confidence interval as shown above. If the average trade is greater than CI, then we have some confidence that the system has a sufficient number of dof to avoid over fitting. When looking up the t statistic or calculating it with a function, such as the TINV function in Excel, use the one-tailed values since we are only concerned with whether the average trade is greater than zero.

そこで我々のトレーディングシステムが市場に対して過剰に適合しているかどうかを確認するために、自由度の数を計算し、我々の選択した信頼度と自由度からｔ統計量を調べる。
そして上に示したように信頼区間を計算する。

もし平均トレードがＣＩより大きいのであれば、システムが過剰に適合するのをさけるのに十分なだけの自由度の数を我々がもっているという自信を得る。

ｔ統計量を調べるかまたはｔ統計量をエクセルにあるＴＩＮＶ関数のような関数を使って計算するとき、我々は平均トレードが０より大きいかどうかに関心があるだけなので、片側検定の値を使用する。

Here are some t values to illustrate the idea:

ここにアイデアを具体化するためにいくつかのｔの値がある。

（自由度） Confidence Level （信頼度）
dof 　　　　　95% 99%
10 　　　　　 1.81 2.76
20 　　　　　1.73 2.53
60 　　　　　1.67 2.39
120 　　　　　1.66 2.36

As an example, consider the simple system described above, which has eight conditions. Let’s say the average trade is $250 with a standard deviation of $1000. If these numbers are based on a sample of 18 trades, then we have 18 – 8 = 10 dof. At 95% confidence, using the table above, the confidence interval is:

一つの例として上記した簡単なシステム、８つの条件をもったもの、について考える。

平均トレードが$250で標準偏差が$1000だとしよう。
もしこれらの数字が１８回のトレードのサンプルの上に成り立っているとしたら我々は１８－８＝１０個の自由度をもっていることになる。
９５％の信頼度では、上記の表を使うと、信頼区間は

CI = 1.81 * 1000/sqrt(18)
= 427.

So, we cannot say that the system will be profitable in this case, and any optimization — no matter how good it looks — is probably just over fitting the system to the trades. Even with 20 dof (i.e., 28 trades), you would find that the system does not pass this test at 95% confidence. However, if we have 68 trades and therefore 60 dof, we get:

よって、このシステムがこのケースにおいて利益を出すであろうとは言えない、そしていかなる最適化も（いかにそれがよく見えたとしても）おそらくシステムをトレードに過剰に適合させているだけなのである。

２０個の自由度（言い換えるとトレード２８回）でさえ、あなたはこのシステムがこのテストを９５％の信頼度でパスしないとわかるだろう。

しかしながらもし我々が６８回のトレードサンプルをもっているならば自由度は６０個である。このとき信頼区間は、

CI = 1.67 * 1000/sqrt(68)
= 203.

Since this value is less than the average trade of $250, we can have some confidence that if we were to optimize the parameters of this system, we would not over fit the system to the 68 trades in question.

この値は平均トレード$250よりも小さいので、もしこのシステムのパラメータを最適化するならば、このシステムは６８回のトレードに対して過剰に適合はしていないと、この質問に自信をもって答えられる。

For a long term trend following system, 68 trades might span 10 years or more of daily data, depending on the system. Whether the actual minimum number of trades is 68 or 30 or 200 depends on the average trade, the standard deviation of the average trade, and the number of rules and conditions of the system. Note that we’re concerned with the number of trades and not the number of bars of data with this approach.

長期トレンドフォロー型システムでは、６８回のトレードが、システムによっては日足では１０年またはそれより長い期間に及ぶだろう。
実際の最小トレード回数が６８なのか３０または２００であるかどうかは、平均トレードと平均トレードの標準偏差そしてルールの数、システムの条件に依存する。
我々はトレードの回数に関心があるのであって、この解析でのデータのバーの数についてではない、ということには注意する必要がある。

< ul>
As I demonstrated last month, we can re-write the CI equation to tell us how large N needs to be in order to demonstrate profitability:

私が先月証明したように、いかに大きなＮが収益性を証明するために必要であるかということを説明するためにＣＩ式をもう一度ここに記す。

N > (t * SD/T)^2

where the ^2 indicates “square.” This assumes we have a good estimate for the standard deviation and average trade. This differs from the equation I presented last month in that t is explicitly included, rather than approximated. Again, t will depend on the number of dof, which depends on the number of conditions in the system and the number of trades. This means this equation must be solved iteratively, rather than explicitly, because t depends on N. For example, you could start with a small value of N, calculate the number of dof, look up t, calculate the right-hand side of the equation and see if it’s less than N. If not, you increment N and try again. The first value of N that satisfies the equation tells you how large N needs to be.

ここで ^2 は２乗を示している。
この式は我々が標準偏差と平均トレードについてのよい評価法をもっていることを意味している。
これは私が先月説明したｔが近似的にというよりむしろ明確に含まれている式とは違う。

繰り返すが、ｔは自由度の数に依存しシステムの条件の数とトレードの数にも依存する。
これはこの式が明快にというよりむしろ繰り返し解かれるべきであるということを意味している。

なぜならｔがＮに依存するからである。
例えるなら、あなたがＮを小さな値で開始し、自由度の数値を計算し、ｔを調べ、式の右辺を計算し、その値がＮ未満であるかどうかを確認する。

もし違えば、あなたはＮの値を増やし再び同じ作業を繰り返す。
この式を満たすＮの最初の値は、あなたにいかに大きなＮの値が必要であるかを教えてくれる。

As I mentioned last month, the primary concern with this approach is that the accuracy of the confidence intervals is dependent on the distribution of trades remaining the same. In statistics, this is called “stationarity.” If the true average and standard deviation change over time, the confidence intervals will change. As all markets tend to change to some degree over time, this is a concern. However, even this problem can be mitigated to some extent by taking trades over a large period of time covering different market conditions. If this is done, the long term average trade and its standard deviation are more likely to be stable in the future.

先月私が述べたように、この解析法に関する主要なことは、
信頼区間の精度は、トレードの分布が同じままである（変わらない）
ということに依存していることである。

統計ではこれは「定常性」と呼ばれている。

もし真の平均と標準偏差が時の経過によって変化するならば、信頼区間も変わるだろう。
すべての市場が時を経てある方向へと変化する傾向にあるため、これは懸念材料である。

しかしながらこの問題は、異なる市場の状態を含んだ長い期間でトレードを行うことによってある程度緩和される。
もしそうすれば、長い期間の平均トレードとその標準偏差は将来においてより安定であるように思われる。

終わり

Reference
(1) Babcock, Bruce. The Business One Irwin Guide to Trading Systems. Richard D. Irwin, Inc. 1989, p. 89.

この内容は下記のページを作者の許可を得て翻訳紹介しています。http://www.breakoutfutures.com/Newsletters/Newsletter0503.htm