Task 3:VaR 回测与基于 EWMA 波动率预测的 VaR 回测

VaR 回测

  • 在大多数情况下,我们假设日收益率服从正态分布,并且按照题目要求,VaR 取 99% 置信水平。

导入库

1
await __import__("piplite").install('numpy', 'scipy', 'matplotlib', 'pandas', 'tabulate')
1
2
# hyy:fix the import failed issue
await __import__("piplite").install('tabulate')
1
2
# hyy:fix excel issue, miss the excel read library, install here.
await __import__("piplite").install('openpyxl')
1
2
3
4
5
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from tabulate import tabulate
1
2
# set precision
pd.set_option('display.precision', 4)

加载数据

1
2
3
4
5
6
7
8
# hyy: in order to save time, load from excel via pandas is easier.
df = pd.read_excel('./res/Indices_Download_2026.xlsx')
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df_filtered = df.loc['2025-01-01':'2026-01-15'].copy()
sp500 = df_filtered['^GSPC'].dropna()
# take a look at the data is right or not.
sp500.head()
1
2
3
4
5
6
7
Date
2025-01-02    5868.5498
2025-01-03    5942.4702
2025-01-06    5975.3799
2025-01-07    5909.0298
2025-01-08    5918.2500
Name: ^GSPC, dtype: float64

数据处理

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# hyy: caculate the daily return first
d_rtns = np.log(sp500 / sp500.shift(1)).dropna()

# rolling standard deviation
r_std = d_rtns.rolling(window=21).std()
# the 10D VaR

# hyy: just take a test for different cofidence_level and check the result.
# set CONFIDENCE_LEVEL 0.99, FACTOR = 2.33
# but we also test 0.95 and 0.9, FACTOR should be 1.645 and 1.28 in below.
# i take a test, when change to 0.95, bleach number 15, zone is GREEN, when change to 0.9, bleach number 16, zone is GREEN too.
# lower confidence level -> higher quantile -> more breaches
# this seems like if the confidence_level is 0.9,which is smaller than 0.99,it should have more breach point, make sense!
var_10d = -2.33 * r_std * np.sqrt(10)

# forward 10D returns
f_10_ret = np.log(sp500.shift(-10) / sp500)

# Create analysis dataframe
a_df = pd.DataFrame({
    'Close': sp500,
    'Rolling_Std': r_std,
    'VaR_10D': var_10d,
    'Forward_10D': f_10_ret
})

# Remove NaN values
a_df = a_df.dropna()

# check is breaches or not
a_df['Breach'] = (a_df['Forward_10D'] < a_df['VaR_10D']).astype(int)

a_df.head()
1
2
3
4
5
6
7
                Close  Rolling_Std  VaR_10D  Forward_10D  Breach
Date                                                            
2025-02-04  6037.8799       0.0089  -0.0657       0.0174       0
2025-02-05  6061.4800       0.0086  -0.0631       0.0092       0
2025-02-06  6083.5698       0.0085  -0.0628      -0.0116       0
2025-02-07  6025.9902       0.0084  -0.0620      -0.0071       0
2025-02-10  6066.4399       0.0085  -0.0627      -0.0185       0

(a) VaR 突破统计

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Calculate breach statistics
breach_count = a_df['Breach'].sum()
total_comparisons = len(a_df)
breach_percentage = breach_count / total_comparisons * 100
# hyy: just take a test for different cofidence_level and check the result.
# try 0.9 and 0.95 for a quick check
# expected_breaches = total_comparisons * (1 - 0.9)
# expected_breaches = total_comparisons * (1 - 0.95)
expected_breaches = total_comparisons * (1 - 0.99)
print("Total Comparisons:", total_comparisons, "Number of Breaches:", breach_count, "Breach Percentage:", breach_percentage, "%, Expected Breaches:", expected_breaches)
1
Total Comparisons: 229 Number of Breaches: 10 Breach Percentage: 4.366812227074235 %, Expected Breaches: 2.290000000000002

(b) 可视化

图中展示 VaR 突破点(用叉号标记)。

Task 3 图

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Create the plot one by one
# fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)
# fig.suptitle('S&P500 Analysis (2025-01-01 to 2026-01-15)', fontsize=14, fontweight='bold')

# S&P Price
# ax = axes[0]
# ax.plot(a_df.index, a_df['Close'], 'b-', linewidth=1)
# ax.set_ylabel('Price')
# ax.set_title('S&P500 Price')
# ax.legend(loc='upper left')
# ax.grid(True, alpha=0.3)

# make them together seems better.
fig, (apx, ret_val, vol_val) = plt.subplots(3, 1, figsize=(12, 8))

# price
apx.plot(a_df.index, a_df['Close'])
apx.set_title('S&P500')

# return and VaR
breach_dates = a_df[a_df['Breach'] == 1].index
breach_values = a_df.loc[breach_dates, 'Forward_10D']
ret_val.plot(a_df.index, a_df['Forward_10D'], label='Forward 10D')
ret_val.plot(a_df.index, a_df['VaR_10D'], 'r--', label='VaR 99%')
ret_val.scatter(breach_dates, breach_values, c='red', marker='x', s=50)
ret_val.legend()

# volatility
vol_val.plot(a_df.index, a_df['Rolling_Std'] * 100, 'g')
vol_val.set_xlabel('Date')

plt.tight_layout()
plt.show()
1
<Figure size 1200x800 with 3 Axes>

[Notebook 原始输出包含图像,纯 Markdown 导出中省略。]

(c) 交通灯区间

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Calculate binomial percentiles
n_trials = total_comparisons

# hyy: just take a test for different cofidence_level and check the result.
# when i change confidence to 0.95 and 0.9, p_breach also need to change to 0.05 and 0.1 as well
# p_breach = 0.05 
# p_breach = 0.1
p_breach = 0.01

q_pG = stats.binom.ppf(0.95, n_trials, p_breach)
q_pY = stats.binom.ppf(0.9999, n_trials, p_breach)
print(q_pG)
print(q_pY)

# Determine zone
if breach_count <= q_pG:
    zone = "GREEN"
elif breach_count <= q_pY:
    zone = "YELLOW"
else:
    zone = "RED"

# Output results
print("Breaches:", breach_count, "Zone:", zone)
1
2
3
5.0
10.0
Breaches: 10 Zone: YELLOW
1
print(f"Period: 2025-02-04 to 2025-12-31, Total Comparisons: {total_comparisons}, Breach Count: {breach_count}, Breach Rate: {breach_percentage:}%, Traffic Light Zone: {zone}")
1
Period: 2025-02-04 to 2025-12-31, Total Comparisons: 229, Breach Count: 10, Breach Rate: 4.366812227074235%, Traffic Light Zone: YELLOW

Task 4:基于 EWMA 波动率预测的 VaR 回测

Task 4:基于 EWMA 波动率预测的 VaR 回测

  • 99% VaR、10 日期限、正态分布假设。

导入库

1
2
3
4
5
6
await __import__("piplite").install('numpy')
await __import__("piplite").install('scipy')
await __import__("piplite").install('matplotlib')
await __import__("piplite").install('pandas')
await __import__("piplite").install('tabulate')
await __import__("piplite").install('openpyxl')
1
2
3
4
5
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from tabulate import tabulate
1
pd.set_option('display.precision', 4)

加载数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# hyy: Using pandas to load Excel directly
df = pd.read_excel('./res/Indices_Download_2026.xlsx')
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

# filter data
df_filtered = df.loc['2025-01-01':'2026-01-15'].copy()
sp500 = df_filtered['^GSPC'].dropna()
print(f"Data loaded: {len(sp500)} observations from {sp500.index[0]} to {sp500.index[-1]}")
# take a look at the data
sp500.head()
1
2
3
4
5
6
7
8
9
Data loaded: 260 observations from 2025-01-02 00:00:00 to 2026-01-15 00:00:00

Date
2025-01-02    5868.5498
2025-01-03    5942.4702
2025-01-06    5975.3799
2025-01-07    5909.0298
2025-01-08    5918.2500
Name: ^GSPC, dtype: float64

数据预处理

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# daily log returns
d_ret = np.log(sp500 / sp500.shift(1)).dropna()

# Calculate EWMA volatility
init_var = d_ret.var()

ewma_var = np.zeros(len(d_ret))

ewma_var[0] = init_var

for t in range(1, len(d_ret)):
    ewma_var[t] = 0.72 * ewma_var[t-1] + (1 - 0.72) * (d_ret.iloc[t] ** 2)

# Convert to volatility
ewma_vol = np.sqrt(ewma_var)

# Create series
ewma_ser = pd.Series(ewma_vol, index=d_ret.index, name='EWMA_Volatility')

# Calculate 10D VaR
var_10d = -2.33 * ewma_ser * np.sqrt(10)
var_10d.name = 'VaR_10D'

# forward 10D returns
for_rets = np.log(sp500.shift(-10) / sp500)
for_rets.name = 'Forward_10D'

a_df = pd.DataFrame({
    'Close': sp500,
    'EWMA_Volatility': ewma_ser,
    'VaR_10D': var_10d,
    'Forward_10D': for_rets
})

# NaN filter
a_df = a_df.dropna()

# Identify breaches
a_df['Breach'] = (a_df['Forward_10D'] < a_df['VaR_10D']).astype(int)

print(f"Analysis period: {a_df.index[0]} to {a_df.index[-1]}")
print(f"Total observations: {len(a_df)}")
# hyy: check some data to verify
# a_df.head()
1
2
Analysis period: 2025-01-03 00:00:00 to 2025-12-31 00:00:00
Total observations: 249

(a) VaR 突破统计

1
2
3
4
5
6
breach_count = int(a_df['Breach'].sum())
total_comparisons = len(a_df)
breach_percentage = breach_count / total_comparisons * 100
expected_breaches = total_comparisons * (1 - 0.99)

print(f"Total Comparisons: {total_comparisons}, Number of Breaches: {breach_count}, Breach Percentage: {breach_percentage}%, Expected Breaches: {expected_breaches}")
1
Total Comparisons: 249, Number of Breaches: 11, Breach Percentage: 4.417670682730924%, Expected Breaches: 2.490000000000002

交通灯区间

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
n_trials = total_comparisons
p_breach = 0.01

q_pG = stats.binom.ppf(0.95, n_trials, p_breach)
q_pY = stats.binom.ppf(0.9999, n_trials, p_breach)

if breach_count <= q_pG:
    zone = "GREEN"
elif breach_count <= q_pY:
    zone = "YELLOW"
else:
    zone = "RED"

print("Breaches:", breach_count)
print("Zone:", zone)
1
2
Breaches: 11
Zone: RED

(b) 可视化

图中展示 VaR 突破点(用叉号标记)。

Task 4 图

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
fig, axes = plt.subplots(3, 1, figsize=(12, 9), sharex=True)

# ---hyy: price ---
ax = axes[0]
ax.plot(a_df.index, a_df['Close'], label='price')
ax.set_title('S&P500')
ax.legend()
ax.grid(True)

# --- hyy: returns vs var ---
ax = axes[1]
ax.plot(a_df.index, a_df['Forward_10D'], label='fwd ret', alpha=0.7)
ax.plot(a_df.index, a_df['VaR_10D'], '--', label='VaR')
ax.axhline(0)

# breach points
bd = a_df[a_df['Breach'] == 1]
ax.scatter(bd.index, bd['Forward_10D'], marker='x', label='breach')

ax.set_title('returns vs var')
ax.legend()
ax.grid(True)

# --- ewma vol ---
ax = axes[2]
vol = a_df['EWMA_Volatility'] * 100
ax.plot(a_df.index, vol, label='ewma vol')

# hyy: mark breach
ax.scatter(a_df.index, a_df['Breach'] * 2, marker='x', label='breach')

ax.set_title('vol')
ax.legend()
ax.grid(True)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
1
<Figure size 1200x900 with 3 Axes>

[Notebook 原始输出包含图像,纯 Markdown 导出中省略。]

(c) 三种方法对比

结合课程讨论,可总结三种方法的优缺点:

  • 历史模拟法:不依赖分布假设,可体现厚尾;但滚动窗口长度非常关键。
  • 滚动标准差法:估计相对稳定,但对波动率结构变化响应较慢
  • EWMA:对新信息响应更快,但结果依赖衰减参数 λ

从上图可以看到:

  • 共同点:图 2 的突破事件在图 1 所示市场下跌阶段(约 2025 年 3-4 月)明显聚集,说明三种方法都能响应风险上升,但速度不同。
  • 滚动标准差法:Task 3 中实现的 Rolling SD 响应更慢。观察到当极端收益在 3 月下旬逐步移出 21 日窗口后,波动率估计会突然下降,可能产生“风险缓和”的错觉。
  • 历史模拟法 也存在类似滞后:需要极端收益进入并填满窗口后,风险估计才会充分调整,通常需要 2-3 周。
  • EWMA:我的波动率图(Figure 3)显示在 2025 年 4 月压力期,EWMA 波动率可从约 1% 快速升至 5% 以上。尽管仍有正态分布假设,它在压力期风险管理中的适应性最好。Rolling SD 处于中间位置:比 EWMA 慢,又没有历史模拟法“分布无关”的优势。

个人结论:

风险管理方法的选择应取决于具体场景:

  • 对于波动期的实时风险监控,EWMA 因响应快更合适。
  • 对于强调稳定性的监管报表场景,历史模拟法可能更合适。
  • Rolling SD 虽然概念简单,但在上述两类场景中优势都比较有限。