
高频数据环境下我国股票市场的波动率预测——基于机器学习和HAR模型的融合研究
The Volatility Forecasting of Chinese Stock Market under High-frequency Data Environment: Fusion Research Based on Machine Learning and HAR Model
股票市场波动率的准确预测对于投资者预判股市走势、优化资产配置和规避风险以及监管机构预警风险和稳定市场秩序等都具有重要的理论和现实意义.本文在基于高频数据的HAR模型基础上, 融合机器学习中的Lasso和随机森林方法进行模型特征选择, 采用神经网络方法刻画变量间的非线性特征, 构建了几类崭新的已实现波动率模型, 并实证评价和比较各类模型对股票市场已实现波动率的预测性能.实证结果表明: 跳跃成分的引入可以提高股票市场已实现波动率的样本外预测精度; 基于Lasso和随机森林进行特征选择的HAR扩展模型的样本外预测性能明显优于传统的HAR模型和GARCH类模型; 采用神经网络方法刻画波动率的非线性特征能进一步提高模型的样本外预测精度; 在所有考察的预测模型中, Lasso-NN-J模型的样本内和样本外预测性能最佳, 并且在不同的预测滚动窗口宽度、不同的个股高频数据以及随机抽样模拟检验下, 该模型的样本外预测性能相当稳健.
The accurate forecasting of stock market volatility is of great theoretical and practical significance for investors to predict stock market trend, optimize asset allocation and avoid risks, and for regulators to warn risks and stabilize market order. In this paper, on the basis of HAR model based on high-frequency trading data, Lasso and random forest method in machine learning are combined to conduct model feature selection, and the nonlinear characteristics among variables are depicted by neural network method, so as to construct several new realized volatility models based on machine learning. Then, the performance of various models in forecasting the realized volatility of Shanghai stock index is evaluated and compared. The empirical results show that, the introduction of the jump component can improve the out-of-sample forecasting accuracy of realized volatility in the stock market. The HAR extended models based on Lasso and random forest for feature selection have significantly better out-of-sample prediction performance than the traditional HAR models and GARCH models. Using the neural network method to describe the nonlinear characteristics of volatility can further improve the out-of-sample prediction accuracy of the model. The Lasso-NN-J model has the best in-sample and out-sample prediction performance among all the investigated forecasting models, and the prediction performance of the model is quite robust under the simulation tests of different rolling window widths, different high-frequency data of individual stocks and random sampling.
特征选择 / HAR模型 / 已实现波动率 / 神经网络 {{custom_keyword}} /
feature selection / HAR model / realized volatility / neural network {{custom_keyword}} /
表1 上证指数RV及其Jump成分的描述性统计量 |
均值 | 标准差 | 最小值 | 最大值 | 偏度 | 峰度 | |
0.8325 | 0.4574 | 0.2317 | 4.7522 | 2.4918 | 10.9068 | |
0.0063 | 0.0246 | 0.0000 | 0.7225 | 16.3752 | 396.9947 |
注: RV表示当期已实现波动率; Jump表示当期跳跃, 0表示不存在跳跃. |
表2 样本内预测精度评价结果 |
模型 | 特征数 | MSE | MAE | EV | |
HAR | 3 | 0.0634 | 0.1503 | 0.5651 | 0.5651 |
HAR-J | 6 | 0.0620 | 0.1498 | 0.5786 | 0.5786 |
Lasso-HAR | 6 | 0.0619 | 0.1498 | 0.5601 | 0.5601 |
Lasso-HAR-J | 6 | 0.0619 | 0.1498 | 0.5601 | 0.5601 |
Lasso-NN | 6 | 0.0583 | 0.1467 | 0.4057 | 0.4062 |
Lasso-NN-J | 6 | 0.0582 | 0.1460 | 0.3918 | 0.3941 |
RF-HAR | 6 | 0.0638 | 0.1516 | 0.5609 | 0.5609 |
RF-HAR-J | 6 | 0.0639 | 0.1518 | 0.5603 | 0.5603 |
RF-NN | 6 | 0.0683 | 0.1561 | 0.4095 | 0.4099 |
RF-NN-J | 6 | 0.0684 | 0.1523 | 0.3923 | 0.3939 |
注: 加粗数据表示所有预测模型中评价指标值最小者. |
表3 样本外预测精度评价结果 |
模型 | MSE | MAE | EV | |
HAR | 0.1031 | 0.1781 | 0.5928 | 0.5929 |
HAR-J | 0.1032 | 0.1802 | 0.5988 | 0.5993 |
Lasso-HAR | 0.1012 | 0.1766 | 0.6018 | 0.6019 |
Lasso-HAR-J | 0.1012 | 0.1766 | 0.6018 | 0.6019 |
Lasso-NN | 0.0950 | 0.1698 | 0.4444 | 0.4448 |
Lasso-NN-J | 0.0935 | 0.1616 | 0.4398 | 0.4403 |
RF-HAR | 0.1016 | 0.1774 | 0.5927 | 0.5923 |
RF-HAR-J | 0.1022 | 0.1777 | 0.5905 | 0.5907 |
RF-NN | 0.0934 | 0.1650 | 0.4575 | 0.4575 |
RF-NN-J | 0.0936 | 0.1609 | 0.4404 | 0.4420 |
HARQ-RV-SJ | 0.0956 | 0.1724 | 0.4572 | 0.4896 |
HARQF-RV-CJ | 0.0960 | 0.1736 | 0.4605 | 0.4899 |
TVS-HAR | 0.0938 | 0.1619 | 0.4401 | 0.4439 |
GARCH | 0.1048 | 0.1885 | 0.6132 | 0.6005 |
TARCH | 0.1033 | 0.1814 | 0.6105 | 0.6004 |
GJR | 0.1032 | 0.1811 | 0.6106 | 0.6000 |
EGARCH | 0.1033 | 0.1817 | 0.6108 | 0.6007 |
注: 加粗数据表示所有预测模型中评价指标值最小者. |
表4 MCS检验结果 |
模型 | MSE | MAE |
HAR | 0.003 | 0.001 |
HAR-J | 0.002 | 0.000 |
Lasso-HAR | 0.003 | 0.001 |
Lasso-HAR-J | 0.000 | 0.000 |
Lasso-NN | 0.002 | 0.001 |
Lasso-NN-J | 1.000** | 1.000** |
RF-HAR | 0.002 | 0.000 |
RF-HAR-J | 0.002 | 0.000 |
RF-NN | 0.265** | 0.240* |
RF-NN-J | 0.860** | 0.851** |
HARQ-RV-SJ | 0.228* | 0.219* |
HARQF-RV-CJ | 0.228* | 0.211* |
TVS-HAR | 0.711** | 0.718** |
GARCH | 0.001 | 0.000 |
TARCH | 0.002 | 0.000 |
GJR | 0.000 | 0.000 |
EGARCH | 0.000 | 0.000 |
注: 表中的数值表示MCS检验的 |
表5 MCS检验结果(预测滚动窗口为500和1500) |
预测滚动窗口为500 | 预测滚动窗口为1500 | ||||
模型 | MSE | MAE | MSE | MAE | |
HAR | 0.000 | 0.001 | 0.000 | 0.006 | |
HAR-J | 0.001 | 0.000 | 0.004 | 0.000 | |
Lasso-HAR | 0.005 | 0.001 | 0.000 | 0.000 | |
Lasso-HAR-J | 0.000 | 0.000 | 0.000 | 0.000 | |
Lasso-NN | 0.000 | 0.002 | 0.002 | 0.002 | |
Lasso-NN-J | 0.897** | 0.901** | 1.000** | 1.000** | |
RF-HAR | 0.000 | 0.000 | 0.004 | 0.000 | |
RF-HAR-J | 0.000 | 0.000 | 0.003 | 0.001 | |
RF-NN | 0.314** | 0.318** | 0.289** | 0.280** | |
RF-NN-J | 0.749** | 0.718** | 0.904** | 0.896** | |
HARQ-RV-SJ | 0.305** | 0.302** | 0.253** | 0.220* | |
HARQF-RV-CJ | 0.305** | 0.301** | 0.252** | 0.221* | |
TVS-HAR | 0.647** | 0.652** | 0.711** | 0.718** | |
GARCH | 0.000 | 0.000 | 0.003 | 0.000 | |
TARCH | 0.001 | 0.000 | 0.000 | 0.001 | |
GJR | 0.000 | 0.000 | 0.000 | 0.000 | |
EGARCH | 0.001 | 0.000 | 0.000 | 0.001 |
注: 表中的数值表示MCS检验的 |
表6 MCS检验结果(个股样本) |
PFYH | SHJC | MSYH | CPCC | ZXZQ | ZSYH | BLDC | ZGLT | SQJT | FXYY | 排名均值 | |
HAR | 0.000 | 0.000 | 0.000 | 0.006 | 0.000 | 0.001 | 0.001 | 0.001 | 0.000 | 0.006 | 12.6 |
HAR-J | 0.002 | 0.000 | 0.002 | 0.000 | 0.001 | 0.000 | 0.007 | 0.000 | 0.002 | 0.000 | 12.3 |
Lasso-HAR | 0.008 | 0.003 | 0.000 | 0.000 | 0.005 | 0.001 | 0.003 | 0.001 | 0.000 | 0.000 | 10.4 |
Lasso-HAR-J | 0.001 | 0.004 | 0.002 | 0.000 | 0.002 | 0.010 | 0.008 | 0.000 | 0.004 | 0.000 | 7.8 |
Lasso-NN | 0.000 | 0.002 | 0.002 | 0.002 | 0.000 | 0.004 | 0.002 | 0.001 | 0.002 | 0.001 | 8.5 |
Lasso-NN-J | 0.882** | 0.912** | 1.000** | 0.940** | 0.992** | 0.835** | 1.000** | 0.794** | 1.000** | 0.922** | 1.1 |
RF-HAR | 0.001 | 0.002 | 0.004 | 0.002 | 0.000 | 0.000 | 0.002 | 0.000 | 0.009 | 0.004 | 10.6 |
RF-HAR-J | 0.002 | 0.001 | 0.003 | 0.008 | 0.003 | 0.002 | 0.008 | 0.002 | 0.001 | 0.000 | 9.4 |
RF-NN | 0.414** | 0.354** | 0.217* | 0.272** | 0.300** | 0.353** | 0.286** | 0.251** | 0.301** | 0.296** | 4.6 |
RF-NN-J | 0.701** | 0.676** | 0.822** | 0.830** | 0.765** | 0.704** | 0.765** | 0.702** | 0.883** | 0.853** | 2.6 |
HARQ-RV-SJ | 0.312** | 0.384** | 0.238* | 0.231* | 0.274** | 0.314** | 0.324** | 0.316** | 0.230** | 0.249* | 5.2 |
HARQF-RV-CJ | 0.321** | 0.275** | 0.260** | 0.272** | 0.354** | 0.311** | 0.276** | 0.240* | 0.241** | 0.206* | 5.2 |
TVS-HAR | 0.710** | 0.677** | 0.711** | 0.725** | 0.848** | 0.791** | 0.812** | 0.824** | 0.760** | 0.698** | 2.4 |
GARCH | 0.000 | 0.001 | 0.003 | 0.001 | 0.000 | 0.000 | 0.001 | 0.000 | 0.001 | 0.000 | 16.8 |
TARCH | 0.001 | 0.002 | 0.000 | 0.001 | 0.002 | 0.000 | 0.001 | 0.000 | 0.000 | 0.001 | 16.2 |
GJR | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 14.7 |
EGARCH | 0.002 | 0.001 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.003 | 14.3 |
注: 表中的数值表示MCS检验的 |
表7 随机抽样模拟检验结果 |
MSE | MAE | EV | ||
HAR | 103 | 154 | 208 | 246 |
(0.000) | (0.000) | (0.001) | (0.000) | |
HAR-J | 335 | 392 | 371 | 287 |
(0.001) | (0.001) | (0.001) | (0.001) | |
Lasso-HAR | 796 | 874 | 857 | 904 |
(0.001) | (0.001) | (0.001) | (0.002) | |
Lasso-HAR-J | 1359 | 1281 | 1302 | 1115 |
(0.003) | (0.008) | (0.005) | (0.005) | |
Lasso-NN | 1024 | 1105 | 986 | 929 |
(0.001) | (0.002) | (0.002) | (0.002) | |
Lasso-NN-J | 4592 | 4124 | 4035 | 4033 |
(0.941) | (0.934) | (0.901) | (0.900) | |
RF-HAR | 788 | 892 | 905 | 893 |
(0.001) | (0.001) | (0.001) | (0.002) | |
RF-HAR-J | 957 | 1032 | 1237 | 1189 |
(0.003) | (0.002) | (0.003) | (0.001) | |
RF-NN | 2411 | 2856 | 2903 | 2844 |
(0.485) | (0.405) | (0.358) | (0.372) | |
RF-NN-J | 3281 | 3397 | 2679 | 3048 |
(0.794) | (0.816) | (0.727) | (0.715) | |
HARQ-RV-SJ | 2015 | 2189 | 2614 | 2149 |
(0.385) | (0.326) | (0.318) | (0.283) | |
HARQF-RV-CJ | 2024 | 2207 | 2960 | 2367 |
(0.396) | (0.310) | (0.289) | (0.307) | |
TVS-HAR | 3495 | 3102 | 3369 | 3206 |
(0.873) | (0.815) | (0.796) | (0.810) | |
GARCH | 0 | 0 | 0 | 0 |
(0.000) | (0.000) | (0.000) | (0.000) | |
TARCH | 0 | 0 | 0 | 0 |
(0.000) | (0.000) | (0.000) | (0.000) | |
GJR | 10 | 0 | 0 | 21 |
(0.000) | (0.000) | (0.000) | (0.000) | |
EGARCH | 0 | 0 | 0 | 0 |
(0.000) | (0.000) | (0.000) | (0.000) |
陈声利, 李一军, 关涛, 基于四次幂差修正HAR模型的股指期货波动率预测[J]. 中国管理科学, 2018, 26 (1): 57- 71.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
龚旭, 曹杰, 文凤华, 杨晓光, 基于杠杆效应和结构突变的HAR族模型及其对股市波动率的预测研究[J]. 系统工程理论与实践, 2020, 40 (5): 1113- 1133.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
龚旭, 文凤华, 黄创霞, 杨晓光, HAR-RV-EMD-J模型及其对金融资产波动率的预测研究[J]. 管理评论, 2017, 29 (1): 19- 32.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
罗嘉雯, 陈浪南, 基于TVS-MHAR模型金融市场高频多元波动率的预测[J]. 系统工程理论与实践, 2018, 38 (7): 1677- 1689.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
姜富伟, 胡逸驰, 黄楠, 央行货币政策报告文本信息、宏观经济与股票市场[J]. 金融研究, 2021, (6): 95- 115.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
姜富伟, 孟令超, 唐国豪, 媒体文本情绪与股票回报预测[J]. 经济学季刊, 2021, 21 (4): 1323- 1344.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
马锋, 魏宇, 黄登仕, 基于符号收益和跳跃变差的高频波动率模型[J]. 管理科学学报, 2017, 20 (10): 31- 43.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
闵峰, 文凤华, 吴楠, 货币政策和财政政策对中国消费和投资的有效性评估[J]. 计量经济学报, 2021, 1 (1): 94- 113.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
瞿慧, 沈微, 基于LSTHAR模型的投资者关注对股市波动影响研究[J]. 中国管理科学, 2020, 28 (7): 23- 34.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
田凤平, 杨科, 基于TVS-HAR模型的农产品期货市场已实现波动率的预测研究[J]. 系统工程理论与实践, 2016, 36 (12): 3003- 3016.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
魏宇, 沪深300股指期货的波动率预测模型研究[J]. 管理科学学报, 2010, 13 (2): 66- 76.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
于孝建, 王秀花, 基于混频已实现GARCH模型的波动预测与VaR度量[J]. 统计研究, 2018, 35 (1): 104- 116.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
Kock A B, (2012). On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions[R]. CREATES Research Papers 5, Aarhus University.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |