
中国A股的Group LASSO非参数样条估计多因子选股策略研究
Study on the Application of Group LASSO and Non-Parametric Estimation Approaches in Multi-Factor Models for Stock Selection: Evidence From the Chinese A-share Market
有效定价因子的筛选是多因子量化选股策略的关键.本文采用Group LASSO算法与非参数样条估计相结合的方法筛选中国A股市场的有效因子,结果得到了个别与其他方法一样的因子(如移动平均成交量),但筛选出很多独特因子(如流动比率、市盈率、去趋势换手率、营业利润增长率).进一步地,基于得到的有效因子构建的投资组合在样本外也有相对更高的超额收益率、更低的收益波动率以及更高的夏普比率.与相关研究的美股因子比较发现,两个市场的因子存在较大差异,美股中的各类动量因子、收益率波动率等并不是A股的有效因子,而A股的市盈率、流动比率也不是美股的有效因子.
Characteristics selection approach is quite critical to multi-factor model for stock selection. Based on data from the Chinese A-share stock market, this paper uses Group LASSO to select characteristics and to nonparametrically estimate the effect of selected factors on future returns. We find that, although a few selected factors are the same as traditional characteristics selection models, many (e.g., current ratio, de-trended turnover, price-to-earnings ratio) are unique. In addition, we use the selected characteristics to predict 1-month-ahead returns, and construct a portfolio going long 20 stocks with the highest predicted returns. We show that, compared with portfolios generated by traditional models, portfolios based on Group LASSO and non-parametric estimation approaches perform better, with higher abnormal return and greater Sharp ratio. Furthermore, selected factors using Group LASSO and non-parametric estimation approaches are quite different between the Chinese and US stock markets. For example, momentum (or reversal), and volatility which are selected factors in the US stock market are not related to future stock return in the Chinese stock market; price-to-earnings ratio and current ratio, which are selected characteristics in the Chinese stock market, are not significant in the US stock market.
LASSO / 非参数估计 / 多因子 / 选股策略 / 中国 {{custom_keyword}} /
LASSO / non-parametric estimation / multi-factors model / stock selection / China {{custom_keyword}} /
表1 候选因子及其定义 |
变量名称 | 含义 | 描述 |
Panael A: 成长类因子 | ||
NPGR | 净利润增长率 | NPGR = (今年净利润(TTM)/去年净利润(TTM)) - 1 |
EGRO | 近5年利润增长率 | EGRO =近5年的年净利润关于时间(年)的线性回归系数/近5年平均净利润的绝对值 |
NAGR | 净资产增长率 | NAGR = (今年股东权益/去年股东权益) - 1 |
OPGR | 营业利润增长率 | OPGR = (今年营业利润(TTM)/去年营业利润(TTM)) - 1 |
Panel B: 价值类因子 | ||
ETOP | 盈利市值比 | ETOP净利润(TTM)/总市值 |
PB | 市净率 | PB =总市值/归属于母公司所有者权益合计 |
PE | 市盈率 | PE =总市值/归属于母公司所有者的净利润(TTM) |
LFLO | 对数流通市值 | LFLO =流通市值的对数 |
Panel C: 财务质量类因子 | ||
APTR | 应付账款周转率 | APTR =营业成本(TTM)/应付账款+应付票据+预付款项 |
CR | 流动比率 | CR =流动资产合计/流动负债合计 |
BtoToR | 息税前利润与营业总收入之比 | BtoToR = (利润总额+利息支出利息收入)/营业总收入.如果没有利息支出, 用财务费用代替, 以上科目使用的都是TTM的数值. |
EqToAs | 股东权益比率 | EqToAs =股东权益/总资产 |
InvTRate | 存货周转率 | InvTRate =营业成本(TTM)/存货 |
MLEV | 市场杠杆 | MLEV =非流动负债合计/(非流动负债合计+总市值) |
NPR | 销售净利率 | NPR =净利润(TTM)/营业收入(TTM) |
OCToCL | 现金流动负债比 | OCToCL =经营活动产生的现金流量净额(TTM)/流动负债合计 |
Panel C: 财务质量类因子 | ||
GroInRa | 销售毛利率 | GroInRa = (营业收入(TTM) -营业成本(TTM))/营业收入(TTM) |
BLEV | 账面杠杆 | BLEV =非流动负债合计/股东权益 |
REVS10 | 固定资产比率 | REVS10 = (固定资产+工程物资+在建工程)/总资产 |
ROA | 资产回报率 | ROA =净利润(TTM)/总资产 |
ROE | 权益回报率 | ROE =净利润(TTM)/股东权益 |
DilEPS | 稀释每股收益 | 假设企业所有发行在外的稀释性潜在普通股均已转换为普通股, 由此调整后的每股收益 |
EPS | 基本每股收益 | EPS=归属于普通股股东的当期净利润/当期实际发行在外普通股的加权平均数 |
Panel D: 情绪类因子 | ||
DTO10 | 换手变动率 | 10日平均换手率与120日平均换手率(turnover rate)之比 |
TO10 | 10日平均换手率 | TO1010 =日平均成交量/流通总股数× 100% |
VolMA10 | 成交量的10日移动平均 | 10日平均成交量 |
Panel E: 动量类因子 | ||
REVS10 | 股票的10日收益 | 个股的前1个交易日的累计收益率 |
High52Week | 价格位置 | High52Week = (当前股价年内最低股价)/(年内最高股价年内最低股价) |
Panel F: 其他因子 | ||
HBETA | 历史贝塔 | 利用前12个月数据估计CAPM模型的β系数 |
Skew | 股价偏度 | 过去20个交易日股价的偏度 |
表2 变量描述性统计 |
均值 | 标准差 | 峰度 | 偏度 | 与 | |
APTR | 508.089 | 979.620 | 0.010 | ||
BLEV | 1.487 | 1.273 | 1153.054 | 27.117 | |
CR | 1.989 | 2.206 | 30.636 | 3.359 | 0.014 |
DTO10 | 0.011 | 21.567 | 1.885 | ||
DilEPS | 0.392 | 0.628 | 81.900 | 6.757 | |
EBtoToR | 0.164 | 0.167 | 7.620 | 1.512 | |
EGRO | 0.332 | 8.438 | 511.975 | 18.614 | 0.003 |
EPS | 0.666 | 1.004 | 65.737 | 6.493 | |
ETOP | 0.044 | 0.049 | 6.220 | ||
EqToAs | 0.500 | 0.202 | 0.259 | 0.028 | |
High52Week | 0.477 | 0.311 | 0.088 | ||
REVS10 | 0.305 | 0.223 | 0.820 | ||
GroInRa | 0.282 | 0.240 | 376.412 | ||
HBETA | 0.976 | 0.261 | 0.028 | 0.013 | |
InvTRate | 12.738 | 32.539 | 74.868 | 7.404 | |
LFLO | 23.227 | 1.300 | 0.425 | 0.067 | |
MLEV | 1.227 | 0.488 | 32.786 | 4.940 | |
NAGR | 0.960 | 30.929 | 2459.325 | 49.546 | |
NPGR | 0.181 | 12.479 | 899.148 | 0.005 | |
NPR | 0.139 | 0.980 | 2997.973 | 53.640 | |
OCToCL | 0.303 | 0.514 | 21.297 | 3.290 | 0.008 |
OPGR | 0.610 | 13.428 | 1344.328 | 31.021 | |
PB | 4.445 | 4.421 | 237.198 | 9.740 | |
PE | 101.429 | 2275.018 | 1789.320 | 38.826 | 0.002 |
REVS10 | 0.994 | 0.081 | 5.921 | 1.039 | |
ROA | 0.077 | 0.078 | 4.922 | 1.280 | |
ROE | 0.150 | 0.148 | 10.439 | 0.811 | |
Skew | 0.598 | 0.802 | 0.142 | ||
VolMA10 | 34.471 | 4.604 | |||
TO10 | 0.016 | 0.018 | 20.289 | 3.382 | 0.024 |
0.007 | 0.130 | 4.168 | 0.216 |
表3 "Group LASSO +二次样条"多因子筛选结果 |
(1) | (2) | (3) | |
0.20 | 0.18 | 0.16 | |
有效因子数量 | 2 | 5 | 10 |
入选的有效因子 | CR | CR | CR |
VolMA10 | VolMA10 | VolMA10 | |
DTO10 | DTO10 | ||
OPGR | OPGR | ||
TO10 | TO10 | ||
EGRO | |||
ETOP | |||
OCToCL | |||
PE | |||
ROE |
表4 "LASSO+线性回归"多因子筛选结果 |
(1) | (2) | (3) | |
0.090 | 0.070 | 0.0345 | |
有效因子数量 | 2 | 5 | 10 |
EGRO | |||
ROA | |||
REVS10 | |||
ROE | |||
TO10 | |||
EPS | |||
FixAsRa | |||
NPGR | |||
Skew | |||
VolMA10 | 0.411 |
表5 "线性逐步回归"多因子筛选结果 |
(1) | (2) | (3) | |
0.05 | 0.17 | 0.3 | |
有效因子数量 | 2 | 5 | 10 |
EGRO | |||
REVS10 | |||
NPGR | |||
VolMA10 | 1.034 | 0.897 | |
TO10 | |||
BLEV | |||
REVS10 | |||
INVTRATE | 0.562 | ||
MLEV | 1.162 | ||
ROA |
注: |
表6 三种多因子选股策略市场表现的测试结果 |
效因子数量 | 年化收益率 | 收益波动率 | 夏普比率 | 最大回撤 | 年化换手率 |
Panel A: "Group LASSO+二次样条"选股策略 | |||||
2 | 24.80% | 25.70% | 0.83 | 41.60% | 15.88 |
5 | 27.70% | 24.20% | 1.00 | 41.50% | 14.17 |
10 | 30.60% | 26.90% | 1.00 | 44.40% | 11.25 |
Panel B: "LASSO +线性回归"选股策略 | |||||
2 | 22.20% | 28.80% | 0.66 | 49.80% | 4.71 |
5 | 21.10% | 27.80% | 0.63 | 47.20% | 7.93 |
10 | 23.70% | 26.80% | 0.75 | 42.60% | 10.00 |
Panel C: "线性逐步回归"选股策略 | |||||
2 | 22.80% | 28.70% | 0.67 | 47.80% | 15.13 |
5 | 22.80% | 27.90% | 0.69 | 48.20% | 10.92 |
10 | 24.30% | 28.20% | 0.74 | 46.50% | 8.95 |
表7 "Group LASSO +二次样条"选股策略滚动测试结果 |
平均有效因子数 | 年化收益率 | 收益率波动 | 夏普比率 | 最大回撤 | 年化换手率 | |
Panel A: 训练集长度= 48个月 | ||||||
0.15 | 11.2 | 39.20% | 26.00% | 1.37 | 40.40% | 12.98 |
0.17 | 6.7 | 33.10% | 26.70% | 1.11 | 40.20% | 12.83 |
Panel B: 训练集长度= 36个月 | ||||||
0.15 | 14.6 | 41.80% | 25.90% | 1.48 | 39.50% | 13.57 |
0.17 | 10.9 | 43.80% | 29.70% | 1.36 | 43.40% | 12.64 |
Panel C: 训练集长度= 24个月 | ||||||
0.15 | 19.0 | 41.00% | 27.70% | 1.35 | 40.70% | 14.08 |
0.17 | 15.7 | 40.00% | 27.70% | 1.32 | 42.00% | 14.34 |
迟国泰, 章彤, 张志鹏, 基于非平衡数据处理的上市公司ST预警混合模型[J]. 管理评论, 2020, 32 (3): 3- 20.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
邓长荣, 马永开, 我国证券市场行业收益三因素模型的实证研究[J]. 系统工程理论方法应用, 2005, 14 (3): 226- 230.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
范龙振, 余世典, 中国股票市场的三因子模型[J]. 系统工程学报, 2002, 17 (6): 537- 546.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
干伟明, 张涤新, 基于价值投资的多因子定价模型在中国资本市场的实证研究[J]. 经济经纬, 2018, 35 (4): 136- 140.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
蒋翠侠, 刘玉叶, 许启发, 基于LASSO分位数回归的对冲基金投资策略研究[J]. 管理科学学报, 2016, 19 (3): 107- 126.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
刘睿智, 杜溦, 基于LASSO变量选择方法的投资组合及实证分析[J]. 经济问题, 2012, (9): 103- 107.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
秦磊, 谢邦昌, Logistic回归的ArctanLASSO惩罚似然估计及应用[J]. 数量经济技术经济研究, 2015, 32 (6): 135- 146.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
孙燕, 随机效应Logit计量模型的自适应LASSO变量选择方法研究——基于Gauss-Hermite积分的EM算法[J]. 数量经济技术经济研究, 2012, 29 (12): 147- 157.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
田利辉, 王冠英, 张伟, 三因素模型定价: 中国与美国有何不同?[J]. 国际金融研究, 2014, (7): 37- 45.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
王江涛, 周勇, 高频数据波动率非参数估计及窗宽选择[J]. 系统工程理论与实践, 2018, 38 (10): 2491- 2500.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
王淑燕, 曹正凤, 陈铭芷, 随机森林在量化选股中的应用研究[J]. 运筹与管理, 2016, 25 (3): 163- 168.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
杨炘, 陈展辉, 中国股市三因子资产定价模型实证研究[J]. 数量经济技术经济研究, 2003, (12): 137- 141.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
姚海祥, 李仲飞, 基于非参数估计框架的期望效用最大化最优投资组合[J]. 中国管理科学, 2014, 22 (1): 1- 9.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
赵胜民, 闫红蕾, 张凯, Fama-French五因子模型比三因子模型更胜一筹吗?——来自中国A股市场的经验证据[J]. 南开经济研究, 2016, (2): 41- 59.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
Asness C S, Frazzini A, Pedersen L H, (2017). Quality Minus Junk[R/OL]. SSRN, https://ssrn.com/ab-stract=2312432.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
Huang J J, Shi Z, (2010). Determinants of Bond Risk Premia: A Machine-Learning-Based Resolution of the Spanning Controversy[C]//AFA 2011 Denver Meetings Paper, https://ssrn.com/abstract=1573186.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |