
Is Alternative Data Useful in China Share Market Investment?—Empirical Study Based on Financial Short Video, Image and Text Data
Yong HE, Qiqi LI, Li JIAO, Wenxuan HUANG
China Journal of Econometrics ›› 2023, Vol. 3 ›› Issue (4) : 1008-1031.
Is Alternative Data Useful in China Share Market Investment?—Empirical Study Based on Financial Short Video, Image and Text Data
Currently, the application of alternative data provides a new perspective for scholars and practitioners in the field of financial investment. This paper builds an analysis platform based on the FarmPredict (factor-augmented regularized model for prediction) framework and deep neural network model, realizing the task of learning trading signals from alternative data such as financial short videos and financial news thereby constructing trading strategies for the China share market. Firstly, match the captured financial news with their corresponding stock code and decompose it into text data and image data. Secondly, the text data is input into the FarmPredict learning framework. We construct and screen the text bag of words by which the phrases are decomposed into common factors and specific factors, and then calculate the score of the news text by the factor regression; We then input the image data into the image recognition deep neural network Google Inception v3 model framework built by the transfer learning technique, thereby outputting the probability that the image represents positive/negative emotions and the image sentiment index and image score. For the captured financial short video, it contains two steps. The first step is to strip the audio data and convert it to audio text data, and use the trained FarmPredict framework to calculate the text score of the short videos; the second step is to extract the key frames of the video, and use the trained image model to calculate the video image score; the text score is summed up with the image score to get the short video data score. Finally, the financial short video score, the text score and the image score of the news report are summed to obtain the stock investment signal, which is used as the basis for constructing the China share stock portfolio and formulating an appropriate investment strategy. Finally, the financial short video score, the text score and the image score of the news report are summed to obtain the stock investment signal, which is used as the basis for constructing the China share stock portfolio and formulating an appropriate investment strategy. The research results show that financial videos and financial news data contain information related to stock prices, which can effectively predict market changes and bring excess returns to investors. The empirical study confirms the importance of alternative data in the Chinese market. By comprehensively analyzing alternative data, this paper provides investors with a comprehensive and effective trading signal extraction method, which can help optimize investment strategies and achieve higher real returns.
China share market / financial news analysis / financial short video / quantitative investment {{custom_keyword}} /
表1 权威财经网站综合排名 |
Alexa排名 | 百度权值 | PR值 | 反向链接 | 有效数据含量 | |
东方财富 | 269 | 5 | 7 | 11, 822 | 27% |
新浪财经 | 37 | 8 | 7 | 51, 880 | 16% |
云财经 | 110, 618 | 5 | 0 | 1, 630 | 32% |
同花顺 | 2, 055 | 4 | 0 | 78, 854 | 26% |
表2 未经预处理的金融新闻文本格式示例 |
发布日期 | 新闻内容 | 新闻标题 |
2022/1/3 20:29:02 | 【恒瑞医药2款创新药获批以创新实力迎新年"开门红"】岁末年初, 恒瑞医药研发创新喜获丰收: 国家药品监督管理局于同一天批准公司2款创新药上市. 至此, 恒瑞医药已上市创新药达到10款. (经济观察网) | 恒瑞医药2款创新药获批以创新实力迎新年"开门红" |
2022/1/3 20:23:27 | 【】 | 腾讯游戏: 相关产品已在华为游戏中心恢复上架 |
2022/1/3 20:14:16 | 【欧佩克下调一季度全球石油供应过剩预测正值考虑下一次增产】就在欧佩克+讨论是否再次增产的前一天, 欧佩克下调对本季度全球石油市场过剩供应的预测. 欧佩克+代表称预计周二会议将推进温和增产, 最新预测或许会鼓励他们做这个决定. | 欧佩克下调一季度全球石油供应过剩预测正值考虑下一次增产 |
表3 预处理后的金融新闻文本格式示例 |
发布日期 | 新闻内容及标题 | 收益率 |
2022/10/3 | 隆平高科: 水稻、玉米及棉花新品种通过国家审定【隆平高科: 水稻、玉米及棉花新品种通过国家审定】隆平高科(000998) 1月3日晚间公告, 根据《中华人民共和国农业农村部公告第500号》, 第四届国家农作物品种审定委员会第八次会议审定通过了677个稻品种、919个玉米品种、39个棉花品种及86个大豆品种, 其中含公司及下属公司自主培育或与他方共同培育的88个水稻新品种、50个玉米新品种和1个棉花新品种. (证券时报) | 3.1814 |
2022/10/3 | 四川九洲: 子公司拟挂牌转让捷能科技5%股权【四川九洲: 子公司拟挂牌转让捷能科技5%股权】四川九洲公告, 公司控股子公司九州科技拟通过产权交易所以公开挂牌的方式转让所持有的捷能科技5%股权. 该次交易以评估报告为依据, 挂牌底价为17.198万元, 挂牌交易完成后, 公司将不再持有捷能科技的股权. (财联社) | |
2022/10/3 | 双乐股份: 入选江苏省专精特新"小巨人"企业名单【双乐股份: 入选江苏省专精特新"小巨人"企业名单】双乐股份(301036)1月3日晚间公告, 公司入选江苏省专精特新"小巨人"企业名单. (证券时报) | 3.1403 |
表4 主流财经网站特点 |
网站 | 板块 | 是否含有配图 | 时间范围 | 发表时间精确度 |
新浪财经 | 经济新闻 | 有 | 2004.01 –至今 | 年–月–日–时–分–秒 |
腾讯财经 | 金融市场 | 有 | 2022.09 –至今 | 年–月–日–时–分–秒 |
网易财经 | 个股新闻 | 有 | 2022.07 –至今 | 年–月–日–时–分–秒 |
财联社 | 财经新闻 | 有 | 近两周 | 年–月–日–时–分–秒 |
表5 优质财经博主及粉丝量 |
数据来源 | 直男财经 | 韩秀云讲经济 | 珍大户 | 叶檀财经 | 暴躁财经 |
快手平台粉丝量/万人 | 205.6 | 456.8 | 165.5 | 207.5 | 263.4 |
抖音平台粉丝量/万人 | 1552.2 | 1148.5 | 783.2 | 552.0 | 303.0 |
粉丝量合计/万人 | 1757.8 | 1605.3 | 948.7 | 759.5 | 566.4 |
表6 投资策略判断指标 |
数据类型 | 夏普比率 | 年化收益率 | 日均基点 | 最大回撤 |
视频+图片+文本 | 1.41 | 23.21% | 9 bps | 7.31% |
文本 | 1.22 | 18.69% | 7 bps | 7.76% |
图片 | -19.28 | 0.13% | < 1 | - |
视频 | -3.35 | 0.48% | < 1 | 0.82% |
表7 FarmPredict框架测试 |
指标 | 夏普比率 | 年化收益率 | 日均基点 | 最大回撤 |
| | 1.326 | 7.560% | 3 bps | 1.119% |
| | 1.220 | 18.690% | 7 bps | 7.760% |
| | 0.840 | 11.811% | 5 bps | 7.538% |
注: FarmPredict框架测试的数据源为纯文本数据类型. |
表8 Google Inception v3 Model框架测试 |
真实值 | |||
积极情绪 | 消极情绪 | ||
预测值 | 积极情绪 | 48 | 8 |
消极情绪 | 2 | 42 |
表9 控制交易成本的影响 |
夏普比率 | 年化收益率 | 日均基点 | 最大回撤 | |
扣除交易成本前 | 2.532 | 40.252% | 16 bps | 6.235% |
扣除交易成本后 | 1.410 | 23.210% | 9 bps | 7.310% |
注: FarmPredict框架测试的数据源为纯文本数据类型. |
姜富伟, 孟令超, 唐国豪, 媒体文本情绪与股票回报预测[J]. 经济学(季刊), 2021, 21 (4): 1323- 1344.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
姜树广, 韦倩, 信念与心理博弈: 理论实证与应用[J]. 经济研究, 2013, 48 (6): 141- 154.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
李龙飞, 空间计量经济学中的空间自回归模型[J]. 计量经济学报, 2021, 1 (1): 36- 65.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
廖理, 另类数据: 经济增长的新亮点[J]. 学术前沿, 2021, (6): 22- 27.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
廖理, 崔向博, 孙琼, 另类数据的信息含量研究——来自电商销售的证据[J]. 管理世界, 2021, 37 (9): 90- 103.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
林建浩, 张一帆, 陈良源, 邓益萌, 基于新闻情绪的机器学习交易策略[J]. 计量经济学报, 2022, 2 (4): 881- 908.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
林耀虎, 刘善存, 杨海军, 一种基于机器学习和蜡烛图的股市投资策略研究[J]. 计量经济学报, 2022, 2 (1): 126- 140.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
谭松涛, 崔小勇, 孙艳梅, 媒体报道机构交易与股价的波动性[J]. 金融研究, 2014, 25 (3): 180- 193.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
王正位, 崔向博, 廖理, 线上销售市场反应与未来股票收益[J]. 经济学报, 2022, 9 (2): 146- 165.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
肖争艳, 陈衎, 陈小亮, 陈彦斌, 通货膨胀影响因素识别——基于机器学习方法的再检验[J]. 统计研究, 2022, 39 (6): 132- 147.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
游家兴, 吴静, 沉默的螺旋: 媒体情绪与资产误定价[J]. 经济研究, 2012, 47 (7): 141- 152.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
张宗新, 吴钊颖, 媒体情绪传染与分析师乐观偏差——基于机器学习文本分析方法的经验证据[J]. 管理世界, 2021, 37 (1): 170- 185.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
周颖刚, 纪洋, 倪骁然, 谢沛霖, 金融学的发展趋势和挑战与中国金融学的机遇[J]. 计量经济学报, 2022, 2 (3): 465- 489.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
He K, Zhang X, Ren S, Sun J, (2016). Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 770-778.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
Hu A, Ma S, (2021). Persuading Investors: A Video-Based Study[R]. NBER Working Papers, National Bureau of Economic Research.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
Jagtiani J, Lemieux C M, (2018). The Roles of Alternative Data and Machine Learning in Fintech Lending: Evidence from the Lendingclub Consumer Platform[R]. FRB-Philadelphia: Working Papers (Topic).
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
You Q, Luo J, Jin H, Yang J, (2015). Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks[C]//Proceedings of the Twenty-ninth AAAI Conference on Artificial Intel ligence (AAAI): 381-388.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |