# Detecting Short-selling Using Ensemble Learning **Repository Path**: likelihoodlab/detecting-short-selling-using-ensemble-learning ## Basic Information - **Project Name**: Detecting Short-selling Using Ensemble Learning - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 3 - **Forks**: 0 - **Created**: 2020-11-14 - **Last Updated**: 2023-09-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Detecting Short-selling Using Ensemble Learning #### 介绍 #### 软件架构 - model code - CART-AdaBoost - LSTM-AdaBoost - feature construction - textual feature - Scraping-SEC-filings - FReader - features - data #### 使用说明 - model code 存放模型代码 - CART-AdaBoost - LSTM-AdaBoost - feature construction 存放特征构建代码 - textual feature 文本特征:可读性,信息含量,文本情绪 - Scraping-SEC-filings 爬取SEC EDGAR FTP site上的年报 - FReader 解析html格式的年报,提取所有文字内容,计算可读性 - features 文本情绪: Generic_Parser.py: 基于LM(2011)词典构建文本情绪和可读性 LoughranMcDonald_MasterDictionary_2018.csv : LM词典 信息含量: tf-idf_LSA_cos.py: TF-IDF+余弦相似度+LSA计算信息含量 主题分布: LDA.py - data 2009-2019 Chinese concept stocks list.csv:2009-2019年美股在市中概股 yearly dataset.csv:年度频率的数据集,包含文本特征和财务特征 monthly dataset.csv:月度频率的数据集,包含股价特征 #### 参与贡献