# kaggle-exercise

**Repository Path**: gethug/kaggle-exercise

## Basic Information

- **Project Name**: kaggle-exercise
- **Description**: The exercises of ML and AI in https://www.kaggle.com website
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-02-18
- **Last Updated**: 2024-03-03

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Kaggle 课程练习

## 目的
记录在 [https://www.kaggle.com/](https://www.kaggle.com/) 网站学习ML和AI课程的练习作业，以便后续编程时可以回看和参考。

## 资料收集

+ [对 LabelEncoder，Ordinal Encoder，OnehotEncoder 区别的解释](https://www.zhihu.com/question/421194789)
+ 对fit_transform() 与 transform() 用法的解释 ： [What and why behind fit_transform() and transform() in scikit-learn!](https://towardsdatascience.com/what-and-why-behind-fit-transform-vs-transform-in-scikit-learn-78f915cf96fe)

+ 熵，交叉熵，KL散度的解释 ：[A Short Introduction to Entropy, Cross-Entropy and KL-Divergence](https://www.youtube.com/watch?v=ErfnhcEV1O8)
+ SimpleImputer 的使用 ：[How To Use Sklearn SimpleImputer for Filling Missing Values in Dataset](https://machinelearningknowledge.ai/how-to-use-sklearn-simple-imputer-simpleimputer-for-filling-missing-values-in-dataset/)
+ 如何创建好的验证数据集 ：[How (and why) to create a good validation set](https://www.fast.ai/posts/2017-11-13-validation-sets.html)
+ 参数问题是AI的一个大问题 ：[The problem with metrics is a big problem for AI](https://www.fast.ai/posts/2019-09-24-metrics.html)
+ 皮尔逊相关系数用来衡量两个连续变量之间线性相关程度的统计量 [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)
## 博客收集
+ [Sukanya Bag](https://github.com/sukanyabag)
+ [Chetna Khanna's blog](https://chetnakhanna.medium.com/)
+ [scikit-learn 中文社区](https://scikit-learn.org.cn/)