# TablePilot

**Repository Path**: mirrors_microsoft/TablePilot

## Basic Information

- **Project Name**: TablePilot
- **Description**: Code and data for ACL'25 paper "TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models"
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-31
- **Last Updated**: 2025-08-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models

[![Paper](http://img.shields.io/badge/Paper-arxiv.2502.18906-99D4C8.svg)](https://www.arxiv.org/pdf/2503.13262)

We propose TablePilot, a pioneering tabular data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results without relying on user profiles or prior interactions. The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy. Additionally, we construct DART, a benchmark tailored for comprehensive tabular data analysis recommendation.

<div align="center">
  <img width="80%" src="figure/framework.png">
</div>

## Quick Start 🚀

### Step 1: Build Environment
```bash
conda create -n tablepilot
conda activate tablepilot

pip install -r requirements.txt
```

### Step 2: Tabular Data Process
```bash
cd data_process
bash table_txt_fmt.sh
```

### Step 3: Analysis Generation
This step is the core generation component of TablePilot and consists of two main phases:

1. **Table Explanation Generation**
2. **Module-based Analysis Generation**, which includes three parts:
   - Basic Analysis  
   - Visualization  
   - Modeling

Replace the corresponding `.py` files as needed to generate specific content, then run:

```bash
bash run_generation.sh
```

### Step 4: Analysis Optimization
We employ a multimodal revision approach to refine the generated data analysis operations.
- Before revision, we first obtain the execution results of the initial round of generated data analysis operations
```bash
cd execution/run
bash run_code_exec_error.sh
```

- After that, we perform optimization based on these results
```bash
cd generation/run
bash run_revision.sh
```

- We perform only a single round of revision to obtain the final optimized results
```bash
cd execution/run
bash run_code_exec_revision.sh
```

### Step 5: Analysis Ranking
After optimization, the ranking module is used to return the highest-quality recommendations.

- We first need to aggregate all the results from the module-based analysis
```bash
cd evaluation/run
bash run_process_module_res.sh
```

- Then we apply the ranking module to return the highest-quality recommendations
```bash
cd generation/run
bash run_rank.sh
```

### Step 6: Evaluation
- **Execution Rate**
```bash
cd evaluation/run
bash run_exec_rate.sh
```

- **Recall**
  - Total Recall, the overall recall of all results generated by the framework
    ```bash
    bash run_recall_all_results,sh
    ```

  - Recall@k, where k represents the number of recommended data analysis operations the user wishes to receive
    ```bash
    bash run_sum_ranking_res.sh
    bash run_recall_ranked_res.sh
    ```

## Citation
If you find this repository useful, please considering giving ⭐ or citing:
```
@article{yi2025tablepilot,
  title={TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models},
  author={Yi, Deyin and Liu, Yihao and Cao, Lang and Zhou, Mengyu and Dong, Haoyu and Han, Shi and Zhang, Dongmei},
  journal={arXiv preprint arXiv:2503.13262},
  year={2025}
}

@inproceedings{yi2025tablepilot,
    title = "{T}able{P}ilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models",
    author={Yi, Deyin and Liu, Yihao and Cao, Lang and Zhou, Mengyu and Dong, Haoyu and Han, Shi and Zhang, Dongmei},
    editor = "Rehm, Georg and Li, Yunyao",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-industry.28/",
    pages = "355--410",
    ISBN = "979-8-89176-288-6",
}
```

## Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
trademarks or logos is subject to and must follow 
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.