# mcp-excel-db

**Repository Path**: mirrors/mcp-excel-db

## Basic Information

- **Project Name**: mcp-excel-db
- **Description**: 数据分析的痛点与机遇 对于数据开发，很多BA（业务分析师）都是使用Excel文件来进行数据分析，但是缺少AI能力的介入
- **Primary Language**: JavaScript
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: https://www.oschina.net/p/mcp-excel-db
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-18
- **Last Updated**: 2025-12-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 
<h3 align="center">MCP - Excel Database</h3>

<p align="center">
  Make your excel as a database to query with SQL.
  <br>
  <a href="doc/zh-cn/README.md">[<strong>中文</strong>]</a> 
  <br>
  
</p>

[![npm version](https://badge.fury.io/js/%40zhangzichao2008%2Fmcp-excel-db.svg)](https://badge.fury.io/js/%40zhangzichao2008%2Fmcp-excel-db)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://img.shields.io/badge/Tests-18%2F148%20failed-red?style=flat-square&logo=jest)](https://github.com/steven0lisa/mcp-excel-db/actions)
[![Coverage](https://img.shields.io/badge/Coverage-21.58%25-red?style=flat-square&logo=codecov)](https://github.com/steven0lisa/mcp-excel-db/actions)


A Model Context Protocol (MCP) server that enables SQL querying of Excel files. Transform your Excel spreadsheets into queryable databases using familiar SQL syntax.

## 🚀 Features

- **SQL Query Support**: Execute SELECT queries on Excel worksheets using standard SQL syntax
- **Multiple Worksheet Support**: Query different worksheets within the same Excel file
- **Advanced SQL Operations**: Support for WHERE clauses, ORDER BY, GROUP BY, aggregation functions, and more
- **Field Validation**: Clear error messages when SQL queries reference non-existent fields
- **MCP Protocol**: Seamlessly integrates with MCP-compatible clients
- **Easy Installation**: Install and run with a single npx command

## ⚡ Performance Optimizations

For aggregate-only SELECT queries (where all selected expressions are aggregates like SUM/AVG/COUNT or built from them via arithmetic or CASE), the engine now uses a streaming aggregation fast path. Instead of loading the entire worksheet into memory and evaluating expressions per row, it streams through the file once, computes the needed aggregate counters, and then composes the final single-row result.

Benefits:
- Significantly lower memory usage on large files
- Faster execution for aggregate-only queries without GROUP BY

Supported patterns include:
- Direct aggregates: `SELECT SUM(amount) AS total FROM Sheet1`
- Derived expressions built from aggregates: `SELECT SUM(a*b)/SUM(w) AS weighted_avg FROM Sheet1`
- CASE built on aggregates: `SELECT CASE WHEN SUM(w)=0 THEN 0 ELSE SUM(a*b*w)/SUM(w) END AS weighted_avg FROM Sheet1 WHERE category='A'`

Notes:
- The fast path applies to single-table queries without JOIN, GROUP BY, or DISTINCT.
- You can disable the fast path for testing by constructing the query engine with `{ disableStreamingAggregate: true }`.
- The default maximum supported rows was increased to 1,000,000 for CSV and stream-loaded Excel paths.

## 📦 Installation

### Quick Start with npx

```bash
npx @zhangzichao2008/mcp-excel-db@latest
```

### Global Installation

```bash
npm install -g @zhangzichao2008/mcp-excel-db
mcp-excel-db
```

### Local Installation

```bash
npm install @zhangzichao2008/mcp-excel-db
```

## 🛠️ Usage

### Using in MCP Client

- TRAE/CURSOR/ClaudeCode MCP configuration

```json
{
  "mcpServers": {
    "excel-db": {
      "command": "npx",
      "args": [
        "-y",
        "@zhangzichao2008/mcp-excel-db@latest"
      ]
    }
  }
}
```

### Starting the MCP Server

The server runs on stdio and communicates using the Model Context Protocol:

```bash
npx @zhangzichao2008/mcp-excel-db@latest
```

### Available Tools

The MCP server provides the following tools:

#### 1. `execute_sql_query`
Execute SQL queries on Excel files. Each query requires specifying the file path.

**Parameters:**
- `sql` (string): SQL SELECT statement
- `filePath` (string): Path to the Excel file (.xlsx or .xls)

**Example:**
```json
{
  "sql": "SELECT * FROM Sheet1 WHERE age > 25 ORDER BY name",
  "filePath": "/path/to/your/spreadsheet.xlsx"
}
```

**Special SQL Commands:**
- Use `DESCRIBE tablename` to get column information for a worksheet
- Example: `DESCRIBE Sheet1` returns column names and data types

#### 2. `get_worksheet_info`
Get basic information about all worksheets in an Excel file (lightweight operation).

**Parameters:**
- `filePath` (string): Path to the Excel file (.xlsx or .xls)

**Returns:**
- List of worksheet names

**Example:**
```json
{
  "filePath": "/path/to/your/spreadsheet.xlsx"
}
```

**Note:** For performance reasons, this method does not return row count information. To get the specific row count, use SQL query: `SELECT COUNT(*) FROM SheetName`

#### 3. `get_worksheet_columns`
Get column information for worksheets in an Excel file (lightweight operation).

**Parameters:**
- `filePath` (string): Path to the Excel file (.xlsx or .xls)
- `worksheetName` (string, optional): Specific worksheet name to get columns for

**Returns:**
- Worksheet names and their column lists

**Example:**
```json
{
  "filePath": "/path/to/your/spreadsheet.xlsx",
  "worksheetName": "Sheet1"
}
```

## 📊 SQL Query Examples

### Basic SELECT
```sql
SELECT * FROM Sheet1;
```

### Filtering with WHERE
```sql
SELECT name, age, salary FROM employees WHERE age > 30;
```

### Sorting
```sql
SELECT * FROM products ORDER BY price DESC;
```

### Aggregation
```sql
SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary 
FROM employees 
GROUP BY department;
```

### Complex Queries
```sql
SELECT category, SUM(quantity) as total_quantity 
FROM inventory 
WHERE status = 'active' 
GROUP BY category 
ORDER BY total_quantity DESC;
```

### IN and NOT IN Operations
```sql
-- Filter by multiple values
SELECT * FROM products WHERE category IN ('Electronics', 'Books', 'Clothing');

-- Exclude multiple values
SELECT * FROM employees WHERE department NOT IN ('HR', 'Finance');

-- Complex conditions with IN
SELECT name, price FROM products 
WHERE category IN ('A', 'B') AND price > 100;
```

### Table Aliases
```sql
SELECT a.name, a.price FROM Sheet1 AS a WHERE a.price > 10;
```

### JOIN Operations
```sql
-- LEFT JOIN between worksheets
SELECT a.name, a.price, b.supplier 
FROM Sheet1 AS a 
LEFT JOIN Sheet2 AS b ON a.id = b.sheet1_id;

-- INNER JOIN between worksheets
SELECT a.*, b.rating 
FROM products AS a 
INNER JOIN suppliers AS b ON a.supplier_id = b.id;

-- CROSS JOIN between worksheets (Cartesian product)
SELECT a.name, b.category
FROM Sheet1 AS a
CROSS JOIN Sheet2 AS b;
```

### UNION Operations
```sql
-- UNION: Combine results and remove duplicates
SELECT name FROM employees_2023
UNION
SELECT name FROM employees_2024;

-- UNION ALL: Combine results keeping duplicates
SELECT product_name FROM orders_q1
UNION ALL
SELECT product_name FROM orders_q2;

-- Multiple UNION operations
SELECT customer_id, amount FROM orders_2023
UNION
SELECT customer_id, amount FROM orders_2024
UNION ALL
SELECT customer_id, amount FROM orders_2025;
```

### String Functions
```sql
-- String manipulation functions
SELECT 
  LENGTH(name) as name_length,
  UPPER(category) as upper_category,
  LOWER(description) as lower_desc,
  TRIM(notes) as clean_notes,
  SUBSTR(name, 1, 3) as name_prefix,
  REPLACE(description, 'old', 'new') as updated_desc
FROM products;
```

### Field Validation
```sql
-- The system now validates field existence and returns clear error messages
SELECT non_existent_field FROM Sheet1;
-- Error: Field "non_existent_field" does not exist in table "Sheet1"

SELECT name, typo_field FROM Sheet1 WHERE non_existent_column = 'value';
-- Error: Field "typo_field" does not exist in table "Sheet1"

SELECT * FROM Sheet1 ORDER BY wrong_column_name;
-- Error: Field "wrong_column_name" does not exist in table "Sheet1"
```

### Math Functions
```sql
-- Mathematical operations
SELECT
  ABS(profit) as absolute_profit,
  ROUND(price, 2) as rounded_price,
  CEIL(rating) as ceiling_rating,
  FLOOR(discount) as floor_discount,
  RANDOM() as random_number
FROM products;
```

### Getting Table Structure
```sql
DESCRIBE Sheet1;
```

**Note:** All SQL queries now require specifying the `filePath` parameter when using the MCP tools. The Excel file is loaded and processed for each query, ensuring you always work with the latest data.

## 🔧 Supported SQL Features

### SELECT Operations
- Column selection (`SELECT col1, col2`)
- Wildcard selection (`SELECT *`)
- Column aliases (`SELECT col1 AS alias`)

### WHERE Clauses
- Comparison operators (`=`, `>`, `<`, `>=`, `<=`, `!=`, `<>`)
- Pattern matching (`LIKE` with `%` wildcards)
- Logical operators (`AND`, `OR`, `NOT`)
- NULL checks (`IS NULL`, `IS NOT NULL`)
- List membership (`IN`, `NOT IN`) - Check if value exists in a list of values

### Table Aliases
- Table aliases (`FROM Sheet1 AS a`)
- Column references with aliases (`a.column_name`)

### JOIN Operations
- `LEFT JOIN` - Left outer join between worksheets
- `INNER JOIN` - Inner join between worksheets
- `RIGHT JOIN` - Right outer join between worksheets
- `FULL OUTER JOIN` - Full outer join between worksheets
- `CROSS JOIN` - Cartesian product between worksheets
- Join conditions with `ON` clause

### String Functions
- `LENGTH(str)` - Get string length
- `UPPER(str)` - Convert to uppercase
- `LOWER(str)` - Convert to lowercase
- `TRIM(str)` - Remove leading and trailing spaces
- `LTRIM(str)` - Remove leading spaces
- `RTRIM(str)` - Remove trailing spaces
- `SUBSTR(str, start, length)` - Extract substring (1-based indexing)
- `INSTR(str, substr)` - Find substring position (1-based, returns 0 if not found)
- `REPLACE(str, from_str, to_str)` - Replace substring

### Math Functions
- `ABS(x)` - Absolute value
- `ROUND(x, d)` - Round to d decimal places
- `CEIL(x)` / `CEILING(x)` - Round up to nearest integer
- `FLOOR(x)` - Round down to nearest integer
- `RANDOM()` - Generate random integer

### Aggregation Functions
- `COUNT(*)` - Count all rows
- `COUNT(column)` - Count non-null values
- `SUM(column)` - Sum numeric values
- `MAX(column)` - Find maximum value
- `MIN(column)` - Find minimum value
- `AVG(column)` - Calculate average value

### Other Features
- `ORDER BY` with `ASC`/`DESC`
- `GROUP BY` for aggregation
- `DISTINCT` for unique values
- `LIMIT` for result limiting
- `UNION` and `UNION ALL` for combining query results

## 🚫 Limitations

- Only SELECT queries are supported (no INSERT, UPDATE, DELETE)
- No subqueries
- No HAVING clauses
- No UNION operations
- Limited comparison operators (supports `=`, `!=`, `<`, `<=`, but not `>`, `>=`, `IS NOT`)
- Each query requires specifying the file path (no persistent file loading)
- For large files (>5MB), row counts are estimated using sampling for performance

## 🏗️ Development

### Building from Source

```bash
git clone https://github.com/steven0lisa/mcp-excel-db.git
cd mcp-excel-db
npm install
npm run build
```

### Running Tests

This project has two types of tests:

#### Unit Tests (Jest)
Run Jest unit tests for core functionality:
```bash
npm test
```

#### Feature Tests
Run comprehensive feature tests for all SQL functionality:
```bash
npm run test:features
```

The feature tests validate all implemented SQL features including WHERE conditions, JOIN operations, string functions, math functions, and more. Each feature has its own test suite in the `test/test-case/` directory.

**Latest Test Features (F-15 to F-18):**
- **F-15**: Worksheet Naming and Column Information Consistency Test - Validates worksheet name handling and column information retrieval
- **F-16**: Column Information Retrieval and Case Sensitivity Test - Tests column name case preservation and special character support
- **F-17**: Error Handling and Message Consistency Test - Comprehensive error handling validation with user-friendly messages
- **F-18**: SQL Syntax Compatibility and Data Type Inference Test - SQL syntax compatibility and automatic data type inference

### Development Mode

```bash
npm run dev
```

## 📁 Project Structure

```
mcp-excel-db/
├── src/                    # Source code
│   ├── excel-sql-query.ts  # Excel SQL query engine
│   └── mcp-server.ts       # MCP server implementation
├── test/                   # Test files
├── doc/                    # Documentation
│   └── zh-cn/             # Chinese documentation
├── dist/                   # Compiled output
├── package.json
├── tsconfig.json
└── README.md
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- [GitHub Repository](https://github.com/steven0lisa/mcp-excel-db)
- [npm Package](https://www.npmjs.com/package/@zhangzichao2008/mcp-excel-db)
- [Model Context Protocol](https://modelcontextprotocol.io/)

## 📞 Support

If you encounter any issues or have questions, please [open an issue](https://github.com/steven0lisa/mcp-excel-db/issues) on GitHub.

---

Made with ❤️ by [Zhang Zichao](https://github.com/steven0lisa)