# COMP9315-A2-25T1

**Repository Path**: wangqiyuejava63/comp9315-a2-25-t1

## Basic Information

- **Project Name**: COMP9315-A2-25T1
- **Description**: MALH: Multi-Attribute Linear Hashing
参考自：https://github.com/YunruiZhang/Multi-attribute-linear-hashing
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-17
- **Last Updated**: 2025-04-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


## 插入命令（Insert Command）

插入命令用于将元组插入到指定的关系中。它从标准输入读取元组，并将它们插入到指定的表中。

假设我们有一个关系 R，可以通过以下命令插入一个元组：

`echo "100,abc,xyz" | ./insert R`


## 检查状态（Stats Command）
使用 stats 命令可以查看表的当前状态，包括每个数据页的内容。例如：

`./stats R`

这将显示表 R 的全局信息和每个数据页的内容。如果插入了一个元组，stats 命令将显示页面0中有1个元组。

## 自动生成元组（gendata Command）
为了更高效地测试和填充表，提供了一个 gendata 命令，用于生成大量元组。gendata 命令的用法如下：

`./gendata <num_tuples> <num_attributes> [start_id] [seed]`

- num_tuples：要生成的元组数量。
- num_attributes：每个元组的属性数量。
- start_id（可选）：起始ID值，默认为1。
- seed（可选）：随机数生成器的种子，默认为0。

You could use gendata to generate large numbers of tuples, and insert them as follows:

`./gendata 250 3 101 | ./insert R`

This will insert 250 tuples into the table, with ID values starting at 101. 

## todo 1

expand data page with linear hashing algorithm

One other thing to notice here is that the file has not expanded. It still has the 4 original 
data pages. Even if you added thousands of tuples, it would still have only 4 data pages. 
This is because linear hashing is not yet implemented. Implementing it is one of your 
tasks. 


## todo 2

implement query scanning function

You could then use the query command to search for tuples using a command like: 

`./query '2,1' from R where '101,?,?'`

This aims to find any tuple with 101 as the ID value (the first attribute), and projects the 
result on attributes 2 and 1 (in that order); there will be exactly one such tuple, since ID 
values are unique. This returns no solutions because query scanning is not yet 
implemented. Implementing it is another of your tasks.

## 任务1：多属性哈希（Multi-attribute Hashing）

修改 tupleHash() 函数，使其能够根据选择向量（choice vector）从每个属性的哈希值中提取特定的位来生成组合哈希值。

当前的实现仅使用第一个属性（ID值）的哈希值来生成元组的哈希值。

## 任务2：查询（选择和投影）

目标

设计并实现选择（Selection）和投影（Projection）数据结构，并实现必要的操作。

具体来说，需要在 select.c 中实现多维部分匹配检索（n-dimensional partial-match retrieval, n-d pmr），
包括模式匹配支持。

在 project.c 中实现投影操作，不包括去重（distinct），涉及从元组中选择和可能重新排序属性。

## 任务3：线性哈希（Linear Hashing）
目标

当前实现本质上是一个静态的单属性哈希版本。

你需要添加功能，确保文件在每 c 次插入后扩展，其中 c 是页面容量 capcity c = floor(B/R) ≈ 1024/(10*n) ，n 是属性的数量。

在文件末尾添加一个新页面，并将“伙伴”页面（索引为 2^d ）中的元组重新分配到旧页面和新页面之间。

通过考虑哈希值的 d+1 位来确定每个元组的去向。

这将涉及修改 addToRelation() 函数，并可能需要在 reln.c 文件（以及其他文件）中添加新函数。