# CodeBERT4JIT

**Repository Path**: ecust-dp/code-bert4-jit

## Basic Information

- **Project Name**: CodeBERT4JIT
- **Description**: Code replication for JIT-DP experiments in the ICSME 2021 paper: Assessing Generalizability of CodeBERT
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-01-13
- **Last Updated**: 2024-06-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CodeBERT4JIT

#### 介绍
Code replication for DP experiments in the ICSME 2021 paper: Assessing Generalizability of CodeBERT

#### 系统环境

**P40-1**_高性能计算平台应用名称：gym

    CPU：10 核
    
    RAM：100 GB 
    
    GPU：NVIDIA Tesla P40 24G
    
    OS：Ubuntu 18.04


**P40-2**_高性能计算平台应用名称：Tensorflow_GPU

    CPU：10 核
    
    RAM：64 GB 
    
    GPU：NVIDIA Tesla P40 24G
    
    OS：CentOS 7.7


**3090**_高性能计算平台应用名称：Desktop_GPU

    CPU：10 核
    
    RAM：20 GB
    
    GPU：Nvidia GeForce RTX 3090
    
    OS：CentOS 7.8


#### 环境配置

```git clone https://gitee.com/ecust-dp/code-bert4-jit.git```

```cd code-bert4-jit```

```conda create -n CodeBERT python=3.6.9```

```conda activate CodeBERT```

[//]: # (~~pip install torch~~   # Reference: [解决pytorch capability sm_86 is not compatible with the current PyTorch installation 问题]&#40;https://blog.csdn.net/a563562675/article/details/121656894&#41;)

```pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/cu110/torch_stable.html```

```pip install transformers==4.18.0```

```pip install scikit-learn==0.24.2```

Manually download **config.json, merges.txt, pytorch_model.bin, special_tokens_map.json, tokenizer_config.json, and vocab.json** from [Hugging Face](https://huggingface.co/microsoft/codebert-base/tree/main) and upload them to your Path/to/code-bert4-jit/pretrained_models/codebert_base/, or download **codebert_base.tar.gz** from 高性能计算平台(Path: ~/ECUST-SE/HuggingFace/), upload to your Path/to/code-bert4-jit/pretrained_models/, and extract the **codebert_base** folder containing above files via ```tar -zxvf codebert_base.tar.gz```

Manually download the **data** folder from [Google Drive](https://drive.google.com/drive/folders/199qFIMUg1rm53jnKejCDKxqMFY89wzNd) and upload them to your Path/to/code-bert4-jit/, or download **data.tar.gz** from 高性能计算平台(Path: ~/ECUST-SE/DP/CodeBERT-JIT/), upload to your Path/to/code-bert4-jit/, and extract the **data** folder via ```tar -zxvf data.tar.gz```


#### 使用说明

**Train**
```
python main.py  -train -train_data './data/op_data/openstack_train_changed.pkl'  -save-dir './trained_model_op'  -dictionary_data './data/op_dict.pkl'
```

```
python main.py  -train -train_data './data/qt_data/qt_train_changed.pkl'  -save-dir './trained_model_qt'  -dictionary_data './data/qt_dict.pkl'
```

**Valid**
```
python main.py -train -train_data './data/op_data/openstack_train_changed.pkl' -save-dir './trained_model_op'  -dictionary_data './data/op_dict.pkl' -valid -load_model_dir './trained_model_op/2024-01-20_22-26-21/'
```
**Note:** Change the value passed to the ***load_model_dir*** parameter accordingly, similarly hereinafter.

```
python main.py -train -train_data './data/qt_data/qt_train_changed.pkl' -save-dir './trained_model_qt'  -dictionary_data './data/qt_dict.pkl' -valid -load_model_dir './trained_model_qt/2024-01-21_11-19-47/'
```

**Test**
```
python main.py -predict -pred_data './data/op_data/openstack_test_changed.pkl' -dictionary_data './data/op_dict.pkl' -load_model_dir './trained_model_op/2024-01-20_22-26-21/'
```

```
python main.py -predict -pred_data './data/qt_data/qt_test_changed.pkl' -dictionary_data './data/qt_dict.pkl' -load_model_dir './trained_model_qt/2024-01-21_11-19-47/'
```


#### 结果对比与分析

**TABLE IV AUC RESULTS OF JUST-IN-TIME DEFECT PREDICTION**
![img.png](Paper_results.png)

**Replication results**

***op***
![img.png](Replication_results_op.png)
***Boldface value shows the best result.***

***qt***
![img.png](Replication_results_qt.png)
***Boldface value shows the best result.***


**Findings**
1. Overall, the performance of CodeBERT4JIT can be replicated on different machines, and the little performance discrepancies can be neglected.
2. Even the default training (fine-tuning) epoch is set to 3, the best checkpoints on different datasets can be obtained during the 1 epoch.

[//]: # (3. For each dataset, the model &#40;checkpoint&#41; that achieved the best AUC score during validation may not perform best on the test set. The difference between valid set and test set is unavoidable.)