# CodeBERT4JIT **Repository Path**: ecust-dp/code-bert4-jit ## Basic Information - **Project Name**: CodeBERT4JIT - **Description**: Code replication for JIT-DP experiments in the ICSME 2021 paper: Assessing Generalizability of CodeBERT - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-13 - **Last Updated**: 2024-06-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CodeBERT4JIT #### 介绍 Code replication for DP experiments in the ICSME 2021 paper: Assessing Generalizability of CodeBERT #### 系统环境 **P40-1**_高性能计算平台应用名称:gym CPU:10 核 RAM:100 GB GPU:NVIDIA Tesla P40 24G OS:Ubuntu 18.04 **P40-2**_高性能计算平台应用名称:Tensorflow_GPU CPU:10 核 RAM:64 GB GPU:NVIDIA Tesla P40 24G OS:CentOS 7.7 **3090**_高性能计算平台应用名称:Desktop_GPU CPU:10 核 RAM:20 GB GPU:Nvidia GeForce RTX 3090 OS:CentOS 7.8 #### 环境配置 ```git clone https://gitee.com/ecust-dp/code-bert4-jit.git``` ```cd code-bert4-jit``` ```conda create -n CodeBERT python=3.6.9``` ```conda activate CodeBERT``` [//]: # (~~pip install torch~~ # Reference: [解决pytorch capability sm_86 is not compatible with the current PyTorch installation 问题](https://blog.csdn.net/a563562675/article/details/121656894)) ```pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/cu110/torch_stable.html``` ```pip install transformers==4.18.0``` ```pip install scikit-learn==0.24.2``` Manually download **config.json, merges.txt, pytorch_model.bin, special_tokens_map.json, tokenizer_config.json, and vocab.json** from [Hugging Face](https://huggingface.co/microsoft/codebert-base/tree/main) and upload them to your Path/to/code-bert4-jit/pretrained_models/codebert_base/, or download **codebert_base.tar.gz** from 高性能计算平台(Path: ~/ECUST-SE/HuggingFace/), upload to your Path/to/code-bert4-jit/pretrained_models/, and extract the **codebert_base** folder containing above files via ```tar -zxvf codebert_base.tar.gz``` Manually download the **data** folder from [Google Drive](https://drive.google.com/drive/folders/199qFIMUg1rm53jnKejCDKxqMFY89wzNd) and upload them to your Path/to/code-bert4-jit/, or download **data.tar.gz** from 高性能计算平台(Path: ~/ECUST-SE/DP/CodeBERT-JIT/), upload to your Path/to/code-bert4-jit/, and extract the **data** folder via ```tar -zxvf data.tar.gz``` #### 使用说明 **Train** ``` python main.py -train -train_data './data/op_data/openstack_train_changed.pkl' -save-dir './trained_model_op' -dictionary_data './data/op_dict.pkl' ``` ``` python main.py -train -train_data './data/qt_data/qt_train_changed.pkl' -save-dir './trained_model_qt' -dictionary_data './data/qt_dict.pkl' ``` **Valid** ``` python main.py -train -train_data './data/op_data/openstack_train_changed.pkl' -save-dir './trained_model_op' -dictionary_data './data/op_dict.pkl' -valid -load_model_dir './trained_model_op/2024-01-20_22-26-21/' ``` **Note:** Change the value passed to the ***load_model_dir*** parameter accordingly, similarly hereinafter. ``` python main.py -train -train_data './data/qt_data/qt_train_changed.pkl' -save-dir './trained_model_qt' -dictionary_data './data/qt_dict.pkl' -valid -load_model_dir './trained_model_qt/2024-01-21_11-19-47/' ``` **Test** ``` python main.py -predict -pred_data './data/op_data/openstack_test_changed.pkl' -dictionary_data './data/op_dict.pkl' -load_model_dir './trained_model_op/2024-01-20_22-26-21/' ``` ``` python main.py -predict -pred_data './data/qt_data/qt_test_changed.pkl' -dictionary_data './data/qt_dict.pkl' -load_model_dir './trained_model_qt/2024-01-21_11-19-47/' ``` #### 结果对比与分析 **TABLE IV AUC RESULTS OF JUST-IN-TIME DEFECT PREDICTION** ![img.png](Paper_results.png) **Replication results** ***op*** ![img.png](Replication_results_op.png) ***Boldface value shows the best result.*** ***qt*** ![img.png](Replication_results_qt.png) ***Boldface value shows the best result.*** **Findings** 1. Overall, the performance of CodeBERT4JIT can be replicated on different machines, and the little performance discrepancies can be neglected. 2. Even the default training (fine-tuning) epoch is set to 3, the best checkpoints on different datasets can be obtained during the 1 epoch. [//]: # (3. For each dataset, the model (checkpoint) that achieved the best AUC score during validation may not perform best on the test set. The difference between valid set and test set is unavoidable.)