# huaweicloud-regionless-dataset

**Repository Path**: HuaweiCloudDeveloper/huaweicloud-regionless-dataset

## Basic Information

- **Project Name**: huaweicloud-regionless-dataset
- **Description**: huaweicloud-regionless-dataset

- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master-dev
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 4
- **Forks**: 1
- **Created**: 2022-10-09
- **Last Updated**: 2025-05-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Regionless orchestration datasets
## Overview
The Regionless orchestration dataset is published by Huawei Group. By providing standard data sets for orchestration and scheduling of computing resources, it can help researchers better study and train scheduling algorithms. We are open source based on Apache License V2.0. Please follow the open source protocol when using datasets.
The dataset contains the resource usage data of 20 tenants in 17 regions (forming 5 zones) connected by 7 backbone network lines during three months. Several regions form a specific zone according to the geographic location.

For the computing resource usage of tenants, each tenant has a separate table. Each table consists of four fields: Date, Region id, VM type id used by the tenant, and number of VM cores that the tenant is using at the current time. Such as:
|  Date   | Region id  | VM type id | Using VM cores |
|  ----  | ----  | ----  | ----  | 
| 1 00:00:00  | r8 | v3 | 272 |
| 1 00:00:00  | r1 | v1 | 30 |
| 1 00:10:00  | r1 | v3 | 32 |
| 1 00:10:00  | r4 | v5 | 1568 |
| ......  | ...... | ...... | ...... |

For the network bandwidth usage resources, we give the data sources of the top 20 tenants and their average network bandwidth usage per unit VM (configured with 1vCPU and 2GB memory). The data can represent the data storage distribution of tenants and their data access density when interacting with remote data sources across different zones. The data table consists of three fields: Tenant_id, Data_source, and Band/VM, which is shown as follows.
|  Tenant_id   | Data_source | Band/VM |  Tenant_id   | Data_source | Band/VM |
|  ----  | ----  | ----  | ----  | ----  | ----  |
| 1 | Zone1+Zone3+Zone4 | 1Mbps | 11 | Zone1 | 2Mbps |
| 2 | Zone1 | 1Mbps | 12 | Zone1+Zone5 | 2Mbps |
| 3 | Zone2 | 4Mbps | 13 | Zone3+Zone4 | 1Mbps |
| 4 | Zone1 | 1Mbps | 14 | Zone2 | 4Mbps |
| 5 | Zone2 | 1Mbps | 15 | Zone1+Zone3+Zone4 | 4Mbps |
| 6 | Zone2 | 4Mbps | 16 | Zone3 | 2Mbps |
| 7 | Zone1 | 1Mbps | 17 | Zone2 | 8Mbps |
| 8 | Zone1+Zone3+Zone4 | 2Mbps | 18 | Zone5 | 2Mbps |
| 9 | Zone1 | 2Mbps | 19 | Zone1+Zone3+Zone4 | 8Mbps |
| 10 | Zone3+Zone4 | 1Mbps | 20 | Zone1 | 1Mbps |

Due to limited storage for sampling data in our backend, decimals for Band/VM were omitted. We also plan to open-source the origion network bandwidth usage over time for the each backbone network line, generated by tenants outside the top 20 tenants.

There are some explainations of the dataset for avoiding users' confusion.
1. A tenant may use more regions or VM types which are not included in the trace.
2. For the data of a tenant, the region and VM type without record at a specific date represents the tenant does not use it at this date.
3. For the data of a tenant, if there's no record during a specific day, it means the tenant does not use VMs at this day.
4. When connecting the computing and network data, we recommend to classify the regions as: Zone1 (Regions 1-4), Zone2 (Regions 5-7), Zone3 (Regions 8-9), Zone4 (Regions 10-12), Zone5 (Regions 13-17).

We encourage anyone to use the datasets for study or research purposes, and if you had any question when using the datasets, please file an issue on Github. Filing an issue is recommanded as the discussion would help all the community. Note that the more clearly you ask the question, the more likely you would get a clear answer.

We are open source based on Apache License V2.0. When using datasets, please follow the open source protocol, and cite our paper.

```BibTeX
@inproceedings{shi2022characterizing,
  title={Characterizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency},
  author={Shi, Jiuchen and Fu, Kaihua and Chen, Quan and Yang, Changpeng and Huang, Pengfei and Zhou, Mosong and Zhao, Jieru and Chen, Chen and Guo, Minyi},
  booktitle={Proceedings of the 13th Symposium on Cloud Computing},
  pages={94--109},
  year={2022}
}
```
## Papers using regionless orchestration datasets
Jiuchen Shi, Kaihua Fu, Quan Chen, Changpeng Yang, Pengfei Huang, Mosong Zhou, Jieru Zhao, Chen Chen and Minyi Guo. Characterizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency[C]//Proceedings of the 13th Symposium on Cloud Computing. 2022: 94-109.

## LISCENCE
Licensed under the Apache License V2.0.

## Future works
More datasets will be released in future.

## Download dataset
**[Download links](https://gitee.com/HuaweiCloudDeveloper/huaweicloud-regionless-dataset)**