To improve the throughput capacity of the training or inference applications without adding extra GPU cores, we share one GPU core between multiple deep learning workloads in a kubernetes cluster by container-level virtual GPU technology. This technology has a better application prospect in the production environments because of its performance loss is lower than virtual-machine-level GPU virtualization.