Abstract: Graphics processing units (GPUs) are widely used to enhance deep learning inference throughput through batch processing, where multiple inference requests are processed simultaneously. When ...