Accelerating the tuning process for optimizing DNN operators by ROFT model

利用 ROFT 模型加速 DNN 算子的优化调优过程

阅读:1

Abstract

Deep neural networks (DNNs) are computationally intensive and optimized in different ways. Some compiler optimizations for DNNs could achieve performance almost the same as, or even better than, manual optimizations. However, the former mechanisms usually require an unbearably long optimization time in the tuning process. In this paper, we propose a new method that accelerates the tuning process significantly without performance penalties. In particular, we use a Roofline-like cost model, namely ROFT (Roofline for Fast AutoTune), to evaluate the performance of schedules. The ROFT model can be easily implemented on different microarchitectures, e.g., NVidia GPUs and Huawei Ascend NPUs. Based on the cost model, we implement a flexible two-stage search algorithm, which significantly improves the time of tuning process. Experiments show that the ROFT method speeds up the tuning process by about 4X and 10X compared with AutoTVM on NVidia GPUs and the AutoTune of Huawei's Tensor Boost Engine (TBE) on Huawei Ascend310 NPUs for some typical DNNs, respectively. It improves the inference time of some DNNs by up to 7% as well.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。