Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Axâ=âb without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4Â ÃÂ -5Â ÃÂ performance increase and 5Ã better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems.
阅读:4
作者:Haidar Azzam, Bayraktar Harun, Tomov Stanimire, Dongarra Jack, Higham Nicholas J
| 期刊: | Proceedings of the Royal Society A-Mathematical Physical and Engineering Sciences | 影响因子: | 3.000 |
| 时间: | 2020 | 起止号: | 2020 Nov;476(2243):20200110 |
| doi: | 10.1098/rspa.2020.0110 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
