专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
爱可可-爱生活  ·  【DeepSeek:比ChatGPT危险10 ... ·  昨天  
爱可可-爱生活  ·  关键或许不在于对抗AI的“幻觉”,而是要重塑 ... ·  昨天  
宝玉xp  ·  Deep Research ... ·  昨天  
宝玉xp  ·  o3-mini-high 是每周 50 ... ·  5 天前  
新机器视觉  ·  DeepSeek R1 ... ·  5 天前  
新机器视觉  ·  DeepSeek R1 ... ·  5 天前  
51好读  ›  专栏  ›  机器学习研究会

【论文】TVM——端到端深度学习优化栈,为不同硬件后端深度学习工作负载提供了性能可移植性

机器学习研究会  · 公众号  · AI  · 2018-02-18 22:02

正文

                                                                                                                                                                                       
点击上方“机器学习研究会”可以订阅
摘要
 

转自:爱可可-爱生活

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. We discuss the optimization challenges specific to deep learning that TVM solves: high-level operator fusion, low-level memory reuse across threads, mapping to arbitrary hardware primitives, and memory latency hiding. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art libraries for low-power CPU and server-class GPUs. We also demonstrate TVM's ability to target new hardware accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The compiler infrastructure is open sourced.





请到「今天看啥」查看全文