2024 OpenFabrics Alliance网络会议资料分享

2024 OpenFabrics Alliance网络会议资料分享



但具体到OFA联盟的历史,还是在之前一篇唐杰总的文章 SoC之三:AWS Elastic Fabric Adapter 》里介绍得更好:

RDMA的技术是在一个有Mellanox主导的行业组织OFA[7]主导的... OFA是2004年成立的工业组织,在整个HPC行业从Myrinet[8]转换到IB的时候成立的。在2005年, Myrinet在TOP500的市场份额占到了28%,之后就一路下降,被IB替换掉了。对于诞生于HPC专业的领域,可用性一直是个大问题,HPC一切为了性能,不要虚拟化,不要通用操作系统和架构,每台超算恨不得自成一台体系。大家看看Mellanox的Linux 驱动的家族就知道这个有多复杂了。

BTW. 近几年在OFA里比较积极的是Intel,比如CXL 3.1也在这次会议内容里。

2024 OFA Virtual Workshop资料网盘下载

链接: https://pan.baidu.com/s/1WPIT1LqEegAlAEjcoNZTuQ?pwd=4dfy


官网链接 https://www.openfabrics.org/2024-ofa-virtual-workshop-agenda/(里面还有视频,qiang外面的)


Session 1

“OFI 2.0 Update”
Jianxin Xiong, Intel

Session 2
“Status of OpenFabrics Interfaces (OFI) Support in MPICH”

Yanfei Guo, Argonne National Laboratory

Session 3
"Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters"

Hari Subramoni and Qinghua Zhou, The Ohio State University

Session 4
"High Performance & Scalable MPI library over Broadcom RoCE"

Mustafa Abduljabbar, The Ohio State University; Hemal Shah, Broadcom Inc; and Shulei Xu, The Ohio State University

Session 5
"Scaling Large Language Model Training using Hybrid GPU-based Compression in MVAPICH"

Speakers: Aamir Shafi and Lang Xu, The Ohio State University

Session 6
"OFI Integrated Shared Memory Offload"

Speakers: Alexia Ingerson, Intel; Shi Jin, Amazon; and Amir Shehata, Oak Ridge National Laboratories

Session 7
"Managing Composable Disaggregated Infrastructure With OFA Sunfish"

Christian Pinto, IBM Research Europe; Michael Aguilar, Sandia National Laboratories; Phil
Cayton, Intel; Russ Herrell, Hewlett Packard Enterprise; and Brian Pan, H3 Platform


