专栏名称: GitChat技术杂谈

GitChat是新时代的学习工具。

100亿次操作，3ms完成? 前端的高性能计算如何实现？

GitChat技术杂谈 · 公众号 · 程序员 · 2017-11-02 07:15

正文

引导关注

本文来自作者谦谦君子在 GitChat 上分享「如何实现前端高性能计算？」，「阅读原文」查看交流实录

「文末高能」

编辑 | 伊健

最近做一个项目，里面涉及到在前端做大量计算，直接用 js 跑了一下，大概需要 15s 的时间，也就是用户的浏览器会卡死 15s，这个完全接受不了。

虽说有 V8 这样牛逼的引擎，但大家知道 js 并不适合做 CPU 密集型的计算，一是因为单线程，二是因为动态语言。

我们就从这两个突破口入手，首先搞定“单线程”的限制，尝试用 WebWorkers 来加速计算。

1. 前端高性能计算之一：WebWorkers

什么是WebWorkers

简单说，WebWorkers 是一个 HTML5 的新 API，web 开发者可以通过此 API 在后台运行一个脚本而不阻塞 UI，可以用来做需要大量计算的事情，充分利用 CPU 多核。

大家可以看看这篇文章介绍 https://www.html5rocks.com/en/tutorials/workers/basics/ ，或者对应的中文版 ( https://www.html5rocks.com/zh/tutorials/workers/basics/ ) 。

The Web Workers specification defines an API for spawning background scripts in your web application.

Web Workers allow you to do things like fire up long-running scripts to handle computationally intensive tasks, but without blocking the UI or other scripts to handle user interactions.

可以打开这个链接自己体验一下 WebWorkers 的加速效果。

现在浏览器基本都支持 WebWorkers 了。

Parallel.js

直接使用 WebWorkers 接口还是太繁琐，好在有人已经对此作了封装： Parallel.js。

注意 Parallel.js 可以通过 node 安装：

$ npm install paralleljs

不过这个是在 node.js 下用的，用的 node 的 cluster 模块。如果要在浏览器里使用的话，需要直接应用 js:

<script src="parallel.js">script>

然后可以得到一个全局变量，Parallel。Parallel 提供了 map 和 reduce 两个函数式编程的接口，可以非常方便的进行并发操作。

我们先来定义一下我们的问题，由于业务比较复杂，我这里把问题简化成求 1-1,0000,0000 的和，然后在依次减去1-1,0000,0000，答案显而易见： 0！

这样做是因为数字太大的话会有数据精度的问题，两种方法的结果会有一些差异，会让人觉得并行的方法不可靠。

此问题在我的 mac pro chrome61 下直接简单地跑 js 运行的话大概是 1.5s （我们实际业务问题需要15s，这里为了避免用户测试的时候把浏览器搞死，我们简化了问题）。

const N = 100000000;// 总次数1亿// 更新自2017-10-24 16：47：00// 代码没有任何含义，纯粹是为了模拟一个耗时计算，直接用//   for (let i = start; i <= end; i += 1) total += i;// 有几个问题，一是代码太简单没有任何稍微复杂一点的操作，后面用C代码优化的时候会优化得很夸张，没法对比。// 二是数据溢出问题， 我懒得处理这个问题，下面代码简单地先加起来，然后再减掉，答案显而易见为0，便于测试。function sum(start, end) {  let total = 0;  for (let i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total += i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total += i / 2;
    }
  }  for (let i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total -= i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total -= i / 2;
    }
  }  return total;
}function paraSum(N) {  const N1 = N / 10;//我们分成10分，没分分别交给一个web worker，parallel.js会根据电脑的CPU核数建立适量的workers
  let p = new Parallel([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    .require(sum);  return p.map(n => sum((n - 1) * 10000000 + 1, n * 10000000))// 在parallel.js里面没法直接应用外部变量N1
    .reduce(data => {      const acc = data[0];      const e = data[1];      return acc + e;
    });
}

export { N, sum, paraSum }

代码比较简单，我这里说几个刚用的时候遇到的坑。

require 所有需要的函数

比如在上诉代码中用到了sum，你需要提前require(sum)，如果sum中由用到了另一个函数f，你还需要require(f)，同样如果f中用到了g，则还需要require(g)，直到你require了所有用到的定义的函数。。。。

没法 require 变量

我们上诉代码我本来定义了N1，但是没法用， ES6 编译成 ES5 之后的问题以及 Chrome 没报错。

实际项目中一开始我们用到了 ES6 的特性：数组解构。本来这是很简单的特性，现在大部分浏览器都已经支持了，不过我当时配置的 babel 会编译成ES5，所以会生成代码 _slicedToArray，大家可以在线上 Babel 测试，然后 Chrome 下面始终不 work，也没有任何报错信息，查了很久，后来在 Firefox 下打开，有报错信息：

ReferenceError: _slicedToArray is not defined

看来 Chrome 也不是万能的啊。。。

大家可以在此 Demo 页面 ( ./parallel-test ) 测试，提速大概在4倍左右，当然还是得看自己电脑CPU的核数。

另外我后来在同样的电脑上 Firefox55.0.3（64位）测试，上诉代码居然只要 190ms！！！在 Safari9.1.1 下也是 190ms 左右。。。

Refers

https://developer.mozilla.org/zh-CN/docs/Web/API/Web_Workers_API/Using_web_workers
https://www.html5rocks.com/en/tutorials/workers/basics/
https://parallel.js.org/
https://johnresig.com/blog/web-workers/
http://javascript.ruanyifeng.com/htmlapi/webworker.html
http://blog.teamtreehouse.com/using-web-workers-to-speed-up-your-javascript-applications

2. 前端高性能计算之二：asm.js & webassembly

前面我们说了要解决高性能计算的两个方法，一个是并发用WebWorkers，另一个就是用更底层的静态语言。

2012年，Mozilla 的工程师Alon Zakai在研究LLVM编译器时突发奇想：能不能把 C/C++ 编译成 Javascript，并且尽量达到 Native 代码的速度呢？

于是他开发了 Emscripten 编译器，用于将 C/C++ 代码编译成 Javascript 的一个子集 asm.js，性能差不多是原生代码的 50%。大家可以看看这个PPT ( http://kripken.github.io/mloc_emscripten_talk/ ) 。

之后 Google 开发了 [Portable Native Client][PNaCI]，也是一种能让浏览器运行 C/C++ 代码的技术。

后来估计大家都觉得各搞各的不行啊，居然 Google, Microsoft, Mozilla, Apple 等几家大公司一起合作开发了一个面向 Web 的通用二进制和文本格式的项目，那就是 WebAssembly，官网上的介绍是：

WebAssembly or wasm is a new portable, size- and load-time-efficient format suitable for compilation to the web.

WebAssembly is currently being designed as an open standard by a W3C Community Group that includes representatives from all major browsers.

所以，WebAssembly 应该是一个前景很好的项目。我们可以看一下目前浏览器的支持情况：

安装 Emscripten

访问 https://kripken.github.io/emscripten-site/docs/getting_started/downloads.html

1. 下载对应平台版本的 SDK

2. 通过 emsdk 获取最新版工具

bash # Fetch the latest registry of available tools. ./emsdk update # Download and install the latest SDK tools. ./emsdk install latest # Make the "latest" SDK "active" for the current user. (writes ~/.emscripten file) ./emsdk activate latest # Activate PATH and other environment variables in the current terminal source ./emsdk_env.sh

3. 将下列添加到环境变量 PATH 中

~/emsdk-portable
~/emsdk-portable/clang/fastcomp/build_incoming_64/bin
~/emsdk-portable/emscripten/incoming

4. 其他

我在执行的时候碰到报错说 LLVM 版本不对，后来参考文档配置了 LLVM_ROOT 变量就好了，如果你没有遇到问题，可以忽略。

LLVM_ROOT = os.path.expanduser(os.getenv('LLVM', '/home/ubuntu/a-path/emscripten-fastcomp/build/bin'))

5. 验证是否安装好

执行 emcc -v，如果安装好会出现如下信息：

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 1.37.21clang version 4.0.0 (https://github.com/kripken/emscripten-fastcomp-clang.git 974b55fd84ca447c4297fc3b00cefb6394571d18) (https://github.com/kripken/emscripten-fastcomp.git 9e4ee9a67c3b67239bd1438e31263e2e86653db5) (emscripten 1.37.21 : 1.37.21) Target: x86_64-apple-darwin15.5.0Thread model: posix InstalledDir: /Users/magicly/emsdk-portable/clang/fastcomp/build_incoming_64/bin INFO:root:(Emscripten: Running sanity checks)

Hello, WebAssembly!

创建一个文件 hello.c：

#include int main() {  printf("Hello, WebAssembly!\n");  return 0;
}

编译 C/C++ 代码：

emcc hello.c

上述命令会生成一个 a.out.js 文件，我们可以直接用 Node.js 执行：

node a.out.js

输出：

Hello, WebAssembly!

为了让代码运行在网页里面，执行下面命令会生成 hello.html 和 hello.js 两个文件，其中 hello.js 和 a.out.js 内容是完全一样的。

emcc hello.c -o hello.html

➜  webasm-study md5 a.out.js
MD5 (a.out.js) = d7397f44f817526a4d0f94bc85e46429
➜  webasm-study md5 hello.js
MD5 (hello.js) = d7397f44f817526a4d0f94bc85e46429

然后在浏览器打开 hello.html，可以看到页面：；

前面生成的代码都是 asm.js，毕竟 Emscripten 是人家作者 Alon Zakai 最早用来生成 asm.js 的，默认输出 asm.js 也就不足为奇了。

当然，可以通过 option 生成 wasm，会生成三个文件：hello-wasm.html, hello-wasm.js, hello-wasm.wasm。

emcc hello.c -s WASM=1 -o hello-wasm.html

然后浏览器打开 hello-wasm.html，发现报错 TypeError: Failed to fetch。原因是 wasm 文件是通过 XHR 异步加载的，用 file://// 访问会报错，所以我们需要启一个服务器。

npm install -g serve serve .

然后访问 http://localhost:5000/hello-wasm.html，就可以看到正常结果了。

调用 C/C++函数

前面的 Hello, WebAssembly! 都是 main 函数直接打出来的，而我们使用 WebAssembly 的目的是为了高性能计算，做法多半是用 C/C++ 实现某个函数进行耗时的计算，然后编译成 wasm，暴露给js去调用。

在文件 add.c 中写如下代码：

#include int add(int a, int b) {  return a + b;
}int main() {  printf("a + b: %d", add(1, 2));  return 0;
}

有两种方法可以把 add 方法暴露出来给 js 调用。

通过命令行参数暴露 API

emcc -s EXPORTED_FUNCTIONS="['_add']" add.c -o add.js

注意方法名 add 前必须加_。
然后我们可以在 Node.js 里面这样使用：

// file node-add.jsconst add_module = require('./add.js');console.log(add_module.ccall('add', 'number', ['number', 'number'], [2, 3]));

执行 node node-add.js 会输出 5。如果需要在 web 页面使用的话，执行：

emcc -s EXPORTED_FUNCTIONS="['_add']" add.c -o add.html

然后在生成的 add.html 中加入如下代码：

 <button onclick="nativeAdd()">clickbutton>
  <script type='text/javascript'>
    function nativeAdd() {      const result = Module.ccall('add', 'number', ['number', 'number'], [2, 3]);
      alert(result);
    }  script>

然后点击 button，就可以看到执行结果了。

Module.ccall 会直接调用 C/C++ 代码的方法，更通用的场景是我们获取到一个包装过的函数，可以在js里面反复调用，这需要用 Module.cwrap，具体细节可以参看文档。

const cAdd = add_module.cwrap('add', 'number', ['number', 'number']);console.log(cAdd(2, 3));console.log(cAdd(2, 4));

定义函数的时候添加 EMSCRIPTEN_KEEPALIVE

添加文件 add2.c。

#include #include int EMSCRIPTEN_KEEPALIVE add(int a, int b) {  return a + b;
}int main() {  printf("a + b: %d", add(1, 2));  return 0;
}

执行命令：

emcc add2.c -o add2.html

同样在 add2.html 中添加代码：

 <button onclick="nativeAdd()">clickbutton>
  <script type='text/javascript'>
    function nativeAdd




    
() {      const result = Module.ccall('add', 'number', ['number', 'number'], [2, 3]);
      alert(result);
    }  script>

但是，当你点击 button 的时候，报错：

Assertion failed: the runtime was exited (use NO_EXIT_RUNTIME to keep it alive after main() exits)

可以通过在 main( ) 中添加 emscripten_exit_with_live_runtime( ) 解决：

#include #include int EMSCRIPTEN_KEEPALIVE add(int a, int b) {  return a + b;
}int main() {  printf("a + b: %d", add(1, 2));
  emscripten_exit_with_live_runtime();  return 0;
}

或者也可以直接在命令行中添加 -s NO_EXIT_RUNTIME=1 来解决，

emcc add2.c -o add2.js -s NO_EXIT_RUNTIME=1

不过会报一个警告：

exit(0) implicitly called by end of main(), but noExitRuntime, so not exiting the runtime (you can use emscripten_force_exit, if you want to force a true shutdown)

所以建议采用第一种方法。

上述生成的代码都是 asm.js，只需要在编译参数中添加 - s WASM=1 中就可以生成 wasm，然后使用方法都一样。

用 asm.js 和 WebAssembly 执行耗时计算

前面准备工作都做完了，现在我们来试一下用C代码来优化前一篇中提过的问题。代码很简单：

// file sum.c#include // #include long sum(long start, long end) {  long total = 0;  for (long i = start; i <= end; i += 3) {
    total += i;
  }  for (long i = start; i <= end; i += 3) {
    total -= i;
  }  return total;
}int main() {  printf("sum(0, 1000000000): %ld", sum(0, 1000000000));  // emscripten_exit_with_live_runtime();
  return 0;
}

注意用 gcc 编译的时候需要把跟 emscriten 相关的两行代码注释掉，否则编译不过。

我们先直接用gcc编译成native code看看代码运行多块呢？

➜ webasm-study gcc sum.c ➜ webasm-study time ./a.out sum(0, 1000000000): 0./a.out 5.70s user 0.02s system 99% cpu 5.746 total ➜ webasm-study gcc -O1 sum.c ➜ webasm-study time ./a.out sum(0, 1000000000): 0./a.out 0.00s user 0.00s system 64% cpu 0.003 total ➜ webasm-study gcc -O2 sum.c ➜ webasm-study time ./a.out sum(0, 1000000000): 0./a.out 0.00s user 0.00s system 64% cpu 0.003 total

可以看到有没有优化差别还是很大的，优化过的代码执行时间是 3ms !。

really？仔细想想，我 for 循环了10亿次啊，每次 for 执行大概是两次加法，两次赋值，一次比较，而我总共做了两次 for 循环，也就是说至少是 100亿次操作。

而我的 mac pro是2.5 GHz Intel Core i7，所以 1s 应该也就执行 25 亿次 CPU 指令操作吧，怎么可能逆天到这种程度，肯定是哪里错了！？

想起之前看到的一篇 rust 测试性能的文章 ( http://ling0322.info/2014/01/20/rust-vs-go-in-code-optimization.html ) ，说 rust 直接在编译的时候算出了答案，然后把结果直接写到了编译出来的代码里，不知道 gcc 是不是也做了类似的事情。

在知乎上 GCC中-O1 -O2 -O3 优化的原理是什么？这篇文章里，还真有loop-invariant code motion（LICM）针对 for 的优化，所以我把代码增加了一些if判断，希望能“糊弄”得了 gcc 的优化。

#include // #include // long EMSCRIPTEN_KEEPALIVE sum(long start, long end) {long sum(long start, long end) {  long total = 0;  for (long i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total += i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total += i / 2;
    }
  }  for (long i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total -= i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total -= i / 2;
    }
  }  return total;
}int main() {  printf("sum(0, 1000000000): %ld", sum(0, 100000000));  // emscripten_exit_with_live_runtime();
  return




    
 0;
}

执行结果大概要正常一些了。

➜ webasm-study gcc -O2 sum.c ➜ webasm-study time ./a.out sum(0, 1000000000): 0./a.out 0.32s user 0.00s system 99% cpu 0.324 total

ok，我们来编译成 asm.js 了。

#include #include long EMSCRIPTEN_KEEPALIVE sum(long start, long end) {// long sum(long start, long end) {
  long total = 0;  for (long i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total += i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total += i / 2;
    }
  }  for (long i = start; i <= end; i += 1) {    if (i % 2 == 0 || i % 3 == 1) {
      total -= i;
    } else if (i % 5 == 0 || i % 7 == 1) {
      total -= i / 2;
    }
  }  return total;
}int main() {  printf("sum(0, 1000000000): %ld", sum(0, 100000000));
  emscripten_exit_with_live_runtime();  return 0;
}

执行：

emcc sum.c -o sum.html

然后在 sum.html 中添加代码

 <button onclick="nativeSum()">NativeSumbutton>
  <button onclick="jsSumCalc()">JSSumbutton>
  <script type='text/javascript'>
    function nativeSum() {
      t1 = Date.now();      const result = Module.ccall('sum', 'number', ['number', 'number'], [0, 100000000]);
      t2 = Date.now();      console.log(`result: ${result}, cost time: ${t2 - t1}`);
    }  script>
  <script type='text/javascript'>
    function jsSum(start, end) {      let total = 0;      for (let i = start; i <= end; i += 1) {        if (i % 2 == 0 || i % 3 == 1) {
          total += i;
        } else if (i % 5 == 0 || i % 7 == 1) {
          total += i / 2;
        }
      }      for (let i = start; i <= end; i += 1) {        if (i % 2 == 0 || i % 3 == 1) {
          total -= i;
        } else if (i % 5 == 0 || i % 7 == 1) {
          total -= i / 2;
        }
      }      return total;
    }    function jsSumCalc() {      const N = 100000000;// 总次数1亿
      t1 = Date.now();
      result = jsSum(0, N);
      t2 = Date.now();      console.log(`result: ${result}, cost time: ${t2 - t1}`);
    }  script>

另外，我们修改成编译成 WebAssembly 看看效果呢？

emcc sum.c -o sum.js -s WASM=1

Browser

webassembly

asm.js


Chrome61	1300ms	600ms	3300ms
Firefox55	600ms	800ms	700ms
Safari9.1	不支持	2800ms	因不支持ES6我懒得改写没测试

感觉 Firefox 有点不合理啊，默认的 JS 太强了吧。然后觉得 webassembly 也没有特别强啊，突然发现 emcc 编译的时候没有指定优化选项 -O2。再来一次：

emcc -O2 sum.c -o sum.js # for asm.jsemcc -O2 sum.c -o sum.js -s WASM=1 # for webassembly

Browser

webassembly -O2

asm.js -O2


Chrome61

100亿次操作，3ms完成? 前端的高性能计算如何实现？

正文

1. 前端高性能计算之一：WebWorkers

什么是WebWorkers

Parallel.js

Refers

2. 前端高性能计算之二：asm.js & webassembly

安装 Emscripten

Hello, WebAssembly!

调用 C/C++函数

通过命令行参数暴露 API

定义函数的时候添加 EMSCRIPTEN_KEEPALIVE

用 asm.js 和 WebAssembly 执行耗时计算

请到「今天看啥」查看全文