专栏名称: 狗厂

[译] 深入 Node 模块：querystring - jsernews

狗厂 · 掘金 · · 2018-04-16 02:53

正文

有一段时间了，我想要在 Node 生态系统中执行标准库和常用软件包的代码演练。我想现在是时候把这个意愿改变为行动，并且实际写出一篇文章。所以在这里，我的第一个带注释的代码演练。

我想先看一下Node标准库中最基本的模块之一： querystring 。 querystring 是一个允许用户提取 URL 的查询部分的值和从键值关联的对象构建查询的模块。这是一个快速的代码片段，显示了由 querystring 暴露的四种不同的API函数， escape ， parse ， stringify 和 unescape 。

> const querystring = require("querystring");
> querystring.escape("key=It's the final countdown");
'key%3DIt\'s%20the%20final%20countdown'
> querystring.parse("foo=bar&abc=xyz&abc=123");
{ foo: 'bar', abc: [ 'xyz', '123' ] }
> querystring.stringify({ foo: 'bar', baz: ['qux', 'quux'], corge: 'i' });
'foo=bar&baz=qux&baz=quux&corge=i'
> querystring.unescape("key%3DIt\'s%20the%20final%20countdown");
'key=It\'s the final countdown'

好的！让我们深入了解有趣的部分。我将查看 querystring 模块的代码作为我写这篇文章的标准。你可以在这里找到这个版本的副本。

引起我注意的第一件事是47-64行的这段代码。

const unhexTable = [
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 0 - 15
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 16 - 31
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 32 - 47
  +0, +1, +2, +3, +4, +5, +6, +7, +8, +9, -1, -1, -1, -1, -1, -1, // 48 - 63
  -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 64 - 79
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 80 - 95
  -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 96 - 111
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 112 - 127
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 128 ...
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1  // ... 255
];

这是什么胡言乱语？我在整个代码库中搜索了 unhexTable 这个术语，以找出它在哪里使用。除了定义语句之外，搜索还返回了另外两个结果。它们出现在代码库的第86和91行。这里是包含这些引用的代码块。

    if (currentChar === 37 /*'%'*/ && index < maxLength) {
      currentChar = s.charCodeAt(++index);
      hexHigh = unhexTable[currentChar];
      if (!(hexHigh >= 0)) {
        out[outIndex++] = 37; // '%'
      } else {
        nextChar = s.charCodeAt(++index);
        hexLow = unhexTable[nextChar];
        if (!(hexLow >= 0)) {
          out[outIndex++] = 37; // '%'
          out[outIndex++] = currentChar;
          currentChar = nextChar;
        } else {
          hasHex = true;
          currentChar = hexHigh * 16 + hexLow;
        }
      }
    }

所有这些都发生在 unescapeBuffer 函数中。快速搜索后，我发现 unescapeBuffer 函数由模块公开的 unescape 函数调用（请参见第113行）。所以这里是发生在 querystring 中的 unescape 动作。

好的！那么， unhexTable 的所有逻辑是什么？我开始通读 unescapeBuffer 函数来弄清楚它在做什么。我从第67行开始。

var out = Buffer.allocUnsafe(s.length);

所以函数首先初始化一个传递给函数的字符串长度的 Buffer。此时，我可以深入了解 Buffer 类中的 allocUnsafe 正在做什么，但是我将预留它为另一篇博客文章。之后，有几个语句会初始化为稍后将在函数中使用的不同变量。

  var index = 0;
  var outIndex = 0;
  var currentChar;
  var nextChar;
  var hexHigh;
  var hexLow;
  var maxLength = s.length - 2;
  // Flag to know if some hex chars have been decoded
  var hasHex = false;

下一块代码是一个 while 循环，遍历字符串中的每个字符。如果字符是 + ，并且函数设置为将 + 更改为空格，则会将转义的字符串中该字符的值设置为空格。

  while (index < s.length) {
    currentChar = s.charCodeAt(index);
    if (currentChar === 43 /*'+'*/ && decodeSpaces) {
      out[outIndex++] = 32; // ' '
      index++;
      continue;
    }

第二组 if 语句检查迭代器是否处于以 % 开始的字符序列，这表示接下来的字符将代表十六进制代码。然后程序获取字符代码。接着程序使用该字符代码作为查找 unhexTable 列表中的索引。如果查找返回的值为 -1 ，则该函数将输出字符串中的字符值设置为百分号。如果从 unhexTable 中的查找返回的值大于 -1 ，则函数会将分隔字符解析为十六进制字符代码。

    if (currentChar === 37 /*'%'*/ && index < maxLength) {
      currentChar = s.charCodeAt(++index);
      hexHigh = unhexTable[currentChar];
      if (!(hexHigh >= 0)) {
        out[outIndex++] = 37; // '%'
      } else {
        nextChar = s.charCodeAt(++index);
        hexLow = unhexTable[nextChar];
        if (!(hexLow >= 0)) {
          out[outIndex++] = 37; // '%'
          out[outIndex++] = currentChar;
          currentChar = nextChar;
        } else {
          hasHex = true;
          currentChar = hexHigh * 16 + hexLow;
        }
      }
    }
    out[outIndex++] = currentChar;
    index++;
  }

让我们再深入一点这段代码。所以，如果第一个字符是有效的十六进制代码，它将使用下一个字符的字符代码作为 unhexTable 的查找索引。这个值是在 hexLow 变量中。如果该变量等于 -1 ，则该值不会被解析为十六进制字符序列。如果不等于 -1 ，则该字符被解析为十六进制字符代码。该函数取十六进制代码的最高位（第二位）（ hexHigh ）的值，将其乘以16并将其加到十六进制代码的值的第一位中。

      } else {
        nextChar = s.charCodeAt(++index);
        hexLow = unhexTable[nextChar];
        if (!(hexLow >= 0)) {
          out[outIndex++] = 37; // '%'
          out[outIndex++] = currentChar;
          currentChar = nextChar;
        } else {
          hasHex = true;
          currentChar = hexHigh * 16 + hexLow;
        }
      }

函数的最后一行让我困惑了一会儿。

return hasHex ? out.slice(0, outIndex) : out;

如果我们在查询中检测到一个十六进制序列，则将输出字符串从 0 到 outIndex 的切片，否则保持原样。这使我感到困惑，因为我认为 outIndex 的值将等于程序结束时输出字符串的长度。我本可以花时间弄清楚这个假设是否属实，但说实话，现在已经快到午夜了，而且我已经没有精力在深夜做这种荒唐举动了。所以我在代码库上运行 git blame ，并试图找出哪些提交与这个特别的改动相关联。事实证明，这并没有太大的帮助。我期待着那里有一个孤立的提交，它描述了为什么那个特别的一行代码是这样的，但最近的变化是属于 escape 函数的一个更大的重构的一部分。我越看越确定这里不需要三元运算符，但我还没有找到一些可重现的证据。

我研究的下一个函数是 parse 函数。函数的第一部分进行一些基本的设置。函数默认分析查询字符串中的 1000 个键值对，但用户可以在 options 对象中传递 maxKeys 值以更改此值。该函数还使用我们前面介绍的 unescape 函数，除非用户在选项对象中提供了不同的东西。

function parse(qs, sep, eq, options) {
  const obj = Object.create(null);

  if (typeof qs !== 'string' || qs.length === 0) {
    return obj;
  }

  var sepCodes = (!sep ? defSepCodes : charCodes(sep + ''));
  var eqCodes = (!eq ? defEqCodes : charCodes(eq + ''));
  const sepLen = sepCodes.length;
  const eqLen = eqCodes.length;

  var pairs = 1000;
  if (options && typeof options.maxKeys === 'number') {
    // -1 is used in place of a value like Infinity for meaning
    // "unlimited pairs" because of additional checks V8 (at least as of v5.4)
    // has to do when using variables that contain values like Infinity. Since
    // `pairs` is always decremented and checked explicitly for 0, -1 works
    // effectively the same as Infinity, while providing a significant
    // performance boost.
    pairs = (options.maxKeys > 0 ? options.maxKeys : -1);
  }

  var decode = QueryString.unescape;
  if (options && typeof options.decodeURIComponent === 'function') {
    decode = options.decodeURIComponent;
  }
  const customDecode = (decode !== qsUnescape);

然后函数遍历查询字符串中的每个字符并获取该字符的字符代码。

  var lastPos = 0;
  var sepIdx = 0;
  var eqIdx = 0;
  var key = '';
  var value = '';
  var keyEncoded = customDecode;
  var valEncoded = customDecode;
  const plusChar = (customDecode ? '%20' : ' ');
  var encodeCheck = 0;
  for (var i = 0; i < qs.length; ++i) {
    const code = qs.charCodeAt(i);

函数然后检查被检查的字符是否对应于键值分隔符（例如查询字符串中的”＆“字符）并执行一些特殊的逻辑。它会检查“＆”后面是否有“key=value”段，并尝试从中提取相应的键和值对（第304-347行）。

[译] 深入 Node 模块：querystring - jsernews

正文

请到「今天看啥」查看全文