专栏名称: 考研英语时事阅读

据统计，考研英语文章90%来自国外的《The Economist》，《Times》,《Science》等杂志。本地持续更新。。。

【经济学人】你真的了解蛋白质吗？丨2017.02.11丨总第814期

考研英语时事阅读 · 公众号 · 考研 · 2017-03-14 06:05

正文

特别提醒

因个别读者投诉，“考研英语时事阅读”暂停留言功能至2017.03.24.期间本公众号内容将在“考研英语时事阅读”和“英语微风尚”同步推送，留言截图打卡请移步“英语微风尚”

英语微风尚

导读

蛋白质结构是指蛋白质分子的空间结构。作为一类重要的生物大分子，蛋白质主要由碳、氢、氧、氮等化学元素组成。所有蛋白质都是由20种不同氨基酸连接形成的多聚体，在形成蛋白质后，这些氨基酸又被称为残基。蛋白质和多肽之间的界限并不是很清晰，有人基于发挥功能性作用的结构域所需的残基数认为，若残基数少于40，就称之为多肽或肽。要发挥生物学功能，蛋白质需要正确折叠为一个特定构型，主要是通过大量的非共价相互作用（如氢键，离子键，范德华力和疏水作用）来实现；此外，在一些蛋白质（特别是分泌性蛋白质）折叠中，二硫键也起到关键作用。为了从分子水平上了解蛋白质的作用机制，常常需要测定蛋白质的三维结构。由研究蛋白质结构而发展起来了结构生物学，采用了包括X射线晶体学、核磁共振等技术来解析蛋白质结构。

Molecular biology

分子生物学

How to determine a protein’s shape

如何确定蛋白质结构

Only a third of known protein structures are human

已知结构的蛋白质中，仅三分之一是人类的

Feb 11th 2017

ABOUT 120,000 types of protein molecule have yielded up their structures to science. That sounds a lot, but it isn’t. The techniques, such as X-ray crystallography and nuclear-magnetic resonance (NMR), which are used to elucidate such structures do not work on all proteins. Some types are hard to produce or purify in the volumes required. Others do not seem to crystallise at all—a prerequisite for probing them with X-rays. As a consequence, those structures that have been determined include representatives of less than a third of the 16,000 known protein families. Researchers can build reasonable computer models for around another third, because the structures of these resemble ones already known. For the remainder, however, there is nothing to go on.

科学家已经了解了大约有120000种蛋白质分子的结构。这看起来有很多，实则不然。像X-射线晶体学，核磁共振（NMR）等一些用于分析蛋白质结构的技术，并非在所有蛋白质上都适用。有些种类的蛋白质很难生产的或难以提纯出所需要的量，其他类型的则无法结晶化——而这是利用X-ray检测结构的先决条件。这就导致，在16000个已知蛋白质家族的代表中，只有不超过三分之一的蛋白质的结构是已知的。基于已知晓的蛋白质的结构，研究人员可以针对另外1/3的蛋白质建立合理的计算机模型，然而对剩余的部分却束手无策。

X-ray crystallography [晶体]X射线晶体学
nuclear-magnetic resonance (NMR) [核]核磁共振
elucidate v.阐明
prerequisite n.先决条件

In addition to this lack of information about protein families, there is a lack of information about those from the species of most interest to researchers: Homo sapiens. Only a quarter of known protein structures are human. A majority of the rest come from bacteria. This paucity is a problem, for in proteins form and function are intimately related. A protein is a chain of smaller molecules, called amino acids, that is often hundreds or thousands of links long. By a process not well understood, this chain folds up, after it has been made, into a specific and complex three-dimensional shape. That shape determines what the protein does: acting as a channel, say, to admit a chemical into a cell; or as an enzyme to accelerate a chemical reaction; or as a receptor, to receive chemical signals and pass them on to a cell’s molecular machinery.

除蛋白质家族的信息缺乏外，研究者对最让他们感兴趣的物种“智人”体内的蛋白质的了解也不多。已知结构的蛋白质中，有1/4属于人体内的蛋白质。其余大部分来自于细菌。因蛋白质的形态和功能密切相关，这一信息的缺失是个问题。一个蛋白质分子是由较小分子所组成的分子链，称之为氨基酸，蛋白质通常有成百上千条相链接。通过一个尚未明确的加工过程，分子链生成会后折叠成特定复杂的三维结构。蛋白质的结构决定了他们的工作角色，充当通道，让化学物质进入细胞，或者作为一种加速化学反应速度的酶；又或是作为受体接收化学信号并将其传递到细胞的分子机制内。

Homo sapiens. 智人
paucity n.缺乏
amino acid [生化]氨基酸
three-dimensional adj.三维的

Almost all drugs work by binding to a particular protein in a particular place, thereby altering or disabling that protein’s function. Designing new drugs is easier if binding sites can be identified in advance. But that means knowing the protein’s structure. To be able to predict this from the order of the amino acids in the chain would thus be of enormous value. That is a hard task, but it is starting to be cracked.

几乎所有的药物都通过与特定位置的特定蛋白质相结合而发挥作用，从而改变或限制蛋白质的功能。如果可以预先得知靶位点，研发新药物就较容易了。但这意味着要知道蛋白质的结构。利用氨基酸在分子链中的顺序，预测出蛋白质的结构是有着极为重大意义的。这是一项艰巨的任务，但现在已经有了一定进展。

Chain gang

链帮

One of the leading researchers in the field of protein folding is David Baker of the University of Washington, in Seattle. For the past 20 years he and his colleagues have used increasingly sophisticated versions of a program they call Rosetta to generate various possible shapes for a given protein, and then work out which is most stable and thus most likely to be the real one. In 2015 they predicted the structures of representative members of 58 of the missing protein families. Last month they followed that up by predicting 614 more.

西雅图华盛顿大学的David Baker是蛋白质折叠领域研究者的领军人物之一。在过去的20年中，他的团队应利用越来越复杂的“Rosetta”程序，模拟出给定蛋白质的各种可能结构，然后研究对比哪一种结构最稳定并最接近于真正的蛋白质结构。2015年他们预测了58个未知蛋白质家族成员的代表结构。就在1月份他们已预测出614多种蛋白质结构。

Even a small protein can fold up into tens of thousands of shapes that are more or less stable. According to Dr Baker, a chain a mere 70 amino acids long—a tiddler in biological terms—has to be folded virtually inside a computer about 100,000 times in order to cover all the possibilities and thus find the optimum. Since it takes a standard microprocessor ten minutes to do the computations needed for a single one of these virtual foldings, even for a protein this small, the project has, for more than a decade, relied on cadging processing power from thousands of privately owned PCs. Volunteers download a version of Dr Baker’s program, called rosetta@home, that runs in the background when a computer is otherwise idle.

仅仅一个小的蛋白质也能折叠出数以千计的较为稳定的结构。Baker博士表示，一条仅70个氨基酸长度的链，在生物界如同一个小孩儿，而想要找到所有的可能性并确定最佳结构需要在电脑中虚拟折叠近10万次。由于完成仅一个如此小的蛋白质的虚拟折叠便需标准微处理器计算耗时10分钟，故这个项目在十多年来一直借助于上千台个人电脑的处理能力。志愿者可以下载rosetta@home，这是Baker博士的程序，当电脑闲置时可让其在后台运行。

microprocessor n.[计]微处理器
computation n.计算

This “citizen science” has helped a lot. But the real breakthrough, which led to those 672 novel structures, is a shortcut known as protein-contact prediction. This relies on the observation that chain-folding patterns seen in nature bring certain pairs of amino acids close together predictably enough for the fact to be used in the virtual-folding process.

“全民科研”的作用很大。但真正的突破性进展是一个被称为蛋白质-接触预测的捷径，使得科学家发现了672个新的蛋白质结构。这得益于在链-折叠模式的自然状态下，观察到了特定的氨基酸对会被拉近在一起的现象，其可预测性足以用于虚拟折叠进程。

An amino acid has four arms, each connected to a central carbon atom. Two arms are the amine group and the acid group that give the molecule its name. Protein chains form because amine groups and acid groups like to react together and link up. The third is a single hydrogen atom. But the fourth can be any combination of atoms able to bond with the central carbon atom. It is this fourth arm, called the side chain, which gives each type of amino acid its individual characteristics.

一个氨基酸有4条支链，每支连接一个中心碳原子。其中有2条支链是用于分子命名的胺基和酸根。蛋白质会互相连接是因为胺基遇到酸根，易反应并连接在一起。第3条支链只有一个氢原子。任何能和中心碳原子相连的任意原子组合，都能与第四条支链相连接。这第4条支链叫做侧链，赋予每种类型的氨基酸各自的性质。

One common protein-contact prediction is that, if the side chain of one member of a pair of amino acids brought close together by folding is long, then that of the other member will be short, and vice versa. （读者试译） In other words, the sum of the two lengths is constant. If you have but a single protein sequence available, knowing this is not much use. Recent developments in genomics, however, mean that the DNA sequences of lots of different species are now available. Since DNA encodes the amino-acid sequences of an organism’s proteins, the composition of those species’ proteins is now known, too. That means slightly different versions, from related species, of what is essentially the same protein can be compared. The latest version of Rosetta does so, looking for co-variation (eg, in this case, two places along the length of the proteins’ chains where a shortening of an amino acid’s side chain in one is always accompanied by a lengthening of it in the other). In this way, it can identify parts of the folded structure that are close together.

（期待您的翻译，明天会有针对这句话的长难句解析哟~） 换句话说，两个氨基酸侧链的总长度是恒定的。只知道一个蛋白质中氨基酸分子的排列顺序没有太大用处。不过基因组学最近的进展表示，现在许多不同物种的DNA分子的顺序是可以获取的。因为DNA分子编码生物体蛋白质中氨基酸分子的顺序，那么这些物种的蛋白质的组成也就可以获悉。这意味着，功能相同，但属于近缘物种的，在组成上稍有不同的蛋白质，可以进行比较。最新版本的Rosetta所做的就是寻找蛋白质的相关变异。（比如:在这个例子中，沿着蛋白质链长度方向的两个地方，如果一个氨基酸的侧链变短了，另一个氨基酸的侧链就会变长）。用这种方法可以辨别紧密接触的折叠氨基酸的结构。

Genomics 基因组
amino-acid 氨基酸
Organism 生物体
Rosetta基于粗糙集理论框架的表格逻辑数据工具

Though it is still early days, the method seems to work. None of the 614 structures Dr Baker modelled most recently has yet been elucidated by crystallography or NMR, but six of the previous 58 have. In each case the prediction closely matched reality. Moreover, when used to “hindcast” the shapes of 81 proteins with known structures, the protein-contact-prediction version of Rosetta got them all right.

虽然现在是初期阶段，不过这个方法还是有用的。 Baker博士近期所建立的614种蛋白质模型中，没有被晶体学或者磁共振所证实的，但是之前的58个模型中有6个被证实。在每一个模型中，预测的蛋白质结构与实际蛋白质分子的结构相差无几。此外，应用最新版本的Rosetta对已知结构的81个蛋白质进行“追算”，结果表明，蛋白质接触预测的蛋白质结构都是正确。

elucidated 阐明说明
Crystallography 晶体学
Hindcast 追算，后报
Protein-contact pretection 蛋白质接触预测

There is a limitation, though. Of the genomes well-enough known to use for this trick, 88,000 belong to bacteria, the most speciose type of life on Earth. Only 4,000 belong to eukaryotes—the branch of life, made of complex cells, which includes plants, fungi and animals. There are, then, not yet enough relatives of human beings in the mix to look for the co-variation Dr Baker’s method relies on.

然而它是有局限性的。已熟知的，并且适用这种方法的基因组中，有88000种属于地球上最多的物种-细菌。仅仅有4000中属于真核生物，生命的另一种形式。它是由复杂的细胞组成，有动物、植物、真菌。然而，在这个大家族中，没有足够多的与人类具有亲缘关系的物种，所以无法研究相关变异，而这是 Baker博士的方法所需要的条件。

Bacteria 细菌
Eukaryotes 真核生物
Fungi 真菌

Others think they have an answer to that problem. They are trying to extend protein-contact prediction to look for relationships between more than two amino acids in a chain. This would reduce the number of related proteins needed to draw structural inferences and might thus bring human proteins within range of the technique. But to do so, you need a different computational approach. Those attempting it are testing out the branch of artificial intelligence known as deep learning.

对于这个问题，其他人认为他们有解决方法。他们尝试扩展蛋白质接触预测的范围，在一条链中寻找不止2个氨基酸的相互关系。这将会减少结构上不同的相关蛋白质的数目，并可能因此将人类蛋白质引入技术范围内。但是如果这么做的话，就需要一个不同的计算方法。想要尝试的人正在对以深度学习为人熟知的人工智能的分支技术进行检测。

Linking the links

链接

Deep learning employs pieces of software called artificial neural networks to fossick out otherwise-abstruse patterns. It is the basis of image- and speech-recognition programs, and also of the game-playing programs that have recently beaten human champions at Go and poker.

深度学习采用一些称为人工神经网络的软件来搜寻其他深奥的模式。它是图像和语音识别程序的基础，也是最近在围棋和纸牌游戏中打败人类冠军的游戏程序的基础。

image- and speech-recognition programs 图像与语音识别程序
fossick out 发现，寻觅，发掘
Abstruse 深奥的

Jianlin Cheng, of the University of Missouri, in Columbia, who was one of the first to apply deep learning in this way, says such programs should be able to spot correlations between three, four or more amino acids, and thus need fewer related proteins to predict structures. Jinbo Xu, of the Toyota Technological Institute in Chicago, claims to have achieved this already. He and his colleagues published their method in PLOS Computational Biology, in January, and it is now being tested.

哥伦比亚的密苏里州的大学的程建林最先把深度学习应用到这个方面。他说，这个程序能够找到三个、四个或者更多氨基酸之间的相互关性。并且需要更少的相关的蛋白质分子来预测其结构。芝加哥丰田技术研究所的徐金波声称现在已经达到这种技术水平。他和他同事在一月份将这一方法发表在《PLOS计算生物学》上，现处于测试阶段。

If the deep-learning approach to protein folding lives up to its promise, the number of known protein structures should multiply rapidly. More importantly, so should the number that belong to human proteins. That will be of immediate value to drugmakers. It will also help biologists understand better the fundamental workings of cells—and thus what, at a molecular level, it truly means to be alive.

对于蛋白质分子折叠，如果深度学习的方法达到了预期的效果，那么已知蛋白质结构的数目应该会迅速增加。更为重要的是，对人类蛋白质结构的了解也会增加。对于制药公司来说将会有即时的好处。这也将会帮助生物学家更好的理解细胞的基本功能。如此一来，意味着分子水平的研究真正开始了。

翻译 ▍hua外音,Sopia.yao

审核 ▍云图

编辑 ▍璃儿

Try to translate

Put Chinese below

音频和英文原文来自《经济学人》

原文请订阅《经济学人》官方正版。

注：以上所有图片均来源于网络

“

英语微风尚

考研扫我