专栏名称: 考研英语时事阅读
据统计,考研英语文章90%来自国外的《The Economist》,《Times》,《Science》等杂志。本地持续更新。。。
51好读  ›  专栏  ›  考研英语时事阅读


考研英语时事阅读  · 公众号  · 考研  · 2017-03-14 06:05







Molecular biology


How to determine a protein’s shape


Only a third of known protein structures are human


Feb 11th 2017

ABOUT 120,000 types of protein molecule have yielded up their structures to science. That sounds a lot, but it isn’t. The techniques, such as X-ray crystallography and nuclear-magnetic resonance (NMR), which are used to elucidate such structures do not work on all proteins. Some types are hard to produce or purify in the volumes required. Others do not seem to crystallise at all—a prerequisite for probing them with X-rays. As a consequence, those structures that have been determined include representatives of less than a third of the 16,000 known protein families. Researchers can build reasonable computer models for around another third, because the structures of these resemble ones already known. For the remainder, however, there is nothing to go on.


  • X-ray crystallography  [晶体]X射线晶体学

  • nuclear-magnetic resonance (NMR)  [核]核磁共振

  • elucidate  v.阐明

  • prerequisite  n.先决条件

In addition to this lack of information about protein families, there is a lack of information about those from the species of most interest to researchers: Homo sapiens. Only a quarter of known protein structures are human. A majority of the rest come from bacteria. This paucity is a problem, for in proteins form and function are intimately related. A protein is a chain of smaller molecules, called amino acids, that is often hundreds or thousands of links long. By a process not well understood, this chain folds up, after it has been made, into a specific and complex three-dimensional shape. That shape determines what the protein does: acting as a channel, say, to admit a chemical into a cell; or as an enzyme to accelerate a chemical reaction; or as a receptor, to receive chemical signals and pass them on to a cell’s molecular machinery. 


  • Homo sapiens. 智人

  • paucity  n.缺乏

  • amino acid  [生化]氨基酸

  • three-dimensional  adj.三维的

Almost all drugs work by binding to a particular protein in a particular place, thereby altering or disabling that protein’s function. Designing new drugs is easier if binding sites can be identified in advance. But that means knowing the protein’s structure. To be able to predict this from the order of the amino acids in the chain would thus be of enormous value. That is a hard task, but it is starting to be cracked.


Chain gang


One of the leading researchers in the field of protein folding is David Baker of the University of Washington, in Seattle. For the past 20 years he and his colleagues have used increasingly sophisticated versions of a program they call Rosetta to generate various possible shapes for a given protein, and then work out which is most stable and thus most likely to be the real one. In 2015 they predicted the structures of representative members of 58 of the missing protein families. Last month they followed that up by predicting 614 more.

西雅图华盛顿大学的David Baker是蛋白质折叠领域研究者的领军人物之一。在过去的20年中,他的团队应利用越来越复杂的“Rosetta”程序,模拟出给定蛋白质的各种可能结构,然后研究对比哪一种结构最稳定并最接近于真正的蛋白质结构。2015年他们预测了58个未知蛋白质家族成员的代表结构。就在1月份他们已预测出614多种蛋白质结构。

Even a small protein can fold up into tens of thousands of shapes that are more or less stable. According to Dr Baker, a chain a mere 70 amino acids long—a tiddler in biological terms—has to be folded virtually inside a computer about 100,000 times in order to cover all the possibilities and thus find the optimum. Since it takes a standard microprocessor ten minutes to do the computations needed for a single one of these virtual foldings, even for a protein this small, the project has, for more than a decade, relied on cadging processing power from thousands of privately owned PCs. Volunteers download a version of Dr Baker’s program, called rosetta@home, that runs in the background when a computer is otherwise idle.


  • microprocessor   n.[计]微处理器

  • computation  n.计算

This “citizen science” has helped a lot. But the real breakthrough, which led to those 672 novel structures, is a shortcut known as protein-contact prediction. This relies on the observation that chain-folding patterns seen in nature bring certain pairs of amino acids close together predictably enough for the fact to be used in the virtual-folding process.


An amino acid has four arms, each connected to a central carbon atom. Two arms are the amine group and the acid group that give the molecule its name. Protein chains form because amine groups and acid groups like to react together and link up. The third is a single hydrogen atom. But the fourth can be any combination of atoms able to bond with the central carbon atom. It is this fourth arm, called the side chain, which gives each type of amino acid its individual characteristics.


One common protein-contact prediction is that, if the side chain of one member of a pair of amino acids brought close together by folding is long, then that of the other member will be short, and vice versa. (读者试译)  In other words, the sum of the two lengths is constant. If you have but a single protein sequence available, knowing this is not much use. Recent developments in genomics, however, mean that the DNA sequences of lots of different species are now available. Since DNA encodes the amino-acid sequences of an organism’s proteins, the composition of those species’ proteins is now known, too. That means slightly different versions, from related species, of what is essentially the same protein can be compared. The latest version of Rosetta does so, looking for co-variation (eg, in this case, two places along the length of the proteins’ chains where a shortening of an amino acid’s side chain in one is always accompanied by a lengthening of it in the other). In this way, it can identify parts of the folded structure that are close together.

(期待您的翻译,明天会有针对这句话的长难句解析哟~) 换句话说,两个氨基酸侧链的总长度是恒定的。只知道一个蛋白质中氨基酸分子的排列顺序没有太大用处。不过基因组学最近的进展表示,现在许多不同物种的DNA分子的顺序是可以获取的。因为DNA分子编码生物体蛋白质中氨基酸分子的顺序,那么这些物种的蛋白质的组成也就可以获悉。这意味着,功能相同,但属于近缘物种的,在组成上稍有不同的蛋白质,可以进行比较。最新版本的Rosetta所做的就是寻找蛋白质的相关变异。(比如:在这个例子中,沿着蛋白质链长度方向的两个地方,如果一个氨基酸的侧链变短了,另一个氨基酸的侧链就会变长)。用这种方法可以辨别紧密接触的折叠氨基酸的结构。

  • Genomics 基因组

  • amino-acid 氨基酸

  • Organism    生物体

  • Rosetta基于粗糙集理论框架的表格逻辑数据工具

Though it is still early days, the method seems to work. None of the 614 structures Dr Baker modelled most recently has yet been elucidated by crystallography or NMR, but six of the previous 58 have. In each case the prediction closely matched reality. Moreover, when used to “hindcast” the shapes of 81 proteins with known structures, the protein-contact-prediction version of Rosetta got them all right.

虽然现在是初期阶段,不过这个方法还是有用的。 Baker博士近期所建立的614种蛋白质模型中,没有被晶体学或者磁共振所证实的,但是之前的58个模型中有6个被证实。在每一个模型中,预测的蛋白质结构与实际蛋白质分子的结构相差无几。此外,应用最新版本的Rosetta对已知结构的81个蛋白质进行“追算”,结果表明,蛋白质接触预测的蛋白质结构都是正确。

  • elucidated 阐明 说明

  • Crystallography  晶体学

  • Hindcast          追算,后报

  • Protein-contact pretection 蛋白质接触预测

There is a limitation, though. Of the genomes well-enough known to use for this trick, 88,000 belong to bacteria, the most speciose type of life on Earth. Only 4,000 belong to eukaryotes—the branch of life, made of complex cells, which includes plants, fungi and animals. There are, then, not yet enough relatives of human beings in the mix to look for the co-variation Dr Baker’s method relies on.

然而它是有局限性的。已熟知的,并且适用这种方法的基因组中,有88000种属于地球上最多的物种-细菌。仅仅有4000中属于真核生物,生命的另一种形式。它是由复杂的细胞组成,有动物、植物、真菌。然而,在这个大家族中,没有足够多的与人类具有亲缘关系的物种,所以无法研究相关变异,而这是 Baker博士的方法所需要的条件。

  • Bacteria 细菌

  • Eukaryotes 真核生物

  • Fungi 真菌

Others think they have an answer to that problem. They are trying to extend protein-contact prediction to look for relationships between more than two amino acids in a chain. This would reduce the number of related proteins needed to draw structural inferences and might thus bring human proteins within range of the technique. But to do so, you need a different computational approach. Those attempting it are testing out the branch of artificial intelligence known as deep learning.


Linking the links


Deep learning employs pieces of software called artificial neural networks to fossick out otherwise-abstruse patterns. It is the basis of image- and speech-recognition programs, and also of the game-playing programs that have recently beaten human champions at Go and poker.


  • image- and speech-recognition programs 图像与语音识别程序

  • fossick out  发现,寻觅,发掘

  • Abstruse   深奥的

Jianlin Cheng, of the University of Missouri, in Columbia, who was one of the first to apply deep learning in this way, says such programs should be able to spot correlations between three, four or more amino acids, and thus need fewer related proteins to predict structures. Jinbo Xu, of the Toyota Technological Institute in Chicago, claims to have achieved this already. He and his colleagues published their method in PLOS Computational Biology, in January, and it is now being tested.


If the deep-learning approach to protein folding lives up to its promise, the number of known protein structures should multiply rapidly. More importantly, so should the number that belong to human proteins. That will be of immediate value to drugmakers. It will also help biologists understand better the fundamental workings of cells—and thus what, at a molecular level, it truly means to be alive.


翻译 ▍hua外音,Sopia.yao

审核 ▍云图

编辑 ▍璃儿

Try to translate 

One common protein-contact prediction is that, if the side chain of one member of a pair of amino acids brought close together by folding is long, then that of the other member will be short, and vice versa. 

Put Chinese below




