PyPy 简介

Python开发者 · 公众号 · Python · 2016-11-28 22:11

正文

（点击上方公众号，可快速关注）

来源：Uche Ogbuji

链接：www.ibm.com/developerworks/cn/opensource/os-pypy-intro/

概述

Python 编程语言于 1994 年问世，自新千年以来，这种语言获得了极大的成功。衡量一种语言成功与否的标准之一就是其实现的数量。最知名也是最常用的 Python 实现称为 CPython。此外还有其他一些成功的项目，例如 Jython（在 Java ™ 运行时中工作的 Python 语言）和 IronPython（在 .NET 平台上工作的 Python 语言）。所有这些项目都是开放源码的，而 Python 在开放源码软件世界中始终有着极大的影响力。

Python 实现的一个由来已久的目标就是支持纯语言设计，通过以自己的方式指定相关语言来 “引导” Python 的定义，而不是按照 C 和 Java 等其他语言的方式做出规定。PyPy 项目正是应此需求而出现的一种 Python 实现。PyPy 表示 “用 Python 实现的 Python”，但实际上它是使用一个称为 RPython 的 Python 子集实现的。更准确地来说，PyPy 自身就是一种运行时，您可以在其中插入任何语言。

PyPy 整洁的语言设计使之非常适合嵌入低级优化器，提供诸多优化优势。具体来说，PyPy 集成了一种即时 (JIT) 编译器。这与能够以革命性的方式改变 Java 性能的知名技术 HotSpot 属于同一种技术的不同形式，Sun Microsystems 于 21 世纪初期从 Animorphic 手中收购了 HotSpot，并整合到了自己的 Java 实现之中，使这种语言适用于大多数用途。Python 原本已经适用于多种用途，但性能是最常被人们抱怨的问题。PyPy 的跟踪 JIT 编译器已经展现了它革新 Python 程序性能的能力，尽管我认为这个项目仍然处于后续测试阶段，但它已经是 Python 程序员的一种重要工具，是开发人员工具箱的有用补充。

在这篇文章中，我将介绍 PyPy，而且假设读者并不具备丰富的 Python 背景知识。

入门

首先，请不要将 PyPy 与 PyPI 混淆。这是两个截然不同的项目。PyPI 即 Python Package Index，是获得第三方 Python 软件包以补充标准库的一个站点及系统。在您进入正确的 PyPy 站点之后（请参见参考资料部分），您会看到开发人员已经使大多数用户能够轻松开始尝试使用 PyPy。如果您在最新的硬件上使用 Linux®、Mac 或 Windows®（不含 Windows 64，目前尚不支持 Windows 64），那么就应该能够直接下载并执行一个二进制软件包。

PyPY 的最新版本是 1.8，它充分实现了 Python 2.7.2，也就是说能够兼容这个版本的 CPython 的语言特性和行为。然而，在许多基准使用当中，它的速度已经远远超过了 CPython 2.7.2，这是它引起我们注意的真正因素。下面的会话展示了我在 Ubuntu 11.04 机器上安装 PyPy 的过程。这段会话来自旧版本的 PyPy，但 PyPy 1.8 也会提供类似的结果。

$ cd Downloads /

$ wget https : //bitbucket.org/pypy/pypy/downloads/pypy-1.6-linux.tar.bz2

$ cd .. / . local

$ tar jxvf ~/ Downloads / pypy - 1.6 - linux . tar . bz2

$ ln - s ~/ . local / pypy - 1.6 / bin / pypy ~/ . local / bin /

现在，您需要更新 $PATH，以包含 ~/.local/bin/。安装 PyPy 之后，建议您同样安装 Distribute 和 Pip，以便简化额外软件包的安装。（尽管本文中未提及，但您也可能需要使用 Virtualenv，这是保持独立、整洁的 Python 环境的一种方法。）以下会话展示了 Distribute 和 Pip 的设置。

$ wget http : //python-distribute.org/distribute_setup.py

$ wget https : //raw.github.com/pypa/pip/master/contrib/get-pip.py

$ pypy distribute_setup . py

$ pypy get - pip . py

您应发现，库文件安装在 ~/.local/pypy-1.8/site-packages/ 目录之中，可执行文件位于 ~/.local/pypy-1.8/bin 目录之中，因此您可能希望将后者添加到 $PATH。此外，务必确保使用了之前安装的 pip，而不是系统级的 pip。随后，您就可以安装本文稍后要用到的第三方软件包。

$ pip install html5lib $ pip install pyparsing

清单 1 展示了调用 Python 的 “彩蛋” import this 之后，PyPy 解释器的输出结果。

清单 1. 示例 PyPy 输出

uche @ malatesta :~ $ pypy

Python 2.7.1 ( d8ac7d23d3ec , Aug 17 2011 , 11 : 51 : 18 )

[ PyPy 1.6.0 with GCC 4.4.3 ] on linux2

Type "help" , "copyright" , "credits" or "license" for more information .

And now for something completely different : `` __xxx__ and __rxxx__ vs operation

slots : particle quantum superposition kind of fun ''

>>>> import this

The Zen of Python , by Tim Peters

Beautiful is better than ugly .

Explicit is better than implicit .

Simple is better than complex .

Complex is better than complicated .

Flat is better than nested .

Sparse is better than dense .

Readability counts .

Special cases aren 't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you' re Dutch .

Now is better than never .

Although never is often better than *right* now .

If the implementation is hard to explain , it 's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let' s do more of those !

>>>>

链接统计

作为 PyPy 的一个简单实例，我将展示一个解析网页并打印出网页中表示的链接列表的程序。这正是网络蜘蛛 (spidering) 软件的基本理念，即出于某些目的跟踪页面间的链接网络。

在解析方面，我选择了 html5lib，这是一种纯 Python 解析库，设计用于实现定义了 HTML5 规范的 WHAT-WG 组织的解析算法。HTML5 针对向后兼容性而设计，即便可以兼容损坏的网页。因此 html5lib 同时也是一种出色的通用 HTML 解析工具包。这种工具在 CPython 和 PyPy 上执行了基准测试，在 PyPy 上的速度明显要更快。

清单 2 解析了一个特定的网页，逐行打印了该页面中的链接。您在命令行中指定目标页面 URL，例如： pypy listing1.py http://www.ibm.com/developerworks/opensource/。

清单 2. 列出一个页面中的链接

#!/usr/bin/env pypy

#Import the needed libraries for use

import sys

import urllib2

import html5lib

#List of tuples, each an element/attribute pair to check for links

link_attrs = [

( 'a' , 'href' ),

( 'link' , 'href' ),

]

#This function is a generator, a Python construct that can be used as a sequence.

def list_links ( url ) :

'''

Given a URL parse the HTML and yield a sequence of link strings

as they are found on the page.

'''

#Open the URL and get back a stream of the content

stream = urllib2 . urlopen ( url )

#Parse the HTML content according to html5lib conventions

tree_builder = html5lib . treebuilders . getTreeBuilder ( 'dom' )

parser = html5lib . html5parser . HTMLParser ( tree = tree_builder )

doc = parser . parse ( stream )

#In the outer loop, go over each element/attribute set

for elemname , attr in link_attrs :

#In the inner loop, go over the matches of the current element name

for elem in doc . getElementsByTagName ( elemname ) :

#If the corresponding attribute is found, yield it in sequence

attrvalue = elem . getAttribute ( attr )

if attrvalue :

yield attrvalue

return

#Read the URL to parse from the first command line argument

#Note: Python lists start at index 0, but as in UNIX convention the 0th

#Command line argument is the program name itself

input_url = sys . argv [ 1 ]

#Set up the generator by calling it with the URL argument, then iterate

#Over the yielded link strings, printing each

for link in list_links ( input_url ) :

print

PyPy 简介

正文

请到「今天看啥」查看全文