Python 内存数据库/引擎

Python开发者 · 公众号 · Python · 2016-12-20 21:43

正文

（点击上方公众号，可快速关注）

来源：伯乐在线专栏作者 - Prodesire

点击 → 了解如何加入专栏作者

如需转载，发送「转载」二字查看说明

1 初探

在平时的开发工作中，我们可能会有这样的需求：我们希望有一个内存数据库或者数据引擎，用比较Pythonic的方式进行数据库的操作（比如说插入和查询）。

举个具体的例子，分别向数据库db中插入两条数据，”a=1, b=1″ 和 “a=1, b=2”, 然后想查询a=1的数据可能会使用这样的语句db.query(a=1)，结果就是返回前面插入的两条数据；如果想查询a=1, b=2的数据，就使用这样的语句db.query(a=1, b=2)，结果就返回前面的第二条数据。

那么是否拥有实现上述需求的现成的第三方库呢？几经查找，发现PyDbLite能够满足这样的需求。其实，PyDbLite和Python自带的SQLite均支持内存数据库模式，只是前者是Pythonic的用法，而后者则是典型的SQL用法。

他们具体的用法是这样的：

PyDbLite

import pydblite

# 使用内存数据库

pydb = pydblite . Base ( ':memory:' )

# 创建a,b,c三个字段

pydb . create ( 'a' , 'b' , 'c' )

# 为字段a,b创建索引

pydb . create_index ( 'a' , 'b' )

# 插入一条数据

pydb . insert ( a =- 1 , b = 0 , c = 1 )

# 查询符合特定要求的数据

results = pydb ( a =- 1 , b = 0 )

SQLite

import sqlite3

# 使用内存数据库

con = sqlite3 . connect ( ':memory:' )

# 创建a,b,c三个字段

cur = con . cursor ()

cur . execute ( 'create table test (a char(256), b char(256), c char(256));' )

# 为字段a,b创建索引

cur . execute ( 'create index a_index on test(a)' )

cur . execute ( 'create index b_index on test(b)' )

# 插入一条数据

cur . execute ( 'insert into test values(?, ?, ?)' , ( - 1 , 0 , 1 ))

# 查询符合特定要求的数据

cur . execute ( 'select * from test where a=? and b=?' ,( - 1 , 0 ))

2 pydblite和sqlite的性能

毫无疑问，pydblite的使用方式非常地Pythonic，但是它的效率如何呢？由于我们主要关心的是数据插入和查询速度，所以不妨仅对这两项做一个对比。写一个简单的测试脚本：

import time

count = 100000

def timeit ( func ) :

def wrapper ( * args , ** kws ) :

t = time . time ()

func ( * args )

print time . time () - t , kws [ 'des' ]

return wrapper

@ timeit

def test_insert ( mdb , des = '' ) :

for i in xrange ( count ) :

mdb . insert ( a = i - 1 , b = i , c = i + 1 )

@ timeit

def test_query_object ( mdb , des = '' ) :

for i in xrange ( count ) :

c = mdb ( a = i - 1 , b = i )

@ timeit

def test_sqlite_insert ( cur , des = '' ) :

for i in xrange ( count ) :

cur . execute ( 'insert into test values(?, ?, ?)' , ( i - 1 , i , i + 1 ))

@ timeit

def test_sqlite_query ( cur , des = '' ) :

for i in xrange ( count ) :

cur . execute ( 'select * from test where a=? and b=?' , ( i - 1 , i ))

print '-------pydblite--------'

import pydblite

pydb = pydblite . Base ( ':memory:' )

pydb . create ( 'a' , 'b' , 'c' )

pydb . create_index ( 'a' , 'b' )

test_insert ( pydb , des = 'insert' )

test_query_object ( pydb , des = 'query, object call' )

print '-------sqlite3--------'

import sqlite3

con = sqlite3 . connect ( ':memory:' )

cur = con . cursor ()

cur . execute ( 'create table test (a char(256), b char(256), c char(256));' )

cur . execute ( 'create index a_index on test(a)' )

cur . execute ( 'create index b_index on test(b)' )

test_sqlite_insert ( cur , des = 'insert' )

test_sqlite_query ( cur , des = 'query' )

在创建索引的情况下，10w次的插入和查询的时间如下：

------- pydblite --------

1.14199995995 insert

0.308000087738 query , object call

------- sqlite3 --------

0.411999940872 insert

0.30999994278 query

在未创建索引的情况（把创建索引的测试语句注释掉）下，1w次的插入和查询时间如下：

------- pydblite --------

0.0989999771118 insert

5.15300011635 query , object call

------- sqlite3 --------

0.0169999599457 insert

7.43400001526 query

我们不难得出如下结论：

sqlite的插入速度是pydblite的3-5倍；而在建立索引的情况下，sqlite的查询速度和pydblite相当；在未建立索引的情况下，sqlite的查询速度比pydblite慢1.5倍左右。

3 优化

我们的目标非常明确，使用Pythonic的内存数据库，提高插入和查询效率，而不考虑持久化。那么能否既拥有pydblite的pythonic的使用方式，又同时具备pydblite和sqlite中插入和查询速度快的那一方的速度？针对我们的目标，看看能否对pydblite做一些优化。

阅读pydblite的源码，首先映入眼帘的是对python2和3做了一个简单的区分。给外部调用的Base基于_BasePy2或者_BasePy3，它们仅仅是在__iter__上有细微差异，最终调用的是_Base这个类。

class _BasePy2 ( _Base ) :

def __iter__ ( self ) :

"""Iteration on the records"""

return iter ( self . records . itervalues ())

class _BasePy3 ( _Base ) :

def __iter__ ( self ) :

"""Iteration on the records"""

return iter ( self . records . values ())

if sys . version_info [ 0 ] == 2 :

Base = _BasePy2

else :

Base = _BasePy3

然后看下_Base的构造函数，做了简单的初始化文件的操作，由于我们就是使用内存数据库，所以文件相关的内容完全可以抛弃。

class _Base ( object ) :

def __init__ ( self , path , protocol = pickle . HIGHEST_PROTOCOL , save_to_file = True ,

sqlite_compat = False ) :

"""protocol as defined in pickle / pickle.

Defaults to the highest protocol available.

For maximum compatibility use protocol = 0

"""

self . path = path

"""The path of the database in the file system"""

self . name = os . path . splitext ( os . path . basename ( path ))[ 0 ]

"""The basename of the path, stripped of its extension"""

self . protocol = protocol

self . mode = None

if path == ":memory:" :

save_to_file = False

self . save_to_file = save_to_file

self . sqlite_compat = sqlite_compat

self . fields = []

"""The list of the fields (does not include the internal

fields __id__ and __version__)"""

# if base exists, get field names

if save_to_file and self . exists () :

if protocol == 0 :

_in = open ( self . path ) # don't specify binary mode !

else :

Python 内存数据库/引擎

正文

请到「今天看啥」查看全文