MongoDB复合索引详解

Fundebug · 公众号 · 前端 · 2018-03-15 09:10

正文

摘要： 对于MongoDB的多键查询，创建复合索引可以有效提高性能。

什么是复合索引？

复合索引，即 Compound Index ，指的是将多个键组合到一起创建索引，这样可以加速匹配多个键的查询。不妨通过一个简单的示例理解复合索引。

students集合如下：

db.students.find().pretty()
{
	"_id" : ObjectId("5aa7390ca5be7272a99b042a"),
	"name" : "zhang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042b"),
	"name" : "wang",
	"age" : "15"
}
{
	"_id" : ObjectId("5aa7393ba5be7272a99b042c"),
	"name" : "zhang",
	"age" : "14"
}

在name和age两个键分别创建了索引(_id自带索引)：

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1
		},
		"name" : "name_1",
		"ns" : "test.students"
	},
	{
		"v" : 1,
		"key" : {
			"age" : 1
		},
		"name" : "age_1",
		"ns"




    
 : "test.students"
	}
]

当进行多键查询时，可以通过explian()分析执行情况(结果仅保留winningPlan)：

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "filter":
    {
        "name":
        {
            "$eq": "zhang"
        }
    },
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "age": 1
        },
        "indexName": "age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }
}

由winningPlan可知，这个查询依次分为 IXSCAN 和 FETCH 两个阶段。 IXSCAN 即索引扫描，使用的是age索引； FETCH 即根据索引去查询文档，查询的时候需要使用name进行过滤。

为name和age创建复合索引：

db.students.createIndex({name:1,age:1})





    

db.students.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"name" : 1,
			"age" : 1
		},
		"name" : "name_1_age_1",
		"ns" : "test.students"
	}
]

有了复合索引之后，同一个查询的执行方式就不同了：

db.students.find({name:"zhang",age:"14"}).explain()
"winningPlan":
{
    "stage": "FETCH",
    "inputStage":
    {
        "stage": "IXSCAN",
        "keyPattern":
        {
            "name": 1,
            "age": 1
        },
        "indexName": "name_1_age_1",
        "isMultiKey": false,
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 1,
        "direction": "forward",
        "indexBounds":
        {
            "name": [
                "[\"zhang\", \"zhang\"]"
            ],
            "age": [
                "[\"14\", \"14\"]"
            ]
        }
    }




    

}

由winningPlan可知，这个查询的顺序没有变化，依次分为 IXSCAN 和 FETCH 两个阶段。但是， IXSCAN 使用的是name与age的复合索引； FETCH 即根据索引去查询文档，不需要过滤。

这个示例的数据量太小，并不能看出什么问题。但是实际上，当数据量很大，IXSCAN返回的索引比较多时，FETCH时进行过滤将非常耗时。接下来将介绍一个真实的案例。

定位MongoDB性能问题

随着接收的错误数据不断增加，我们Fundebug已经累计处理 3.5亿 错误事件，这给我们的服务不断带来性能方面的挑战，尤其对于MongoDB集群来说。

对于生产数据库，配置profile，可以记录MongoDB的性能数据。执行以下命令，则所有超过 1s 的数据库读写操作都会被记录下来。

db.setProfilingLevel(1,1000)

查询profile所记录的数据，会发现events集合的某个查询非常慢：

db.system.profile.find().pretty()
{
	"op" : "command",
	"ns" : "fundebug.events",
	"command" : {
		"count" : "events",
		"query" : {
			"createAt" : {
				"$lt" : ISODate("2018-02-05T20:30:00.073Z")
			},
			"projectId" : ObjectId("58211791ea2640000c7a3fe6")
		}
	},
	"keyUpdates" : 0,
	"writeConflicts" : 0,
	"numYield" : 1414,
	"locks" : {
		"Global" : {
			"acquireCount" : {
				"r" : NumberLong(2830)
			}
		},
		"Database" : {
			"acquireCount" : {
				"r" : NumberLong(1415)
			}
		},
		"Collection" : {
			"acquireCount" : {
				"r" : NumberLong(1415




    
)
			}
		}
	},
	"responseLength" : 62,
	"protocol" : "op_query",
	"millis" : 28521,
	"execStats" : {

	},
	"ts" : ISODate("2018-03-07T20:30:59.440Z"),
	"client" : "192.168.59.226",
	"allUsers" : [ ],
	"user" : ""
}

events集合中有数亿个文档，因此count操作比较慢也不算太意外。根据profile数据，这个查询耗时 28.5s ，时间长得有点离谱。另外， numYield 高达1414，这应该就是操作如此之慢的直接原因。根据MongoDB文档，numYield的含义是这样的：

The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation.

这就意味着大量时间消耗在读取硬盘上，且读了非常多次。可以推测，应该是索引的问题导致的。

不妨使用explian()来分析一下这个查询(仅保留executionStats)：

db.events.explain("executionStats").count({"projectId" : ObjectId("58211791ea2640000c7a3fe6"),createAt:{"$lt" : ISODate("2018-02-05T20:30:00.073Z")}})
"executionStats":
{
    "executionSuccess": true,
    "nReturned": 20853,
    "executionTimeMillis": 28055,
    "totalKeysExamined": 28338,
    "totalDocsExamined": 28338,
    "executionStages":
    {
        "stage": "FETCH",
        "filter":
        {
            "createAt":
            {
                "$lt": ISODate("2018-02-05T20:30:00.073Z")
            }
        },
        "nReturned": 20853,
        "executionTimeMillisEstimate": 27815,
        "works": 28339,
        "advanced": 20853,
        "needTime": 7485,
        "needYield": 0,
        "saveState": 1387,
        "restoreState": 1387,
        "isEOF": 1,
        "invalidates": 0,
        "docsExamined": 28338,
        "alreadyHasObj": 0,
        "inputStage":
        {
            "stage": "IXSCAN",
            "nReturned": 28338,
            "executionTimeMillisEstimate": 30,
            "works": 28339,
            "advanced": 28338,
            "needTime": 0,
            "needYield": 0,
            "saveState": 1387,
            "restoreState"

MongoDB复合索引详解

正文

什么是复合索引？

定位MongoDB性能问题

请到「今天看啥」查看全文