几百行代码完成百度搜索引擎,真的可以吗?

Java极客技术 · 公众号 · · 2020-10-20 07:30

正文

每天早上七点三十，准时推送干货

Hello 大家好，我是鸭血粉丝，大家都叫我阿粉，搜索引擎想必大家一定不会默认，我们项目中经常使用的 ElasticSearch 就是一种搜索引擎，在我们的日志系统中必不可少，ELK 作为一个整体，基本上是运维标配了，另外目前的搜索引擎底层都是基于 Lucene 来实现的。

阿粉最近遇到一个需求，因为数据量没有达到需要使用 ElasticSearch 的级别，也不想单独部署一套集群，所以准备自己基于 Lucene 实现一个简易的搜索服务。下面我们一起来看一下吧。

背景

**Lucene **是一套用于全文检索和搜索的开放源码程序库，由 Apache 软件基金会支持和提供。Lucene 提供了一个简单却强大的应用程序接口，能够做全文索引和搜索。Lucene 是现在最受欢迎的免费 Java 信息检索程序库。

上面的解释是来自维基百科，我们只需要知道 Lucene 可以进行全文索引和搜索就行了，这里的索引是动词，意思是我们可以将文档或者文章或者文件等数据进行索引记录下来，索引过后，我们查询起来就会很快。

索引这个词有的时候是动词，表示我们要索引数据，有的时候是名词，我们需要根据上下文场景来判断。新华字典前面的字母表或者书籍前面的目录本质上都是索引。

接入

引入依赖

首先我们创建一个 SpringBoot 项目，然后在 pom 文件中加入如下内容，我这里使用的 lucene 版本是 7.2.1，

<properties>
    <lucene.version>7.2.1lucene.version>
properties>


<dependency>
 <groupId>org.apache.lucenegroupId>
 <artifactId>lucene-coreartifactId>
 <version>${lucene.version}version>
dependency>

<dependency>
 <groupId>org.apache.lucenegroupId>
 <artifactId>lucene-queryparserartifactId>
 <version>${lucene.version}version>
dependency>

<dependency>
 <groupId>org.apache.lucenegroupId>
 <artifactId>lucene-analyzers-commonartifactId>
 <version>${lucene.version}version>
dependency>

索引数据

在使用 Lucene 之前我们需要先索引一些文件，然后再通过关键词查询出来，下面我们来模拟整个过程。为了方便我们这里模拟一些数据，正常的数据应该是从数据库或者文件中加载的，我们的思路是这样的：

生成多条实体数据；
将实体数据映射成 Lucene 的文档形式；
索引文档；
根据关键词查询文档；

第一步我们先创建一个实体如下：

import lombok.Data;

@Data
public class ArticleModel {
    private String title;
    private String author;
    private String content;
}

我们再写一个工具类，用来索引数据，代码如下：

import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class LuceneIndexUtil {

    private static String INDEX_PATH = "/opt/lucene/demo";
    private static IndexWriter writer;

    public static LuceneIndexUtil getInstance() {
        return SingletonHolder.luceneUtil;
    }

    private static class SingletonHolder {
        public final static LuceneIndexUtil luceneUtil = new LuceneIndexUtil();
    }

    private LuceneIndexUtil() {
        this.initLuceneUtil();
    }

    private void initLuceneUtil() {
        try {
            Directory dir = FSDirectory.open(Paths.get(INDEX_PATH));
            Analyzer analyzer = new StandardAnalyzer();
            IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
            writer = new IndexWriter(dir, iwc);
        } catch (IOException e) {
            log.error("create luceneUtil error");
            if (null != writer) {
                try {
                    writer.close();
                } catch (IOException ioException) {
                    ioException.printStackTrace();
                } finally {
                    writer = null;
                }
            }
        }
    }

    /**
     * 索引单个文档
     *
     * @param doc 文档信息
     * @throws IOException IO 异常
     */
    public void addDoc(Document doc) throws IOException {
        if (null != doc) {
            writer.addDocument(doc);
            writer.commit();
            writer.close();
        }
    }

    /**
     * 索引单个实体
     *
     * @param model 单个实体
     * @throws IOException IO 异常
     */
    public void addModelDoc(Object model) throws IOException {
        Document document = new Document();
        List fields = luceneField(model.getClass());
        fields.forEach(document::add);
        writer.addDocument(document);
        writer.commit();
        writer.close();
    }

    /**
     * 索引实体列表
     *
     * @param objects 实例列表
     * @throws IOException IO 异常
     */
    public void addModelDocs(List> objects) throws IOException {
        if (CollectionUtils.isNotEmpty(objects)) {
            List docs = new ArrayList<>();
            objects.forEach(o -> {
                Document document = new Document();
                List fields = luceneField(o);
                fields.forEach(document::add);
                docs.add(document);
            });
            writer.addDocuments(docs);
        }
    }

    /**
     * 清除所有文档
     *
     * @throws IOException IO 异常
     */
    public void delAllDocs() throws IOException {
        writer.deleteAll();
    }

    /**
     * 索引文档列表
     *
     * @param docs 文档列表
     * @throws IOException IO 异常
     */
    public void addDocs(List docs) throws IOException {
        if (CollectionUtils.isNotEmpty(docs)) {
            long startTime = System.currentTimeMillis();
            writer.addDocuments(docs);
            writer.commit();
            log.info("共索引{}个 Document，共耗时{} 毫秒", docs.size(), (System.currentTimeMillis() - startTime));
        } else {
            log.warn("索引列表为空");
        }
    }

    /**
     * 根据实体 class 对象获取字段类型，进行 lucene Field 字段映射
     *
     * @param modelObj 实体 modelObj 对象
     * @return 字段映射列表
     */
    public List luceneField(Object modelObj) {
        Map classFields = ReflectionUtils.getClassFields(modelObj.getClass());
        Map classFieldsValues = ReflectionUtils.getClassFieldsValues(modelObj);

        List fields = new ArrayList<>();
        for (String key : classFields.keySet()) {
            Field field;
            String dataType = StringUtils.substringAfterLast(classFields.get(key).toString(), ".");
            switch (dataType) {
                case "Integer":
                    field = new IntPoint(key, (Integer) classFieldsValues.get(key));
                    break;
                case "Long":
                    field = new LongPoint(key, (Long) classFieldsValues.get(key));
                    break;
                case "Float":
                    field = new FloatPoint(key, (Float) classFieldsValues.get(key));
                    break;
                case "Double":
                    field = new DoublePoint(key, (Double) classFieldsValues.get(key));
                    break;
                case "String":
                    String string = (String) classFieldsValues.get(key);
                    if (StringUtils.isNotBlank(string)) {
                        if (string.length() <= 1024) {
                            field = new StringField(key, (String) classFieldsValues.get(key), Field.Store.YES);
                        } else

几百行代码完成百度搜索引擎,真的可以吗?

正文

背景

接入

引入依赖

索引数据

请到「今天看啥」查看全文