专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【学习】FAQ聊天机器人构建指南：文本内容预测任务实例

机器学习研究会 · 公众号 · AI · 2017-03-04 20:26

正文

点击上方“机器学习研究会”可以订阅哦

摘要

转自：ArnetMiner

In our previous tutorial on customer support bots, we trained a bot using the Custom Collection API to direct customers to the team member who is best suited to assist them with their problem or query. The bot improved our team’s response times as we no longer had to rely on a human facilitator (who also plays many other roles in our company #startuplife) to do the job. However, we’re generally only able to respond during our office hours of 11am-7pm EST, so there’s still lag for inquiries outside of that period. How can we improve this? Build a bot to answer frequently asked questions, reducing lag time for more customers and ensuring our engineers don’t need to spend more time than necessary away from the products we’re building for you :).

The Task

We’ll conduct a nearest neighbour search in Python, comparing a user input question to a list of FAQs. To do this, we’ll use indico’s Text Features API to find all the feature vectors for the text data, and calculate the distance between these vectors to those of the user’s input question in 300-dimensional space. Then we’ll return the appropriate answer based to the FAQ that the user’s question is most similar to (if it meets a certain confidence threshold).

Getting Started

First, get the skeleton code from our SuperCell GitHub repo.

You’ll need to install all necessary packages if you don’t have them — texttable and, of course, indicoio.

If you haven’t already set up your indico account, follow our Quickstart Guide. It will walk you through the process of getting your API key and installing the indicoio Python library. If you run into any problems, check the Installation section of the docs. If all else fails, you can also reach out to us through that little chat bubble. Assuming your account is all set up and you’ve installed everything, let’s get started!

Go to the top of your file and import indicoio. Don’t forget to set your API key. There are a number of ways you can do it; I like to put mine in a configuration file.

import indicoio
indicoio.config.api_key = 'YOUR_API_KEY'

Using indico’s Text Features API

You’ll need to store your FAQs and their respective answers in a dictionary. For simplicity’s sake, I’ve created a dictionary, faqs, of five questions and answers in the script itself. This will be our starting dataset. We only need to find the text features for the questions and not the answers, so we extract faqs.keys() and then feed that data into our make_feats() function.

def make_feats(data):
    """
    Send our text data through the indico API and return each text example's text vector representation
    """
    chunks = [data[x:x+100] for x in xrange(0, len(data), 100)]
    feats = []

    # just a progress bar to show us how much we have left
    for chunk in tqdm(chunks):
        feats.extend(indicoio.text_features(chunk))

    return feats

Next, let’s update the run() function. Save out feats to a Pickle file so you don’t have to keep re-running the Text Features API on the static list of FAQs every time you want to compare a user’s question to it.

def run():
    data = faqs.keys()
    print "FAQ data received. Finding features."feats = make_feats(data)
    with open('faq_feats.pkl', 'wb') as f:
        pickle.dump(feats, f)
    print "FAQ features found!"

链接：

https://indico.io/blog/faqs-bot-text-features-api/

原文链接：

http://weibo.com/1870858943/Ey6L42E8r?type=comment#_rnd1488630027390

“完整内容”请点击【阅读原文】

↓↓↓