专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
宝玉xp  ·  o1 ... ·  3 天前  
爱可可-爱生活  ·  积分重置的最后一刻靠着o1把Poe干爆了,一 ... ·  4 天前  
爱可可-爱生活  ·  通俗版解读 查看图片-20250112080051 ·  4 天前  
爱可可-爱生活  ·  【[95星]Kokoro-FastAPI:基 ... ·  5 天前  
宝玉xp  ·  转发微博-20250110065741 ·  1 周前  
51好读  ›  专栏  ›  机器学习研究会

【学习】词向量直觉理解

机器学习研究会  · 公众号  · AI  · 2017-06-07 21:36

正文



点击上方“机器学习研究会”可以订阅哦
摘要
 

转自: 爱可可-爱生活

Introduction

Before we start, have a look at the below examples.

  1. You open Google and search for a news article on the ongoing Champions trophy and get hundreds of search results in return about it.

  2. Nate silver analysed millions of tweets and correctly predicted the results of 49 out of 50 states in 2008 U.S Presidential Elections.

  3. You type a sentence in google translate in English and get an Equivalent Chinese conversion.

So what do the above examples have in common?

You possible guessed it right – TEXT processing. All the above three scenarios deal with humongous amount of text to perform different range of tasks like clustering in the google search example, classification in the second and Machine Translation in the third.

Humans can deal with text format quite intuitively but provided we have millions of documents being generated in a single day, we cannot have humans performing the above the three tasks. It is neither scalable nor effective


So, how do we make computers of today perform clustering, classification etc on a text data since we know that they are generally inefficient at handling and processing strings or texts for any fruitful outputs?

Sure, a computer can match two strings and tell you whether they are same or not. But how do we make computers tell you about football or Ronaldo when you search for Messi? How do you make a computer understand that “Apple” in “Apple is a tasty fruit” is a fruit that can be eaten and not a company?

The answer to the above questions lie in creating a representation for words that capture their meaningssemantic relationships and the different types of contexts they are used in.

And all of these are implemented by using Word Embeddings or numerical representations of texts so that computers may handle them.

Below, we will see formally what are Word Embeddings and their different types and how we can actually implement them to perform the tasks like returning efficient Google search results.

 

Table of Contents

  1. What are Word Embeddings?

  2. Different types of Word Embeddings
    2.1  Frequency based Embedding
    2.1.1 Count Vectors
    2.1.2 TF-IDF
    2.1.3 Co-Occurrence Matrix
    2.2  Prediction based Embedding
    2.2.1 CBOW
    2.2.2 Skip-Gram

  3. Word Embeddings use case scenarios(what all can be done using word embeddings? eg: similarity, odd one out etc.)

  4. Using pre-trained Word Vectors

  5. Training your own Word Vectors

  6. End Notes


链接:

https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/


原文链接:

http://m.weibo.cn/1402400261/4115731496708803
“完整内容”请点击【阅读原文】
↓↓↓