Introduction
Before we start, have a look at the below examples.
You open Google and search for a news article on the ongoing Champions trophy and get hundreds of search results in return about it.
Nate silver analysed millions of tweets and correctly predicted the results of 49 out of 50 states in 2008 U.S Presidential Elections.
You type a sentence in google translate in English and get an Equivalent Chinese conversion.
So what do the above examples have in common?
You possible guessed it right – TEXT processing. All the above three scenarios deal with humongous amount of text to perform different range of tasks like clustering in the google search example, classification in the second and Machine Translation in the third.
Humans can deal with text format quite intuitively but provided we have millions of documents being generated in a single day, we cannot have humans performing the above the three tasks. It is neither scalable nor effective
So, how do we make computers of today perform clustering, classification etc on a text data since we know that they are generally inefficient at handling and processing strings or texts for any fruitful outputs?
Sure, a computer can match two strings and tell you whether they are same or not. But how do we make computers tell you about football or Ronaldo when you search for Messi? How do you make a computer understand that “Apple” in “Apple is a tasty fruit” is a fruit that can be eaten and not a company?
The answer to the above questions lie in creating a representation for words that capture their meanings, semantic relationships and the different types of contexts they are used in.
And all of these are implemented by using Word Embeddings or numerical representations of texts so that computers may handle them.
Below, we will see formally what are Word Embeddings and their different types and how we can actually implement them to perform the tasks like returning efficient Google search results.
Table of Contents
What are Word Embeddings?
Different types of Word Embeddings
2.1 Frequency based Embedding
2.1.1 Count Vectors
2.1.2 TF-IDF
2.1.3 Co-Occurrence Matrix
2.2 Prediction based Embedding
2.2.1 CBOW
2.2.2 Skip-Gram
Word Embeddings use case scenarios(what all can be done using word embeddings? eg: similarity, odd one out etc.)
Using pre-trained Word Vectors
Training your own Word Vectors
End Notes
链接:
https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
原文链接:
http://m.weibo.cn/1402400261/4115731496708803