Introduction
Imagine you get a dataset with hundreds
of features (variables) and have little understanding about the domain
the data belongs to. You are expected to identify hidden patterns in the
data, explore and analyze the dataset. And not just that, you have to
find out if there is a pattern in the data – is it signal or is it just
noise?
Does that thought make you uncomfortable?
It made my hands sweat when I came across this situation for the first
time. Do you wonder how to explore a multidimensional dataset? It is one
of the frequently asked question by many data scientists. In this
article, I will take you through a very powerful way to exactly do this.
What about PCA?
By now, some of you would be
screaming “I’ll use PCA for dimensionality reduction and visualization”.
Well, you are right! PCA is definitely a good choice for dimensionality
reduction and visualization for datasets with a large number of
features. But, what if you could use something more advanced than PCA? (If you don’t know PCA, I would strongly recommend to read this article first)
What if you could easily search for a pattern in non-linear style? In this article, I will tell you about a new algorithm called t-SNE (2008), which is much more effective than PCA (1933). I
will take you through the basics of t-SNE algorithm first and then will
walk you through why t-SNE is a good fit for dimensionality reduction
algorithms.
You will also, get hands-on knowledge for using t-SNE in both R and Python.
Read on!
Table of Content
What is t-SNE?
What is dimensionality reduction?
How does t-SNE fit in the dimensionality reduction algorithm space
Algorithmic details of t-SNE
What does t-SNE actually do?
Use cases
t-SNE compared to other dimensionality reduction algorithm
Example Implementations
Hyper parameter tuning
Code
Implementation Time
Hyper parameter tuning
Code
Implementation Time
Interpreting Results
Where and when to use
Common fallacies
1. What is t-SNE?
(t-SNE) t-Distributed Stochastic Neighbor Embedding is
a non-linear dimensionality reduction algorithm used for exploring
high-dimensional data. It maps multi-dimensional data to two or more
dimensions suitable for human observation. With help of the t-SNE
algorithms, you may have to plot fewer exploratory data analysis plots
next time you work with high dimensional data.
链接:
https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/
原文链接:
http://weibo.com/1402400261/Es6dIkb9i?type=comment#_rnd1485171495261