专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
量子位  ·  Claude化身服务器联通一切!AI写好代码 ... ·  3 天前  
爱可可-爱生活  ·  【Comfyui_Flux_Style_Ct ... ·  5 天前  
爱可可-爱生活  ·  【tex-fmt:一个用Rust编写的超高性 ... ·  5 天前  
爱可可-爱生活  ·  【PySpur:一款专注于推理计算流水线的集 ... ·  6 天前  
51好读  ›  专栏  ›  机器学习研究会

【推荐】(R/Python)t-SNE聚类算法实践指南

机器学习研究会  · 公众号  · AI  · 2017-01-23 20:11

正文


点击上方“机器学习研究会”可以订阅哦
摘要
 

转自:爱可可-爱生活

Introduction

Imagine you get a dataset with hundreds of features (variables) and have little understanding about the domain the data belongs to. You are expected to identify hidden patterns in the data, explore and analyze the dataset. And not just that, you have to find out if there is a pattern in the data – is it signal or is it just noise?

Does that thought make you uncomfortable? It made my hands sweat when I came across this situation for the first time. Do you wonder how to explore a multidimensional dataset? It is one of the frequently asked question by many data scientists. In this article, I will take you through a very powerful way to exactly do this.

What about PCA?

By now, some of you would be screaming “I’ll use PCA for dimensionality reduction and visualization”. Well, you are right! PCA is definitely a good choice for dimensionality reduction and visualization for datasets with a large number of features. But, what if you could use something more advanced than PCA? (If you don’t know PCA, I would strongly recommend to read this article first)

What if you could easily search for a pattern in non-linear style? In this article, I will tell you about a new algorithm called t-SNE (2008), which is much more effective than PCA (1933). I will take you through the basics of t-SNE algorithm first and then will walk you through why t-SNE is a good fit for dimensionality reduction algorithms.

You will also, get hands-on knowledge for using t-SNE in both R and Python.

Read on!

 

Table of Content

  1. What is t-SNE?

  2. What is dimensionality reduction?

  3. How does t-SNE fit in the dimensionality reduction algorithm space

  4. Algorithmic details of t-SNE

  • Algorithm

  • Time and Space Complexity

  • What does t-SNE actually do?

  • Use cases 

  •  t-SNE compared to other dimensionality reduction algorithm

  • Example Implementations

    • Hyper parameter tuning

    • Code

    • Implementation Time  

    • Hyper parameter tuning

    • Code

    • Implementation Time

    • Interpreting Results

    • In R

    • In Python

  • Where and when to use

    • Data Scientist

    • Machine Learning Competition Enthusiast

    • Student

  • Common fallacies

  •  

    1. What is t-SNE?

    (t-SNE) t-Distributed Stochastic Neighbor Embedding is a non-linear dimensionality reduction algorithm used for exploring high-dimensional data. It maps multi-dimensional data to two or more dimensions suitable for human observation. With help of the t-SNE algorithms, you may have to plot fewer exploratory data analysis plots next time you work with high dimensional data.


    链接:

    https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/


    原文链接:

    http://weibo.com/1402400261/Es6dIkb9i?type=comment#_rnd1485171495261

    “完整内容”请点击【阅读原文】
    ↓↓↓