专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
宝玉xp  ·  #开源项目推荐# Ant Design ... ·  昨天  
新智元  ·  AI卷翻科研!DeepMind ... ·  昨天  
爱可可-爱生活  ·  //@爱可可-爱生活:AI编程正在推动软件开 ... ·  2 天前  
宝玉xp  ·  转:cloudflare worker ... ·  1 周前  
51好读  ›  专栏  ›  机器学习研究会

【推荐】特征降维算法:优势与不足

机器学习研究会  · 公众号  · AI  · 2017-06-01 22:36

正文



点击上方“机器学习研究会”可以订阅哦
摘要
 

转自:爱可可-爱生活

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we’ll cover methods for Dimensionality Reduction, further broken into Feature Selection and Feature Extraction. In general, these tasks are rarely performed in isolation. Instead, they’re often preprocessing steps to support other tasks.

If you missed Part 1, you can check it out here. It explains our methodology for categorization algorithms, and it covers the “Big 3” machine learning tasks:

  1. Regression

  2. Classification

  3. Clustering

In this part, we’ll cover:

  1. Feature Selection

  2. Feature Extraction

We will also cover other tasks, such as Density Estimation and Anomaly Detection, in dedicated guides in the future.

The Curse of Dimensionality

In machine learning, “dimensionality” simply refers to the number of features (i.e. input variables) in your dataset.

When the number of features is very large relative to the number of observations in your dataset, certain algorithms struggle to train effective models. This is called the “Curse of Dimensionality,” and it’s especially relevant for clustering algorithms that rely on distance calculations.

A Quora user has provided an excellent analogy for the Curse of Dimensionality, which we'll borrow here:

Let's say you have a straight line 100 yards long and you dropped a penny somewhere on it. It wouldn't be too hard to find. You walk along the line and it takes two minutes.

Now let's say you have a square 100 yards on each side and you dropped a penny somewhere on it. It would be pretty hard, like searching across two football fields stuck together. It could take days.

Now a cube 100 yards across. That's like searching a 30-story building the size of a football stadium. Ugh.

The difficulty of searching through the space gets a lot harder as you have more dimensions.

In this guide, we'll look at the 2 primary methods for reducing dimensionality: Feature Selection and Feature Extraction.



链接;

https://elitedatascience.com/dimensionality-reduction-algorithms


原文链接:

http://weibo.com/1402400261/F5OPo9EmX?from=page_1005051402400261_profile&wvr=6&mod=weibotime&type=comment

“完整内容”请点击【阅读原文】
↓↓↓