专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
宝玉xp  ·  从截图还原网页,Claude sonnet ... ·  昨天  
爱可可-爱生活  ·  [CL]《Do Large ... ·  昨天  
黄建同学  ·  Jim Fan ... ·  2 天前  
爱可可-爱生活  ·  【Genesis:一个为通用机器人和具身AI ... ·  4 天前  
51好读  ›  专栏  ›  机器学习研究会

【学习】深度学习项目开展流程

机器学习研究会  · 公众号  · AI  · 2017-03-06 19:04

正文



点击上方“机器学习研究会”可以订阅哦
摘要
 

转自:视觉机器人

当你接手了一个有关深度学习项目,教你如何来开展工作的一个流程,Deep Learning Project Workflow, 本文试图从2016年深度学习暑期学校的“Nuts and Bolts of Applying Deep Learning”演讲中总结Andrew Ng的推荐机器学习工作流程。


This document attempts to summarize Andrew Ng's recommended machine learning workflow from his "Nuts and Bolts of Applying Deep Learning" talk at Deep Learning Summer School 2016. Any errors or misinterpretations are my own.

Start Here

  1. Measure Human-level performance on your task (More)

  2. Do your training and test data come from the same distribution?

Measuring Human Level Performance

The real goal of measuring human-level performance is to estimate the Bayes Error Rate. Knowing your Bayes Error Rate helps you figure out if your model is underfitting or overfitting your training data. More specifically, it will let us measure 'Bias' (as Ng defines it), which we use later in the workflow.


If Your Training and Test Data Are From the Same Distribution

1. Shuffle and split your data into Train / Dev / Test Sets

Ng recommends a Train / Dev / Test split of approximately 70% / 15% / 15%.

2. Measure Your Training Error and Dev Set Error, and Calculate Bias and Variance

Calculate your bias and variance as:

  • Bias = (Training Set Error) - (Human Error)

  • Variance = (Dev Set Error) - (Training Set Error)

3. Do You Have High Bias? Fix This First.

An example of high bias:

Error TypeError Rate
Human Error1%
Training Set Error5%
Dev Set Error6%

Fix high bias before going on to the next step.

4. Do You Have High Variance? Fix High Variance.

An example of high variance:

Error TypeError Rate
Human Error1%
Training Set Error2%
Dev Set Error6%

Once you fix your high variance then you're done!

If Your Training and Test Data Are Not From the Same Distribution

1. Split Your Data

If your train and test data come from different distributions, make sure at least your dev and test sets are from the same distribution. You can do this by taking your test set and using half as dev and half as test.

Carve out a small portion of your training set (call this Train-Dev) and split your Test data into Dev and Test:

|---------------------------------|-----------------------|
|     Train (Distribution 1)      | Test (Distribution 2) |
|---------------------------------|-----------------------|
|  Train              | Train-Dev |  Dev      |    Test   |
|---------------------------------|-----------------------|

2. Measure Your Errors, and Calculate the Relevant Metrics

Calculate these metrics to help know where to focus your efforts:

Error TypeFormula
Bias(Training Error) - (Human Error)
Variance(Train-Dev Error) - (Training Error)
Train/Test Mismatch(Dev Error) - (Train-Dev Error)
Overfitting of Dev(Test Error) - (Dev Error)

3. Do you have High Bias? Fix Your High Bias.

An example of high bias:

Error TypeError Rate
Human Error1%
Training Set Error10%
Train-Dev Set Error10.1%
Dev Set Error10.2%

Fix high bias before going on to the next step.

4. Do You Have High Variance? Fix Your High Variance.

An example of high variance:

Error TypeError Rate
Human Error1%
Training Set Error2%
Train-Dev Set Error10.1%
Dev Set Error10.2%

Fix your high variance before going on to the next step.

4. Do You Have Train/Test Mismatch? Fix Your Train/Test Mismatch.

An example of train/test mismatch:

Error TypeError Rate
Human Error1%
Training Set Error2%
Train-Dev Set Error2.1%
Dev Set Error10%

Fix your train/test mismatch before going on to the next step.

5. Are you Overfitting Your Dev Set? Fix Your Overfitting

An example of overfitting your dev set:

Error TypeError Rate
Human Error1%
Training Set Error2%
Train-Dev Set Error2.1%
Dev Set Error2.2%
Test Error10%

Once you fix your dev set overfitting, you're done!

How to Fix High Bias

Ng suggests these ways for fixing a model with high bias:

  • Try a bigger model

  • Try training longer

  • Try a new model architecture (this can be hard)

How to Fix High Variance

Ng suggests these ways for fixing a model with high variance:

  • Get more data

    • This includes data synthesis and data augmentation

  • Try adding regularization

  • Try early stopping

  • Try new model architecture (this can be hard)

How to Fix Train/Test Mismatch

Ng suggests these ways for fixing a model with high train/test mismatch:

  • Try to get more data similar to your test data

  • Try data synthesis and data augmentation

  • Try new model architecture (this can be hard)

How to Fix Overfitting of Your Dev Set

Ng suggests only one way of fixing dev set overfitting:

  • Get more dev data

Presumably this would include data synthesis and data augmentation as well.


链接:

https://github.com/thomasj02/DeepLearningProjectWorkflow


原文链接:

http://weibo.com/5501429448/EywUvs339?ref=home&rid=13_0_8_2676189011275573092&type=comment#_rnd1488791032982

“完整内容”请点击【阅读原文】
↓↓↓