每日一练 | Data Scientist & Business Analyst 面试题 152

大数据应用 · 公众号 · 大数据 · 2017-08-06 08:23

正文

从6月15日起，数据应用学院将与你一起温习数据科学（DS）和商业分析（BA）领域常见的面试问题。希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考，我们将会在第二天给出答案。

Day 52

DS Interview Questions

How does a tree decide where to split?

BA Interview Questions

Using the following variables:
x=c(2,4)
# type a while () loop that adds even numbers to x,
# while the length of x is less than 12.
# For example, in the first iteration you get x = 2,4,6, and the third x =2,4,6,8.

欲知答案如何？请见下期分解！

Day 51 答案揭晓

DS Interview Questions

What are the primary differences & similarity between classification and regression trees.

Regression trees are used when dependent variable is continuous. Classification trees are used when dependent variable is categorical.
In case of regression tree, the value obtained by terminal nodes in the training data is the mean response of observation falling in that region. Thus, if an unseen data observation falls in that region, we’ll make its prediction with mean value.
In case of classification tree, the value (class) obtained by terminal node in the training data is the mode of observations falling in that region. Thus, if an unseen data observation falls in that region, we’ll make its prediction with mode value.
Both the trees divide the predictor space (independent variables) into distinct and non-overlapping regions. For the sake of simplicity, you can think of these regions as high dimensional boxes or boxes.
Both the trees follow a top-down greedy approach known as recursive binary splitting. We call it as ‘top-down’ because it begins from the top of tree when all the observations are available in a single region and successively splits the predictor space into two new branches down the tree. It is known as ‘greedy’ because, the algorithm cares about only the current split, and not about future splits which will lead to a better tree.
This splitting process is continued until a user defined stopping criteria is reached.
In both the cases, the splitting process results in fully grown trees until the stopping criteria is reached. But, the fully grown tree is likely to overfit data, leading to poor accuracy on unseen data.

BA Interview Questions

R language:

Using the following variables:

x=1

y=40

i=c(1:10)

## write a for() loop that increments x by three and decrease y by two, for each i.

for(j in i){
x=x+3
y=y-2
c
print(c)
}

数据应用学院

数据应用学院（Data Application Lab）, 北美第一家培训－项目实习－职业辅导－内推一站式专业数据人才输送机构，提供大数据和数据科学培训和公司项目解决方案，由南加州与硅谷的高级数据科学家与数据工程师联合创办，致力于传播数据行业最新应用和知识、培训及输送优秀大数据人才，以填补人才缺口、充分发挥大数据在商业中的力量。2016年被北美著名科技杂志Tech Beacon评为Top Data Camp。

长期招募

TECHNICAL WRITER/翻译志愿者

职责：

深度讨论数据应用
调研行业发展

要求：

对数据应用极为感兴趣
具备数据分析基础
具有一定BUSINESS INSIGHT
写作能力强

感兴趣的同学发送简历及writing sample到[email protected]，邮件标题“申请翻译/Technical Writer”。

往期文章内容

点击“阅读原文”查看数据应用学院核心课程