专栏名称: 大数据应用

数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构，你的数据科学求职咨询专家！

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 304

大数据应用 · 公众号 · 大数据 · 2018-03-02 10:00

正文

自2017年6月15日起，数据应用学院与你一起温习数据科学（DS）和商业分析（BA）领域常见的面试问题。从2017年10月4号起，每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考，我们将会在第二天给出答案。

Day 204

DS Interview Questions

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

BA Interview Questions

R language:

Using the following variable:

i=10

x=10

# type a while() or repeat () loop that decreasing i computes x=x/i until i=0.

LeetCode Questions

Description:

Given a 2D board and a word, find if the word exists in the grid.
The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

Input: board = [ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E'] ] word = "ABCCED"
Output: true

欲知答案如何？请见下期分解！

Day 203 答案揭晓

DS Interview Questions

You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your machine has memory constraints. What would you do?

Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that. Following are the methods you can use to tackle such situation:

Since we have lower RAM, we should close all other applications in our machine, including the web browser, so that most of the memory can be put to use.
We can randomly sample the data set. This means, we can create a smaller data set, let’s say, having 1000 variables and 300000 rows and do the computations.
To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we’ll use correlation. For categorical variables, we’ll use chi-square test.
Also, we can use PCA and pick the components which can explain the maximum variance in the data set.
Using online learning algorithms like Vowpal Wabbit (available in Python) is a possible option.
Building a linear model using Stochastic Gradient Descent is also helpful.
We can also apply our business understanding to estimate which all predictors can impact the response variable. But, this is an intuitive approach, failing to identify useful predictors might result in significant loss of information.

BA Interview Questions

R language:

Using the following variable:

a=1:10

# type a while () loop that computes a vector x=1 3 6 10 15 21 28 36 45 55

# such that:

# x[1]=a[1]

# x[2]=a[1]+a[2]

# x[3]=a[1]+a[2]+a[3]

# .

a=1:10

i=0

while(i<11){

if(i==1){x[i]=a[i]}

else {x[i]=a[i]+x[i-1]}

i=i+1

}

Leetcode Questions

Description:

Given a set of distinct integers, nums, return all possible subsets.

Input: [1,2,3]
Output: [[],[1],[1,2],[1,2,3],[1,3],[2],[2,3],[3]]
Assumptions:

The solution set must not contain duplicate subsets.

每日一练 | Data Scientist &amp; Business Analyst &amp; Leetcode 面试题 304

正文

请到「今天看啥」查看全文

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 304