每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 287

大数据应用 · 公众号 · 大数据 · 2018-02-02 10:00

正文

自2017年6月15日起，数据应用学院与你一起温习数据科学（DS）和商业分析（BA）领域常见的面试问题。从2017年10月4号起，每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考，我们将会在第二天给出答案。

Day 187

DS Interview Questions

How can you ensure that you don’t analyse something that ends up producing meaningless results?

BA Interview Questions

How to check the frequency distribution of a categorical variable in R?

LeetCode Questions

Description:

Given a string, find the length of the longest substring without repeating characters.

Input: “abcabcbb”

Output: 3

欲知答案如何？请见下期分解！

Day 186 答案揭晓

DS Interview Questions

How do data management procedures like missing data handling make selection bias worse?

Missing value treatment is one of the primary tasks which a data scientist is supposed to do before starting data analysis. There are multiple methods for missing value treatment. If not done properly, it could potentially result into selection bias. Let see few missing value treatment examples and their impact on selection-

Complete Case Treatment : Complete case treatment is when you remove entire row in data even if one value is missing. You could achieve a selection bias if your values are not missing at random and they have some pattern. Assume you are conducting a survey and few people didn’t specify their gender. Would you remove all those people? Can’t it tell a different story?

Available case analysis : Let say you are trying to calculate correlation matrix for data so you might remove the missing values from variables which are needed for that particular correlation coefficient. In this case your values will not be fully correct as they are coming from population sets.

Mean Substitution : In this method missing values are replaced with mean of other available values.This might make your distribution biased e.g., standard deviation, correlation and regression are mostly dependent on the mean value of variables.

Hence, various data management procedures might include selection bias in your data if not chosen correctly.

BA Interview Questions

How will you merge two dataframes in R programming language?

Merge () function is used to combine two dataframes and it identifies common rows or columns between the 2 dataframes. Merge () function basically finds the intersection between two different sets of data.

Merge () function in R language takes a long list of arguments as follows –

Syntax for using Merge function in R language -

merge (x, y, by.x, by.y, all.x or all.y or all )

X represents the first dataframe.
Y represents the second dataframe.
by.X- Variable name in dataframe X that is common in Y.
by.Y- Variable name in dataframe Y that is common in X.
all.x - It is a logical value that specifies the type of merge. all.X should be set to true, if we want all the observations from dataframe X . This results in Left Join.
all.y - It is a logical value that specifies the type of merge. all.y should be set to true , if we want all the observations from dataframe Y . This results in Right Join.
all – The default value for this is set to FALSE which means that only matching rows are returned resulting in Inner join. This should be set to true if you want all the observations from dataframe X and Y resulting in Outer join.

Leetcode Questions

Description:

You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order and each of their nodes contain a single digit. Add the two numbers and return it as a linked list.
You may assume the two numbers do not contain any leading zero, except the number 0 itself.

每日一练 | Data Scientist &amp; Business Analyst &amp; Leetcode 面试题 287

正文

请到「今天看啥」查看全文

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 287