专栏名称: 大数据应用
数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构,你的数据科学求职咨询专家!
目录
相关文章推荐
数据派THU  ·  多智能体协作机制:大语言模型综述 ·  昨天  
大数据文摘  ·  对于那些出来卖的DeepSeek课程,我有些 ... ·  昨天  
CDA数据分析师  ·  用Deepseek处理复杂数据效果好吗?小白 ... ·  昨天  
CDA数据分析师  ·  2025 CDA数据分析师就业班课程更新通知 ·  3 天前  
艺恩数据  ·  春节档观众满意度亮眼 ... ·  1 周前  
51好读  ›  专栏  ›  大数据应用

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 307

大数据应用  · 公众号  · 大数据  · 2018-03-08 11:01

正文

自2017年6月15日起,数据应用学院与你一起温习数据科学(DS)和商业分析(BA)领域常见的面试问题。 从2017年10月4号起,每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day 207

DS Interview Questions

Why is naive Bayes so ‘naive’ ?

BA Interview Questions

R language:

Using the following variable:

x=cbind(c(1,2,3,4,9,7,4,3),c(3,1,2,5,3,6,5,3))

x

type a for() loop that calculate y=3 8 18 44 126 140 100 84, such that:

y[1]=x[1,1]*x[1,2]

y[2]=x[2,1]*sum(x[1:2,2])

y[3]=x[3,1]*sum(x[1:3,2])

.

.

.

y[8]=x[8,1]*sum(x[1:8,2])

LeetCode Questions

Description:

  • Given numRows, generate the first numRows of Pascal’s triangle.

    • Input: 5

    • Output:

欲知答案如何?请见下期分解!

Day 206 答案揭晓

DS Interview Questions

You are given a data set on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?

If you have worked on enough data sets, you should deduce that cancer detection results in imbalanced data. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier. If the minority class performance is found to to be poor, we can undertake the following steps:

  1. We can use undersampling, oversampling or SMOTE to make the data balanced.

  2. We can alter the prediction threshold value by doing probability calibration and finding a optimal threshold using AUC-ROC curve.

  3. We can assign weight to classes such that the minority classes gets larger weight.

  4. We can also use anomaly detection

BA Interview Questions

R language:

Using the following variable:

x=as.Date("10/11/2017","%d/%m/%Y")

# type a repeat () loop that increment x until x is equal to 31/12/2017.


repeat{

x=x+1

print(x)

if(x=="2017-12-31") break;

}


Leetcode Questions

  • Description:

    • Given two binary trees, write a function to check if they are equal or not.

    • Two binary trees are considered equal if they are structurally identical and the nodes have the same value.

  • Input: 两颗树相同

  • Output: true







请到「今天看啥」查看全文