专栏名称: 大数据应用
数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构,你的数据科学求职咨询专家!
目录
相关文章推荐
艺恩数据  ·  2024微博娱乐白皮书 ·  2 天前  
CDA数据分析师  ·  Deepseek来袭,数据分析师会失业吗? ·  6 天前  
CDA数据分析师  ·  【教程】30000字长文,手把手教你用Pyt ... ·  4 天前  
玉树芝兰  ·  新学期,给你自己配一个好用的 AI ... ·  2 天前  
51好读  ›  专栏  ›  大数据应用

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 228

大数据应用  · 公众号  · 大数据  · 2017-11-05 09:34

正文

从6月15日起,数据应用学院将与你一起温习数据科学(DS)和商业分析(BA)领域常见的面试问题。 从10月4号起,每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day 128

DS Interview Questions

Why is naive Bayes so ‘naive’ ?

BA Interview Questions

Mention what does not ‘R’ language do?

Leetcode Questions

Intersection of Two Linked Lists

Description:

  • Write a program to find the node at which the intersection of two singly linked lists begins.

Input:

A:          a1 → a2

c1 → c2 → c3

B:     b1 → b2 → b3

Output: c1

Assumptions:

  • The array may contain duplicates.

欲知答案如何?请见下期分解!

Day 127 答案揭晓

DS Interview Questions

You are given a data set on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?

If you have worked on enough data sets, you should deduce that cancer detection results in imbalanced data. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier. If the minority class performance is found to to be poor, we can undertake the following steps:

  1. We can use undersampling, oversampling or SMOTE to make the data balanced.

  2. We can alter the prediction threshold value by doing probability calibration and finding a optimal threshold using AUC-ROC curve.

  3. We can assign weight to classes such that the minority classes gets larger weight.

  4. We can also use anomaly detection.

BA Interview Questions

Which R packages do you use the most and which ones are your favorites?

I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.

Leetcode Questions

Find Minimum in Rotated Sorted Array II

Description:

  • Follow up for “Find Minimum in Rotated Sorted Array”:

  • What if duplicates are allowed?

  • Would this affect the run-time complexity? How and why?

  • Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand. (i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2).

  • Find the minimum element.

Input: [4 5 6 7 0 0 1 2]

Output: 4







请到「今天看啥」查看全文