专栏名称: 大数据应用
数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构,你的数据科学求职咨询专家!
目录
相关文章推荐
艺恩数据  ·  艺恩数据祝您开工大吉! ·  1 周前  
艺恩数据  ·  春节档观众满意度亮眼 ... ·  1 周前  
数据派THU  ·  【HKUST博士论文】单视图图像的高质量3D生成 ·  3 天前  
数据派THU  ·  【ICLR2025】扩散图网络:使用扩散图网 ... ·  5 天前  
51好读  ›  专栏  ›  大数据应用

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 357

大数据应用  · 公众号  · 大数据  · 2018-05-17 08:53

正文

点击上方 蓝字 会变美




MAY

16

Data Application Lab 自2017年6月15日起,每天和你分享讨论一道数据科学(DS)和商业分析(BA) 领域常见的面试问题。

自2017年10月4日起,每天再为大家分享一道Leetcode 算法题。


希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day

257

DS Interview Question

What's difference between pca and kernel pca?

BA Interview Question

R Programing: imagine you have two columns in a dataframe, ‘Gender’ and ‘Loan_Status’, both are categorical variables. ‘Gender’ has ‘Male’,’Female’ and NA values, ‘Loan_Status’ with ‘Yes’ and ‘No’. Now, how do you use ‘dplyr’ package to calculate the percentage for male and female when Loan Status equals to ‘No’, remember to ignore NA values.


Sample Output:

LeetCode Question

Remove Duplicates from Sorted Array II


Description:

Follow up for "Romove Duplicates"

What if duplicates are allowed at most twice?


Input: [1,1,2,2,2,3,3,3]

Output: [1,1,2,2,3,3]


Day

256

答案揭晓

DS Interview Question & Answer

What are the advantages of random projection comparing to PCA?

Answer:

* With very high dimensions, if speed is an issue, then consider that on a matrix of size n×k, PCA takes O(k^2×n+k^3) time, whereas a random projection takes O(nkd), where you're projecting on a subspace of size d.

* With a sparse matrix its even faster.

* The data may well be low-dimensional, but not in a linear subspace. PCA assumes this.

* Random projection are also quite fast for reducing the dimension of a mixture of Gaussians.

* If the data is very large, you don't need to hold it in memory for a random projections, whereas for PCA you do.

* In general PCA works well on relatively low dimensional data. Of course, PCA maintains the best possible projection.

BA Interview Question & Answer

R Programming: How to check the frequency distribution of a categorical variable?


Imagine we have a table names gender:

Gender = factor(c(‘M’,’F’,’M’,’F’,’F’,’F’)

table(Gender)


Sample Table:


Gender

F  M

4  2








请到「今天看啥」查看全文