每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 283

大数据应用 · 公众号 · 大数据 · 2018-01-27 10:00

正文

自2017年6月15日起，数据应用学院与你一起温习数据科学（DS）和商业分析（BA）领域常见的面试问题。从2017年10月4号起，每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考，我们将会在第二天给出答案。

Day 183

DS Interview Questions

Is there any limitation of R2?

BA Interview Questions

Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?

LeetCode Questions

Description:

Follow up for "Romove Duplicates"
What if duplicates are allowed at most twice?

Input: [1,1,2,2,2,3,3,3]

Output: [1,1,2,2,3,3]

欲知答案如何？请见下期分解！

Day 182 答案揭晓

DS Interview Questions

Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?

In a skewed distribution with a long tail, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically .

Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution. The least frequently occurring 80% of items are more important as a proportion of the total population.

Example: Natural language

- Given some corpus of natural language - The frequency of any word is inversely proportional to its rank in the frequency table. The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent… “The” accounts for 7% of all word occurrences (70000 over 1 million). “of” accounts for 3.5%, followed by “and”… Only 135 vocabulary items are needed to account for half the English corpus.

Other examples: Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people. File size distribution of Internet Traffic, Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites.

In classification and regression problems, this is a issue when using models that make assumptions on the linearity and need to apply a monotone transformation on the data (logarithm…). When sampling, the data will become even more unbalanced.

BA Interview Questions

What is Gross Rating Points (GRPs), Cost per Point (CPP), and Impressions?

Gross Rating Points (GRPs)

Gross Rating Point (GRP) is a measure of the size of an advertising campaign by a specific medium or schedule. GRP is calculated by multiplying the number of Spots by Rating.

Cost per Point (CPP)

Cost per Point (CPP) is a measure of cost efficiency which enables you to compare the cost of this advertisement to other advertisements. CPP is calculated as Media Cost Divided by Gross Rating Points (GRPs)

Impressions

Impressions are the total number of exposures to your advertisement. One person can receive multiple exposures over time. If one person was exposed to an advertisement five times, this would count as five impressions. Impressions are calculated by multiplying the number of Spots by Average Persons.

Reference:

http://www.bionic-ads.com/2016/03/reach-frequency-ratings-grps-impressions-cpp-and-cpm-in-advertising/

Leetcode Questions

Description

Given a sorted linked list, delete all duplicates such that each element appear only once

每日一练 | Data Scientist &amp; Business Analyst &amp; Leetcode 面试题 283

正文

请到「今天看啥」查看全文

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 283