专栏名称: 大数据应用
数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构,你的数据科学求职咨询专家!
目录
相关文章推荐
人工智能与大数据技术  ·  因一条1分钟的视频,工程师被OpenAI封禁 ... ·  4 天前  
数据派THU  ·  【ICLR2025】CUBEDIFF:将基于 ... ·  4 天前  
数据派THU  ·  提升数据科学工作流效率的10个Jupyter ... ·  5 天前  
数据派THU  ·  机器学习过程:特征、模型、优化和评估 ·  3 天前  
大数据分析和人工智能  ·  这样做,DeepSeek输出增强100倍 ·  3 天前  
51好读  ›  专栏  ›  大数据应用

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 720

大数据应用  · 公众号  · 大数据  · 2019-10-16 08:27

正文

点击上方 蓝字 会变美




Sep.

15

Data Application Lab 自2017年6月15日起,每天和你分享讨论一道数据科学(DS)和商业分析(BA) 领域常见的面试问题。

自2017年10月4日起,每天再为大家分享一道Leetcode 算法题。


希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day

620

DS Interview Question

Give me some examples about the applications of Naive Bayes Algorithms.

BA Interview Question

Combine two tables


Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| PersonId    | int     |
| FirstName   | varchar |
| LastName    | varchar |
+-------------+---------+

PersonId is the primary key column for this table.


Table: Address
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| AddressId   | int     |
| PersonId    | int     |
| City        | varchar |
| State       | varchar |
+-------------+---------+


AddressId is the primary key column for this table.

Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people:

FirstName, LastName, City, State

LeetCode Question

Group Anagrams


Description:

Given an array of strings, group anagrams together.

Input: [“eat”,“tea”,“tan”,“ate”,“nat”,“bat”]

Output: [[“bat”],[“ate”,“eat”,“tea”],[“nat”,“tan”]]

Day

619

答案揭晓

DS Interview Question & Answer

What are the Pros and Cons of Naive Bayes?

Pros:

It is easy and fast to predict class of test data set. It also perform well in multi class prediction

When assumption of independence holds, a Naive Bayes classifier performs better comparing to other models like logistic regression and you need less training data.

It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).


Cons:

If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

BA Interview Question & Answer

Write a query in SQL to Obtain the names of all patients whose primary care is taken by a physician who is not the head of any department and name of that physician along with their primary care physician.


Table: patient (pt)

ssn              |       name                |      address                  |  phone            | insuranceid | pcp
----------------+-------------------------+------------------------------+-------------------+----------------+--------
100000001 | John Smith             | 42 Foobar Lane          | 555-0256      |    68476213    |   1
100000002 | Grace Ritchie         | 37 Snafu Drive            | 555-0512      |    36546321    |   2
100000003 | Random J. Patient | 101 Omgbbq Street     | 555-1204      |    65465421    |   2
100000004 | Dennis Doe            | 1100 Foobaz Avenue   | 555-2048     |    68421879    |   3

Table: physician (p)

Employeeid  |     name                   |     position                                  |    ssn
------------+------------------------------+-------------------------------------------+---------------------
1         | John Dorian            | Staff Internist                              | 111111111
2         | Elliot Reid               | Attending Physician                    | 222222222
3         | Christopher Turk    | Surgical Attending Physician      | 333333333
4         | Percival Cox          | Senior Attending Physician         | 444444444
5         | Bob Kelso              | Head Chief of Medicine               | 555555555
6         | Todd Quinlan          | Surgical Attenian                        | 666666666
7         | John Wen               | Surgical Attending Physician      | 777777777
8         | Keith Dudemeister | MD Resident                               | 888888888
9         | Molly Clock             | Attending Psychiatrist                 | 999999999

Answer:

SELECT pt.name AS "Patient",
p.name AS "Primary care Physician"
FROM patient pt
JOIN physician p ON pt.pcp=p.employeeid
WHERE pt.pcp NOT IN
(SELECT head
FROM department);

https://www.w3resource.com/sql-exercises/hospital-database-exercise/sql-exercise-hospital-database-39.php







请到「今天看啥」查看全文