专栏名称: 大数据应用
数据应用学院被评为2016北美Top Data Camp, 是最专业一站式数据科学咨询服务机构,你的数据科学求职咨询专家!
目录
相关文章推荐
数据派THU  ·  NeurIPS 2024 | ... ·  13 小时前  
数据派THU  ·  【AAAI2025】TimeDP:通过领域提 ... ·  13 小时前  
CDA数据分析师  ·  Deepseek来袭,数据分析师会失业吗? ·  2 天前  
艺恩数据  ·  春节档观众满意度亮眼 ... ·  1 周前  
艺恩数据  ·  新春贺岁,福满人间! ·  2 周前  
51好读  ›  专栏  ›  大数据应用

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 318

大数据应用  · 公众号  · 大数据  · 2018-03-22 09:11

正文

自2017年6月15日起,数据应用学院与你一起温习数据科学(DS)和商业分析(BA)领域常见的面试问题。 从2017年10月4号起,每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day 218

DS Interview Questions

How is KNN different from k-means clustering?

BA Interview Questions

SQL: Write a query identifying the type of each record in the TRIANGLES table using its three side lengths. Output one of the following statements for each record in the table:


  • Equilateral: It's a triangle with  sides of equal length.

  • Isosceles: It's a triangle with  sides of equal length.

  • Scalene: It's a triangle with  sides of differing lengths.

  • Not A Triangle: The given values of A, B, and C don't form a triangle.

LeetCode Questions

Description:

Given a 2D board and a word, find if the word exists in the grid.

The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

Input: board = [ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E'] ] word = "ABCCED"

Output: true

欲知答案如何?请见下期分解!

Day 217 答案揭晓

DS Interview Questions

What is latent semantic indexing? What is it used for? What are the specific limitations of the method?

Latent semantic indexing:

  • Latent Semantic Indexing is Principal Component Analysis (PCA) in document analysis, it is simply applying PCA to (the variance-covariance matrix) of X and the principal directions (eigenvectors) now define topics.

  • It uses a term-document matrix X that describes the occurrences of terms in documents.  Rows correspond to terms(vocabulary) and columns correspond to documents.  Elements of X are typically weights that are proportional to the number of times a term appears in a document, with rare terms upweighted to reflect the relative importance.  The matrix X is usually large and sparse.

  • LSA finds a low-rank approximation of the original term-document matrix, which merges the dimensions of terms that have similar meanings.

What is it used for:

  • LSA can be applied to compare documents in the low-dimensional space (document classification), find relations between terms (synonym identification), find matching documents by translating a query of terms to low-dimensional space (information retrieval), and etc.


Limitations include:

  • The resulting dimensions can be difficult to interpret

  • LSA cannot capture multiple meanings of a word

  • The terms of a document are represented unordered

  • Eigenvectors can have negative components

Reference: https://en.wikipedia.org/wiki/Latent_semantic_analysis

BA Interview Questions

SQL: Query all columns for all American cities in CITY with populations larger than 100000. The CountryCode for America is USA.

The CITY table is described as follows:


SELECT *

FROM CITY

WHERE

COUNTRYCODE = 'USA'

AND POPULATION > 100000;


Leetcode Questions

    Description:

    • Given a linked list, swap every two adjacent nodes and return its head.

    Input: 1->2->3->4

    Output: 2->1->4->3

    Assumptions:

    Your algorithm should use only constant space.

    You may not modify the values in the list, only nodes itself can be changed.








请到「今天看啥」查看全文