专栏名称: 生信圈
关注生物医学大数据、以及数据分析方法在转化医学研究中的应用进展,讨论与生物信息相关的一切话题。
目录
相关文章推荐
微观三农  ·  今日学习 · 建设更高水平的平安中国 ·  昨天  
微观三农  ·  国产大豆食用消费增长潜力将持续释放 ·  3 天前  
微观三农  ·  “两岸媒体乡村行”采访采风活动成功举办 ·  3 天前  
51好读  ›  专栏  ›  生信圈

机器学习|推荐系统算法

生信圈  · 公众号  ·  · 2017-07-24 21:00

正文

*本文仅为部分摘要,请点击文章底部“阅读全文”查看原文。


Today, many companies use big data to make super relevant recommendations and growth revenue. Among a variety of recommendation algorithms, data scientists need to choose the best one according a business’s limitations and requirements.


To simplify this task, the Statsbot team has prepared an overview of the main existing recommendation system algorithms.


Collaborative filtering

Collaborative filtering (CF) and its modifications is one of the most commonly used recommendation algorithms. Even data scientist beginners can use it to build their personal movie recommender system, for example, for a resume project.


When we want to recommend something to a user, the most logical thing to do is to find people with similar interests, analyze their behavior, and recommend our user the same items. Or we can look at the items similar to ones which the user bought earlier, and recommend products which are like them.


These are two basic approaches in CF: user-based collaborative filtering and item-based collaborative filtering, respectively.

In both cases this recommendation engine has two steps:


  1. Find out how many users/items in the database are similar to the given user/item.

  2. Assess other users/items to predict what grade you would give the user of this product, given the total weight of the users/items that are more similar to this one.


What does “most similar” mean in this algorithm?


All we have is a vector of preferences for each user (row of the matrix R) and the vector of user ratings for each product (columns of the matrix R).

First of all, let’s leave only the elements for which we know the values in both vectors.


For example, if we want to compare Bill and Jane, we can mention that Bill hasn’t watched Titanic and Jane hasn’t watched Batman until this moment, so we can measure their similarity only by Star Wars. How could anyone not watch Star Wars, right? :)


The most popular techniques to measure similarity are cosine similarity or correlations between vectors of users/items. The final step is to take the weighted arithmetic mean according to the degree of similarity to fill empty cells in the table.


点击下方“阅读原文”查看更多内容。


本文转载自 爱可可-爱生活 ,如有侵权请联系删除。

推荐阅读
  1. 【原创】挖掘乳腺癌潜在调控基因的新利器:BCIP

  2. 从特征分解到协方差矩阵:详细剖析和实现PCA算法

  3. 编程语言|如何最有效地编写SQL

  4. 一名工程师对于深度学习的理解:神经网络基础ANN

  5. 编程语言|数据专家必知必会的7款Python工具

生信圈致力于每天推送生物信息干货,让大家了解生信行业。旨在通过更多的交流促进行业的发展。我们一直在寻找志同道合的伙伴!投稿邮箱:[email protected]

生信圈

微信ID:bioinfor-club

1.点击历史信息,查看更多内容

2.长按右侧二维码,关注更多生物信息干货

长按二维码关注