Anyone who has been actively using the internet has encountered recommendation systems. For example, when reading news online, a section on recommended items to read and see is ubiquitous.

wsj

A recommender system makes possible this section on my twitter account:

twitter

Even Google searches are powered by recommender systems:

google1

And the area indicated by the red lines in the picture below of a Google search results page also uses a recommender system:

google2

Music recommendations such as iTunes also take advantage of the system:

iTunes)

Of course, the most popular examples of recommender systems are used by Amazon and Netflix:

amazon

The picture above is from my Amazon homepage. I don’t have a Netflix account (I used to but now I don’t) so I can’t show you a picture. But the use of recommender systems was actually popularized even more by what Netflix did a few years ago, which is sponsor a competition called the Netflix Prize to improve their recommendation system for movies they offer in their service.

There are a lot more websites than what I’ve mentioned here that take advantage of recommendation systems.

Why use recommendation systems?

Recommendation systems came about from the tremendous amount of products in the web that are available to consumers. In comparison, the available products in a traditional, physical institution (or store) is limited by whatever space the store has. Therefore, only the most popular products are available in the physical store. But this isn’t a problem in online stores. All possible products can be available. This however, brings another problem, though it pertains to consumer experience. Due to the volume of available products, consumers can experience cognitive overload and may not be able to find a product that they really like. This phenomenon is called the “long-tail” problem, which can be described by a line plot of popularity of products (with the most popular on lower side of the x axis) versus the product consumption. The plot tails on both axes and the area under the curve that covers the least popular items are sometimes even bigger than the area covered by the most popular items and available in physical stores. To access this area efficiently, recommender systems are used.

If you want to know more about the long tail phenomenon, it is explained succintly by this chapter of the book, Mining of Massive Datasets. The image above was obtained from the handout for the MOOC that uses this book.

But what really is a recommendation system?

The technology that powers recommender or recommendation systems (some also call it recommendation engines) is machine learning. Recommender systems are algorithms that determine whether an item or product will be useful to a user or consumer.

That said, there are three things that are important in understanding the model for recommender systems:

  • The user is the consumer.
  • The item is the product used by the consumer.
  • The third item are the values, which can be boolean or numerical, that indicate the degree of usability of the items to the users. These values maybe known (provided by the users) or unknown.

A user-item matrix can be constructed that maps the values given a number of user-item pairs. For example, if the items are movies, “Harry Potter”, “Chronicles of Narnia”, and “Star Wars” and we have users, “Sally”, “Joe”, and “Bill”, and the values are ratings (with 5 being the highest rating and 1 the lowest) for the corresponding movies, a matrix might look like this:

Sally Joe Bill
Harry Potter 5 2
Chronicles of Narnia 4 3
Star Wars 5

In reality, the matrix is too big and sparse–there are a lot of missing values! The missing values indicate that the user has not consumed the item or that the user has not rated the item, though he/she may have used/consumed it. The objective of the recommender system is to determine the missing values, specifically, the ones that are high (the ones that will be highly liked by the user).

Populating the Utility Matrix

To populate the utility matrix, experts in the field refer to implicit and explicit data gathering. Implicit data are those that were learned from user online activity such as clicks, views and purchases. Explicit data are those obtained by directly asking the user for their ratings, likes and dislikes.

Types of Recommender Systems

Different articles on recommendation systems vary in how they classify the types of recommender systems, but in general, there are two:

  • Content-based systems
  • Collaborative filtering systems

Content-based systems, also sometimes called content-based filtering use information on the features of the items. Any knowledge about the user is not used. Here, determining the similarities between items is the key. Say, a user likes an item, A. If another item, B, is found to be a similar to item A, item B can be recommended to the user. The problem now is how to determine the similarities of items.

Collaborative filtering systems use information from the users’ relationship with the item. This system is further classified into item-based and user-based. The hypothesis is that similar users tend to like similar items.

I find the difference between content-based system from item-based collaborative filtering rather confusing. But a blog article by Martin Kihn put the difference among these types of recommender systems this way:

Some recommendation systems use a hybrid of the two general types of systems. In the next blog, I will try to explain the two types of recommender systems.

References

Long-tail phenomenon Long-tail phenomenon by Chris Anderson, Wired Magazine

The Netflix Prize

Recommender Systems by Alpa Jain

How to Build a Recommender System by Martin Kihn