SlideShare a Scribd company logo
1 of 38
Download to read offline
RWTH Aachen University
University of Bonn
Fraunhofer FIT
E-Commerce Seminar WT 08/09




                   Recommender Engines
                      Seminar Paper

                              Thomas Hess (289222)

                                February 1, 2009
Abstract Recommender engines are used by more and more e-commerce businesses to help con-
sumers finding products they are interested in. The paper describes what recommender engines are
and what role they play in e-commerce. Recommender engines use various techniques that use dif-
ferent knowledge sources to make recommendations. The paper explains these techniques and their
strengths and weaknesses. Some of the common issues that recommender systems face are discussed
and possible solutions presented. Concluding examples of recommender engines in e-commerce are
described. It is shown what techniques they use and how the e-businesses utilize recommendations on
their websites.
Contents

1 Introduction                                                                                                                                                      5

2 Recommender Techniques                                                                                                                                            6
  2.1 Non-Personalized Recommendation                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
  2.2 Demographic Recommendation . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
  2.3 Content-Based Recommendation . .                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
  2.4 Collaborative Filtering . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
      2.4.1 User-Based Approach . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
      2.4.2 Item-Based Approach . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
      2.4.3 Model-Based Approach . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
  2.5 Hybrid Approaches . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13

3 Issues And Solutions                                                                                                                                             14
  3.1 Data Collection . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
  3.2 Cold Start . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
  3.3 Stability vs. Plasticity . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
  3.4 Sparsity . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
  3.5 Performance & Scalability        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
  3.6 User Input Consistency . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
  3.7 Privacy . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17

4 Recommender Engine Examples                                                                                                                                      19
  4.1 ChoiceStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                     20
  4.2 Amazon.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                       22
  4.3 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                   29

5 Conclusion                                                                                                                                                       36



                                                               3
List of Figures

 2.1    Knowledge Sources of Recommender Engines         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    6
 2.2    Non-Personalized Recommendation . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
 2.3    Demographic Recommendation . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
 2.4    Content-Based Recommendation . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8
 2.5    User-Based Collaborative Filtering . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
 2.6    User-Based Collaborative Filtering Example .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
 2.7    Item-Based Collaborative Filtering . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
 2.8    Item-Based Collaborative Filtering Example .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
 2.9    Model-Based Collaborative Filtering . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12

 4.1    ChoiceStream Recommender Engine . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
 4.2    Amazon – Item With Recommendations . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
 4.3    Amazon – Shopping Cart With Recommendations              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
 4.4    Amazon – Your Recommendations . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
 4.5    Amazon – Recommendation Details . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
 4.6    Amazon – Your Purchases . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   28
 4.7    Digg – Story . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
 4.8    Digg – Topic Settings . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   31
 4.9    Digg – Homepage . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   32
 4.10   Digg – Recommendations . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
 4.11   Digg – Correlated User . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   35




                                                 4
1 Introduction

Recommender engines are personalized information agents that attempt to predict which items out of
a large pool a user may be interested in. These items can be of any type, like movies, music, books,
websites, or news articles. The user’s interest in an item is expressed through the rating the user gives
the item. A recommendation system has to predict the ratings for items that the user has not yet seen.
With these estimated ratings the system can recommend the items that have the highest estimated
rating.

Recommender engines have become an integral part of many e-commerce businesses [1, 2]. They are
a serious business tool that gets used by an ever-increasing number of online stores. Recommender
systems are an unique feature of e-commerce, as websites are able to track everything their customers
do, in contrast to real stores. The knowledge learned from the customers’ behaviour is the basis for
the recommendations. Because online businesses have no real space constraint, they can offer much
larger stocks, providing their customers with more choices. These large stocks become impossible to
stack search, so e-commerce stores must provide personalized versions with reduced choices to the
individual users. One way to achieve this is the use of recommender engines.

For e-commerce vendors, recommender engines provide multiple benefits. Good recommender sys-
tems present customers products they are interested in but did not plan to buy, making them purchase
more items [2, 3, 4]. These unplanned purchases are not yet happening as often in online stores as in
traditional stores [2]. Recommender engines can help to gain consumers’ loyalty, which is a essential
business strategy in e-commerce as the competitor is always just “one click away” [4]. Because rec-
ommender systems make it easier und faster to find new items, customers come back more often [2].
The more a user uses a website and purchases items, the more the recommender engine learns about
the user and the better the recommendations get. This helps to build a “value-added relationship”
between the website and the user [4]. Recommender systems are also a way to promote older or
low-demand items, such as niche products [2].




                                                   5
2 Recommender Techniques

The techniques used by recommender engines can be classified based on the information sources they
use [5, 2]. The available sources are the user features (demographics) (e.g. age, gender, profession,
income, location), the item features (e.g. keywords, genres), and the user-item ratings (gathered
through questionnaires, explicit ratings, transaction data). See figure 2.1.



2.1 Non-Personalized Recommendation

Non-personalized recommendations are identical for each user. The recommendations are either man-
ually selected (e.g. editor choices) or based on the popularity of items (e.g. average ratings, sales data).
See figure 2.2.




                Figure 2.1: Knowledge Sources of Recommender Engines (From [5])




                                                     6
2 Recommender Techniques




                    Figure 2.2: Non-Personalized Recommendation (From [5])


Because non-personalized recommendations are easy to compute, they are popular among e-commerce
businesses. They are also an option for websites that offer no personalization.



2.2 Demographic Recommendation

Demographic recommendation methods uses only the information about the users. The users are
categorized based on the attributes of their demographic profiles in order to find users with similar
features. The engine then recommends items that are preferred by these similar users. See figure 2.3.




Advantages

   • Because user-item ratings are not used, new users can get recommendations before they have
     rated any item.
   • Knowledge about the items and their features is not needed, therefore the technique is domain-
     independent.




                      Figure 2.3: Demographic Recommendation (From [5])




                                                 7
2 Recommender Techniques




                       Figure 2.4: Content-Based Recommendation (From [5])


Problems

   • Gathering the required demographic data leads to privacy issues, see 3.7.
   • Demographic classification is too crude for highly personalized recommendations [5, 3]. The
     generalisations created from the classification are often false, especially when it comes to cul-
     tural items like books, music, or movies [6, 3].
   • Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).
   • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).



2.3 Content-Based Recommendation

Content-based recommendation methods use the information about item features and the ratings a
user has given to items. The technique combines these ratings to a profile of the user’s interests based
on the features of the rated items. The engine then can find items with the preferred features and
recommend the items with the highest similarity to the ones preferred in the past. See figure 2.4. The
recommendations of a content-based system are based on individual information and ignore contribu-
tions from other users.

The profiles of the users’ interests are often represented as vectors of weights on item features. But if
automatic learning methods, like a rule induction algorithm, are used to generate them, they can also
be rule-based [7].

Content-based recommendation works well if the items can be properly represented as a set of fea-
tures. The quality of the recommendations depends directly on the quality of the available descriptive
data. In order to have a sufficient set of features, the item descriptions must either be in a form from
which features can be extracted automatically with information retrieval techniques (e.g. text), or




                                                   8
2 Recommender Techniques


the features must be assigned manually, which takes a lot of resources [8]. Besides objective cate-
gorizations, systems can also use (user-generated) tags associated to items that provide a subjective
view.


Problems

   • Content analysis is necessary to determine the item features.
   • The technique depends not only on the quality of the item metadata but also on the homogeneity
     of the stock, so items can be categorized.
   • The quality of items cannot be evaluated. The similarity computation is limited to the item
     features [5].
   • The technique suffers from the cold start problem for new users, see 3.2.
   • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).



2.4 Collaborative Filtering

Collaborative filtering techniques use the user behaviour in form of the user-item ratings as their in-
formation source. The concept is to make correlations between users or between items.Collaborative
filtering is widely implemented and the most mature recommendation technique. Three main ap-
proaches of collaborative filtering can be distinguished: user-based, item-based, and model-based
approaches.


Advantages

   • Like for demographic recommendations no knowledge about the item features is needed. Col-
     laborative filtering works completely independent of machine-readable item representations. It
     is therefore domain independent.
   • The quality (not just the relevancy) of items can be evaluated, as it is also expressed through
     user-item ratings [5].
   • Collaborative filtering techniques are able to make recommendations “outside the box” because
     they look outside the preferences of the individual user [1].




                                                  9
2 Recommender Techniques




                       Figure 2.5: User-Based Collaborative Filtering (From [5])


Problems

   •   The quality of the recommendations depends on the size of the historical rating data set.
   •   The technique suffers from the cold start problem for new users and new items, see 3.2.
   •   Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6).
   •   Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3).



2.4.1 User-Based Approach

The user-based approach is based on the assumption that users that rated the same items similarly
probably have the same taste. It make user-to-user correlations by using the rating profiles of different
users to find highly correlated users. These users form like-minded neighbourhoods based on their
shared item preferences. The engine then can recommend the items preferred by the other users in the
neighbourhood. See figure 2.5.

Figure 2.6 shows an example of user-based collaborative recommendation.

But if there are little overlapping ratings across users in the data set, the user-based approach runs into
the sparsity problem, see 3.4.

User-based collaborative filtering does not scale well for many users and items, because the analysis
and comparison processes become more complex, see 3.5.



2.4.2 Item-Based Approach

The item-based approach focuses on items, assuming that items rated similarly are probably similar. It
compares items based on the shared appreciation of users, in order to create neighbourhoods of similar




                                                    10
2 Recommender Techniques




                Figure 2.6: User-Based Collaborative Filtering Example (From [5])


items. The engine then recommends the neighbouring items of the user’s know preferred ones. See
figure 2.7.

Figure 2.8 shows an example of item-based collaborative recommendation.

Item-based collaborative filtering is more scalable than the user-based approach, as the correlations
are drawn among a limited number of products, instead of a potentially very large number of users.
Items are also easy to categorize, while users’ activities must be examined and analyzed. See 3.5.

Also because the number of items is naturally smaller than the number of users, the item-based ap-
proach has a reduced sparsity problem (see 3.4) in comparison to the user-based approach.




                     Figure 2.7: Item-Based Collaborative Filtering (From [5])




                                                11
2 Recommender Techniques




                 Figure 2.8: Item-Based Collaborative Filtering Example (From [5])


2.4.3 Model-Based Approach

For huge data sets, the quadratic complexity of the user-item rating matrix gets very high [7]. But in
real applications predictions must me made quickly. Model-based approaches address this problem
by deriving a model for prediction from historical user-item rating data, in order to make the online
prediction process faster. To build the model learning techniques like bayesian networks, neural net-
works, or latent semantic indexing are used. For an accurate model a large amount of data must be
available. The engine then makes the online recommendations by using the model. See figure 2.9.

As the model is build in advance of the online recommendation processes, this approach has a higher
performance than the memory-based approaches and avoids the scalability problem, see 3.5. Depend-
ing on the learning techniques used to create the model, this approach can lead to a higher recommen-
dation accuracy and a reduced sparsity problem [5].

The major drawback of the model-based approach is that the recommendation results do not adapt




                    Figure 2.9: Model-Based Collaborative Filtering (From [5])



                                                 12
2 Recommender Techniques


automatically to data changes. Instead the model must be re-build to reflect updated data.



2.5 Hybrid Approaches

Hybrid approaches combine collaborative and demographic or content-based methods in order to over-
come their drawbacks. Collaborative filtering systems often result in better predictive performance
but have problems when limited user-item ratings are available [7]. Demographic and content-based
recommendation systems work without rating data and therefore can compensate for the cold start
problem [1].

There are various methods to combine recommender techniques in a hybrid system [1, 9]:

Weighted Hybridization The scores of the different recommendation components are combined
    numerically. Each component of the hybrid system scores a given item and the scores are
    combined using a linear formula.

Switching Hybridization The system chooses among recommendation components based on the
     situation and applies the selected one. Some reliable criterion must be available on which to
     base the switching decision.

Mixed Hybridization Recommendations from different recommenders are presented side-by-side
    in a combined list. The results of the recommender systems are not combined.

Feature Combination Features derived from different knowledge sources are combined together
     and then injected into a single recommendation algorithm.

Feature Augmentation One recommendation technique is used to compute a feature or set of fea-
     tures, which is then part of the input to the next technique.

Cascaded Hybridization Recommenders are given strict priority, with the lower priority ones break-
     ing ties in the scoring of the higher ones.

Meta-Level Hybridization One recommendation technique is applied to produce a model, which is
    then used as the input for another technique.




                                                 13
3 Issues And Solutions

3.1 Data Collection

The data used by recommender engines can be categorized into explicit and implicit data [2].

Explicit is all data that users themselves feed into the system. Like demographic data, information
about their preferences (e.g. collected through questionnaires), search terms, explicit ratings and
reviews of items (wisdom of the crowds). The collection of explicit data must not be intrusive or time
consuming. The way the explicit data is collected can affect the quality and amount of data the users
will provide [10].

Recommendation systems should not rely completely on explicit data. Websites are able to track
their user’s activities in order to acquire implicit data. The most important implicit data source in
e-commerce is the transaction data including the purchase information. Other sources are web usage
patterns like click sequences or reading times, or search engine referrers. Implicit data needs to be
analyzed first before it can be used to describe user features or user-item ratings.



3.2 Cold Start

The cold start problem occurs when too little rating data is available in the initial state. The rec-
ommender system then lacks data to produce appropriate recommendations. A distinction is made
between the new user and new item problem.


New User Problem When recommendations follow from user-to-user correlations based on the
accumulation of ratings, a user with few ratings is difficult to categorize.



                                                 14
3 Issues And Solutions


New Item Problem A item with few ratings cannot easily be recommended. This problem occurs
particularly in domains with many new items (e.g. news articles). As the problem also occurs for long
tail items, it is also called “long tail problem” [10].

A solution to the cold start problem is the combination of the collaborative technique with demo-
graphic (for the new user problem) or content-based (for the new item problem) techniques in a hybrid
recommender engine, see 2.5. That way the cold start problem gets compensated by techniques that
don’t rely on user-item ratings.

Other solutions to reduce the cold start problem are the use of default ratings (e.g. from the average
rating of all users) [6, 10] or the use of active learning techniques in model-based recommendation
techniques [5].



3.3 Stability vs. Plasticity

The converse of the cold start problem is the stability vs. plasticity problem. When users have rated a
lot of items, their preferences in the established user profiles are difficult to change [1, 9]. But because
in reality taste evolves, this becomes a problem.

The solution for this is to gradually discount older ratings to have less influence. But by doing so
engines risk to loose information about long-term interests [1, 9].

Related to this problem is that users may use a website with different intentions. For example one day
a customer buys books for himself, but the next day he is looking for a present for someone else.



3.4 Sparsity

In most use cases for recommender systems, due to the catalog sizes of e-business vendors, the number
of ratings already obtained is usually very small compared to the number of ratings that need to be
predicted. But collaborative filtering techniques depend on an overlap in ratings across users and have
difficulties when the space of ratings is sparse (few users have rated the same items). Sparsity in the
user-item rating matrix degrades the quality of the recommendations.




                                                   15
3 Issues And Solutions


To reduce the sparsity the rating data needs to be adjusted by either adding additional ratings or
reducing the dimensionality of the matrix. Ratings can be augmented by inserting simulated values
on behalf of the users. These can be ratings derived from other (implicit) data sources, like item views
or clicks, or default values [6].

The dimensionality of the rating matrix can be reduced by techniques such as singular value decompo-
sition [1]. Singular value decomposition is a well-known method for matrix factorization that provides
the best lower rank approximations of the original matrix. Dimensionality reduction techniques are
often used in model-based collaborative filtering approaches [1].



3.5 Performance & Scalability

Performance and scalability are important issues for recommender systems as e-commerce websites
must be able to determine recommendations in real-time and often deal with huge data sets of millions
of customers and items. The big growth rates of e-businesses are making the sets even larger in the
user dimension [6].

Definitive for the performance is the computational complexity of a recommendation technique. Tech-
niques that calculate correlation coefficients for M users over N items have a complexity of O(M × N)
in the worst case. Due to the common sparsity of the user-item rating matrix the performance tends
to be closer to O(M + N) [11]. However for large data sets this still leads to performance and scaling
issues.

Techniques that can perform the most expensive calculations offline scale better than techniques where
everything must be calculated online, in real time [11]. Demographic and content-based recommen-
dation as well as item- and model-based collaborative filtering can utilize offline computation. But
user-based collaborative filtering can do little or no offline computing, which makes it impractical for
large data sets [11].

Additionally to performing calculations offline, all methods that help reducing the size of the data
set improve performance and scalability of a recommendation technique [6]. For example users with
very few ratings or very popular or unpopular items could be discarded [11]. But these methods also
reduce the recommendation quality.




                                                  16
3 Issues And Solutions


3.6 User Input Consistency

Recommender techniques that work with user-to-user correlations, like demographic or collaborative
filtering, depend on high correlation coefficients between the users in a data set.

Users can be split into three classes based on their correlation coefficients with other users [6]. The
majority of users fall into the class of “white sheep”, which have a high rating correlation with many
other users. Engines can easily find recommendations for these users. The opposite type are the
“black sheep”. For them there are only few or no correlating users. This makes it very difficult to find
recommendations for them. But when the number of overall users in a data set increases, the chance
to find similar users increases as well.

The bigger problem is the “gray sheep” problem. These users have different opinions or an unusual
taste, that results in low correlation coefficients with many users. They fall on a border between user
cliques. Recommendations for them are very difficult to find and they also cause odd recommenda-
tions for their correlated users.



3.7 Privacy

Privacy is an important issue in recommender systems. In order to provide personalized recommen-
dations, recommender systems must know something about the users. In fact, the more the systems
know, the more accurate the recommendations can get. Users are reasonably concerned about what
information is collected, how it is used, and if it is stored.

These privacy concerns affect both, the collection of explicit and implicit data. Regarding explicit
data, users are reluctant to disclose information about themselves and their interests [2, 4]. If ques-
tionnaires get too personal, users may provide false information in order to protect their privacy [4].
Recommender engines should be able to deal with privacy concerned users and not solely rely on
explicit data or recommender techniques that do, like demographic recommendation.

Regarding implicit data that gets acquired by tracking users’ behaviour, there are concerns that per-
sonal taste or private actions get revealed through the recommendations [5]. Users fear that extensive
consumer profiles get created.




                                                  17
3 Issues And Solutions


To confront these concerns e-commerce businesses muss provide privacy protection mechanisms [5]
and make transparent which data gets acquired and analyzed. Usage und storage restrictions must be
assured through privacy policies [4].




                                               18
4 Recommender Engine Examples

Recommender engines are developed and run by independent technology vendors and by e-commerce
businesses themselves.

The business model of recommendation technology vendors is either to offer the recommender engine
as a hosted service or to license their engines to e-commerce businesses. Examples for technology
vendors are: ChoiceStream1 , Baynote2 , ExpertMaker3 , Loomia4 , Criteo5 , SourceLight6 , and Collar-
ity7 .

Especially bigger e-commerce businesses develop their own recommender solutions because they
have unique requirements, want unique features, or deal with items that third-party products are not
suited for. Examples are: Amazon.com8 , Netflix9 , Digg10 , The Internet Movie Database (IMDb)11 ,
Pandora12 , and Last.fm13 .

In the following the techniques and usages of the recommender engines of ChoiceStream, Ama-
zon.com, and Digg are described in detail.




 1 http://www.choicestream.com
 2 http://www.baynote.com
 3 http://www.expertmaker.com
 4 http://www.loomia.com
 5 http://www.criteo.com
 6 http://www.sourcelight.com
 7 http://www.collarity.com
 8 http://www.amazon.com
 9 http://www.netflix.com
10 http://digg.com
11 http://www.imdb.com
12 http://www.pandora.com
13 http://www.last.fm




                                                 19
4 Recommender Engine Examples


4.1 ChoiceStream

ChoiceStream is a personalisation company that offers their recommendation technology “RealRele-
vance Recommendations” as a fully-hosted service for e-commerce vendors.

Because the different recommendation techniques all have their drawbacks and are not suited for all
fields of application, ChoiceStream is using a hybrid system based on a variety of techniques that are
chosen and combined depending on the concrete recommendation use case on hand [10]. The use
cases that ChoiceStream distinguishes are listed in table 4.1.

The recommendation techniques used by the ChoiceStream recommender engine are [10]:

Collaborative Filtering Both, user-based and item-based collaborative filtering are used.

Collaborative Filtering Using Multiple Correlation Tables Use of multiple correlation tables
     (e.g. item views or clicks in addition to transactions) to overcome the cold start problem
     (see 3.2).

Cohort Analysis Creation of groups of similar users, called cohorts, in order to make better recom-
    mendations for users with sparse rating data.

  Use Case                   Definition
  Rich Profile User           Users for whom you have a lot of data (e.g. more than 5 transac-
                             tions).
  Sparse Profile User         Users for whom you have little data (e.g. fewer than 1 to 4 trans-
                             actions).
  Anonymous / New User       Users for whom you have no data.
  Popular Content            Items in your catalog that you can determine are “most popular”.
                             Typically these will be few in number, but very high volume.
  Mainstream Content         Items for which you have recorded patterns of behavior (e.g. more
                             than 20 transactions per the items).
  New Content                Items for which there are no past transactions.
  Long Tail Content          Items in a catalog which are less well known, but still profitable,
                             and for which there are few past transactions.
  Business Goal Optimization The requirement to maximize a metric other than the number of
                             transactions, such as revenue, margin, or order size.

    Table 4.1: ChoiceStream – Common Use Cases Requiring Different Algorithms (From [10])




                                                 20
4 Recommender Engine Examples


Selective Filtering By selective filtering the most popular items are taken out of the recommenda-
     tions, so they don’t dominate and customers can find less popular items.

Attribute Correlations Item attributes are used to make content-based recommendations to over-
      come the cold start problems of collaborative filtering.

Default Recommendations Default recommendations are the fallback function if all other tech-
     niques fail to determine recommendations.

Business Goal Optimization With a multi-term scoring function the recommendation algorithm
     can be adjusted to for example preferably recommend higher-priced items in order to increase
     revenue.

Figure 4.1 shows what techniques are used for which use cases by the ChoiceStream recommender
engine.




                   Figure 4.1: ChoiceStream Recommender Engine (From [10])




                                               21
4 Recommender Engine Examples


4.2 Amazon.com

Amazon.com, founded in 1994, is the largest online retailer worldwide and one of the most well know
example of e-commerce businesses utilizing a recommender engine. Amazon uses it’s recommenda-
tion engines extensively to personalize its website.

Amazon’s recommender engine is based on item-based collaborative filtering [5, 6, 11]. It looks for
items correlating to the ones purchased and rated and combines the highly correlated items into a
recommendation list [11].

The recommendation engine consists of an online and an offline component. The offline component
creates an item-to-item matrix with all similar items. The online component can then lookup recom-
mendations in the matrix when they are needed [11]. To build the item-to-item matrix a similarity
function is used that determines the correlation coefficient between item pairs that customers tend to
purchase together. This expensive calculation is done offline [11, 6]. The online component then only
has to lookup similar items to the ones a user already has purchased or rated. This is a very easy and
fast operation that can be done online in real-time. Its complexity only depends on the number of
items a customer is associated with [11].

By performing the most expensive calculations offline Amazon’s recommendation system can deal
with the huge data set of approximately 50 million customers per month (only from the U.S.) and
several million catalog items. The online component scales independently of the catalog size and the
number of customers [11]. Another benefit of the created similar-items table is that the algorithm
produces higher quality recommendations for users with little user-item rating data than traditional
collaborative filtering [11].


Customers Who Bought On the information page for every item, Amazon shows the “Customers
Who Bought” feature that recommends items frequently purchased by customers who purchased the
selected item, see Figure 4.2.

As figure 4.3 shows, the feature is also used on the shopping cart page. This works as the equivalent
to the impulse items in a supermarket checkout line [11], but here the impulse items are personalized
for each customer.




                                                 22
4 Recommender Engine Examples




Figure 4.2: Amazon – Item With Recommendations




                     23
4 Recommender Engine Examples




Figure 4.3: Amazon – Shopping Cart With Recommendations




                          24
4 Recommender Engine Examples


Your Recommendations On the page “Your Recommendations” all recommendations are listed
with the ones derived from recent purchases in front, see Figure 4.4. They can be filtered by product
line and subject area. Users can mark the recommended items as already owned or as not interesting
as well as rate them in order to provide the recommender engine with further rating data to influence
what gets recommended. It is also shown why an item is recommended, that is which purchased item
is correlated to the recommended item.

Additionally the user can view a detail page for every recommendation that lists all correlations to
purchased or otherwise rated items, see Figure 4.5.

Amazon encourages users to refine their user-item rating data by giving the option to rate purchased
items on a 5-point scale. On a page that lists all previous purchases the items can be rated and also
excluded from the recommendation calculation, see Figure 4.6.




                                                 25
4 Recommender Engine Examples




Figure 4.4: Amazon – Your Recommendations
            1 Recommended items can be marked as owned or not interested in and be rated
            2 It is shown why items are recommended.




                                               26
4 Recommender Engine Examples




Figure 4.5: Amazon – Recommendation Details




                    27
4 Recommender Engine Examples




Figure 4.6: Amazon – Your Purchases
            1 Items can be rated
            2 Items can be excluded from the recommendation engine




                                               28
4 Recommender Engine Examples


4.3 Digg

Digg is social news site, launched in 2004, where users can submit links to websites. Users can rate
these links, called stories, by “digging” or “burying” them. Stories can also be favorited, shared,
and commented on. See figure 4.7. The stories are categorized into various topics. A user can
configure which topics he is interested in and will then only see stories in these categories throughout
the website, see Figure 4.8.

On the Digg homepage the most popular stories are shown, see Figure 4.9. The popularity is measured
by the number of recent “diggs”. Thereby the homepage utilizes non-personalized recommendation.

For registered users Digg provides personalized recommendations through their own recommendation
engine, which is based on user-based collaborative filtering. The engine relies solely on the user-item
ratings express by the the “digg” function. It works without knowledge about the content of the
stories [12].

The recommendation engine uses the user’s history of “dugg” stories in the last thirty days to make
recommendations [13]. This short time span is appropriate for fast moving internet news, avoids the
stability vs. plasticity problem, and helps to keep the size of the ratings matrix within limits.

Every time a user “diggs” a story, the engine associates the user with all other users who also have
“dugg” the story. Out of these associations the recommender system calculates a correlation coef-
ficient between the users. The coefficient is based on the number of “dugg” stories in common in
relation to the total number of stories “dugg” by each of the associated users [13]. The coefficient has
a value between one zero and one. Zero if both users have never “dugg” the same story. One if the
users share all their “dugg” stories. The coefficient calculation automatically accounts for the overall
level of user activity. If a user “diggs” a lot of stories, the number of common “dugg” stories must be
high to get a high correlation coefficient. If a user “diggs” rarely, a small amount of agreement can
suffice.

The users highly correlated to a user are called “Diggers Like You”. The engine recommends the
upcoming stories that have been “dugg” by these users, minus the stories the user has already “dugg”
or buried. Stories are upcoming if they are newly submitted and have not made it to the homepage yet.
The “Diggers Like You” therefore work as a filter for all the upcoming stories. In average numbers
this means that more than 17,000 submissions per day get boiled down to about 300 recommenda-
tions [12].



                                                  29
4 Recommender Engine Examples




Figure 4.7: Digg – Story
            1 Users can “Digg” Stories
            2 Users can Share and Favorite Stories
            3 Recommendations by the Recommender Engine




                                            30
4 Recommender Engine Examples




Figure 4.8: Digg – Topic Settings




               31
4 Recommender Engine Examples




Figure 4.9: Digg – Homepage
            1 Non-Personalized Recommendations
            2 Personalized Recommendations from the Recommender Engine




                                            32
4 Recommender Engine Examples


A user’s recommended upcoming stories are displayed on the recommendations page, see Figure 4.10.
On the right pane of the page a list of the most highly correlated users with their compatibility per-
centage is shown. The compatibility percentage represents the correlation coefficient. This allows the
user to explore the correlated users. Also for every recommended story the correlated users, that have
“dugg” this story, are shown including their compatibility percentage. By clicking on the compatibil-
ity percentage of a correlated user a page is shown, that displays the correlation to this user in detail,
see Figure 4.11. It is listed which stories both users have “dugg” and which stories are at the moment
recommended through this correlation. The user is also able to remove the correlation to this user
from his recommendation calculation.

The recommender engine works in real-time without prediction models or batch processing. In order
to achieve this for more than 2 million users, Digg is using their own graph-database [12].

As a social platform Digg enables users to create social networks by designating other users as friends.
Users can explore the stories their friends found interesting, which makes Digg also a social recom-
mendation engine.




                                                   33
4 Recommender Engine Examples




Figure 4.10: Digg – Recommendations
             1 Recommendations by the Recommender Engine
             2 Correlated User with Compatibility Percentage
             3 Highly Correlated Users with Compatibility Percentage




                                                34
4 Recommender Engine Examples




Figure 4.11: Digg – Correlated User
             1 Remove User from the Recommender Engine
             2 Shared “Dugg” Stories




                                            35
5 Conclusion

Recommender systems are a powerful technology for personalization. Used in the right way, they
can benefit both consumers and businesses. Consumers profit by finding new interesting products and
businesses can increase their sales.

As e-commerce continues to grow the technologies of recommender engines are challenged to deal
with greater amounts of data. Therefore systems must be developed further to meet this challenge in
terms of recommendation accuracy, scalability and performance.

Item-based collaborative filtering proves to be the best recommendation technique in terms of recom-
mendation quality, scalability, performance, and learning capability [7]. Combined in a hybrid system
with content-based techniques in order to overcome the cold start problem, this is the state of the art
of recommender systems used today.

There are many fields of application for recommender engines and many have their own requirements
that get fulfilled by different techniques. So which recommendation technique works best always
depends on the concrete use case.




                                                  36
Bibliography


[1] Burke, R. (2002): Hybrid Recommender Systems: Survey and Experiments.
    In: User Modeling and User-Adapted Interaction, Volume 12, Issue 4 (November 2002), Kluwer
    Academic Publishers, pp. 331–370

[2] Leavitt, N. (2006): Recommendation Technology: Will It Boost E-Commerce?.
    In: Computer Journal, Volume 39, Issue 5 (May 2006), IEEE Computer Society Press, pp. 13–16

[3] Thompson, C. (2008): If You Liked This, You’re Sure to Love That.
    In: The New York Times Magazine (November 21, 2008), http://www.nytimes.com/2008/11/
    23/magazine/23Netflix-t.html

[4] Schafer, J. B. et al. (2001): E-Commerce Recommendation Applications.
    In: Data Mining and Knowledge Discovery, Volume 5, Issue 1-2 (January–April 2001), pp. 115–
    153

[5] Kim, J. (2006): What is a recommender system?.
    In: Proceedings of Recommenders06.com (2006), pp. 1-21

[6] McCrae, J. et al. (2004): Collaborative Filtering.
    http://www.imperialviolet.org/suprema.pdf

[7] Candillier, L. et al. (2009): State-of-the-Art Recommender Systems.
    In: Collaborative and Social Information Retrieval and Access (2009), Idea Group Inc, pp. 1–22

[8] Adomavicius, G.; Tuzhilin, A. (2004): Recommendation Technologies: Survey of Current Meth-
    ods and Possible Extensions.
    Working paper, Stern School of Business, New York University




                                                37
Bibliography


 [9] Burke, R. (2007): Hybrid Web Recommender Systems.
     In: Lecture Notes in Computer Science (2007), Springer Berlin/Heidelberg, pp. 377–408

[10] ChoiceStream, Inc.: Personalization Technology Brief.
     http://www.choicestream.com/resources/

[11] Linden, G. et al. (2003): Amazon.com Recommendations: Item-to-Item Collaborative Filtering.
     In: IEEE Internet Computing, Volume 7, Issue 1 (January/February 2003), pp. 76–80

[12] Rose, K. (2008): Recommendation Engine Announcement.
     http://blog.digg.com/?p=127

[13] Kast, A. (2008): Digg Recommendation Engine White Paper.
     http://digg.com/whitepapers/recommendationengine




                                               38

More Related Content

What's hot

Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Datajob 2013 - Construire un système de recommandation
Datajob 2013 - Construire un système de recommandationDatajob 2013 - Construire un système de recommandation
Datajob 2013 - Construire un système de recommandationDjamel Zouaoui
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Recommender system
Recommender systemRecommender system
Recommender systemSaiguru P.v
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender EnginesThomas Hess
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsLior Rokach
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 

What's hot (20)

Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Datajob 2013 - Construire un système de recommandation
Datajob 2013 - Construire un système de recommandationDatajob 2013 - Construire un système de recommandation
Datajob 2013 - Construire un système de recommandation
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 

Viewers also liked

Seminar paper in lucknow international
Seminar paper in lucknow internationalSeminar paper in lucknow international
Seminar paper in lucknow internationalDr.Amol Ubale
 
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINAR
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINARPUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINAR
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINARjundumaug1
 
Seminar report guidelines1
Seminar report guidelines1Seminar report guidelines1
Seminar report guidelines1Shikhar Patwari
 
Sample seminar report
Sample seminar reportSample seminar report
Sample seminar reportFarman Khan
 

Viewers also liked (6)

Seminars, trainings or workshop on trends and issues on IT
Seminars, trainings or workshop on trends and issues on ITSeminars, trainings or workshop on trends and issues on IT
Seminars, trainings or workshop on trends and issues on IT
 
Seminar paper in lucknow international
Seminar paper in lucknow internationalSeminar paper in lucknow international
Seminar paper in lucknow international
 
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINAR
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINARPUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINAR
PUBLIC POLICY FORMULATION AND IMPLEMENTATION: REACTION TO SEMINAR
 
Seminar report guidelines1
Seminar report guidelines1Seminar report guidelines1
Seminar report guidelines1
 
Sample seminar report
Sample seminar reportSample seminar report
Sample seminar report
 
Reaction paper
Reaction paperReaction paper
Reaction paper
 

Similar to Recommender Engines Seminar Paper

Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environmentdivjeev
 
architectureplaybook-readthedocs-io-en-latest.pdf
architectureplaybook-readthedocs-io-en-latest.pdfarchitectureplaybook-readthedocs-io-en-latest.pdf
architectureplaybook-readthedocs-io-en-latest.pdfmomirlan
 
W java81
W java81W java81
W java81rasikow
 
advanced java.pdf
advanced java.pdfadvanced java.pdf
advanced java.pdfAli Bozkurt
 
Ibm watson analytics
Ibm watson analyticsIbm watson analytics
Ibm watson analyticsLeon Henry
 
Dimensional modelling sg247138
Dimensional modelling sg247138Dimensional modelling sg247138
Dimensional modelling sg247138Sourav Singh
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media LayerLinkedTV
 
Information extraction systems aspects and characteristics
Information extraction systems  aspects and characteristicsInformation extraction systems  aspects and characteristics
Information extraction systems aspects and characteristicsGeorge Ang
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014George Jenkins
 
Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Laust Rud Jacobsen
 

Similar to Recommender Engines Seminar Paper (20)

Oop c++ tutorial
Oop c++ tutorialOop c++ tutorial
Oop c++ tutorial
 
Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environment
 
architectureplaybook-readthedocs-io-en-latest.pdf
architectureplaybook-readthedocs-io-en-latest.pdfarchitectureplaybook-readthedocs-io-en-latest.pdf
architectureplaybook-readthedocs-io-en-latest.pdf
 
Thesis_Report
Thesis_ReportThesis_Report
Thesis_Report
 
W java81
W java81W java81
W java81
 
advanced java.pdf
advanced java.pdfadvanced java.pdf
advanced java.pdf
 
Ibm watson analytics
Ibm watson analyticsIbm watson analytics
Ibm watson analytics
 
IBM Watson Content Analytics Redbook
IBM Watson Content Analytics RedbookIBM Watson Content Analytics Redbook
IBM Watson Content Analytics Redbook
 
Dimensional modelling sg247138
Dimensional modelling sg247138Dimensional modelling sg247138
Dimensional modelling sg247138
 
Pentest standard
Pentest standardPentest standard
Pentest standard
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media Layer
 
Information extraction systems aspects and characteristics
Information extraction systems  aspects and characteristicsInformation extraction systems  aspects and characteristics
Information extraction systems aspects and characteristics
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014
 
Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...
 
test5
test5test5
test5
 
test6
test6test6
test6
 
test4
test4test4
test4
 
test5
test5test5
test5
 
test6
test6test6
test6
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Recommender Engines Seminar Paper

  • 1. RWTH Aachen University University of Bonn Fraunhofer FIT E-Commerce Seminar WT 08/09 Recommender Engines Seminar Paper Thomas Hess (289222) February 1, 2009
  • 2. Abstract Recommender engines are used by more and more e-commerce businesses to help con- sumers finding products they are interested in. The paper describes what recommender engines are and what role they play in e-commerce. Recommender engines use various techniques that use dif- ferent knowledge sources to make recommendations. The paper explains these techniques and their strengths and weaknesses. Some of the common issues that recommender systems face are discussed and possible solutions presented. Concluding examples of recommender engines in e-commerce are described. It is shown what techniques they use and how the e-businesses utilize recommendations on their websites.
  • 3. Contents 1 Introduction 5 2 Recommender Techniques 6 2.1 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4.1 User-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 Item-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.3 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Issues And Solutions 14 3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Cold Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Stability vs. Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Performance & Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 User Input Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Recommender Engine Examples 19 4.1 ChoiceStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Amazon.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Digg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Conclusion 36 3
  • 4. List of Figures 2.1 Knowledge Sources of Recommender Engines . . . . . . . . . . . . . . . . . . . . . 6 2.2 Non-Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Demographic Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Content-Based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 User-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 11 2.7 Item-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.8 Item-Based Collaborative Filtering Example . . . . . . . . . . . . . . . . . . . . . . 12 2.9 Model-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 ChoiceStream Recommender Engine . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Amazon – Item With Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Amazon – Shopping Cart With Recommendations . . . . . . . . . . . . . . . . . . . 24 4.4 Amazon – Your Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Amazon – Recommendation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.6 Amazon – Your Purchases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.7 Digg – Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.8 Digg – Topic Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.9 Digg – Homepage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.10 Digg – Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.11 Digg – Correlated User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4
  • 5. 1 Introduction Recommender engines are personalized information agents that attempt to predict which items out of a large pool a user may be interested in. These items can be of any type, like movies, music, books, websites, or news articles. The user’s interest in an item is expressed through the rating the user gives the item. A recommendation system has to predict the ratings for items that the user has not yet seen. With these estimated ratings the system can recommend the items that have the highest estimated rating. Recommender engines have become an integral part of many e-commerce businesses [1, 2]. They are a serious business tool that gets used by an ever-increasing number of online stores. Recommender systems are an unique feature of e-commerce, as websites are able to track everything their customers do, in contrast to real stores. The knowledge learned from the customers’ behaviour is the basis for the recommendations. Because online businesses have no real space constraint, they can offer much larger stocks, providing their customers with more choices. These large stocks become impossible to stack search, so e-commerce stores must provide personalized versions with reduced choices to the individual users. One way to achieve this is the use of recommender engines. For e-commerce vendors, recommender engines provide multiple benefits. Good recommender sys- tems present customers products they are interested in but did not plan to buy, making them purchase more items [2, 3, 4]. These unplanned purchases are not yet happening as often in online stores as in traditional stores [2]. Recommender engines can help to gain consumers’ loyalty, which is a essential business strategy in e-commerce as the competitor is always just “one click away” [4]. Because rec- ommender systems make it easier und faster to find new items, customers come back more often [2]. The more a user uses a website and purchases items, the more the recommender engine learns about the user and the better the recommendations get. This helps to build a “value-added relationship” between the website and the user [4]. Recommender systems are also a way to promote older or low-demand items, such as niche products [2]. 5
  • 6. 2 Recommender Techniques The techniques used by recommender engines can be classified based on the information sources they use [5, 2]. The available sources are the user features (demographics) (e.g. age, gender, profession, income, location), the item features (e.g. keywords, genres), and the user-item ratings (gathered through questionnaires, explicit ratings, transaction data). See figure 2.1. 2.1 Non-Personalized Recommendation Non-personalized recommendations are identical for each user. The recommendations are either man- ually selected (e.g. editor choices) or based on the popularity of items (e.g. average ratings, sales data). See figure 2.2. Figure 2.1: Knowledge Sources of Recommender Engines (From [5]) 6
  • 7. 2 Recommender Techniques Figure 2.2: Non-Personalized Recommendation (From [5]) Because non-personalized recommendations are easy to compute, they are popular among e-commerce businesses. They are also an option for websites that offer no personalization. 2.2 Demographic Recommendation Demographic recommendation methods uses only the information about the users. The users are categorized based on the attributes of their demographic profiles in order to find users with similar features. The engine then recommends items that are preferred by these similar users. See figure 2.3. Advantages • Because user-item ratings are not used, new users can get recommendations before they have rated any item. • Knowledge about the items and their features is not needed, therefore the technique is domain- independent. Figure 2.3: Demographic Recommendation (From [5]) 7
  • 8. 2 Recommender Techniques Figure 2.4: Content-Based Recommendation (From [5]) Problems • Gathering the required demographic data leads to privacy issues, see 3.7. • Demographic classification is too crude for highly personalized recommendations [5, 3]. The generalisations created from the classification are often false, especially when it comes to cul- tural items like books, music, or movies [6, 3]. • Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6). • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.3 Content-Based Recommendation Content-based recommendation methods use the information about item features and the ratings a user has given to items. The technique combines these ratings to a profile of the user’s interests based on the features of the rated items. The engine then can find items with the preferred features and recommend the items with the highest similarity to the ones preferred in the past. See figure 2.4. The recommendations of a content-based system are based on individual information and ignore contribu- tions from other users. The profiles of the users’ interests are often represented as vectors of weights on item features. But if automatic learning methods, like a rule induction algorithm, are used to generate them, they can also be rule-based [7]. Content-based recommendation works well if the items can be properly represented as a set of fea- tures. The quality of the recommendations depends directly on the quality of the available descriptive data. In order to have a sufficient set of features, the item descriptions must either be in a form from which features can be extracted automatically with information retrieval techniques (e.g. text), or 8
  • 9. 2 Recommender Techniques the features must be assigned manually, which takes a lot of resources [8]. Besides objective cate- gorizations, systems can also use (user-generated) tags associated to items that provide a subjective view. Problems • Content analysis is necessary to determine the item features. • The technique depends not only on the quality of the item metadata but also on the homogeneity of the stock, so items can be categorized. • The quality of items cannot be evaluated. The similarity computation is limited to the item features [5]. • The technique suffers from the cold start problem for new users, see 3.2. • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.4 Collaborative Filtering Collaborative filtering techniques use the user behaviour in form of the user-item ratings as their in- formation source. The concept is to make correlations between users or between items.Collaborative filtering is widely implemented and the most mature recommendation technique. Three main ap- proaches of collaborative filtering can be distinguished: user-based, item-based, and model-based approaches. Advantages • Like for demographic recommendations no knowledge about the item features is needed. Col- laborative filtering works completely independent of machine-readable item representations. It is therefore domain independent. • The quality (not just the relevancy) of items can be evaluated, as it is also expressed through user-item ratings [5]. • Collaborative filtering techniques are able to make recommendations “outside the box” because they look outside the preferences of the individual user [1]. 9
  • 10. 2 Recommender Techniques Figure 2.5: User-Based Collaborative Filtering (From [5]) Problems • The quality of the recommendations depends on the size of the historical rating data set. • The technique suffers from the cold start problem for new users and new items, see 3.2. • Users with an unusual taste may not get good recommendations (“gray sheep” problem, see 3.6). • Once established user preferences do not change easily (stability vs. plasticity problem, see 3.3). 2.4.1 User-Based Approach The user-based approach is based on the assumption that users that rated the same items similarly probably have the same taste. It make user-to-user correlations by using the rating profiles of different users to find highly correlated users. These users form like-minded neighbourhoods based on their shared item preferences. The engine then can recommend the items preferred by the other users in the neighbourhood. See figure 2.5. Figure 2.6 shows an example of user-based collaborative recommendation. But if there are little overlapping ratings across users in the data set, the user-based approach runs into the sparsity problem, see 3.4. User-based collaborative filtering does not scale well for many users and items, because the analysis and comparison processes become more complex, see 3.5. 2.4.2 Item-Based Approach The item-based approach focuses on items, assuming that items rated similarly are probably similar. It compares items based on the shared appreciation of users, in order to create neighbourhoods of similar 10
  • 11. 2 Recommender Techniques Figure 2.6: User-Based Collaborative Filtering Example (From [5]) items. The engine then recommends the neighbouring items of the user’s know preferred ones. See figure 2.7. Figure 2.8 shows an example of item-based collaborative recommendation. Item-based collaborative filtering is more scalable than the user-based approach, as the correlations are drawn among a limited number of products, instead of a potentially very large number of users. Items are also easy to categorize, while users’ activities must be examined and analyzed. See 3.5. Also because the number of items is naturally smaller than the number of users, the item-based ap- proach has a reduced sparsity problem (see 3.4) in comparison to the user-based approach. Figure 2.7: Item-Based Collaborative Filtering (From [5]) 11
  • 12. 2 Recommender Techniques Figure 2.8: Item-Based Collaborative Filtering Example (From [5]) 2.4.3 Model-Based Approach For huge data sets, the quadratic complexity of the user-item rating matrix gets very high [7]. But in real applications predictions must me made quickly. Model-based approaches address this problem by deriving a model for prediction from historical user-item rating data, in order to make the online prediction process faster. To build the model learning techniques like bayesian networks, neural net- works, or latent semantic indexing are used. For an accurate model a large amount of data must be available. The engine then makes the online recommendations by using the model. See figure 2.9. As the model is build in advance of the online recommendation processes, this approach has a higher performance than the memory-based approaches and avoids the scalability problem, see 3.5. Depend- ing on the learning techniques used to create the model, this approach can lead to a higher recommen- dation accuracy and a reduced sparsity problem [5]. The major drawback of the model-based approach is that the recommendation results do not adapt Figure 2.9: Model-Based Collaborative Filtering (From [5]) 12
  • 13. 2 Recommender Techniques automatically to data changes. Instead the model must be re-build to reflect updated data. 2.5 Hybrid Approaches Hybrid approaches combine collaborative and demographic or content-based methods in order to over- come their drawbacks. Collaborative filtering systems often result in better predictive performance but have problems when limited user-item ratings are available [7]. Demographic and content-based recommendation systems work without rating data and therefore can compensate for the cold start problem [1]. There are various methods to combine recommender techniques in a hybrid system [1, 9]: Weighted Hybridization The scores of the different recommendation components are combined numerically. Each component of the hybrid system scores a given item and the scores are combined using a linear formula. Switching Hybridization The system chooses among recommendation components based on the situation and applies the selected one. Some reliable criterion must be available on which to base the switching decision. Mixed Hybridization Recommendations from different recommenders are presented side-by-side in a combined list. The results of the recommender systems are not combined. Feature Combination Features derived from different knowledge sources are combined together and then injected into a single recommendation algorithm. Feature Augmentation One recommendation technique is used to compute a feature or set of fea- tures, which is then part of the input to the next technique. Cascaded Hybridization Recommenders are given strict priority, with the lower priority ones break- ing ties in the scoring of the higher ones. Meta-Level Hybridization One recommendation technique is applied to produce a model, which is then used as the input for another technique. 13
  • 14. 3 Issues And Solutions 3.1 Data Collection The data used by recommender engines can be categorized into explicit and implicit data [2]. Explicit is all data that users themselves feed into the system. Like demographic data, information about their preferences (e.g. collected through questionnaires), search terms, explicit ratings and reviews of items (wisdom of the crowds). The collection of explicit data must not be intrusive or time consuming. The way the explicit data is collected can affect the quality and amount of data the users will provide [10]. Recommendation systems should not rely completely on explicit data. Websites are able to track their user’s activities in order to acquire implicit data. The most important implicit data source in e-commerce is the transaction data including the purchase information. Other sources are web usage patterns like click sequences or reading times, or search engine referrers. Implicit data needs to be analyzed first before it can be used to describe user features or user-item ratings. 3.2 Cold Start The cold start problem occurs when too little rating data is available in the initial state. The rec- ommender system then lacks data to produce appropriate recommendations. A distinction is made between the new user and new item problem. New User Problem When recommendations follow from user-to-user correlations based on the accumulation of ratings, a user with few ratings is difficult to categorize. 14
  • 15. 3 Issues And Solutions New Item Problem A item with few ratings cannot easily be recommended. This problem occurs particularly in domains with many new items (e.g. news articles). As the problem also occurs for long tail items, it is also called “long tail problem” [10]. A solution to the cold start problem is the combination of the collaborative technique with demo- graphic (for the new user problem) or content-based (for the new item problem) techniques in a hybrid recommender engine, see 2.5. That way the cold start problem gets compensated by techniques that don’t rely on user-item ratings. Other solutions to reduce the cold start problem are the use of default ratings (e.g. from the average rating of all users) [6, 10] or the use of active learning techniques in model-based recommendation techniques [5]. 3.3 Stability vs. Plasticity The converse of the cold start problem is the stability vs. plasticity problem. When users have rated a lot of items, their preferences in the established user profiles are difficult to change [1, 9]. But because in reality taste evolves, this becomes a problem. The solution for this is to gradually discount older ratings to have less influence. But by doing so engines risk to loose information about long-term interests [1, 9]. Related to this problem is that users may use a website with different intentions. For example one day a customer buys books for himself, but the next day he is looking for a present for someone else. 3.4 Sparsity In most use cases for recommender systems, due to the catalog sizes of e-business vendors, the number of ratings already obtained is usually very small compared to the number of ratings that need to be predicted. But collaborative filtering techniques depend on an overlap in ratings across users and have difficulties when the space of ratings is sparse (few users have rated the same items). Sparsity in the user-item rating matrix degrades the quality of the recommendations. 15
  • 16. 3 Issues And Solutions To reduce the sparsity the rating data needs to be adjusted by either adding additional ratings or reducing the dimensionality of the matrix. Ratings can be augmented by inserting simulated values on behalf of the users. These can be ratings derived from other (implicit) data sources, like item views or clicks, or default values [6]. The dimensionality of the rating matrix can be reduced by techniques such as singular value decompo- sition [1]. Singular value decomposition is a well-known method for matrix factorization that provides the best lower rank approximations of the original matrix. Dimensionality reduction techniques are often used in model-based collaborative filtering approaches [1]. 3.5 Performance & Scalability Performance and scalability are important issues for recommender systems as e-commerce websites must be able to determine recommendations in real-time and often deal with huge data sets of millions of customers and items. The big growth rates of e-businesses are making the sets even larger in the user dimension [6]. Definitive for the performance is the computational complexity of a recommendation technique. Tech- niques that calculate correlation coefficients for M users over N items have a complexity of O(M × N) in the worst case. Due to the common sparsity of the user-item rating matrix the performance tends to be closer to O(M + N) [11]. However for large data sets this still leads to performance and scaling issues. Techniques that can perform the most expensive calculations offline scale better than techniques where everything must be calculated online, in real time [11]. Demographic and content-based recommen- dation as well as item- and model-based collaborative filtering can utilize offline computation. But user-based collaborative filtering can do little or no offline computing, which makes it impractical for large data sets [11]. Additionally to performing calculations offline, all methods that help reducing the size of the data set improve performance and scalability of a recommendation technique [6]. For example users with very few ratings or very popular or unpopular items could be discarded [11]. But these methods also reduce the recommendation quality. 16
  • 17. 3 Issues And Solutions 3.6 User Input Consistency Recommender techniques that work with user-to-user correlations, like demographic or collaborative filtering, depend on high correlation coefficients between the users in a data set. Users can be split into three classes based on their correlation coefficients with other users [6]. The majority of users fall into the class of “white sheep”, which have a high rating correlation with many other users. Engines can easily find recommendations for these users. The opposite type are the “black sheep”. For them there are only few or no correlating users. This makes it very difficult to find recommendations for them. But when the number of overall users in a data set increases, the chance to find similar users increases as well. The bigger problem is the “gray sheep” problem. These users have different opinions or an unusual taste, that results in low correlation coefficients with many users. They fall on a border between user cliques. Recommendations for them are very difficult to find and they also cause odd recommenda- tions for their correlated users. 3.7 Privacy Privacy is an important issue in recommender systems. In order to provide personalized recommen- dations, recommender systems must know something about the users. In fact, the more the systems know, the more accurate the recommendations can get. Users are reasonably concerned about what information is collected, how it is used, and if it is stored. These privacy concerns affect both, the collection of explicit and implicit data. Regarding explicit data, users are reluctant to disclose information about themselves and their interests [2, 4]. If ques- tionnaires get too personal, users may provide false information in order to protect their privacy [4]. Recommender engines should be able to deal with privacy concerned users and not solely rely on explicit data or recommender techniques that do, like demographic recommendation. Regarding implicit data that gets acquired by tracking users’ behaviour, there are concerns that per- sonal taste or private actions get revealed through the recommendations [5]. Users fear that extensive consumer profiles get created. 17
  • 18. 3 Issues And Solutions To confront these concerns e-commerce businesses muss provide privacy protection mechanisms [5] and make transparent which data gets acquired and analyzed. Usage und storage restrictions must be assured through privacy policies [4]. 18
  • 19. 4 Recommender Engine Examples Recommender engines are developed and run by independent technology vendors and by e-commerce businesses themselves. The business model of recommendation technology vendors is either to offer the recommender engine as a hosted service or to license their engines to e-commerce businesses. Examples for technology vendors are: ChoiceStream1 , Baynote2 , ExpertMaker3 , Loomia4 , Criteo5 , SourceLight6 , and Collar- ity7 . Especially bigger e-commerce businesses develop their own recommender solutions because they have unique requirements, want unique features, or deal with items that third-party products are not suited for. Examples are: Amazon.com8 , Netflix9 , Digg10 , The Internet Movie Database (IMDb)11 , Pandora12 , and Last.fm13 . In the following the techniques and usages of the recommender engines of ChoiceStream, Ama- zon.com, and Digg are described in detail. 1 http://www.choicestream.com 2 http://www.baynote.com 3 http://www.expertmaker.com 4 http://www.loomia.com 5 http://www.criteo.com 6 http://www.sourcelight.com 7 http://www.collarity.com 8 http://www.amazon.com 9 http://www.netflix.com 10 http://digg.com 11 http://www.imdb.com 12 http://www.pandora.com 13 http://www.last.fm 19
  • 20. 4 Recommender Engine Examples 4.1 ChoiceStream ChoiceStream is a personalisation company that offers their recommendation technology “RealRele- vance Recommendations” as a fully-hosted service for e-commerce vendors. Because the different recommendation techniques all have their drawbacks and are not suited for all fields of application, ChoiceStream is using a hybrid system based on a variety of techniques that are chosen and combined depending on the concrete recommendation use case on hand [10]. The use cases that ChoiceStream distinguishes are listed in table 4.1. The recommendation techniques used by the ChoiceStream recommender engine are [10]: Collaborative Filtering Both, user-based and item-based collaborative filtering are used. Collaborative Filtering Using Multiple Correlation Tables Use of multiple correlation tables (e.g. item views or clicks in addition to transactions) to overcome the cold start problem (see 3.2). Cohort Analysis Creation of groups of similar users, called cohorts, in order to make better recom- mendations for users with sparse rating data. Use Case Definition Rich Profile User Users for whom you have a lot of data (e.g. more than 5 transac- tions). Sparse Profile User Users for whom you have little data (e.g. fewer than 1 to 4 trans- actions). Anonymous / New User Users for whom you have no data. Popular Content Items in your catalog that you can determine are “most popular”. Typically these will be few in number, but very high volume. Mainstream Content Items for which you have recorded patterns of behavior (e.g. more than 20 transactions per the items). New Content Items for which there are no past transactions. Long Tail Content Items in a catalog which are less well known, but still profitable, and for which there are few past transactions. Business Goal Optimization The requirement to maximize a metric other than the number of transactions, such as revenue, margin, or order size. Table 4.1: ChoiceStream – Common Use Cases Requiring Different Algorithms (From [10]) 20
  • 21. 4 Recommender Engine Examples Selective Filtering By selective filtering the most popular items are taken out of the recommenda- tions, so they don’t dominate and customers can find less popular items. Attribute Correlations Item attributes are used to make content-based recommendations to over- come the cold start problems of collaborative filtering. Default Recommendations Default recommendations are the fallback function if all other tech- niques fail to determine recommendations. Business Goal Optimization With a multi-term scoring function the recommendation algorithm can be adjusted to for example preferably recommend higher-priced items in order to increase revenue. Figure 4.1 shows what techniques are used for which use cases by the ChoiceStream recommender engine. Figure 4.1: ChoiceStream Recommender Engine (From [10]) 21
  • 22. 4 Recommender Engine Examples 4.2 Amazon.com Amazon.com, founded in 1994, is the largest online retailer worldwide and one of the most well know example of e-commerce businesses utilizing a recommender engine. Amazon uses it’s recommenda- tion engines extensively to personalize its website. Amazon’s recommender engine is based on item-based collaborative filtering [5, 6, 11]. It looks for items correlating to the ones purchased and rated and combines the highly correlated items into a recommendation list [11]. The recommendation engine consists of an online and an offline component. The offline component creates an item-to-item matrix with all similar items. The online component can then lookup recom- mendations in the matrix when they are needed [11]. To build the item-to-item matrix a similarity function is used that determines the correlation coefficient between item pairs that customers tend to purchase together. This expensive calculation is done offline [11, 6]. The online component then only has to lookup similar items to the ones a user already has purchased or rated. This is a very easy and fast operation that can be done online in real-time. Its complexity only depends on the number of items a customer is associated with [11]. By performing the most expensive calculations offline Amazon’s recommendation system can deal with the huge data set of approximately 50 million customers per month (only from the U.S.) and several million catalog items. The online component scales independently of the catalog size and the number of customers [11]. Another benefit of the created similar-items table is that the algorithm produces higher quality recommendations for users with little user-item rating data than traditional collaborative filtering [11]. Customers Who Bought On the information page for every item, Amazon shows the “Customers Who Bought” feature that recommends items frequently purchased by customers who purchased the selected item, see Figure 4.2. As figure 4.3 shows, the feature is also used on the shopping cart page. This works as the equivalent to the impulse items in a supermarket checkout line [11], but here the impulse items are personalized for each customer. 22
  • 23. 4 Recommender Engine Examples Figure 4.2: Amazon – Item With Recommendations 23
  • 24. 4 Recommender Engine Examples Figure 4.3: Amazon – Shopping Cart With Recommendations 24
  • 25. 4 Recommender Engine Examples Your Recommendations On the page “Your Recommendations” all recommendations are listed with the ones derived from recent purchases in front, see Figure 4.4. They can be filtered by product line and subject area. Users can mark the recommended items as already owned or as not interesting as well as rate them in order to provide the recommender engine with further rating data to influence what gets recommended. It is also shown why an item is recommended, that is which purchased item is correlated to the recommended item. Additionally the user can view a detail page for every recommendation that lists all correlations to purchased or otherwise rated items, see Figure 4.5. Amazon encourages users to refine their user-item rating data by giving the option to rate purchased items on a 5-point scale. On a page that lists all previous purchases the items can be rated and also excluded from the recommendation calculation, see Figure 4.6. 25
  • 26. 4 Recommender Engine Examples Figure 4.4: Amazon – Your Recommendations 1 Recommended items can be marked as owned or not interested in and be rated 2 It is shown why items are recommended. 26
  • 27. 4 Recommender Engine Examples Figure 4.5: Amazon – Recommendation Details 27
  • 28. 4 Recommender Engine Examples Figure 4.6: Amazon – Your Purchases 1 Items can be rated 2 Items can be excluded from the recommendation engine 28
  • 29. 4 Recommender Engine Examples 4.3 Digg Digg is social news site, launched in 2004, where users can submit links to websites. Users can rate these links, called stories, by “digging” or “burying” them. Stories can also be favorited, shared, and commented on. See figure 4.7. The stories are categorized into various topics. A user can configure which topics he is interested in and will then only see stories in these categories throughout the website, see Figure 4.8. On the Digg homepage the most popular stories are shown, see Figure 4.9. The popularity is measured by the number of recent “diggs”. Thereby the homepage utilizes non-personalized recommendation. For registered users Digg provides personalized recommendations through their own recommendation engine, which is based on user-based collaborative filtering. The engine relies solely on the user-item ratings express by the the “digg” function. It works without knowledge about the content of the stories [12]. The recommendation engine uses the user’s history of “dugg” stories in the last thirty days to make recommendations [13]. This short time span is appropriate for fast moving internet news, avoids the stability vs. plasticity problem, and helps to keep the size of the ratings matrix within limits. Every time a user “diggs” a story, the engine associates the user with all other users who also have “dugg” the story. Out of these associations the recommender system calculates a correlation coef- ficient between the users. The coefficient is based on the number of “dugg” stories in common in relation to the total number of stories “dugg” by each of the associated users [13]. The coefficient has a value between one zero and one. Zero if both users have never “dugg” the same story. One if the users share all their “dugg” stories. The coefficient calculation automatically accounts for the overall level of user activity. If a user “diggs” a lot of stories, the number of common “dugg” stories must be high to get a high correlation coefficient. If a user “diggs” rarely, a small amount of agreement can suffice. The users highly correlated to a user are called “Diggers Like You”. The engine recommends the upcoming stories that have been “dugg” by these users, minus the stories the user has already “dugg” or buried. Stories are upcoming if they are newly submitted and have not made it to the homepage yet. The “Diggers Like You” therefore work as a filter for all the upcoming stories. In average numbers this means that more than 17,000 submissions per day get boiled down to about 300 recommenda- tions [12]. 29
  • 30. 4 Recommender Engine Examples Figure 4.7: Digg – Story 1 Users can “Digg” Stories 2 Users can Share and Favorite Stories 3 Recommendations by the Recommender Engine 30
  • 31. 4 Recommender Engine Examples Figure 4.8: Digg – Topic Settings 31
  • 32. 4 Recommender Engine Examples Figure 4.9: Digg – Homepage 1 Non-Personalized Recommendations 2 Personalized Recommendations from the Recommender Engine 32
  • 33. 4 Recommender Engine Examples A user’s recommended upcoming stories are displayed on the recommendations page, see Figure 4.10. On the right pane of the page a list of the most highly correlated users with their compatibility per- centage is shown. The compatibility percentage represents the correlation coefficient. This allows the user to explore the correlated users. Also for every recommended story the correlated users, that have “dugg” this story, are shown including their compatibility percentage. By clicking on the compatibil- ity percentage of a correlated user a page is shown, that displays the correlation to this user in detail, see Figure 4.11. It is listed which stories both users have “dugg” and which stories are at the moment recommended through this correlation. The user is also able to remove the correlation to this user from his recommendation calculation. The recommender engine works in real-time without prediction models or batch processing. In order to achieve this for more than 2 million users, Digg is using their own graph-database [12]. As a social platform Digg enables users to create social networks by designating other users as friends. Users can explore the stories their friends found interesting, which makes Digg also a social recom- mendation engine. 33
  • 34. 4 Recommender Engine Examples Figure 4.10: Digg – Recommendations 1 Recommendations by the Recommender Engine 2 Correlated User with Compatibility Percentage 3 Highly Correlated Users with Compatibility Percentage 34
  • 35. 4 Recommender Engine Examples Figure 4.11: Digg – Correlated User 1 Remove User from the Recommender Engine 2 Shared “Dugg” Stories 35
  • 36. 5 Conclusion Recommender systems are a powerful technology for personalization. Used in the right way, they can benefit both consumers and businesses. Consumers profit by finding new interesting products and businesses can increase their sales. As e-commerce continues to grow the technologies of recommender engines are challenged to deal with greater amounts of data. Therefore systems must be developed further to meet this challenge in terms of recommendation accuracy, scalability and performance. Item-based collaborative filtering proves to be the best recommendation technique in terms of recom- mendation quality, scalability, performance, and learning capability [7]. Combined in a hybrid system with content-based techniques in order to overcome the cold start problem, this is the state of the art of recommender systems used today. There are many fields of application for recommender engines and many have their own requirements that get fulfilled by different techniques. So which recommendation technique works best always depends on the concrete use case. 36
  • 37. Bibliography [1] Burke, R. (2002): Hybrid Recommender Systems: Survey and Experiments. In: User Modeling and User-Adapted Interaction, Volume 12, Issue 4 (November 2002), Kluwer Academic Publishers, pp. 331–370 [2] Leavitt, N. (2006): Recommendation Technology: Will It Boost E-Commerce?. In: Computer Journal, Volume 39, Issue 5 (May 2006), IEEE Computer Society Press, pp. 13–16 [3] Thompson, C. (2008): If You Liked This, You’re Sure to Love That. In: The New York Times Magazine (November 21, 2008), http://www.nytimes.com/2008/11/ 23/magazine/23Netflix-t.html [4] Schafer, J. B. et al. (2001): E-Commerce Recommendation Applications. In: Data Mining and Knowledge Discovery, Volume 5, Issue 1-2 (January–April 2001), pp. 115– 153 [5] Kim, J. (2006): What is a recommender system?. In: Proceedings of Recommenders06.com (2006), pp. 1-21 [6] McCrae, J. et al. (2004): Collaborative Filtering. http://www.imperialviolet.org/suprema.pdf [7] Candillier, L. et al. (2009): State-of-the-Art Recommender Systems. In: Collaborative and Social Information Retrieval and Access (2009), Idea Group Inc, pp. 1–22 [8] Adomavicius, G.; Tuzhilin, A. (2004): Recommendation Technologies: Survey of Current Meth- ods and Possible Extensions. Working paper, Stern School of Business, New York University 37
  • 38. Bibliography [9] Burke, R. (2007): Hybrid Web Recommender Systems. In: Lecture Notes in Computer Science (2007), Springer Berlin/Heidelberg, pp. 377–408 [10] ChoiceStream, Inc.: Personalization Technology Brief. http://www.choicestream.com/resources/ [11] Linden, G. et al. (2003): Amazon.com Recommendations: Item-to-Item Collaborative Filtering. In: IEEE Internet Computing, Volume 7, Issue 1 (January/February 2003), pp. 76–80 [12] Rose, K. (2008): Recommendation Engine Announcement. http://blog.digg.com/?p=127 [13] Kast, A. (2008): Digg Recommendation Engine White Paper. http://digg.com/whitepapers/recommendationengine 38