Machine Learning at Quora

Xavier (Xavi) Amatriain

Leading AI Products at Google

Published Nov 18, 2015

Ever since I joined Quora almost a year ago I have been talking about all the very interesting machine learning challenges that we have here. However, when I attended and spoke at MLConf last week I was surprised that many of the people I talked to still had not heard about what we are doing. In this post I am going to briefly summarize and give a few good pointers for you to read more about it.

What we care about at Quora

Quora's mission is to "Share and grow the world's knowledge". We believe that there is a lot of knowledge that is still in people's heads and we want to bring to the Internet and then provide a way that this knowledge can be shared in a way that is not only efficient but also engaging. We do that by mostly using the paradigm of Questions and Answers, but it is important to know that this is just our vehicle of choice for accomplishing our mission.

One of the things that makes Quora unique is how much we care about three orthogonal dimensions: Relevance, Demand, and Quality.

We care about relevance because we want to make sure that each person gets the kind of knowledge they are most interested on. We care about demand because we want to make sure that questions that many people have get good answers. And, last, but very importantly, we care about quality because we believe that quality is an intrinsic property of knowledge. "Bad quality knowledge" is, after all, no knowledge at all.

These three dimensions are important because they are the ones that we will optimize in our product features as well as our machine learning models.

The data

You can think of Quora as the overlay between a knowledge base, a topic interest network, and a social network. This creates a very rich ecosystem of data and data relations that we can use in our machine learning approaches. Take a look at the diagram below.

This summarizes the different data and data relations. For example, a user can follow and endorse another user on a given topic. A user can also follow a topic. Users also act on questions and answers by either writing them or upvoting/downvoting them... and so on.

This complex ecosystem provides a lot of opportunities for leveraging data to improve our product and our user experience. In order to do this, we first need to understand the different effects that are already in place. Our Data Science team has a number of interesting posts on some of this research. Take a look at this post by Shankar Iyer on the dynamics of Upvotes in the network or this other by Don van der Drift on the Quora Topic Network.

ML product solutions

We use machine learning in many different parts of the product. These are some of the product features that use machine learning behind the scenes:

Answer ranking
Feed ranking
Topic recommendations
User recommendations
Email digest
Ask2Answer
Duplicate Questions
Related Questions
Spam/moderation
Trending now
...

Each of those solutions needs different kind of data both for training and testing and for feature generation. We also need to define different target functions and metrics to optimize to. And, of course, we need to use different machine learning models.

Machine Learning models

As mentioned above, we need different models in order to implement the machine learning product features that we are interested on. Some of them will require a learning-to-rank approach while others will need a binary classifier.

Here is a non-exhaustive list of machine learning models we use:

Logistic Regression
Elastic Nets
Gradient Boosted Decision Trees
Random Forests
(Deep) Neural Networks
LambdaMART
Matrix Factorization
LDA
...

To be clear, we don't use so many models to brag about how many models we know. We do so because it turns out that they do end up working best in some scenarios. Random Forests and Gradient Boosted Decision Trees are pretty interchangeable, but if one works best in one case why not use it if it doesn't add complexity to the system?

Do you want to work on this cool stuff?

Here are some positions you might be interested on:

Ankit Dhingra

Data Engineer | ex-Airtable, Airbnb, Facebook

Thanks for sharing this Xavier. I was curious to know if you guys have written in greater details about how you assess the quality of an answer and what part ML plays in that. There are obviously a lot of signals that one has to listen to: number of up-votes, number of downvotes, who upvotes (a seasoned users upvote vs a new users upvote), etc. How do you go about assessing which signals are better predictors than the others through Machine Learning.

Jamone Kelly

Sr. Mobile Engineer at Designer Brands | Apple Entrepreneur Camp ‘22 Alumni | Cofounder

Very informative. Thank you for taking the time to share.

Luis Alberto Santana

Software Engineering Manager @ Nubank || Engineering Leadership / Fintech / Distributed Systems

Thanks for the insight! As a Software Engineer interested in Machine Learning this post is awesome!

Jalem Raj Rohit

ML at Paisabazaar

A wonderful post, Xavier. Now, you made me watch that video of MLConf :D

See more comments

To view or add a comment, sign in

See all

Machine Learning at Quora

Xavier (Xavi) Amatriain

Leading AI Products at Google

What we care about at Quora

The data

ML product solutions

Machine Learning models

Further reading

Do you want to work on this cool stuff?

More articles by this author

Insights from the community

Explore topics

What we care about at Quora

The data

ML product solutions

Machine Learning models

Further reading

Do you want to work on this cool stuff?

Beyond Prompt Engineering: The Multi-Layered Cake of GenAI Development

Oct 26, 2023

The Limits of Expertise: A Comparative Look at LLMs and Medical Doctors

May 2, 2023

The AI Re-evolution: Beyond ChatGPT and the Dawn of Superhuman Capabilities

Mar 29, 2023

Prompt Engineering 101 - Introduction and resources

Jan 5, 2023

2022 in books

Dec 17, 2022

ChatGPT: revolution or hype?

Dec 12, 2022

I am [in]!

Nov 10, 2022

Quora at the Recsys 2016 Conference in Boston

Sep 13, 2016

Football or Futbol? Or why Deep Learning will not make other Machine Learning approaches obsolete

Apr 21, 2016

Dealing with difference of opinions in a team

Mar 31, 2016

Insights from the community

Explore topics