Machine Learning at Quora

Machine Learning at Quora

Ever since I joined Quora almost a year ago I have been talking about all the very interesting machine learning challenges that we have here. However, when I attended and spoke at MLConf last week I was surprised that many of the people I talked to still had not heard about what we are doing. In this post I am going to briefly summarize and give a few good pointers for you to read more about it.

What we care about at Quora

Quora's mission is to "Share and grow the world's knowledge". We believe that there is a lot of knowledge that is still in people's heads and we want to bring to the Internet and then provide a way that this knowledge can be shared in a way that is not only efficient but also engaging. We do that by mostly using the paradigm of Questions and Answers, but it is important to know that this is just our vehicle of choice for accomplishing our mission.

One of the things that makes Quora unique is how much we care about three orthogonal dimensions: Relevance, Demand, and Quality.

We care about relevance because we want to make sure that each person gets the kind of knowledge they are most interested on. We care about demand because we want to make sure that questions that many people have get good answers. And, last, but very importantly, we care about quality because we believe that quality is an intrinsic property of knowledge. "Bad quality knowledge" is, after all, no knowledge at all.

These three dimensions are important because they are the ones that we will optimize in our product features as well as our machine learning models.

The data

You can think of Quora as the overlay between a knowledge base, a topic interest network, and a social network. This creates a very rich ecosystem of data and data relations that we can use in our machine learning approaches. Take a look at the diagram below.

This summarizes the different data and data relations. For example, a user can follow and endorse another user on a given topic. A user can also follow a topic. Users also act on questions and answers by either writing them or upvoting/downvoting them... and so on.

This complex ecosystem provides a lot of opportunities for leveraging data to improve our product and our user experience. In order to do this, we first need to understand the different effects that are already in place. Our Data Science team has a number of interesting posts on some of this research. Take a look at this post by Shankar Iyer on the dynamics of Upvotes in the network or this other by Don van der Drift on the Quora Topic Network.

ML product solutions

We use machine learning in many different parts of the product. These are some of the product features that use machine learning behind the scenes:

  • Answer ranking
  • Feed ranking
  • Topic recommendations
  • User recommendations
  • Email digest
  • Ask2Answer
  • Duplicate Questions
  • Related Questions
  • Spam/moderation
  • Trending now
  • ...

Each of those solutions needs different kind of data both for training and testing and for feature generation. We also need to define different target functions and metrics to optimize to. And, of course, we need to use different machine learning models.

Machine Learning models

As mentioned above, we need different models in order to implement the machine learning product features that we are interested on. Some of them will require a learning-to-rank approach while others will need a binary classifier.

Here is a non-exhaustive list of machine learning models we use:

  • Logistic Regression
  • Elastic Nets
  • Gradient Boosted Decision Trees
  • Random Forests
  • (Deep) Neural Networks
  • LambdaMART
  • Matrix Factorization
  • LDA
  • ...

To be clear, we don't use so many models to brag about how many models we know. We do so because it turns out that they do end up working best in some scenarios. Random Forests and Gradient Boosted Decision Trees are pretty interchangeable, but if one works best in one case why not use it if it doesn't add complexity to the system?

Further reading

Here are some resources that you might be interested on if you want to learn more about machine learning at Quora:

  • My blog post on Quora about Machine Learning at Quora
  • Another blogpost by Ofir on Ask-to-answer as a machine learning problem
  • Here is a video of me talking about the same topic.
  • And, here are the slides.

Do you want to work on this cool stuff?

Here are some positions you might be interested on:

Ankit Dhingra

Data Engineer | ex-Airtable, Airbnb, Facebook

8y

Thanks for sharing this Xavier. I was curious to know if you guys have written in greater details about how you assess the quality of an answer and what part ML plays in that. There are obviously a lot of signals that one has to listen to: number of up-votes, number of downvotes, who upvotes (a seasoned users upvote vs a new users upvote), etc. How do you go about assessing which signals are better predictors than the others through Machine Learning.

Like
Reply
Jamone Kelly

Sr. Mobile Engineer at Designer Brands | Apple Entrepreneur Camp ‘22 Alumni | Cofounder

8y

Very informative. Thank you for taking the time to share.

Like
Reply
Luis Alberto Santana

Software Engineering Manager @ Nubank || Engineering Leadership / Fintech / Distributed Systems

8y

Thanks for the insight! As a Software Engineer interested in Machine Learning this post is awesome!

Like
Reply

A wonderful post, Xavier. Now, you made me watch that video of MLConf :D

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics