Lists

Table of Contents

The Problem Space

The Performance Issues

The Solution

The Queries

Why it works

Embargo

Postlog

Brought to you by Rick James


The Problem Space


Let's say you have "news articles" (rows in a table) and want a web page showing the latest ten articles about a particular topic.

Variants on "topic":
    ⚈  Category
    ⚈  Tag
    ⚈  Provider (of news article)
    ⚈  Manufacturer (of item for sale)
    ⚈  Ticker (financial stock)

Variants on "news article"
    ⚈  Item for sale
    ⚈  Blog comment
    ⚈  Blog thread

Variants on "latest"
    ⚈  Publication date (unix_timestamp)
    ⚈  Most popular (keep the count)
    ⚈  Most emailed (keep the count)
    ⚈  Manual ranking (1..10 -- 'top ten')

Variants on "10" - there is nothing sacred about "10" in this discussion.

The Performance Issues


Currently you have a table (or a column) that relates the topic to the article. The SELECT statement to find the latest 10 articles has grown in complexity, and performance is poor. You have focused on what index to add, but nothing seems to work.

    ⚈  If there are multiple topics for each article, you need a many-to-many table.
    ⚈  You have a flag "is_deleted" that needs filtering on.
    ⚈  You want to "paginate" the list (ten articles per page, for as many pages as necessary).

The Solution


First, let me give you the solution, then I will elaborate on why it works well.

    ⚈  Create one new table called, say, Lists:
    ⚈  Lists has exactly 3 columns: topic, article_id, sequence
    ⚈  Lists has exactly 2 indexes: PRIMARY KEY(topic, sequence, article_id), INDEX(article_id)
    ⚈  Only viewable articles are in Lists. (This avoids the filtering on is_deleted, etc) (See also 'Embargo', below.)
    ⚈  Lists is ENGINE=InnoDB. (This gets "clustering".)
    ⚈  sequence is typically the date of the article, but could be some other ordering.
    ⚈  topic could be a string, or could be an id that maps to the 'topic', the choice is not critical to this discussion.
    ⚈  article_id is a link to the bulky row in another table(s) that provides all the details about the article.

The Queries


Find the latest 10 articles for a topic:
SELECT  a.*
    FROM  Articles a
    JOIN  Lists s ON s.article_id = a.article_id
    WHERE  s.topic = ?
    ORDER BY  s.sequence DESC
    LIMIT  10;
You must not have any WHERE conditions touching columns in Articles.

When you mark an article for deletion; you must remove it from Lists:
DELETE  FROM  Lists
    WHERE  article_id = ?;
I emphasize "must" because "deleted" flags and other filtering are often the root of performance issues.

Why it works


By now, you may have discovered why it works.

The big goal is to minimize the disk hits. Let's itemize how few disk hits are needed. When finding the latest articles with 'normal' code, you will probably find that it is doing significant scans of the Articles table, failing to quickly home in on the 10 rows you want. With this design, there is only one extra disk hit:
    ⚈  1 disk hit: 10 adjacent, narrow, rows in Lists -- probably in a single "block".
    ⚈  10 disk hits: The 10 articles. (These hits are unavoidable, but may be cached.)
The PRIMARY KEY, and using InnoDB, makes these quite efficient.

OK, you pay for this by removing things that you should avoid.
    ⚈  1 disk hit: INDEX(article_id) - finding a few ids
    ⚈  A few more disk hits to DELETE rows from Lists.
This is a small price to pay -- and you are not paying it while the user is waiting for the page to render.

Embargo


Suppose you are displaying news stories, but your provider places an "embargo" on some stores. That is, you are contracturally not allowed to display the articles before a given date.

That maybe as simple as putting the embargo datetime in the sequence column; otherwise it is set to NOW() or the date on the article. Then add this to the the WHERE clause: AND sequence < NOW(). (You already have ORDER BY sequence DESC, article_id DESC.)

Postlog


Invented ~2010; Published ~2012; Added Embargo 2019.
-- Rick James

MySQL Documents by Rick James

HowTo Techniques for Optimizing Tough Tasks:

Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)

Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
    Chunking lengthy DELETE/UPDATE/etc.

Data Warehouse techniques:
    Data Warehouse Overview   Summary Tables   High speed ingestion   Bulk Normalization  

Schema and code design for large Sensor database

Entity-Attribute-Value (EAV) -- a common, poorly performing, design pattern; plus an alternative

5 methods for 'Find Nearest'

Lat/Lng search to Find the nearest 10 pizza parlors
    Lat/Long representation choices

Z-Order 'find nearest'

Pagination, not with OFFSET, LIMIT

Techniques on efficiently finding a random row (On beyond ORDER BY RAND())

GUID/UUID Performance (type 1 only)

IP Range Table Performance -- or other disjoint ranges

Rollup Unique User Counts

Alter of a Huge table -- Mostly obviated by 5.6

Efficient List of Latest 10 news articles

Build and execute a Pivot SELECT (showing rows as columns)

(Groupwise Max): Efficiently find largest row(s) for each group

Other Tips, Tuning, Debugging, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)

Datatypes and building a good schema

Memory Allocation (caching, etc)

Character Set and Collation problem solver
    Trouble with UTF-8   If you want case folding, but accent sensitivity, please file a request at https://bugs.mysql.com .
    Python tips,   PHP tips,   other language tips
    utf8 Collations   utf8mb4 Collations on 8.0

Converting from MyISAM to InnoDB -- includes differences between them

Compound INDEXes plus other insights into the mysteries of INDEXing

Cookbook for Creating Indexes
    Many-to-many mapping table   Handler counts   wp_postmeta   UNION+OFFSET

MySQL Limits -- built-in hard limits
    767-byte INDEX limit

Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)

5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users

Analyze MySQL Performance
    Analyze VARIABLEs and GLOBAL STATUS     Analyze SlowLog

My slides from conferences
MiniFest 2021 - Rick James & Daniel Black - Answering on Stack Overflow(+comments) - MariaDB Frontlines
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB

Contact me via LinkedIn; be sure to include a brief teaser in the Invite request:   View Rick James's profile on LinkedIn

Did my articles help you out? Like what you see? Consider donating:

☕️ Buy me a Banana Latte and bagel ($10) There is no obligation but it would put a utf8mb4 smiley 🙂 on my face, instead of the Mojibake "🙂"