Skip to content
This repository has been archived by the owner on Jun 18, 2020. It is now read-only.

npm/download-counts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

npm stats microservice

Note! This code base isn't what npm uses to serve download counts anymore, and its documentation is likely to drift out of correctness as time passes. See the registry API documentation for up-to-date usage info.

Gives you download counts. Eventually, maybe other stuff.

Our blog has an explanation of how npm download counts work, including "what counts as a download?"

Data source

npm's raw log data is continuously written to a series of buckets on AWS S3. Once per day, soon after UTC midnight, a map-reduce cluster is spun up that crunches the previous day's logs and pushes them into the database. Because this is UTC this creates some slightly unintuitive results, e.g. if you are on the west coast on the 19th of September, the data for the 19th of September will become available at 5pm (because UTC already moved to the 20th) during the winter, but not until 6pm during the summer, because the US observes daylight savings but UTC is fixed.

Point values

Gets the total downloads for a given period, for all packages or a specific package.

GET https://api.npmjs.org/downloads/point/{period}[/{package}]

Examples

All packages, last day:
/downloads/point/last-day
All packages, specific date:
/downloads/point/2014-02-01
Package "express", last week:
/downloads/point/last-week/express
Package "express", given 7-day period:
/downloads/point/2014-02-01:2014-02-08/express
Package "jquery", last 30 days:
/downloads/point/last-month/jquery
Package "jquery", specific month:
/downloads/point/2014-01-01:2014-01-31/jquery

Parameters

Acceptable values are:

last-day
Gets downloads for the last available day. In practice, this will usually be "yesterday" (in GMT) but if stats for that day have not yet landed, it will be the day before.
last-week
Gets downloads for the last 7 available days.
last-month
Gets downloads for the last 30 available days.

Output

The following incredibly simple JSON is the output:

{
  downloads: 31623,
  start: "2014-01-01",
  end: "2014-01-31",
  package: "jquery"
}

If you have not specified a package, that key will not be present. The start and end dates are inclusive.

Ranges

Gets the downloads per day for a given period, for all packages or a specific package.

GET https://api.npmjs.org/downloads/range/{period}[/{package}]

Examples

Downloads per day, last 7 days
/downloads/range/last-week
Downloads per day, specific 7 days
/downloads/range/2014-02-07:2014-02-14
Downloads per day, last 30 days
/downloads/range/last-month/jquery
Downloads per day, specific 30 day period
/downloads/range/2014-01-03:2014-02-03/jquery

Parameters

Same as for /downloads/point.

Output

Responses are very similar to the point API, except that downloads is now an array of days with downloads on each day:

{
	downloads: [
		{
			day: "2014-02-27",
			downloads: 1904088
		},
		..
		{
			day: "2014-03-04",
			downloads: 7904294
		}
	],
	start: "2014-02-25",
	end: "2014-03-04",
	package: "somepackage"
}

As before, the package key will not be present if you have not specified a package.

Bulk Queries

To perform a bulk query, you can hit the range or point endpoints with a comma separated list of packages rather than a single package, e.g.,

/downloads/point/last-day/npm,express

Development

The code requires node and a mysql database to talk to. We have a conveniently pre-configured VM available for download. First, install VirtualBox:

https://www.virtualbox.org/wiki/Downloads

And then install Vagrant:

https://www.vagrantup.com/downloads.html

Now just cd into the root of this repo and run

vagrant up

When you see "Done!" you are ready to rock.

Running the web service

Install dependencies:

npm install

You will need a config file:

cp test/config.dev.js config.js

For development, you shouldn't need to change anything in here unless your VM didn't come up at the usual IP (192.168.33.10)

Run the server on port 3000:

node index.js 3000

Test that it's working:

curl "http://localhost:3000/downloads/point/2014-03-01"

You can ssh into the VM to play with MySQL or whatever:

vagrant ssh

Importing data from S3 (npm, Inc. only)

New data is generated daily and stored in S3. You can get it with the backfill script like so:

node scripts/backfill.js YYYY-MM-DD N

YYYY-MM-DD is the date you want new data to start. If omitted, it will start importing from the first available data, which is a bad idea except when creating a new production host

N is the number of days to import after that date. If omitted, it will import all available days. So to get everything after April 1, for instance, run

node scripts/backfill.js 2014-04-01

For the AWS JS SDK to work, you must have a ~/.aws/credentials file containing

aws_access_key_id = XXXXX
aws_secret_access_key = YYYYY

Where X and Y are your AWS access credentials. The production server has its own credentials specifically for this purpose.

About

Background jobs and a minimal service for collecting and delivering download counts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published