joesul.li/van

Digital Ocean Documentation-Generated API Library

Joe Sullivan

The Digital Ocean API documentation is good: it describes each endpoint in a fairly structured way, provides curl examples, and fits on a single page. It's so good, in fact, that I wondered if I could build a library to the API by scraping it. Here are the results, which include a complete node.js library and a CLI to it.

Overview

The approach that I landed on has two pieces: convert the HTML of the documentation into JSON describing the API, then write an NPM package to read that JSON upon initialization and expose each action to the consumer.

HTML to JSON

Each API action consists of, at most, a path, a method, and a list of properties to be sent in the body. Below is a JSON representation of one of the more complex operations, transfering an image from one data center to another:

{
  "name": "transferImage",
  "path": "/v2/images/$IMAGE_ID/actions",
  "method": "POST",
  "requiredResourceIdCount": 1,
  "requiredProperties": [
    {
      "name": "type",
      "type": "string",
      "required": true
    },
    {
      "name": "region",
      "type": "string",
      "required": true
    }
  ],
  "staticProperties": {
    "type": "transfer"
  }
}

(Here's the Digital Ocean documentation for that action.)

A few notes on the model:

I won't go into the actual process of parsing the HTML. It's a little messy, but straightforward. I used cheerio, and all the data required above is present in the documentation. The actual parser script is here.

JSON to .js

Once we have the JSON, a factory converts a model to a function. Running each of the models through this process yields the actual library, which looks like:

api = {
  droplets: {
    createNewDroplet: [Function],
    retrieveExistingDropletById: [Function]
    ...
  },
  images: {
    listAllImages: [Function],
    retrieveExistingImageById: [Function]
    ...
  },
  ...
}

Each function returns a promise via request-promise.

From library to CLI

That's more or less the process. To use the API from the command line, there is digital-ocean-wrench ( npm install -g dow ). It uses a map to convert:

    dow floating-ip create --region=nyc2
        to
    api.floatingIPActions.createNewFloatingIPReservedToRegion({ region: "nyc2" })

It uses yargs, which I found to be great.

More...

Working closely with this documentation, I noticed one pluralization error, one instance where a PATH is described unusually, one action that is simply not described at all, and a few places where non-crucial data wasn't described in the standard way. Yet this is good documentation. This brings up the reality that API documentation doesn't usually meet the same quality standards as the code it describes, so it should be used with caution.

Being capable of being used to generate a library seems like a pretty good characteristic of documentation. The only requirement to do that is for all the documentation to be presented in a straightforward, structured way. The benefit is that it makes generating complete libraries in different languages easy.

How well does this method of library generation travel? A useful next step would be to survey other popular APIs and determine whether any have simple enough APIs and good enough documentation to give it a try. Perhaps some of the code present here could be re-used, though I'd like to emphasize that this method is still faster in terms of development time than implementing methods manually.

To reiterate: