Deploying a plumber API on AWS

Introduction:

We introduced Amazon Web Services (AWS) and configured its Command Line Interface (CLI) in a previous post. However, we did not discuss or demonstrate using AWS services and resources via the CLI. Therefore, this post will utilise AWS services to deploy an Application Programming Interface (API). For the sake of simplicity, we will not look into the making of an API. Still, we advise interested readers to check the “Controlling and monitoring access to plumber-powered APIs” and the “Deploying plumber-powered APIs with Docker”.

So, in this post, we will:

discuss the relevance to HTA,
briefly describe the API we want to host on AWS,
briefly explain the docker image, and
host the API on AWS.

Relevance, prerequisites and difficulty:

Relevance:

We have already discussed the relevance of APIs and containerised applications in previous posts. If you are new to one or both topics, we recommend you catch up before continuing this post. It is also worth it if it has been a while and you want to brush up on your skills. We argue that this tutorial is relevant to the Health Technology Assessment (HTA) context because it covers getting the API(s) we containerised earlier shipped to the intended end users.

Difficulty:

While we have tried to bring down the technicality in this post, it is fair to rate this post as “intermediate” on the difficulty scale. The main reason behind this classification is that this post requires a good level of understanding of both APIs and containerised applications.

Prerequisites:

First, unless you have a good grasp on API and docker, we recommend that you review:

the APIs post, and
the containerised applications post.

Second, we expect you to have the following software installed on your machine:

Docker desktop client (see here),
R,
RStudio¹, and
Visual Studio Code¹.

Finally, it goes without saying that we need an AWS account with the necessary permissions. Please refer to the Getting started with AWS tutorial for more information.

We also host the files containing the entire code discussed in this tutorial in the “controlling-and-monitoring-access-to-plumber-powered-APIs” folder on the GitHub repository.

The API we will deploy:

Since this post focuses on the hosting process, we will try to keep this simple and re-use the Smith et al. (2022)² API we developed further in the “Controlling and monitoring access to plumber-powered APIs” post. Below is a copy of the API code, which is also available in the “controlling-and-monitoring-access-to-plumber-powered-APIs” folder on my GitHub repository.

  
#################

library(dampack)
library(readr)
library(assertthat)

#* @apiTitle Client API hosting sensitive data
#* 
#* @apiDescription This API contains sensitive data, the client does not 
#* want to share this data but does want a consultant to build a health 
#* economic model using it, and wants that consultant to be able to run 
#* the model for various inputs 
#* (while holding certain inputs fixed and leaving them unknown).

#* Log some information about the incoming request
#* @filter logger
function(req) {
  cat(
    "Time: ", as.character(Sys.time()), "\n",
    "HTTP verb: ", req$REQUEST_METHOD, "\n",
    "Endpoint: ", req$PATH_INFO, "\n",
    "Request issuer: ", req$HTTP_USER_AGENT, "@", req$REMOTE_ADDR, "\n"
  )
  plumber::forward()
}

#* Check user's credentials in the incoming request
#* @filter security
function(req, res, API_key = "R-HTA-220908") {
  ## Forward requests coming to swagger endpoints:
  if (grepl("docs", tolower(req$PATH_INFO)) |
    grepl("openapi", tolower(req$PATH_INFO))) {
    return(plumber::forward())
  }

  ## Check requests coming to other endpoints:
  ### Grab the key passed in the HEADERS list:
  key <- NULL
  if (!is.null(req$HEADERS["key"])) {
    key <- req$HEADERS["key"]
  }
  ### Check the key passed through with the request object, if any:
  if (is.null(key) | is.na(key)) {
    #### Unauthorised users:
    res$status <- 401 # Unauthorised
    #### Log outcome:
    cat(
      "Authorisation status: 401. API key missing! \n"
    )
    return(list(error = "Authentication required. Please add your API key to the HEADER object using the 'key' value and/or contact API administrator."))
  } else {
    #### Correct credentials:
    if (key == API_key) {
      #### Log outcome:
      cat(
        "Authorisation status: authorised - API key accepted! \n"
      )
      plumber::forward()
    } else {
      #### Incorrect credentials:
      res$status <- 403 # Forbidden
      #### Log outcome:
      cat(
        "Authorisation status: 403. API key incorrect! \n"
      )
      return(list(error = "Authentication failed. Please make sure you have authorisation to access the API and/or contact API administrator."))
    }
  }
}

#* Run the DARTH model
#* @serializer csv
#* @param path_to_psa_inputs is the path of the csv
#* @param model_functions gives the github repo to source the model code
#* @param param_updates gives the parameter updates to be run
#* @post /runDARTHmodel
function(path_to_psa_inputs = "parameter_distributions.csv",
         model_functions = paste0("https://raw.githubusercontent.com/",
                                  "BresMed/plumberHE/main/R/darth_funcs.R"),
         param_updates = data.frame(
           parameter = c("p_HS1", "p_S1H"),
           distribution = c("beta", "beta"),
           v1 = c(25, 50),
           v2 = c(150, 70)
         )) {
  
  
  # source the model functions from the shared GitHub repo...
  source(model_functions)
  
  # read in the csv containing parameter inputs
  psa_inputs <- as.data.frame(readr::read_csv(path_to_psa_inputs))
  
  # for each row of the data-frame containing the variables to be changed...
  for(n in 1:nrow(param_updates)){
  
  # update parameters from API input
  psa_inputs <- overwrite_parameter_value(
                            existing_df = psa_inputs,
                            parameter = param_updates[n,"parameter"], 
                            distribution = param_updates[n,"distribution"],
                            v1 = param_updates[n,"v1"],
                            v2 = param_updates[n,"v2"])
  }
  
  # run the model using the single run-model function.
  results <- run_model(psa_inputs)
  
  # check that the model results being returned are the correct dimensions
  # here we expect a single dataframe with 6 columns and 1000 rows
  assertthat::assert_that(
    all(dim(x = results) == c(1000, 6)),
    class(results) == "data.frame",
    msg = "Dimensions or type of data are incorrect,
  please check the model code is correct or contact an administrator.
  This has been logged"
  )
  
  # check that no data matching the sensitive csv data is included in the output
  # searches through the results data-frame for any of the parameter names,
  # if any exist they will flag a TRUE, therefore we assert that all = F
  assertthat::assert_that(all(psa_inputs[, 1] %in%
        as.character(unlist(x = results,
                            recursive = T)) == F))
  
  return(results)
  
}

#* Scientific paper
#* @preempt security
#* @get /paper
function(){
  return("https://wellcomeopenresearch.org/articles/7-194")
}

The deployment docker image:

To build the deployment docker image, we will make use of the same dockerfile we used in the “Controlling and monitoring access to plumber-powered APIs” tutorial. Below are the contents of that dockerfile.

  
# Dockerfile

# Get the docker image provided by plumber developers:
FROM rstudio/plumber
# Install the R package `pacman`:
RUN R -e "install.packages('pacman')"
# Use pacman to install other required packages:
RUN R -e "pacman::p_load('assertthat', 'dampack', 'ggplot2', 'jsonlite', 'readr')"
# Create a working directory in the container:
WORKDIR /api
# Copy API files to the created working directory in the container:
COPY ./RobertASmithBresMed-plumberHE-809f204/darthAPI /api
# Specify the commands to run once the container runs: 
CMD ["./plumber.R"]

Deploying the API on AWS:

There are a few steps that we need to go through before our API can be deployed. We cover each step in detail below, but briefly, we need to:

create a repository to host our API image,
push the API image to the created repository,
deploy a computer cluster on the cloud, and
task that cluster with running our API image.

1. Push our API image to AWS:

1.1. Create a repository on Amazon Elastic Container Service (ECS):

The screen recording below shows the steps in creating a docker-image capable repository at Amazon Elastic Container Registry (ECR).


Creating our public AWS ECR repository

As we can see at the end of the above recording, the ECR repository we created comes with a set of commands that help us to:

connect our docker client to the ECR repository,
build the image, if necessary, and tag it with the appropriate repository name, and
push the image to the designated repository.

Below we go through each of the above steps to get our image to our ECR repository.

1.2. Authenticate our docker client to the ECR repository:

To get our docker “daemon” client connected to the ECR repository we created earlier, we need to authenticate it with the said repository. Calling the following command from the Command Prompt (since calling it from PowerShell was not successful) establishes the required authentication.

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/x2d1l3b3


Retrieving an authentication token and authenticating our docker client to our AWS ECR repository

1.3. Build and test the docker image:

To build an image from the earlier described dockerfile, and test it locally, we call:

  
# Build the docker image using the dockerfile mentioned earlier:
docker build --tag living_hta:2.0 --file .\controlling-and-monitoring-access-to-plumber-powered-APIs\Dockerfile .\controlling-and-monitoring-access-to-plumber-powered-APIs
# Spin a container up in the foreground (keep access to the container's inputs and outputs):
docker run -p 8080:8000 -it --rm --name living_hta_api living_hta:2.0

Now that we have tested our image locally, we can proceed with the necessary actions to get it uploaded to our ECR repository.

1.4. Tag the built image:

Tagging the image we built and tested above allows our docker daemon to push images to the ECR repository named in the image tag. Notice that the new image name (or tag) looks much like a URL, and this is what docker will use to push this image to the correct repository. In fact, this image’s tag is made of the repository address, public.ecr.aws/x2d1l3b3/, and the image’s actual name and version (name:version) living_hta_repo:latest.

  
docker tag living_hta:2.0 public.ecr.aws/x2d1l3b3/living_hta_repo:latest

1.5. Push the tagged image:

The last step to get our image onto our ECR repository is to push it by calling the command below.

  
docker push public.ecr.aws/x2d1l3b3/living_hta_repo:latest

In the screen recording below, we can see:

the living_hta:2.0 image, among other images on our system,
that we re-authorised our local docker client to access our ECR repository,
that we tagged living_hta:2.0 image as public.ecr.aws/x2d1l3b3/living_hta_repo:latest, and
that we pushed the newly tagged image to our ECR repository.


Authenticating docker, tagging the API image and starting to push it to ECR

Below we can see a screenshot confirming that we successfully pushed the image to the Living_hta_repo repository.


API image pushed to AWS ECR

2. Create a computer cluster on AWS ECS:

A cluster is the infrastructure or logical grouping of tasks or services on which our tasks and services run. Therefore, it makes sense that we would want to create a cluster to handle the deployment of our image. To get our AWS ECS cluster up and running we:

navigate to ECS,
on the right hand panel and under “Amazon ECS”, click on “Clusters”,
on the next page, click on “Create Cluster”,
on the “Select cluster template” page, choose “EC2 Linux + Networking”,
on the next page:
- assign a cluster name,
- under “Instance configuration” choose “t2 micro” from the “EC2 instance type” dropdown list,
- under “Networking”:
  - choose the default VPC,
  - choose the default Subnets,
  - enable “Auto assign public IP”, and
  - select the default Security group.
Finally, click on Create at the bottom of the page to finish the cluster creation wizard.


Creating a cluster on AWS ECS

Virtual private clouds (VPC) “is a virtual network that closely resembles a traditional network that you’d operate in your own data center.”; whereas, a subnet is “a range of IP addresses in our VPC, which allows us to deploy AWS resources in said VPC. The default VPC we set earlier is preconfigured, so we can immediately start launching and connecting to the EC2 instances. On the other hand, a security group controls the inbound and outbound traffic allowed to reach and leave, respectively, the resources (the EC2 instance in our cluster) that we associated it with.

At the time we scripted this tutorial, the “t2 micro” was offered among AWS’s free tier, but you are free to choose from the available Amazon EC2 instance types. Below we see our ECS cluster up and running; however, we do not have any tasks running so far.


Our ECS cluster

3. Define a task on AWS ECS:

Now that our cluster is running, we will create a task for our EC2 instance(s) to run the API image we pushed earlier to our ECR repository. But before we create the task, we need the image’s URI, the URL we used earlier to tag our image public.ecr.aws/x2d1l3b3/living_hta_repo:latest.

In the screen recording below we:

navigate to our repository on the ECR and copy the target image URI,
navigate back to ECS and:
- from the panel on the right hand-side we click on Task Definitions,
- on the next page, we click on Create new Task Definition,
- on the Create new Task Definition page, we choose EC2 under the Select launch type compatibility.
- On the Configure task and container definitions page, we:
  - name our task definition (livingHTAtask),
  - under Container definitions, click on Add container, and on the Add container popup page:
    - name the container as (livingHTAcontainer),
    - add the image’s URI under Image,
    - set the hard limit under Memory Limits to 900 (MiB), and
    - expose port 8080 on the host to 8000 from the container.


Creating our ECS cluster task definition

Notice that the memory limit we set depends on the container we want to run but could not exceed that available on the EC2 instance(s) that make up the cluster. To know the resources supported by the instance, we can:

navigate to ECS,
Clusters,
ECS Instances, and
click on Container Instance.


The resources of our ECS EC2 instance

4. Run the defined task on the AWS ECS cluster:

We can now link our cluster to the task definition to get our API container running. This process is demonstrated in the gif file below.


Running our API image on the ECS cluster

5. Allow access to container ports:

Now that we have our container running, we need to expose the EC2 instance(s) port(s) to which the container is linked. Remember the Security group we defined earlier when we created the ECS cluster? That is our firewall, and we need to amend it so that inbound traffic can pass through to the container and subsequently to the API. As we can see below, this process involves allowing traffic from all ipv4 and ipv6 addresses to get through port 8080 of the EC2 instance that we earlier linked to port 8000 of the container.


Expose ECS cluster ports to access hosted API

6. Connect to the API:

To connect to the hosted API, we need a URL to the ECS cluster or to be specific to the EC2 instance in this case. So, once we navigate to the “EC2 Management Console” and select the EC2 instance, we want to connect to, a page containing the instance’s details loads. On that page, we need the “Public IPv4 DNS” which is “ec2-54-91-164-225.compute-1.amazonaws.com” in our current example. Using this address, we can now connect to our API, as we can see in the screen recording below.


Connect to the Living HTA API deployed to the ECS cluster

In the gif file above, we can see that we connected to the API using http://ec2-54-91-164-225.compute-1.amazonaws.com:8080/__docs__/. However, there is one point to highlight; we could not get through to the container using the secure https; hence we used http. We will revisit the use of https in a future tutorial.

What about connecting to this API programmatically? Let us use the R code we employed earlier in the Controlling and monitoring access to plumber-powered APIs tutorial (copied below). We already employed that script to communicate with the deployed container, which embeds the credentials needed to access the secure endpoint.

  
# remember to load a package that exports the pipe "%>%":
results <- httr::POST(
  ## the Server URL can also be kept confidential, but will leave here for now:
  url = "http://ec2-54-91-164-225.compute-1.amazonaws.com:8080",
  ## path for the API within the server URL:
  path = "/runDARTHmodel",
  ## code is passed to the client API from GitHub:
  query = list(model_functions = 
                 paste0("https://raw.githubusercontent.com/",
                        "BresMed/plumberHE/main/R/darth_funcs.R")),
  ## set of parameters to be changed:
  body = list(
    param_updates = jsonlite::toJSON(
      data.frame(parameter = c("p_HS1","p_S1H"),
                 distribution = c("beta","beta"),
                 v1 = c(25, 50),
                 v2 = c(150, 100)))),
  ## pass the API key to the request object:
  config = httr::add_headers(
    key = "R-HTA-220908")) %>%  
  httr::content()

Notice that we have updated the url argument in the code above to point to the API we deployed on AWS ECS http://ec2-54-91-164-225.compute-1.amazonaws.com:8080.


Interacting with the AWS hosted API programmatically

Cleaning up:

Now that we have concluded publishing our API on AWS, we will remove the ECR repository and ECS cluster from our AWS account to avoid any financial implications they may bring.

Conclusion:

In this post, we demonstrated hosting a plumber-powered API on AWS. However, we must stress that this tutorial was merely an example of how to deploy services on AWS, but not the only way. We could have published the same API on AWS following other venues, and we may do so in the near future.

While we acknowledge that the process might change in time, we hope that the information in this post will help interested readers get their products deployed on AWS.

Sources:

RStudio and Visual Studio Code (aka VS code) are integrated development environments (IDE)s. RStudio is, at least in my opinion, the best IDE for R, whereas VS code has several great extensions, including a few for docker. ↩ ↩²
Smith RA, Schneider PP and Mohammed W. Living HTA: Automating Health Economic Evaluation with R. Wellcome Open Res 2022, 7:194 (https://doi.org/10.12688/wellcomeopenres.17933.2). ↩