Subpath routing and the developer

Disclaimers

A few quick disclaimers:

I may get some details wrong here. Routing (and specific server implementations of more complex routing patterns) is complicated.
This worked for us, but there are security concerns:
- Primarily, cookies can be shared across the different services using the same root domain, which means more vigilance is required
This pattern is clearly not a common path/solution, and thus there are fewer experienced hands, meaning other potential issues may be harder to resolve.

Content

Let’s talk about subpath routing…

In HTTP parlance, the “subpath” is any part of a URI after the domain. e.g. /site/index.html in https://example.com/site/index.html.

So let’s say you have a nice website advertising your SaaS app at https://example.com. And your marketing team want to improve SEO by adding the app under https://example.com/site. We now need to implement subpath routing so that all requests to https://example.com/site go to our SaaS app, while all requests to other subpaths (e.g. https://example.com/about) still go to our marketing site.

If we’re starting with a SPA, we also need to address bot traffic. Bots obviously don’t process javascript well, so we need a way to “pre-render” pages for the (properly behaving) bots.

Ingress

So first we need a load balancer to route traffic between the app and the marketing site. Since $JOB is a GCP shop, we’ll stand up a Cloud Load Balancer to handle initial ingress (including IPv6 support, right?):

locals {
  routing_domains = ["example.com"],
  routing_root_domain = "example.com"
}

resource "google_compute_security_policy" "routing-policy" {
  name = "routing-layer"
  type = "CLOUD_ARMOR"
  adaptive_protection_config {
    layer_7_ddos_defense_config {
      enable          = true
      rule_visibility = "STANDARD"
    }
  }
  rule {
    action      = "allow"
    description = "Default rule, higher priority overrides it"
    preview     = false
    priority    = 2147483647

    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = [
          "*",
        ]
      }
    }
  }
}

resource "google_compute_url_map" "routing-mappings" {
  name = "routing-mappings"

  default_url_redirect {
    host_redirect          = local.routing_root_domain
    redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
    strip_query            = true
  }

  host_rule {
    hosts        = local.routing_domains
    path_matcher = "routing"
  }

  path_matcher {
    name            = "routing"
    default_service = google_compute_backend_service.routing-marketing.self_link

    path_rule {
      content {
        service = google_compute_backend_service.routing-app.id
      }
    }
  }
}

resource "google_compute_global_network_endpoint_group" "routing_marketing" {
  name                  = "routing-layer-marketing"
  network_endpoint_type = "INTERNET_FQDN_PORT"
  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_backend_service" "routing-marketing" {
  name            = "routing-marketing"
  protocol        = "HTTPS"
  security_policy = "routing-layer"
  log_config {
    enable = false
  }
  backend {
    group = google_compute_global_network_endpoint_group.routing_marketing.id
  }
}

resource "google_compute_region_network_endpoint_group" "routing_app" {
  name                  = "routing-app"
  network_endpoint_type = "SERVERLESS"
  region                = var.region
  cloud_run {
    service = google_cloud_run_service.app.name
  }
}

resource "google_compute_backend_service" "routing-app" {
  name            = "routing-app"
  protocol        = "HTTPS"
  security_policy = "routing-layer"
  log_config {
    enable = false
  }
  backend {
    group = google_compute_region_network_endpoint_group.routing_app.id
  }
}

resource "google_compute_target_https_proxy" "routing-https-proxy" {
  name    = "routing-https-proxy"
  url_map = google_compute_url_map.routing-mappings.id
  ssl_certificates = [
    google_compute_managed_ssl_certificate.routing-cert.self_link,
  ]
}

resource "google_compute_url_map" "routing_https_redirect" {
  name = "routing-https-redirect"
  default_url_redirect {
    https_redirect         = true
    redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
    strip_query            = false
  }
}

resource "google_compute_target_http_proxy" "routing_https_redirect" {
  name    = "routing-http-redirect"
  url_map = google_compute_url_map.routing_https_redirect.self_link
}

resource "google_compute_global_address" "routing_lb_ipv4" {
  name       = "routing-lb-address-ipv4"
  ip_version = "IPV4"
}

resource "google_compute_global_address" "routing_lb_ipv6" {
  name       = "routing-lb-address-ipv6"
  ip_version = "IPV6"
}

resource "google_compute_global_forwarding_rule" "routing_http_forward_ipv4" {
  name       = "routing-http-ipv4"
  target     = google_compute_target_http_proxy.routing_https_redirect.self_link
  ip_address = google_compute_global_address.routing_lb_ipv4.address
  port_range = "80"
}

resource "google_compute_global_forwarding_rule" "routing_https_ipv4" {
  name       = "routing-https-ipv4"
  target     = google_compute_target_https_proxy.routing-https-proxy.self_link
  ip_address = google_compute_global_address.routing_lb_ipv4.address
  port_range = "443"
}

resource "google_compute_global_forwarding_rule" "routing_http_forward_ipv6" {
  name       = "routing-http-ipv6"
  target     = google_compute_target_http_proxy.routing_https_redirect.self_link
  ip_address = google_compute_global_address.routing_lb_ipv6.address
  port_range = "80"
}

resource "google_compute_global_forwarding_rule" "routing_https_ipv6" {
  name       = "routing-https-ipv6"
  target     = google_compute_target_https_proxy.routing-https-proxy.self_link
  ip_address = google_compute_global_address.routing_lb_ipv6.address
  port_range = "443"
}

resource "random_id" "routing-cert" {
  byte_length = 8
  prefix      = "routing-"

  keepers = {
    domains = join(",", local.routing_domains)
  }
}

resource "google_compute_managed_ssl_certificate" "routing-cert" {
  managed {
    domains = local.routing_domains
  }
  name = random_id.routing-cert.hex
  lifecycle {
    create_before_destroy = true
  }
}

output "routing_ipv4" {
  value = google_compute_global_address.routing_lb_ipv4.address
}

output "routing_ipv6" {
  value = google_compute_global_address.routing_lb_ipv6.address
}

resource "google_cloud_run_service" "app" {
  name     = "app"
  location = "us-central1"

  template {
    spec {
      timeout_seconds = 300
      containers {
        image = "<container path...probably use GCP's Artifact Registry?>"
        resources {
          limits = {
            cpu    = "1000m"
            memory = "1Gi"
          }
        }
        ports {
          container_port = 80
        }
      }
    }
    metadata {
      labels = {
        "run.googleapis.com/startupProbeType" = "Default"
      }
      annotations = {
        # Limit scale up to prevent any cost blow outs!
        "run.googleapis.com/client-name"           = "gcloud"
        "autoscaling.knative.dev/minScale"         = "0"
        "autoscaling.knative.dev/maxScale"         = "100"
        "run.googleapis.com/execution-environment" = "gen2"
        "run.googleapis.com/cpu-throttling"        = "true"
      }
    }
  }
  autogenerate_revision_name = true
  metadata {
    annotations = {
      # For valid annotation values and descriptions, see
      # https://cloud.google.com/sdk/gcloud/reference/run/deploy#--ingress
      "run.googleapis.com/ingress"     = "internal-and-cloud-load-balancing"
      "run.googleapis.com/client-name" = "gcloud"
      "client.knative.dev/user-image"  = "<container path again...use the same path as above>"
    }
  }
  lifecycle {
    ignore_changes = [
      # This annotation appears to be auto-generated somewhere in GCP, so avoid killing it from the terraform side
      metadata.0.annotations["run.googleapis.com/operation-id"],
    ]
  }
  traffic {
    percent         = 100
    latest_revision = true
  }
}

resource "google_cloud_run_service_iam_binding" "app" {
  location = google_cloud_run_service.app.location
  service  = google_cloud_run_service.app.name
  role     = "roles/run.invoker"
  members = [
    "allUsers"
  ]
}

Most of this terraform is boilerplate to set up a reasonable load balancer. Even routing to our SPA is simply pointing the load balancer at a Cloud Run app…

The challenge here is how to ensure the DNS resolution transfers through to our marketing site. If we can configure an alternate DNS entry on the actual marketing site (Wordpress, Squarespace, Netlify, and others should all support this), it should be fairly painless. What matters is actually defining the Internet Network Endpoint Group with the destination DNS entry of our marketing site. Something like:

gcloud --project example-project compute network-endpoint-groups update ${google_compute_global_network_endpoint_group.routing_marketing[0].name} --add-endpoint=\"fqdn=<dns entry>,port=443\" --global

Setting this value tells the load balancer where to actually ship traffic.

The App

Okay. That’s great and all…but what about actually routing a React app to a subpath? There are two parts here:

Configuring the React App to compile using a subpath
- There are many potential ways to do this. This implementation was the most straight-forward for our usecase.
Configuring the actual server process
- This is where pre-rendering actually comes in. More on that below.
- We landed on using nginx, but other servers could also do this successfully (e.g. apache, traefik, etc.)

React configs

React can set a subpath route in a number of ways. We landed on the following configs:

Static configs

For the actual static site, we needed to ensure favicon(s)/etc. would load:

  <link rel="manifest" href="%PUBLIC_URL%/manifest.json">
  <link rel="shortcut icon" href="%PUBLIC_URL%/favicon.ico">

Router Configs

For the actual react router, we had to configure PUBLIC_URL:

import React from 'react';
import ReactDOM from 'react-dom';
import { HelmetProvider } from 'react-helmet-async';
import { Provider } from 'react-redux';
import { BrowserRouter as Router, Route } from 'react-router-dom';
import './styles/scss/index.scss';
import { QueryParamProvider } from 'use-query-params';
import { createBrowserHistory } from 'history';
import App from './App';
import { unregister } from './registerServiceWorker';
import store from './store';

const ROOT_URL = process.env.PUBLIC_URL || '/';
const history = createBrowserHistory({
  basename: ROOT_URL,
});

ReactDOM.render(
  <HelmetProvider>
    <Provider store={store}>
      <Router basename={ROOT_URL}>
        <QueryParamProvider ReactRouterRoute={Route}>
          <Route component={App} path="/" />
        </QueryParamProvider>
      </Router>
    </Provider>
  </HelmetProvider>,
  document.getElementById('root')
);
// UNregister service worker
unregister();

This is a rough sketch of the react configurations. Some additional work for ensuring static paths was required, but the TL;DR of that story is essentially grep /static/ src/* -RI and manual updates.

Nginx configs

So a quick diversion around pre-rendering content: There are many options here (Cloudflare and other CDN friends spring to mind) but we landed on leveraging Prerender.io. I’m going to hand-wave away some of the decision details, but suffice to say, we are leveraging an external system for caching page renders and we need nginx to be smart enough to route requests from specific user agents to that service…

Also of note is the heavy use of maps in this nginx config. Maps are somewhat akin to dictionaries/hash tables in actual programming languages. We can essentially set key/value pairs (even with regexes!) to define what we want a value to be converted to. Maps can also call other maps, so you’ll actually see a pattern of maps calling other maps within them. This is closest to having actual if/then statements in nginx without breaking the general nginx rule of “don’t use if statements”.

user nginx;
worker_processes  auto;

error_log  stderr notice;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" (prerender: $prerender)';
    log_format prerender_debug '[$time_local] $remote_addr - $remote_user - $server_name user-agent: $http_user_agent prerender: $prerender: $request to: $upstream_addr upstream_response_time: $upstream_response_time msec $msec request_time $request_time';
    log_not_found on;
    log_subrequest on;

    # some basic caching of local files
    open_file_cache max=200 inactive=180s;
    open_file_cache_valid 120s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

    access_log  /dev/stdout main;

    sendfile           on;
    keepalive_timeout  65;

    map $http_user_agent $prerender_ua {
        default       0;
        "~*Prerender" 0;

        "~*googlebot"                               1;
        "~*yahoo!\ slurp"                           1;
        "~*bingbot"                                 1;
        "~*yandex"                                  1;
        "~*baiduspider"                             1;
        "~*facebookexternalhit"                     1;
        "~*twitterbot"                              1;
        "~*rogerbot"                                1;
        "~*linkedinbot"                             1;
        "~*embedly"                                 1;
        "~*quora\ link\ preview"                    1;
        "~*showyoubot"                              1;
        "~*outbrain"                                1;
        "~*pinterest\/0\."                          1;
        "~*developers.google.com\/\+\/web\/snippet" 1;
        "~*slackbot"                                1;
        "~*vkshare"                                 1;
        "~*w3c_validator"                           1;
        "~*redditbot"                               1;
        "~*applebot"                                1;
        "~*whatsapp"                                1;
        "~*flipboard"                               1;
        "~*tumblr"                                  1;
        "~*bitlybot"                                1;
        "~*skypeuripreview"                         1;
        "~*nuzzel"                                  1;
        "~*discordbot"                              1;
        "~*google\ page\ speed"                     1;
        "~*qwantify"                                1;
        "~*pinterestbot"                            1;
        "~*bitrix\ link\ preview"                   1;
        "~*xing-contenttabreceiver"                 1;
        "~*chrome-lighthouse"                       1;
        "~*telegrambot"                             1;
        "~*google-inspectiontool"                   1;
        "~*petalbot"                                1;
    }

    map $args $prerender_args {
        default $prerender_ua;
        "~(^|&)_escaped_fragment_=" 1;
    }

    map $http_x_prerender $x_prerender {
        default $prerender_args;
        "1"     0;
    }

    map "${PRERENDER_TOKEN}" $bad_prerender_token {
        default $x_prerender;
        "NONE"  0;
    }

    map $uri $prerender {
        default $bad_prerender_token;
        "~*\.(css|js|xml|less|png|jpg|jpeg|gif|pdf|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|woff2|svg|eot)" 0;
    }

    server {
        listen       *:80 default_server;
        # disable until we are more confident everyone using the container will have IPv6 enabled
        # listen       [::]:80 default_server;

        # suggested headers for good security posture from https://securityheaders.com/
        set $scriptCSP "'self' <additional domains>";
        set $fontCSP "'self' <additional domains>";
        set $styleCSP "'self' 'unsafe-inline' <additional domains>";
        set $styleelemCSP "'self' 'unsafe-inline' <additional domains>";
        set $imgCSP "'self' blob: data: <additional domains>";
        set $frameCSP "'self' <additional domains>";
        set $connectCSP "'self' <additional domains>";
        set $defaultCSP "'self' <additional domains>";
        add_header Content-Security-Policy "script-src $scriptCSP; style-src $styleCSP; style-src-elem $styleelemCSP; font-src $fontCSP; frame-ancestors $frameCSP; img-src $imgCSP; connect-src $connectCSP; default-src $defaultCSP";
        add_header Referrer-Policy "strict-origin-when-cross-origin";
        add_header Access-Control-Allow-Origin "*";
        add_header Permissions-Policy "geolocation=(self), microphone=(), camera=(), fullscreen=(self), payment=(), usb=(), autoplay=(), display-capture=()";
        add_header X-Content-Type-Options "nosniff";
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains";

        # our root path should never change, so set it outside any location blocks
        root /usr/share/nginx/html;

        # handle a few "passed through" locations so we can easily deploy management files
        location ~ "^/(sitemap.xml|robots.txt)$" {
            try_files $uri =404;
        }

        # we use two location blocks here to allow for better routing
        # location = PUBLIC_URL block *only* routes traffic that hits the root path exactly
        # location ~ ^PUBLIC_URL(.*) block routes any internal paths to their correct potential path in the filesystem
        # e.g. localhost/app/static/img/logo.png -> /usr/share/nginx/html/static/img/logo.png
        location = ${PUBLIC_URL} {
            try_files /index.html =404;
        }

        location ~* ^${PUBLIC_URL}(.*) {
            # if we don't do this, we run into the "regex in regex" problem where nginx will somehow consume our needed capture group
            set $path $1;

            if ($prerender = 1) {
                rewrite .* /prerenderio last;
            }

            # need /$path (vs just $path) for the case where PUBLIC_URL == /
            try_files /$path /index.html =404;
        }

        # code pulled from https://github.com/prerender/prerender-nginx/blob/master/nginx.conf
        location = /prerenderio {
            if ($prerender = 0) {
                return 404;
            }
            proxy_set_header X-Prerender-Token ${PRERENDER_TOKEN};
            proxy_hide_header Cache-Control;
            add_header Cache-Control "private,max-age=600,must-revalidate";

            # add explicit resolver to force DNS resolution and prevent caching of IPs
            resolver 208.67.222.222 208.67.220.220;
            # setting the host as a variable helps prevent IP caching
            set $prerender_host "service.prerender.io";
            proxy_pass https://$prerender_host;

            # hard-code TLS endpoint to avoid post-LB routing not being encrypted
            # also remove / from between $host and $request_uri to avoid duplicate /
            rewrite .* /https://$host$request_uri? break;
        }

        gzip on;
        gzip_vary on;
        gzip_min_length 10240;
        gzip_proxied expired no-cache no-store private auth;
        gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
        gzip_disable "MSIE [1-6]\.";
    }
}

Building

So we have all the parts, but how do we actually build a nginx container with the static site? We’re using docker, right? I sure hope so…

FROM node:18.16.1 AS web-builder
ARG API_URL=http://example.com
ARG APPLICATION_URL=http://example.com
ARG ENV=testing
ARG PUBLIC_URL=/
RUN echo "Building against $ENV :: $API_URL :: $APPLICATION_URL => Subpath: $PUBLIC_URL"
ARG SEGMENT_KEY=testing
ARG PRERENDER_TOKEN=NONE
ENV REACT_APP_API_URL=$API_URL
ENV REACT_APP_ENV=$ENV
ENV REACT_APP_SEGMENT_KEY=$SEGMENT_KEY
ENV APPLICATION_URL=$APPLICATION_URL
ENV API_URL=$API_URL
ENV PUBLIC_URL=$PUBLIC_URL
ENV CI=false

RUN apt-get update -qq \
  && DEBIAN_FRONTEND=noninteractive apt-get install -qq -y --no-install-recommends gettext-base ca-certificates \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

###
# < copy our actual code in here, including any yarn commands to install dependencies
###

RUN npm install -g serve
RUN PORT=80 yarn workspace web build
COPY packages/web/deploy/nginx.conf.template /nginx.conf.template
RUN envsubst '$PUBLIC_URL $PRERENDER_TOKEN' < /nginx.conf.template > /nginx.conf
###
# any additional template files can be processed here (robots.txt, sitemaps, etc.)
###

FROM nginx:alpine as web-deploy
COPY --from=web-builder /code/build/ /usr/share/nginx/html/
COPY --from=web-builder /nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Conclusion

That’s a lot. And this was a lot of work to get sorted out. The main goal of this post is to make this all available so someone else might find it in the future and not have to spend a long time digging through documentation.