Subpath routing and the developer
Disclaimers
A few quick disclaimers:
- I may get some details wrong here. Routing (and specific server implementations of more complex routing patterns) is complicated.
- This worked for us, but there are security concerns:
- Primarily, cookies can be shared across the different services using the same root domain, which means more vigilance is required
- This pattern is clearly not a common path/solution, and thus there are fewer experienced hands, meaning other potential issues may be harder to resolve.
Content
Let’s talk about subpath routing…
In HTTP parlance, the “subpath” is any part of a URI after the domain. e.g. /site/index.html in https://example.com/site/index.html.
So let’s say you have a nice website advertising your SaaS app at https://example.com. And your marketing team want to improve SEO by adding the app under https://example.com/site. We now need to implement subpath routing so that all requests to https://example.com/site go to our SaaS app, while all requests to other subpaths (e.g. https://example.com/about) still go to our marketing site.
If we’re starting with a SPA, we also need to address bot traffic. Bots obviously don’t process javascript well, so we need a way to “pre-render” pages for the (properly behaving) bots.
Ingress
So first we need a load balancer to route traffic between the app and the marketing site. Since $JOB is a GCP shop, we’ll stand up a Cloud Load Balancer to handle initial ingress (including IPv6 support, right?):
locals {
routing_domains = ["example.com"],
routing_root_domain = "example.com"
}
resource "google_compute_security_policy" "routing-policy" {
name = "routing-layer"
type = "CLOUD_ARMOR"
adaptive_protection_config {
layer_7_ddos_defense_config {
enable = true
rule_visibility = "STANDARD"
}
}
rule {
action = "allow"
description = "Default rule, higher priority overrides it"
preview = false
priority = 2147483647
match {
versioned_expr = "SRC_IPS_V1"
config {
src_ip_ranges = [
"*",
]
}
}
}
}
resource "google_compute_url_map" "routing-mappings" {
name = "routing-mappings"
default_url_redirect {
host_redirect = local.routing_root_domain
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
strip_query = true
}
host_rule {
hosts = local.routing_domains
path_matcher = "routing"
}
path_matcher {
name = "routing"
default_service = google_compute_backend_service.routing-marketing.self_link
path_rule {
content {
service = google_compute_backend_service.routing-app.id
}
}
}
}
resource "google_compute_global_network_endpoint_group" "routing_marketing" {
name = "routing-layer-marketing"
network_endpoint_type = "INTERNET_FQDN_PORT"
lifecycle {
create_before_destroy = true
}
}
resource "google_compute_backend_service" "routing-marketing" {
name = "routing-marketing"
protocol = "HTTPS"
security_policy = "routing-layer"
log_config {
enable = false
}
backend {
group = google_compute_global_network_endpoint_group.routing_marketing.id
}
}
resource "google_compute_region_network_endpoint_group" "routing_app" {
name = "routing-app"
network_endpoint_type = "SERVERLESS"
region = var.region
cloud_run {
service = google_cloud_run_service.app.name
}
}
resource "google_compute_backend_service" "routing-app" {
name = "routing-app"
protocol = "HTTPS"
security_policy = "routing-layer"
log_config {
enable = false
}
backend {
group = google_compute_region_network_endpoint_group.routing_app.id
}
}
resource "google_compute_target_https_proxy" "routing-https-proxy" {
name = "routing-https-proxy"
url_map = google_compute_url_map.routing-mappings.id
ssl_certificates = [
google_compute_managed_ssl_certificate.routing-cert.self_link,
]
}
resource "google_compute_url_map" "routing_https_redirect" {
name = "routing-https-redirect"
default_url_redirect {
https_redirect = true
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
strip_query = false
}
}
resource "google_compute_target_http_proxy" "routing_https_redirect" {
name = "routing-http-redirect"
url_map = google_compute_url_map.routing_https_redirect.self_link
}
resource "google_compute_global_address" "routing_lb_ipv4" {
name = "routing-lb-address-ipv4"
ip_version = "IPV4"
}
resource "google_compute_global_address" "routing_lb_ipv6" {
name = "routing-lb-address-ipv6"
ip_version = "IPV6"
}
resource "google_compute_global_forwarding_rule" "routing_http_forward_ipv4" {
name = "routing-http-ipv4"
target = google_compute_target_http_proxy.routing_https_redirect.self_link
ip_address = google_compute_global_address.routing_lb_ipv4.address
port_range = "80"
}
resource "google_compute_global_forwarding_rule" "routing_https_ipv4" {
name = "routing-https-ipv4"
target = google_compute_target_https_proxy.routing-https-proxy.self_link
ip_address = google_compute_global_address.routing_lb_ipv4.address
port_range = "443"
}
resource "google_compute_global_forwarding_rule" "routing_http_forward_ipv6" {
name = "routing-http-ipv6"
target = google_compute_target_http_proxy.routing_https_redirect.self_link
ip_address = google_compute_global_address.routing_lb_ipv6.address
port_range = "80"
}
resource "google_compute_global_forwarding_rule" "routing_https_ipv6" {
name = "routing-https-ipv6"
target = google_compute_target_https_proxy.routing-https-proxy.self_link
ip_address = google_compute_global_address.routing_lb_ipv6.address
port_range = "443"
}
resource "random_id" "routing-cert" {
byte_length = 8
prefix = "routing-"
keepers = {
domains = join(",", local.routing_domains)
}
}
resource "google_compute_managed_ssl_certificate" "routing-cert" {
managed {
domains = local.routing_domains
}
name = random_id.routing-cert.hex
lifecycle {
create_before_destroy = true
}
}
output "routing_ipv4" {
value = google_compute_global_address.routing_lb_ipv4.address
}
output "routing_ipv6" {
value = google_compute_global_address.routing_lb_ipv6.address
}
resource "google_cloud_run_service" "app" {
name = "app"
location = "us-central1"
template {
spec {
timeout_seconds = 300
containers {
image = "<container path...probably use GCP's Artifact Registry?>"
resources {
limits = {
cpu = "1000m"
memory = "1Gi"
}
}
ports {
container_port = 80
}
}
}
metadata {
labels = {
"run.googleapis.com/startupProbeType" = "Default"
}
annotations = {
# Limit scale up to prevent any cost blow outs!
"run.googleapis.com/client-name" = "gcloud"
"autoscaling.knative.dev/minScale" = "0"
"autoscaling.knative.dev/maxScale" = "100"
"run.googleapis.com/execution-environment" = "gen2"
"run.googleapis.com/cpu-throttling" = "true"
}
}
}
autogenerate_revision_name = true
metadata {
annotations = {
# For valid annotation values and descriptions, see
# https://cloud.google.com/sdk/gcloud/reference/run/deploy#--ingress
"run.googleapis.com/ingress" = "internal-and-cloud-load-balancing"
"run.googleapis.com/client-name" = "gcloud"
"client.knative.dev/user-image" = "<container path again...use the same path as above>"
}
}
lifecycle {
ignore_changes = [
# This annotation appears to be auto-generated somewhere in GCP, so avoid killing it from the terraform side
metadata.0.annotations["run.googleapis.com/operation-id"],
]
}
traffic {
percent = 100
latest_revision = true
}
}
resource "google_cloud_run_service_iam_binding" "app" {
location = google_cloud_run_service.app.location
service = google_cloud_run_service.app.name
role = "roles/run.invoker"
members = [
"allUsers"
]
}
Most of this terraform is boilerplate to set up a reasonable load balancer. Even routing to our SPA is simply pointing the load balancer at a Cloud Run app…
The challenge here is how to ensure the DNS resolution transfers through to our marketing site. If we can configure an alternate DNS entry on the actual marketing site (Wordpress, Squarespace, Netlify, and others should all support this), it should be fairly painless. What matters is actually defining the Internet Network Endpoint Group with the destination DNS entry of our marketing site. Something like:
gcloud --project example-project compute network-endpoint-groups update ${google_compute_global_network_endpoint_group.routing_marketing[0].name} --add-endpoint=\"fqdn=<dns entry>,port=443\" --global
Setting this value tells the load balancer where to actually ship traffic.
The App
Okay. That’s great and all…but what about actually routing a React app to a subpath? There are two parts here:
- Configuring the React App to compile using a subpath
- There are many potential ways to do this. This implementation was the most straight-forward for our usecase.
- Configuring the actual server process
- This is where pre-rendering actually comes in. More on that below.
- We landed on using
nginx, but other servers could also do this successfully (e.g. apache, traefik, etc.)
React configs
React can set a subpath route in a number of ways. We landed on the following configs:
Static configs
For the actual static site, we needed to ensure favicon(s)/etc. would load:
<link rel="manifest" href="%PUBLIC_URL%/manifest.json">
<link rel="shortcut icon" href="%PUBLIC_URL%/favicon.ico">
Router Configs
For the actual react router, we had to configure PUBLIC_URL:
import React from 'react';
import ReactDOM from 'react-dom';
import { HelmetProvider } from 'react-helmet-async';
import { Provider } from 'react-redux';
import { BrowserRouter as Router, Route } from 'react-router-dom';
import './styles/scss/index.scss';
import { QueryParamProvider } from 'use-query-params';
import { createBrowserHistory } from 'history';
import App from './App';
import { unregister } from './registerServiceWorker';
import store from './store';
const ROOT_URL = process.env.PUBLIC_URL || '/';
const history = createBrowserHistory({
basename: ROOT_URL,
});
ReactDOM.render(
<HelmetProvider>
<Provider store={store}>
<Router basename={ROOT_URL}>
<QueryParamProvider ReactRouterRoute={Route}>
<Route component={App} path="/" />
</QueryParamProvider>
</Router>
</Provider>
</HelmetProvider>,
document.getElementById('root')
);
// UNregister service worker
unregister();
This is a rough sketch of the react configurations. Some additional work for ensuring static paths was required, but the TL;DR of that story is essentially grep /static/ src/* -RI and manual updates.
Nginx configs
So a quick diversion around pre-rendering content: There are many options here (Cloudflare and other CDN friends spring to mind) but we landed on leveraging Prerender.io. I’m going to hand-wave away some of the decision details, but suffice to say, we are leveraging an external system for caching page renders and we need nginx to be smart enough to route requests from specific user agents to that service…
Also of note is the heavy use of maps in this nginx config. Maps are somewhat akin to dictionaries/hash tables in actual programming languages. We can essentially set key/value pairs (even with regexes!) to define what we want a value to be converted to. Maps can also call other maps, so you’ll actually see a pattern of maps calling other maps within them. This is closest to having actual if/then statements in nginx without breaking the general nginx rule of “don’t use if statements”.
user nginx;
worker_processes auto;
error_log stderr notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" (prerender: $prerender)';
log_format prerender_debug '[$time_local] $remote_addr - $remote_user - $server_name user-agent: $http_user_agent prerender: $prerender: $request to: $upstream_addr upstream_response_time: $upstream_response_time msec $msec request_time $request_time';
log_not_found on;
log_subrequest on;
# some basic caching of local files
open_file_cache max=200 inactive=180s;
open_file_cache_valid 120s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
access_log /dev/stdout main;
sendfile on;
keepalive_timeout 65;
map $http_user_agent $prerender_ua {
default 0;
"~*Prerender" 0;
"~*googlebot" 1;
"~*yahoo!\ slurp" 1;
"~*bingbot" 1;
"~*yandex" 1;
"~*baiduspider" 1;
"~*facebookexternalhit" 1;
"~*twitterbot" 1;
"~*rogerbot" 1;
"~*linkedinbot" 1;
"~*embedly" 1;
"~*quora\ link\ preview" 1;
"~*showyoubot" 1;
"~*outbrain" 1;
"~*pinterest\/0\." 1;
"~*developers.google.com\/\+\/web\/snippet" 1;
"~*slackbot" 1;
"~*vkshare" 1;
"~*w3c_validator" 1;
"~*redditbot" 1;
"~*applebot" 1;
"~*whatsapp" 1;
"~*flipboard" 1;
"~*tumblr" 1;
"~*bitlybot" 1;
"~*skypeuripreview" 1;
"~*nuzzel" 1;
"~*discordbot" 1;
"~*google\ page\ speed" 1;
"~*qwantify" 1;
"~*pinterestbot" 1;
"~*bitrix\ link\ preview" 1;
"~*xing-contenttabreceiver" 1;
"~*chrome-lighthouse" 1;
"~*telegrambot" 1;
"~*google-inspectiontool" 1;
"~*petalbot" 1;
}
map $args $prerender_args {
default $prerender_ua;
"~(^|&)_escaped_fragment_=" 1;
}
map $http_x_prerender $x_prerender {
default $prerender_args;
"1" 0;
}
map "${PRERENDER_TOKEN}" $bad_prerender_token {
default $x_prerender;
"NONE" 0;
}
map $uri $prerender {
default $bad_prerender_token;
"~*\.(css|js|xml|less|png|jpg|jpeg|gif|pdf|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|woff2|svg|eot)" 0;
}
server {
listen *:80 default_server;
# disable until we are more confident everyone using the container will have IPv6 enabled
# listen [::]:80 default_server;
# suggested headers for good security posture from https://securityheaders.com/
set $scriptCSP "'self' <additional domains>";
set $fontCSP "'self' <additional domains>";
set $styleCSP "'self' 'unsafe-inline' <additional domains>";
set $styleelemCSP "'self' 'unsafe-inline' <additional domains>";
set $imgCSP "'self' blob: data: <additional domains>";
set $frameCSP "'self' <additional domains>";
set $connectCSP "'self' <additional domains>";
set $defaultCSP "'self' <additional domains>";
add_header Content-Security-Policy "script-src $scriptCSP; style-src $styleCSP; style-src-elem $styleelemCSP; font-src $fontCSP; frame-ancestors $frameCSP; img-src $imgCSP; connect-src $connectCSP; default-src $defaultCSP";
add_header Referrer-Policy "strict-origin-when-cross-origin";
add_header Access-Control-Allow-Origin "*";
add_header Permissions-Policy "geolocation=(self), microphone=(), camera=(), fullscreen=(self), payment=(), usb=(), autoplay=(), display-capture=()";
add_header X-Content-Type-Options "nosniff";
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains";
# our root path should never change, so set it outside any location blocks
root /usr/share/nginx/html;
# handle a few "passed through" locations so we can easily deploy management files
location ~ "^/(sitemap.xml|robots.txt)$" {
try_files $uri =404;
}
# we use two location blocks here to allow for better routing
# location = PUBLIC_URL block *only* routes traffic that hits the root path exactly
# location ~ ^PUBLIC_URL(.*) block routes any internal paths to their correct potential path in the filesystem
# e.g. localhost/app/static/img/logo.png -> /usr/share/nginx/html/static/img/logo.png
location = ${PUBLIC_URL} {
try_files /index.html =404;
}
location ~* ^${PUBLIC_URL}(.*) {
# if we don't do this, we run into the "regex in regex" problem where nginx will somehow consume our needed capture group
set $path $1;
if ($prerender = 1) {
rewrite .* /prerenderio last;
}
# need /$path (vs just $path) for the case where PUBLIC_URL == /
try_files /$path /index.html =404;
}
# code pulled from https://github.com/prerender/prerender-nginx/blob/master/nginx.conf
location = /prerenderio {
if ($prerender = 0) {
return 404;
}
proxy_set_header X-Prerender-Token ${PRERENDER_TOKEN};
proxy_hide_header Cache-Control;
add_header Cache-Control "private,max-age=600,must-revalidate";
# add explicit resolver to force DNS resolution and prevent caching of IPs
resolver 208.67.222.222 208.67.220.220;
# setting the host as a variable helps prevent IP caching
set $prerender_host "service.prerender.io";
proxy_pass https://$prerender_host;
# hard-code TLS endpoint to avoid post-LB routing not being encrypted
# also remove / from between $host and $request_uri to avoid duplicate /
rewrite .* /https://$host$request_uri? break;
}
gzip on;
gzip_vary on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\.";
}
}
Building
So we have all the parts, but how do we actually build a nginx container with the static site? We’re using docker, right? I sure hope so…
FROM node:18.16.1 AS web-builder
ARG API_URL=http://example.com
ARG APPLICATION_URL=http://example.com
ARG ENV=testing
ARG PUBLIC_URL=/
RUN echo "Building against $ENV :: $API_URL :: $APPLICATION_URL => Subpath: $PUBLIC_URL"
ARG SEGMENT_KEY=testing
ARG PRERENDER_TOKEN=NONE
ENV REACT_APP_API_URL=$API_URL
ENV REACT_APP_ENV=$ENV
ENV REACT_APP_SEGMENT_KEY=$SEGMENT_KEY
ENV APPLICATION_URL=$APPLICATION_URL
ENV API_URL=$API_URL
ENV PUBLIC_URL=$PUBLIC_URL
ENV CI=false
RUN apt-get update -qq \
&& DEBIAN_FRONTEND=noninteractive apt-get install -qq -y --no-install-recommends gettext-base ca-certificates \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
###
# < copy our actual code in here, including any yarn commands to install dependencies
###
RUN npm install -g serve
RUN PORT=80 yarn workspace web build
COPY packages/web/deploy/nginx.conf.template /nginx.conf.template
RUN envsubst '$PUBLIC_URL $PRERENDER_TOKEN' < /nginx.conf.template > /nginx.conf
###
# any additional template files can be processed here (robots.txt, sitemaps, etc.)
###
FROM nginx:alpine as web-deploy
COPY --from=web-builder /code/build/ /usr/share/nginx/html/
COPY --from=web-builder /nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Conclusion
That’s a lot. And this was a lot of work to get sorted out. The main goal of this post is to make this all available so someone else might find it in the future and not have to spend a long time digging through documentation.