A step-by-step guide to installing Firecrawl with all web traffic routed through a VPN tunnel using Gluetun, along with a self-hosted SearXNG search engine.
Create the base directory and all required subdirectories:
# Create base directory (adjust path to your preference)
mkdir -p /path/to/dockers/arrs
cd /path/to/dockers/arrs
# Create data directories
mkdir -p Downloads
mkdir -p firecrawl
mkdir -p firecrawl-redis
mkdir -p firecrawl-postgres
mkdir -p searxng-tunneled
Firecrawl needs to be built from source. Clone the repository one level above your docker-compose directory:
# From the parent directory of your arrs folder
cd /path/to/dockers
git clone https://github.com/mendableai/firecrawl.git firecrawl-src
The docker-compose file references ../firecrawl-src/apps/ for building the API, Playwright service, and NuQ PostgreSQL images.
Obtain your WireGuard credentials from your VPN provider:
Add the following service to your docker-compose.yaml:
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
cap_add:
- NET_ADMIN
devices:
- /dev/net/tun:/dev/net/tun
ports:
- 3002:3002 # Firecrawl API
- 8001:8080 # SearXNG (internal 8080 -> external 8001)
- 8888:8888/tcp # HTTP proxy (optional, for external clients)
- 8002:8000/tcp # Gluetun HTTP control server (optional)
# Add other ports for services you want tunneled
volumes:
- /path/to/dockers/arrs:/gluetun
- /path/to/dockers/arrs/forwarded_port:/tmp/gluetun/forwarded_port
- /path/to/dockers/arrs/auth:/gluetun/auth # Auth config for control server / proxy
environment:
- FIREWALL_OUTBOUND_SUBNETS=<YOUR_DOCKER_NETWORK_SUBNET>
- VPN_SERVICE_PROVIDER=<your_vpn_provider>
- VPN_TYPE=wireguard
- WIREGUARD_PRIVATE_KEY=<your_wireguard_private_key>
- WIREGUARD_ADDRESS=<your_wireguard_ip>/32
- VPN_PORT_FORWARDING=on
- HTTPPROXY=on # Enable built-in HTTP proxy on port 8888
- HTTPPROXY_LOG=on # Optional: log proxy requests
- TZ=America/Chicago
- UPDATER_PERIOD=24h
networks:
- arrs
restart: unless-stopped
Gluetun can run an HTTP proxy on port 8888, useful for routing external clients or tools through the VPN.
All routes in the control server and proxy require authentication. Configure it via a TOML file mounted at /gluetun/auth/config.toml.
Example auth/config.toml:
[[roles]]
name = "proxy"
routes = [
"GET /v1/publicip/ip",
"GET /v1/openvpn/status",
"PUT /v1/openvpn/status"
]
auth = "basic"
username = "your_username"
password = "your_password"
To test the HTTP proxy from another machine:
curl -v -x "http://your_username:your_password@
If the response shows a different IP (your VPN IP), the proxy is working.
This is the most important setting and the #1 cause of connection issues.
The FIREWALL_OUTBOUND_SUBNETS environment variable tells Gluetun which subnets should bypass the VPN tunnel. Redis, PostgreSQL, and RabbitMQ must bypass the VPN because they run on the Docker bridge network, not through the VPN.
To find your Docker network subnet:
# After docker-compose creates the network, run:
docker network inspect <network_name> --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
For a bridge network named arrs, Docker typically assigns 172.18.0.0/16 or similar. You must use the actual subnet assigned by Docker, not a hardcoded value.
Set this value in the Gluetun environment:
FIREWALL_OUTBOUND_SUBNETS=172.XX.0.0/16
Gluetun must be connected to the same Docker network (arrs) as Redis, PostgreSQL, and RabbitMQ. Without this, services using network_mode: "service:gluetun" cannot reach the internal services.
networks:
- arrs
At the bottom of your docker-compose.yaml, define the network:
networks:
arrs:
driver: bridge
Recommended pattern:
networks: - arrs.network_mode: "service:gluetun".Create a .env file in your project directory:
# ============================================================
# Firecrawl Self-Host Configuration
# ============================================================
# ---- Required ----
FIRECRAWL_PORT=3002
FIRECRAWL_USE_DB_AUTH=false
# ---- PostgreSQL ----
FIRECRAWL_POSTGRES_USER=firecrawl
FIRECRAWL_POSTGRES_PASSWORD=<CHANGE_ME_STRONG_PASSWORD>
FIRECRAWL_POSTGRES_DB=firecrawl
# ---- Redis ----
# Use Docker service name (no hardcoded IP)
FIRECRAWL_REDIS_URL=redis://redis:6379
# ---- Playwright ----
FIRECRAWL_PLAYWRIGHT_URL=http://playwright-service:3000/scrape
# ---- AI / LLM (OpenAI-compatible endpoint) ----
FIRECRAWL_OPENAI_BASE_URL=<your_llm_endpoint>
FIRECRAWL_OPENAI_API_KEY=<your_api_key>
FIRECRAWL_MODEL_NAME=<your_model_name>
FIRECRAWL_MODEL_EMBEDDING_NAME=
# ---- /search API (SearXNG) ----
FIRECRAWL_SEARXNG_ENDPOINT=http://<your_server_ip>:8001
# ---- Queue Admin UI ----
# Access at: http://<host>:3002/admin/<FIRECRAWL_BULL_AUTH_KEY>/queues
FIRECRAWL_BULL_AUTH_KEY=<CHANGE_ME_STRONG_KEY>
FIRECRAWL_TEST_API_KEY=
# ---- Worker / Concurrency (adjust for your hardware) ----
FIRECRAWL_NUM_WORKERS=16
FIRECRAWL_CONCURRENT_REQUESTS=20
FIRECRAWL_MAX_JOBS=10
FIRECRAWL_BROWSER_POOL=10
# ---- System Resource Thresholds ----
FIRECRAWL_MAX_CPU=0.85
FIRECRAWL_MAX_RAM=0.90
# ---- Logging ----
FIRECRAWL_LOGGING_LEVEL=info
SEARXNG_VERSION=latest
SEARXNG_HOST=[::]
PUID=1000
PGID=1001
UMASK=002
TZ=America/Chicago
INSTANCE_NAME='Tunneled SearXNG'
services:
# ---------------------------------------------------------
# Gluetun VPN Tunnel
# ---------------------------------------------------------
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
cap_add:
- NET_ADMIN
devices:
- /dev/net/tun:/dev/net/tun
ports:
- 3002:3002 # Firecrawl API
- 8001:8080 # SearXNG
- 8888:8888/tcp # HTTP proxy (optional)
- 8002:8000/tcp # Gluetun HTTP control server (optional)
volumes:
- ./forwarded_port:/tmp/gluetun/forwarded_port
- ./auth:/gluetun/auth # Auth config for control server / proxy
environment:
- FIREWALL_OUTBOUND_SUBNETS=<YOUR_DOCKER_NETWORK_SUBNET>
- VPN_SERVICE_PROVIDER=<provider>
- VPN_TYPE=wireguard
- WIREGUARD_PRIVATE_KEY=<your_key>
- WIREGUARD_ADDRESS=<your_ip>/32
- VPN_PORT_FORWARDING=on
- HTTPPROXY=on
- HTTPPROXY_LOG=on
- TZ=America/Chicago
- UPDATER_PERIOD=24h
networks:
- arrs
restart: unless-stopped
# ---------------------------------------------------------
# SearXNG (on arrs network, web traffic via Gluetun proxy)
# ---------------------------------------------------------
searxng:
container_name: searxng
image: docker.io/searxng/searxng:${SEARXNG_VERSION:-latest}
networks:
- arrs
ports:
- "8001:8080"
env_file: ./.env
volumes:
- ./searxng-tunneled/:/etc/searxng/:Z
- ./searxng-tunneled/core-data:/var/cache/searxng/
restart: unless-stopped
depends_on:
- gluetun
# ---------------------------------------------------------
# Firecrawl Playwright Service (on arrs network, web traffic via Gluetun proxy)
# ---------------------------------------------------------
playwright-service:
build:
context: ../firecrawl-src/apps/playwright-service-ts
dockerfile: Dockerfile
shm_size: "2g"
networks:
- arrs
depends_on:
- gluetun
environment:
PORT: "3000"
# Route browser traffic through Gluetun HTTP proxy
PROXY_SERVER: "http://gluetun:8888"
PROXY_USERNAME: ${PROXY_USERNAME:-}
PROXY_PASSWORD: ${PROXY_PASSWORD:-}
BLOCK_MEDIA: ${BLOCK_MEDIA:-}
NO_PROXY: "localhost,127.0.0.1,redis,nuq-postgres,playwright-service,host.docker.internal"
MAX_CONCURRENT_PAGES: ${CRAWL_CONCURRENT_REQUESTS:-20}
cpus: 8.0
mem_limit: 16G
memswap_limit: 16G
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
compress: "true"
# ---------------------------------------------------------
# Firecrawl API (on arrs network, web traffic via Gluetun proxy)
# ---------------------------------------------------------
firecrawl-api:
build:
context: ../firecrawl-src/apps/api
dockerfile: Dockerfile
ulimits:
nofile:
soft: 65535
hard: 65535
networks:
- arrs
ports:
- "3002:3002"
depends_on:
gluetun:
condition: service_started
redis:
condition: service_started
rabbitmq:
condition: service_started
playwright-service:
condition: service_started
nuq-postgres:
condition: service_healthy
environment:
HOST: "0.0.0.0"
PORT: "3002"
WORKER_PORT: "3005"
ENV: local
# Use Docker service names (no hardcoded IPs)
REDIS_URL: "redis://redis:6379"
REDIS_RATE_LIMIT_URL: "redis://redis:6379"
PLAYWRIGHT_MICROSERVICE_URL: "http://playwright-service:3000/scrape"
POSTGRES_USER: ${FIRECRAWL_POSTGRES_USER:-firecrawl}
POSTGRES_PASSWORD: ${FIRECRAWL_POSTGRES_PASSWORD}
POSTGRES_DB: ${FIRECRAWL_POSTGRES_DB:-firecrawl}
POSTGRES_HOST: "nuq-postgres"
POSTGRES_PORT: "5432"
# For local-only/self-hosted use, disable DB auth to avoid Supabase errors
USE_DB_AUTHENTICATION: "false"
NUM_WORKERS_PER_QUEUE: ${FIRECRAWL_NUM_WORKERS:-16}
CRAWL_CONCURRENT_REQUESTS: ${FIRECRAWL_CONCURRENT_REQUESTS:-20}
MAX_CONCURRENT_JOBS: ${FIRECRAWL_MAX_JOBS:-10}
BROWSER_POOL_SIZE: ${FIRECRAWL_BROWSER_POOL:-10}
OPENAI_BASE_URL: ${FIRECRAWL_OPENAI_BASE_URL}
OPENAI_API_KEY: ${FIRECRAWL_OPENAI_API_KEY}
MODEL_NAME: ${FIRECRAWL_MODEL_NAME}
MODEL_EMBEDDING_NAME: ${FIRECRAWL_MODEL_EMBEDDING_NAME}
BULL_AUTH_KEY: ${FIRECRAWL_BULL_AUTH_KEY}
TEST_API_KEY: ${FIRECRAWL_TEST_API_KEY}
LOGGING_LEVEL: ${FIRECRAWL_LOGGING_LEVEL:-info}
# Route scraping traffic through Gluetun VPN via HTTP proxy
PROXY_SERVER: "http://gluetun:8888"
PROXY_USERNAME: ${PROXY_USERNAME:-}
PROXY_PASSWORD: ${PROXY_PASSWORD:-}
NO_PROXY: "localhost,127.0.0.1,redis,nuq-postgres,playwright-service,host.docker.internal"
SEARXNG_ENDPOINT: ${FIRECRAWL_SEARXNG_ENDPOINT}
SEARXNG_ENGINES: ${FIRECRAWL_SEARXNG_ENGINES:-}
SEARXNG_CATEGORIES: ${FIRECRAWL_SEARXNG_CATEGORIES:-}
MAX_CPU: ${FIRECRAWL_MAX_CPU:-0.85}
MAX_RAM: ${FIRECRAWL_MAX_RAM:-0.90}
NUQ_RABBITMQ_URL: "amqp://rabbitmq:5672"
volumes:
- ./firecrawl:/app/data
cpus: 16.0
mem_limit: 32G
memswap_limit: 32G
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
compress: "true"
# ---------------------------------------------------------
# Redis (internal, NO VPN)
# ---------------------------------------------------------
redis:
image: redis:7-alpine
container_name: firecrawl-redis
# Use noeviction so jobs/queues are not silently dropped when memory is full
command: redis-server --bind 0.0.0.0 --maxmemory 8gb --maxmemory-policy noeviction
volumes:
- ./firecrawl-redis:/data
networks:
- arrs
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "3"
compress: "true"
# ---------------------------------------------------------
# PostgreSQL / NuQ (internal, NO VPN)
# ---------------------------------------------------------
nuq-postgres:
build:
context: ../firecrawl-src/apps/nuq-postgres
dockerfile: Dockerfile
container_name: firecrawl-nuq-postgres
environment:
POSTGRES_USER: ${FIRECRAWL_POSTGRES_USER:-firecrawl}
POSTGRES_PASSWORD: ${FIRECRAWL_POSTGRES_PASSWORD}
POSTGRES_DB: ${FIRECRAWL_POSTGRES_DB:-firecrawl}
volumes:
- ./firecrawl-postgres:/var/lib/postgresql/data
networks:
- arrs
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${FIRECRAWL_POSTGRES_USER:-firecrawl} -d ${FIRECRAWL_POSTGRES_DB:-firecrawl}"]
start_period: 30s
interval: 10s
timeout: 5s
retries: 10
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
compress: "true"
# ---------------------------------------------------------
# RabbitMQ for Firecrawl NuQ (internal, NO VPN)
# ---------------------------------------------------------
rabbitmq:
image: rabbitmq:3-management-alpine
container_name: firecrawl-rabbitmq
networks:
- arrs
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "3"
compress: "true"
networks:
arrs:
driver: bridge
cd /path/to/dockers/arrs
docker compose up -d
After the services start, get the IP addresses assigned to Redis and PostgreSQL:
# Get Redis IP
docker inspect firecrawl-redis --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
# Get PostgreSQL IP
docker inspect firecrawl-nuq-postgres --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
# Get the Docker network subnet (for FIREWALL_OUTBOUND_SUBNETS)
docker network inspect <your_project>_arrs --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
Update your configuration with these values:
FIRECRAWL_REDIS_URL should use the Redis service name (redis://redis:6379), not a hardcoded IP.POSTGRES_HOST in firecrawl-api should use the service name nuq-postgres, not a hardcoded IP.FIREWALL_OUTBOUND_SUBNETS in gluetun should use the network subnet.If your PostgreSQL volume already has data from a previous installation, the NuQ schema initialization script (nuq.sql) will NOT run automatically (it only runs on first initialization). Apply it manually:
# Copy the SQL file into the container
docker cp ../firecrawl-src/apps/nuq-postgres/nuq.sql firecrawl-nuq-postgres:/tmp/nuq.sql
# Execute it against the database
docker exec firecrawl-nuq-postgres psql -U <postgres_user> -d <postgres_db> -f /tmp/nuq.sql
Expected output: Some CREATE TABLE, CREATE INDEX, and CREATE SCHEMA messages. Cron-related errors are non-fatal and can be ignored.
After updating IPs and applying the schema:
docker compose up -d firecrawl-api playwright-service
docker ps --filter name=firecrawl --filter name=gluetun --filter name=searxng --filter name=rabbitmq
All should show Up status (not Restarting).
curl -X POST http://<your_server_ip>:3002/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Expected response:
{
"success": true,
"data": {
"markdown": "Example Domain...",
"metadata": {
"title": "Example Domain",
"statusCode": 200,
...
}
}
}
Open http://<your_server_ip>:8001 in your browser. You should see the SearXNG search interface.
# Check for Redis connection errors
docker logs arrs-firecrawl-api-1 2>&1 | grep -i "redis.*error\|ETIMEDOUT"
# Check for database errors
docker logs arrs-firecrawl-api-1 2>&1 | grep -i "postgres.*error\|relation.*does not exist"
# Check for RabbitMQ errors
docker logs arrs-firecrawl-api-1 2>&1 | grep -i "rabbitmq\|NUQ_RABBITMQ"
Symptom: Error: connect ETIMEDOUT in firecrawl-api logs when connecting to Redis.
Cause: Gluetun's firewall is blocking traffic to the Docker network subnet.
Fix:
FIREWALL_OUTBOUND_SUBNETS matches your Docker network subnet:
docker network inspect <project>_arrs --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
docker inspect gluetun --format '{{json .Config.Env}}' | grep FIREWALL_OUTBOUND_SUBNETSarrs network:
docker inspect gluetun --format '{{json .NetworkSettings.Networks}}'
It should show the arrs network, not just arrs_default.
Symptom: relation "nuq.queue_scrape" does not exist or similar.
Cause: The NuQ schema was not initialized (common when PostgreSQL volume has existing data).
Fix: Apply the schema manually (see Step 8.2).
Symptom: extract-worker crashes with Error: NUQ_RABBITMQ_URL is not configured.
Fix: Ensure the NUQ_RABBITMQ_URL environment variable is set in the firecrawl-api service:
NUQ_RABBITMQ_URL: amqp://rabbitmq:5672
And that the rabbitmq service is listed in depends_on.
Symptom: Logs show:
Cause: DB/Supabase auth is enabled but not configured.
Fix (recommended for local/self-hosted):
Symptom: firecrawl-api cannot connect to Redis/PostgreSQL/RabbitMQ, or shows:
Cause: Containers were started from an older configuration and are not on the correct network.
Fix:
Internet
|
v
[Gluetun VPN Tunnel]
|
+-- HTTP Proxy (port 8888)
|
+-- firecrawl-api (networks: arrs, uses PROXY_SERVER=http://gluetun:8888)
| |
| +-- playwright-service (networks: arrs, uses PROXY_SERVER=http://gluetun:8888)
| |
| +-- searxng (networks: arrs, web traffic via same proxy)
|
[arrs Docker Network] <--- All services share this network
|
+-- redis
+-- nuq-postgres
+-- rabbitmq
Key Points:
arrs Docker network.networks: - arrs plus an HTTP proxy is preferred over network_mode: "service:gluetun" because:
FIREWALL_OUTBOUND_SUBNETS must match the Docker network subnet so Gluetun allows traffic to internal services.| Service | CPU | RAM |
|---|---|---|
| firecrawl-api | 16 cores | 32 GB |
| playwright-service | 8 cores | 16 GB |
| redis | - | 8 GB (maxmemory) |
| nuq-postgres | - | - |
| rabbitmq | - | - |
| gluetun | - | - |
| Total Minimum | ~24 cores | ~56 GB |
Adjust cpus and mem_limit values in docker-compose.yaml based on your hardware.
cd /path/to/dockers/firecrawl-src
git pull
cd ../arrs
docker compose build firecrawl-api playwright-service nuq-postgres
docker compose up -d
# PostgreSQL
docker exec firecrawl-nuq-postgres pg_dump -U firecrawl firecrawl > backup.sql
# Redis
docker exec firecrawl-redis redis-cli BGSAVE
# RabbitMQ (definitions)
docker exec firecrawl-rabbitmq rabbitmqctl dump_definitions /backup/definitions.json
docker cp firecrawl-rabbitmq:/backup/definitions.json ./rabbitmq-definitions.json
# PostgreSQL
docker cp backup.sql firecrawl-nuq-postgres:/tmp/backup.sql
docker exec firecrawl-nuq-postgres psql -U firecrawl firecrawl -f /tmp/backup.sql
| Variable | Description | Default | Required |
|---|---|---|---|
FIRECRAWL_PORT |
API port | 3002 |
Yes |
FIRECRAWL_USE_DB_AUTH |
Enable Supabase-style DB auth (for local/self-hosted, set to false) | false |
No |
FIRECRAWL_POSTGRES_USER |
PostgreSQL username | firecrawl |
Yes |
FIRECRAWL_POSTGRES_PASSWORD |
PostgreSQL password | - | Yes |
FIRECRAWL_POSTGRES_DB |
PostgreSQL database name | firecrawl |
Yes |
FIRECRAWL_REDIS_URL |
Redis connection URL | - | Yes |
FIRECRAWL_PLAYWRIGHT_URL |
Playwright service URL | http://playwright-service:3000/scrape |
Yes |
FIRECRAWL_OPENAI_BASE_URL |
LLM endpoint | - | No |
FIRECRAWL_OPENAI_API_KEY |
LLM API key | - | No |
FIRECRAWL_MODEL_NAME |
Model name for AI tasks | - | No |
FIRECRAWL_SEARXNG_ENDPOINT |
SearXNG endpoint | - | No |
FIRECRAWL_BULL_AUTH_KEY |
Queue admin auth key | - | Yes |
FIRECRAWL_NUM_WORKERS |
Number of queue workers | 16 |
No |
FIRECRAWL_CONCURRENT_REQUESTS |
Max concurrent crawl requests | 20 |
No |
FIRECRAWL_MAX_JOBS |
Max concurrent jobs | 10 |
No |
FIRECRAWL_BROWSER_POOL |
Browser pool size | 10 |
No |
FIRECRAWL_LOGGING_LEVEL |
Log level | info |
No |
NUQ_RABBITMQ_URL |
RabbitMQ connection URL | amqp://rabbitmq:5672 |
Yes |