Firecrawl Self-Hosted Installation Guide (VPN-Tunneled Crawler)

Table of Contents

  1. Prerequisites
  2. Directory Structure
  3. Docker Network Configuration
  4. Gluetun VPN Setup
  5. .env File Template
  6. Complete docker-compose.yaml
  7. Database Schema Setup
  8. Starting Services
  9. Verification Tests
  10. Troubleshooting
  11. Maintenance
  12. Useful Commands
  13. Environment Variables Reference
  14. Architecture Diagrams

1. Prerequisites

Required Software

Software Minimum Version Purpose
Docker 24.0+ Container runtime
Docker Compose V2 (2.20+) Multi-container orchestration
Docker Buildx 0.12+ BuildKit image builder
Git 2.30+ Cloning repositories
ProtonVPN Account Active subscription VPN credentials for Gluetun

Install Docker Buildx (if not already installed)

# Check if Buildx is installed
docker buildx version

# If not installed, install it (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install -y docker-buildx-plugin

# Enable BuildKit
echo 'export DOCKER_BUILDKIT=1' >> ~/.bashrc
source ~/.bashrc

Hardware Requirements

Component Minimum Recommended
CPU 2 cores 4+ cores
RAM 4 GB 8 GB+
Disk 20 GB 50 GB+ SSD
Network Stable internet connection Low-latency to VPN servers

Network Requirements

  • Outbound HTTPS (443): Required for web scraping and VPN connectivity
  • Port 3002: Firecrawl API access (customizable)
  • Port 8085: SearXNG search access (customizable)
  • Port 3001: Admin UI (optional)

2. Directory Structure

Create the base directory structure for your deployment:

# Create base directory (adjust path to your preference)
mkdir -p /opt/firecrawl-vpn && cd /opt/firecrawl-vpn

# Create data directories
mkdir -p firecrawl-postgres
mkdir -p firecrawl-redis
mkdir -p firecrawl-data

Expected Directory Layout

/opt/firecrawl-vpn/
├── .env                          # Environment variables
├── docker-compose.yaml           # Service definitions
├── settings.yml                  # SearXNG configuration
├── firecrawl-postgres/           # PostgreSQL data volume
│   └── ...
├── firecrawl-redis/              # Redis data volume
│   └── dump.rdb
└── firecrawl-data/               # Firecrawl working directory
    └── ...

3. Docker Network Configuration

Firecrawl requires a custom Docker network for inter-service communication. The network subnet must be known ahead of time to configure Gluetun's firewall correctly.

Create the Docker Network

# Create the arrs network with a specific subnet
docker network create --driver bridge --subnet 172.28.0.0/16 arrs

IMPORTANT: Note the subnet (172.28.0.0/16). You will need this exact value for Gluetun's FIREWALL_OUTBOUND_SUBNETS setting in Section 4.

Verify Network Creation

docker network inspect arrs

4. Gluetun VPN Setup

Gluetun is a Docker container that creates a VPN tunnel using WireGuard (or OpenVPN). All traffic from containers using the VPN will exit through your ProtonVPN server.

4.1 Get VPN Credentials

  1. Log in to your ProtonVPN account
  2. Navigate to ConfigurationManual Connection
  3. Select a WireGuard server (preferably one with low load)
  4. Copy the following values:
    • Username: Your ProtonVPN username
    • Password: Your ProtonVPN password
    • Server Name: e.g., ca-wireguard.protonvpn.net

4.2 Gluetun Environment Variables

Set these essential variables in your docker-compose.yaml:

environment:
  - VPN_SERVICE_PROVIDER=protonvpn
  - VPN_TYPE=wireguard
  - WIREGUARD_PRIVATE_KEY=<YOUR_WIREGUARD_PRIVATE_KEY>
  - WIREGUARD_ADDRESSES=<YOUR_WIREGUARD_IP>/32
  - VPN_PORT_FORWARDING=on
  - HTTPPROXY=on                            # Enable HTTP proxy on 8888
  - HTTPPROXY_LOG=on                        # Optional: log proxy requests
  - FIREWALL_OUTBOUND_SUBNETS=172.28.0.0/16 # Must match your Docker network subnet
  - TZ=America/Chicago

4.3 CRITICAL: FIREWALL_OUTBOUND_SUBNETS

The FIREWALL_OUTBOUND_SUBNETS setting tells Gluetun which networks it should allow traffic to without going through the VPN tunnel. This is essential because Docker internal services (Redis, PostgreSQL, RabbitMQ) live on a Docker bridge network that must be reachable directly.

  - FIREWALL_OUTBOUND_SUBNETS=172.28.0.0/16

Why this matters: Without this setting, Gluetun will block all traffic to your internal Docker services, causing Redis connection timeouts and PostgreSQL connection failures. The value 172.28.0.0/16 must match the subnet you created in Section 3. If you used a different subnet, update this value accordingly.

How to find your Docker network subnet:

docker network inspect arrs | grep Subnet
# Output: "Subnet": "172.28.0.0/16"

4.4 Gluetun Network Configuration

Gluetun must be connected to the arrs network so that all services can reach each other via Docker DNS.

networks:
  - arrs

Recommended pattern:

  • All services (including Gluetun) use networks: - arrs.
  • Firecrawl API, Playwright, and SearXNG route external traffic through Gluetun’s HTTP proxy (port 8888) instead of using network_mode: "service:gluetun".
  • Redis, PostgreSQL, and RabbitMQ are internal-only (no VPN, no external exposure).

5. .env File Template

Create a .env file with the following template. Replace placeholder values with your own credentials.

# ============================================
# Firecrawl Core Settings
# ============================================
FIRECRAWL_VERSION=latest
FIRECRAWL_BULL_AUTH_KEY=YOUR_ADMIN_API_KEY_HERE
FIRECRAWL_BASE_URL=http://localhost:3002

# ============================================
# Database Authentication
# For local/self-hosted use, set to false to avoid Supabase errors
# ============================================
FIRECRAWL_USE_DB_AUTH=false

# ============================================
# PostgreSQL Configuration
# ============================================
POSTGRES_USER=firecrawl
POSTGRES_PASSWORD=YOUR_SECURE_PASSWORD_HERE
POSTGRES_DB=firecrawl
NUQ_POSTGRES_URL=postgresql://firecrawl:YOUR_SECURE_PASSWORD_HERE@nuq-postgres:5432/firecrawl

# ============================================
# Redis Configuration
# ============================================
REDIS_URL=redis://redis:6379
REDIS_RATE_LIMIT_URL=redis://redis:6379

# ============================================
# RabbitMQ Configuration (REQUIRED for NuQ workers)
# ============================================
NUQ_RABBITMQ_URL=amqp://rabbitmq:5672

# ============================================
# AI / LLM Configuration (OpenAI-compatible endpoint)
# ============================================
OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE
OPENAI_BASE_URL=https://api.openai.com/v1

# ============================================
# Worker / Concurrency Settings
# ============================================
FIRECRAWL_CONCURRENCY=10
PLAYWRIGHT_MAX_CONCURRENCY_PER_PROJECT=5

# ============================================
# System Resource Thresholds
# ============================================
CPU_THRESHOLD_PERCENT=80
MEMORY_THRESHOLD_PERCENT=90
DISK_THRESHOLD_PERCENT=85

# ============================================
# Logging
# ============================================
LOG_LEVEL=info

SearXNG Environment Variables

# ============================================
# SearXNG (Self-Hosted Search Engine)
# ============================================
SEARXNG_SECRET=YOUR_SEARXNG_SECRET_KEY_HERE
SEARXNG_IMAGE_PROXY=true
SEARXNG_PORT=8085
SEARXNG_BIND_ADDRESS=0.0.0.0

Security Note: Generate secure random values for all password and secret fields. Do not use the placeholder values shown above in production.


6. Complete docker-compose.yaml

Create a docker-compose.yaml file with all services. This configuration includes:

  • Gluetun (VPN tunnel)
  • Firecrawl API (main scraping service)
  • Playwright Service (browser-based scraping)
  • SearXNG (self-hosted search engine, VPN-tunneled)
  • Redis (session/rate limiting)
  • NuQ PostgreSQL (database for scraped data)
  • RabbitMQ (message queue for workers)
services:
  # ============================================
  # Gluetun VPN Tunnel
  # ============================================
  gluetun:
    image: qmcgaw/gluetun
    container_name: gluetun
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    ports:
      - 8888:8888/tcp    # HTTP proxy (used by Firecrawl/Playwright)
      - 8002:8000/tcp    # Gluetun HTTP control server (optional)
      # Add other ports for tunneled services as needed
    volumes:
      - ./auth:/gluetun/auth  # Auth config for control server / proxy
    environment:
      - VPN_SERVICE_PROVIDER=protonvpn
      - VPN_TYPE=wireguard
      - WIREGUARD_PRIVATE_KEY=<YOUR_WIREGUARD_PRIVATE_KEY>
      - WIREGUARD_ADDRESSES=<YOUR_WIREGUARD_IP>/32
      - VPN_PORT_FORWARDING=on
      - HTTPPROXY=on
      - HTTPPROXY_LOG=on
      - FIREWALL_OUTBOUND_SUBNETS=172.28.0.0/16
      - TZ=America/Chicago
      - UPDATER_PERIOD=24h
    networks:
      - arrs
    restart: unless-stopped

  # ============================================
  # Firecrawl API (on arrs network, web traffic via Gluetun proxy)
  # ============================================
  firecrawl-api:
    build:
      context: ../firecrawl-src/apps/api
      dockerfile: Dockerfile
    ulimits:
      nofile:
        soft: 65535
        hard: 65535
    networks:
      - arrs
    ports:
      - "3002:3002"
    depends_on:
      gluetun:
        condition: service_started
      redis:
        condition: service_started
      rabbitmq:
        condition: service_started
      playwright-service:
        condition: service_started
      nuq-postgres:
        condition: service_healthy
    environment:
      HOST: "0.0.0.0"
      PORT: "3002"
      WORKER_PORT: "3005"
      ENV: local

      # Use Docker service names (no hardcoded IPs)
      REDIS_URL: "redis://redis:6379"
      REDIS_RATE_LIMIT_URL: "redis://redis:6379"

      PLAYWRIGHT_MICROSERVICE_URL: "http://playwright-service:3000/scrape"

      POSTGRES_USER: ${FIRECRAWL_POSTGRES_USER:-firecrawl}
      POSTGRES_PASSWORD: ${FIRECRAWL_POSTGRES_PASSWORD}
      POSTGRES_DB: ${FIRECRAWL_POSTGRES_DB:-firecrawl}
      POSTGRES_HOST: "nuq-postgres"
      POSTGRES_PORT: "5432"

      # For local-only/self-hosted use, disable DB auth to avoid Supabase errors
      USE_DB_AUTHENTICATION: "false"
      NUM_WORKERS_PER_QUEUE: ${FIRECRAWL_NUM_WORKERS:-16}
      CRAWL_CONCURRENT_REQUESTS: ${FIRECRAWL_CONCURRENT_REQUESTS:-20}
      MAX_CONCURRENT_JOBS: ${FIRECRAWL_MAX_JOBS:-10}
      BROWSER_POOL_SIZE: ${FIRECRAWL_BROWSER_POOL:-10}

      OPENAI_BASE_URL: ${FIRECRAWL_OPENAI_BASE_URL}
      OPENAI_API_KEY: ${FIRECRAWL_OPENAI_API_KEY}
      MODEL_NAME: ${FIRECRAWL_MODEL_NAME}
      MODEL_EMBEDDING_NAME: ${FIRECRAWL_MODEL_EMBEDDING_NAME}

      BULL_AUTH_KEY: ${FIRECRAWL_BULL_AUTH_KEY}
      TEST_API_KEY: ${FIRECRAWL_TEST_API_KEY}
      LOGGING_LEVEL: ${FIRECRAWL_LOGGING_LEVEL:-info}

      # Route scraping traffic through Gluetun VPN via HTTP proxy
      PROXY_SERVER: "http://gluetun:8888"
      PROXY_USERNAME: ${PROXY_USERNAME:-}
      PROXY_PASSWORD: ${PROXY_PASSWORD:-}
      NO_PROXY: "localhost,127.0.0.1,redis,nuq-postgres,playwright-service,host.docker.internal"

      SEARXNG_ENDPOINT: ${FIRECRAWL_SEARXNG_ENDPOINT}
      SEARXNG_ENGINES: ${FIRECRAWL_SEARXNG_ENGINES:-}
      SEARXNG_CATEGORIES: ${FIRECRAWL_SEARXNG_CATEGORIES:-}

      MAX_CPU: ${FIRECRAWL_MAX_CPU:-0.85}
      MAX_RAM: ${FIRECRAWL_MAX_RAM:-0.90}

      NUQ_RABBITMQ_URL: "amqp://rabbitmq:5672"

    volumes:
      - ./firecrawl:/app/data
    cpus: 16.0
    mem_limit: 32G
    memswap_limit: 32G
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        compress: "true"

  # ============================================
  # Playwright Service (on arrs network, web traffic via Gluetun proxy)
  # ============================================
  playwright-service:
    build:
      context: ../firecrawl-src/apps/playwright-service-ts
      dockerfile: Dockerfile
    shm_size: "2g"
    networks:
      - arrs
    depends_on:
      - gluetun
    environment:
      PORT: "3000"
      # Route browser traffic through Gluetun HTTP proxy
      PROXY_SERVER: "http://gluetun:8888"
      PROXY_USERNAME: ${PROXY_USERNAME:-}
      PROXY_PASSWORD: ${PROXY_PASSWORD:-}
      BLOCK_MEDIA: ${BLOCK_MEDIA:-}
      NO_PROXY: "localhost,127.0.0.1,redis,nuq-postgres,playwright-service,host.docker.internal"
      MAX_CONCURRENT_PAGES: ${CRAWL_CONCURRENT_REQUESTS:-20}
    cpus: 8.0
    mem_limit: 16G
    memswap_limit: 16G
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        compress: "true"

  # ============================================
  # SearXNG (on arrs network, web traffic via Gluetun proxy)
  # ============================================
  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:${SEARXNG_VERSION:-latest}
    networks:
      - arrs
    ports:
      - "8001:8080"
    env_file: ./.env
    volumes:
      - ./searxng-tunneled/:/etc/searxng/:Z
      - ./searxng-tunneled/core-data:/var/cache/searxng/
    restart: unless-stopped
    depends_on:
      - gluetun

  # ============================================
  # Redis (internal, NO VPN)
  # ============================================
  redis:
    image: redis:7-alpine
    container_name: firecrawl-redis
    # Use noeviction so jobs/queues are not silently dropped when memory is full
    command: redis-server --bind 0.0.0.0 --maxmemory 8gb --maxmemory-policy noeviction
    volumes:
      - ./firecrawl-redis:/data
    networks:
      - arrs
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
        compress: "true"

  # ============================================
  # NuQ PostgreSQL (internal, NO VPN)
  # ============================================
  nuq-postgres:
    build:
      context: ../firecrawl-src/apps/nuq-postgres
      dockerfile: Dockerfile
    container_name: firecrawl-nuq-postgres
    environment:
      POSTGRES_USER: ${FIRECRAWL_POSTGRES_USER:-firecrawl}
      POSTGRES_PASSWORD: ${FIRECRAWL_POSTGRES_PASSWORD}
      POSTGRES_DB: ${FIRECRAWL_POSTGRES_DB:-firecrawl}
    volumes:
      - ./firecrawl-postgres:/var/lib/postgresql/data
    networks:
      - arrs
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${FIRECRAWL_POSTGRES_USER:-firecrawl} -d ${FIRECRAWL_POSTGRES_DB:-firecrawl}"]
      start_period: 30s
      interval: 10s
      timeout: 5s
      retries: 10
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        compress: "true"

  # ============================================
  # RabbitMQ (message queue for workers)
  # ============================================
  rabbitmq:
    image: rabbitmq:3-management-alpine
    container_name: firecrawl-rabbitmq
    networks:
      - arrs
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
        compress: "true"

networks:
  arrs:
    driver: bridge

7. Database Schema Setup

The NuQ PostgreSQL database requires specific tables for job queues (nuq.queue_scrape, nuq.queue_crawl_finished, etc.). These tables are created by the nuq.sql init script, but only on first database initialization. If your PostgreSQL volume already contains data from a previous installation, you must apply the schema manually.

7.1 Obtain the NuQ SQL Script

If you have the Firecrawl source code:

# Path to the NuQ SQL init script
FIRECRAWL_SRC_PATH=../firecrawl-src/apps/nuq-postgres/nuq.sql

If you don't have the source, clone it:

cd /opt
git clone https://github.com/mendableai/firecrawl.git
cd firecrawl

7.2 Apply the Schema Manually

# Copy the SQL file into the PostgreSQL container
docker cp ../firecrawl-src/apps/nuq-postgres/nuq.sql firecrawl-nuq-postgres:/tmp/nuq.sql

# Execute it against the database
docker exec firecrawl-nuq-postgres psql -U firecrawl -d firecrawl -f /tmp/nuq.sql

# Clean up
docker exec firecrawl-nuq-postgres rm /tmp/nuq.sql

7.3 Verify Schema Application

# Connect to PostgreSQL and list tables
docker exec -it firecrawl-nuq-postgres psql -U firecrawl -d firecrawl

# In psql, run:
\dt nuq.*

# Expected output should show:
#               List of relations
#  Schema |        Name        | Type  |  Owner
# --------+--------------------+-------+----------
#  nuq    | queue_crawl_finished | table | firecrawl
#  nuq    | queue_crawl_init   | table | firecrawl
#  nuq    | queue_scrape       | table | firecrawl
#  nuq    | queue_scrape_done  | table | firecrawl
# (4 rows)

8. Starting Services

8.1 Start All Services

# Navigate to your deployment directory
cd /opt/firecrawl-vpn

# Enable BuildKit for faster builds
export DOCKER_BUILDKIT=1
export COMPOSE_DOCKER_CLI_BUILD=1

# Start all services in detached mode
docker compose up -d --remove-orphans

8.2 Monitor Startup Logs

# Watch all container logs
docker compose logs -f

# Watch specific service logs
docker compose logs -f firecrawl-api
docker compose logs -f gluetun

Wait for all services to become healthy. The PostgreSQL container will show healthy status when ready.

8.3 Verify Container Status

docker compose ps

Expected output:

NAME                    STATUS
firecrawl-api           Up (healthy)
firecrawl-gluetun       Up
firecrawl-playwright    Up
firecrawl-rabbitmq      Up
firecrawl-redis         Up
firecrawl-searxng       Up
firecrawl-nuq-postgres  Up (healthy)

9. Verification Tests

9.1 Test Firecrawl API - Scrape

curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }' | jq .

Expected response: A JSON object with success: true and data containing the scraped page content.

9.2 Test Firecrawl API - Crawl

curl -X POST http://localhost:3002/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }' | jq .

Expected response: A JSON object with a jobId that you can use to check crawl status.

9.3 Test Firecrawl API - Search

curl -X POST http://localhost:3002/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "test query"
  }' | jq .

9.4 Test Firecrawl API - Extract

curl -X POST http://localhost:3002/v1/extract \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"]
  }' | jq .

9.5 Test SearXNG Search

curl "http://localhost:8085/search?q=test&format=json" | jq .

Expected response: JSON with search results from the self-hosted SearXNG engine.

9.6 Check Firecrawl Admin UI

Open your browser and navigate to:

http://localhost:3002/admin/YOUR_ADMIN_API_KEY_HERE/queues

You should see the BullMQ dashboard showing queue statuses for scrape, crawl, and extract jobs.

9.7 Verify VPN Tunnel is Working

Check the Gluetun container logs for your public IP:

docker logs firecrawl-gluetun 2>&1 | grep -i "public ip"

Or test by scraping a site that returns your IP:

curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://api.ipify.org?format=json"
  }' | jq .

The returned IP should match your ProtonVPN server's exit IP, not your real public IP.


10. Troubleshooting

10.1 Redis Connection Timeout (ETIMEDOUT)

Symptom: Firecrawl API logs show Error: connect ETIMEDOUT when connecting to Redis at 172.28.0.2:6379.

Root Cause: Gluetun's firewall is blocking traffic to the Docker bridge network because either:

  1. FIREWALL_OUTBOUND_SUBNETS does not match the actual Docker network subnet
  2. Gluetun is not connected to the arrs network

Fix:

# 1. Verify the Docker network subnet
docker network inspect arrs | grep Subnet

# 2. Check that FIREWALL_OUTBOUND_SUBNETS matches (in docker-compose.yaml line ~FIREWALL_OUTBOUND_SUBNETS)
# Should be: FIREWALL_OUTBOUND_SUBNETS=172.28.0.0/16 (or whatever your subnet is)

# 3. Verify Gluetun is on the arrs network
docker network inspect arrs | grep gluetun

# 4. If missing, add networks: - arrs to the gluetun service in docker-compose.yaml

# 5. Restart affected services
docker compose restart gluetun firecrawl-api

10.2 PostgreSQL "relation does not exist"

Symptom: Logs show relation "nuq.queue_scrape" does not exist or similar errors.

Root Cause: The NuQ database schema was never applied because the PostgreSQL volume already had data from a previous installation.

Fix:

# Apply the NuQ schema manually (see Section 7.2)
docker cp ../firecrawl-src/apps/nuq-postgres/nuq.sql firecrawl-nuq-postgres:/tmp/nuq.sql
docker exec firecrawl-nuq-postgres psql -U firecrawl -d firecrawl -f /tmp/nuq.sql
docker exec firecrawl-nuq-postgres rm /tmp/nuq.sql

# Restart firecrawl-api
docker compose restart firecrawl-api

10.3 "NUQ_RABBITMQ_URL is not configured"

Symptom: The extract-worker process crashes with Error: NUQ_RABBITMQ_URL is not configured.

Root Cause: RabbitMQ service is missing from the docker-compose.yaml or the environment variable is not set.

Fix:

# 1. Add RabbitMQ service to docker-compose.yaml (see Section 6)
rabbitmq:
  image: rabbitmq:3-management-alpine
  container_name: firecrawl-rabbitmq
  networks:
    - arrs
  restart: unless-stopped

# 2. Add NUQ_RABBITMQ_URL to firecrawl-api environment
NUQ_RABBITMQ_URL: amqp://rabbitmq:5672

# 3. Add rabbitmq to firecrawl-api depends_on
depends_on:
  rabbitmq:
    condition: service_started

# 4. Restart all services
docker compose up -d --remove-orphans

10.4 Supabase / DB authentication errors

Symptom: Logs show:

  • "Supabase environment variables aren't configured correctly"
  • "Supabase RR client is not configured"

Cause: DB/Supabase auth is enabled but not configured.

Fix (recommended for local/self-hosted):

  • Set:
    • USE_DB_AUTHENTICATION: "false"
  • This disables Supabase-style auth and avoids these errors.

10.5 Services Cannot Reach Each Other

Symptom: firecrawl-api cannot connect to Redis/PostgreSQL/RabbitMQ, or shows:

  • "getaddrinfo EAI_AGAIN nuq-postgres"

Cause: Containers were started from an older configuration and are not on the correct network.

Fix:

  1. Ensure all services (gluetun, firecrawl-api, playwright-service, redis, nuq-postgres, rabbitmq) use:
    • networks:
      • arrs
  2. Recreate affected containers:
    • docker compose up -d --force-recreate firecrawl-api playwright-service redis nuq-postgres rabbitmq

10.6 Gluetun Fails to Connect to VPN

Symptom: Gluetun logs show repeated connection attempts or authentication failures.

Fix:

# Check Gluetun logs for specific error
docker logs firecrawl-gluetun --tail 50

# Common fixes:
# 1. Verify PROTONVPN_USER and PROTONVPN_PASS are correct
# 2. Check if the selected server is available
# 3. Try a different VPN_TYPE (wireguard vs openvpn)
# 4. Ensure port /dev/net/tun is mounted

10.7 Outbound Traffic Not Going Through VPN

Symptom: Scraped content shows your real IP instead of VPN exit IP.

Fix:

  1. Verify Gluetun is connected:
    • docker logs gluetun | grep -i "public ip"
  2. Ensure firecrawl-api and playwright-service use:
  3. Ensure FIREWALL_OUTBOUND_SUBNETS only includes internal Docker subnets.

10.8 SearXNG Not Returning Results

Symptom: SearXNG returns empty results or errors when queried.

Fix:

# 1. Check SearXNG logs
docker logs firecrawl-searxng --tail 50

# 2. Verify settings.yml is mounted correctly
docker compose exec firecrawl-searxng ls /etc/searxng/settings.yml

# 3. Ensure SEARXNG_SECRET is set (required for security)
# 4. Check that SearXNG is on the same network as firecrawl-api (via gluetun)

10.9 High Resource Usage

Symptom: System becomes unresponsive or containers are killed due to OOM.

Fix:

# 1. Reduce concurrency in .env
FIRECRAWL_CONCURRENCY=5
PLAYWRIGHT_MAX_CONCURRENCY_PER_PROJECT=2

# 2. Limit container memory in docker-compose.yaml
deploy:
  resources:
    limits:
      memory: 2G

# 3. Monitor resource usage
docker stats

10.10 Pre-built Image Issues

Note: If using pre-built Firecrawl images instead of building from source, you may encounter Supabase-related errors. The NuQ schema and worker setup described in this guide is specific to the self-hosted NuQ architecture. Pre-built images may expect different database tables or connection strings.


11. Maintenance

11.1 Update Firecrawl

# Navigate to deployment directory
cd /opt/firecrawl-vpn

# Pull latest images
docker compose pull

# Stop and remove old containers
docker compose down

# Start with new images
export DOCKER_BUILDKIT=1
docker compose up -d --remove-orphans

# Check logs for any migration issues
docker compose logs -f firecrawl-api

11.2 Backup Data

# Create backup directory
mkdir -p /opt/firecrawl-vpn-backups/$(date +%Y%m%d_%H%M%S)

# Backup PostgreSQL data
docker run --rm \
  -v firecrawl-vpn_firecrawl-postgres:/data/postgres \
  -v $(pwd)/backup:/data/backup \
  alpine tar czf /data/backup/postgres-backup-$(date +%Y%m%d).tar.gz -C /data/postgres .

# Backup Redis data
cp -r firecrawl-redis/* backup/redis-data/

# Backup configuration files
cp .env docker-compose.yaml settings.yml backup/

echo "Backup complete: $(pwd)/backup"

11.3 Restore Data

# Stop services
docker compose down

# Restore PostgreSQL
docker run --rm \
  -v firecrawl-vpn_firecrawl-postgres:/data/postgres \
  -v $(pwd)/backup:/data/backup \
  alpine sh -c "cd /data/postgres && tar xzf /data/backup/postgres-backup-YYYYMMDD.tar.gz"

# Restore Redis
cp backup/redis-data/* firecrawl-redis/

# Restart services
docker compose up -d

12. Useful Commands

Container Management

# View all container status
docker compose ps

# View logs for a specific service
docker compose logs -f firecrawl-api

# Restart a specific service
docker compose restart firecrawl-api

# Stop all services
docker compose down

# Stop and remove volumes (WARNING: deletes all data)
docker compose down -v

Debugging

# Execute shell inside a container
docker exec -it firecrawl-api sh
docker exec -it firecrawl-nuq-postgres psql -U firecrawl -d firecrawl

# Check network connectivity between containers
docker exec firecrawl-api ping redis
docker exec firecrawl-api ping nuq-postgres
docker exec firecrawl-api ping rabbitmq

# View container resource usage
docker stats

# Inspect Docker network
docker network inspect arrs

VPN-Specific

# Check Gluetun VPN connection status
docker logs firecrawl-gluetun | grep -i "public ip\|connected"

# Test if traffic is going through VPN
curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://api.ipify.org?format=json"}' | jq .

# List active WireGuard interfaces in Gluetun
docker exec firecrawl-gluetun wg show

13. Environment Variables Reference

Firecrawl API

Variable Description Default Required
FIRECRAWL_VERSION Firecrawl image version latest No
FIRECRAWL_BULL_AUTH_KEY Admin API key for BullMQ UI None Yes
FIRECRAWL_BASE_URL Base URL for the API http://localhost:3002 No
FIRECRAWL_USE_DB_AUTH Enable database persistence false Yes (set to true)
FIRECRAWL_CONCURRENCY Max concurrent scrape jobs 10 No
LOG_LEVEL Logging verbosity info No

Database

Variable Description Default Required
POSTGRES_USER PostgreSQL username firecrawl Yes
POSTGRES_PASSWORD PostgreSQL password None Yes
POSTGRES_DB Database name firecrawl No
NUQ_POSTGRES_URL Full connection string to NuQ PostgreSQL None Yes

Redis

Variable Description Default Required
REDIS_URL Redis connection URL redis://redis:6379 Yes
REDIS_RATE_LIMIT_URL Redis rate limiting URL redis://redis:6379 Yes

RabbitMQ (NuQ)

Variable Description Default Required
NUQ_RABBITMQ_URL RabbitMQ connection URL amqp://rabbitmq:5672 Yes

AI / LLM

Variable Description Default Required
OPENAI_API_KEY OpenAI-compatible API key None Conditional
OPENAI_BASE_URL OpenAI-compatible endpoint URL https://api.openai.com/v1 No

Gluetun VPN

Variable Description Default Required
VPN_PROVIDER VPN provider name protonvpn Yes
VPN_TYPE VPN protocol wireguard Yes
PROTONVPN_USER ProtonVPN username None Yes
PROTONVPN_PASS ProtonVPN password None Yes
SERVER_COUNTRIES Preferred VPN server country US No
FIREWALL_ENABLED Enable outbound firewall yes Yes
FIREWALL_OUTBOUND_PORTS Allowed outbound ports 80,443 No
FIREWALL_OUTBOUND_SUBNETS Internal networks to allow Must match Docker subnet Yes

SearXNG

Variable Description Default Required
SEARXNG_SECRET SearXNG secret key for sessions None Yes
SEARXNG_IMAGE_PROXY Enable image proxy true No
SEARXNG_PORT SearXNG listening port 8085 No

14. Architecture Diagrams

Network Architecture

Internet
    |
    v
[Gluetun VPN Tunnel]
    |
    +-- HTTP Proxy (port 8888)
            |
            +-- firecrawl-api (networks: arrs, uses PROXY_SERVER=http://gluetun:8888)
            |       |
            |       +-- playwright-service (networks: arrs, uses PROXY_SERVER=http://gluetun:8888)
            |       |
            |       +-- searxng (networks: arrs, web traffic via same proxy)
            |
[arrs Docker Network] <--- All services share this network
        |
        +-- redis
        +-- nuq-postgres
        +-- rabbitmq

Service Communication Flow

User Request
     │
     ▼
Firecrawl API (on arrs network, uses PROXY_SERVER=http://gluetun:8888)
     │
     ├──► Redis (redis:6379) ──────► Session/Rate Limiting
     │
     ├──► NuQ PostgreSQL (nuq-postgres:5432) ──► Store Scraped Data
     │
     ├──► RabbitMQ (rabbitmq:5672) ──────► Job Queue
     │       │
     │       ▼
     │   Extract Worker / Crawl Worker
     │
     ├──► Playwright Service (playwright-service:3000) ──► Browser Rendering
     │
     └──► SearXNG (searxng:8080) ──► Search Results

Resource Requirements Per Service

Service CPU RAM Disk
Gluetun 0.1 core 100 MB Minimal
Firecrawl API 0.5 core 500 MB Minimal
Playwright 0.5 core 500 MB Minimal
SearXNG 0.2 core 200 MB Minimal
Redis 0.1 core 256 MB Depends on data
PostgreSQL 0.5 core 512 MB Depends on data
RabbitMQ 0.1 core 128 MB Depends on queue size

Total Minimum: 2 cores, 2.2 GB RAM, 20 GB disk
Total Recommended: 4 cores, 8 GB RAM, 50 GB SSD


Quick Reference Card

┌─────────────────────────────────────────────────────────────┐
│                    Quick Start Checklist                      │
├─────────────────────────────────────────────────────────────┤
│ □ Install Docker, Docker Compose V2, Buildx                  │
│ □ Create Docker network: docker network create --subnet ...  │
│ □ Get ProtonVPN credentials                                  │
│ □ Create .env file with your credentials                     │
│ □ Create docker-compose.yaml (Section 6)                     │
│ □ Create settings.yml for SearXNG                            │
│ □ Start services: docker compose up -d                       │
│ □ Apply NuQ schema if needed (Section 7)                     │
│ □ Verify with curl tests (Section 9)                         │
│ □ Check VPN is working (Section 9.7)                         │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    Common URLs                                │
├─────────────────────────────────────────────────────────────┤
│ Firecrawl API:    http://<host>:3002/v1/scrape              │
│ Admin UI:         http://<host>:3002/admin/<key>/queues     │
│ SearXNG:          http://<host>:8085/search                 │
│ RabbitMQ Mgmt:    http://<host>:15672 (default guest/guest) │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    Critical Settings                          │
├─────────────────────────────────────────────────────────────┤
│ FIREWALL_OUTBOUND_SUBNETS must match Docker network subnet  │
│ Gluetun MUST have: networks: - arrs                         │
│ USE_DB_AUTHENTICATION should be "false" for local/self-hosted│
│ NUQ_RABBITMQ_URL must be set for extract workers            │
│ NuQ schema must be applied if PostgreSQL volume exists      │
│ Use PROXY_SERVER=http://gluetun:8888 for VPN routing        │
└─────────────────────────────────────────────────────────────┘