GitHub - ryanbr/network-scanner: A network scanner to output in various adblock formats

A Puppeteer-based tool for scanning websites to find third-party (or optionally first-party) network requests matching specified patterns, and generate Adblock-formatted rules.

Features

Scan websites and detect matching third-party or first-party resources
Output Adblock-formatted blocking rules
Support for multiple filters per site
Grouped titles (! ) before site matches, including redirect source and matching regex
Ignore unwanted domains (global and per-site)
Block unwanted domains during scan (simulate adblock)
Support Chrome, Firefox, Safari user agents (desktop or mobile)
Advanced fingerprint spoofing and referrer header simulation
Delay, timeout, reload options per site
Verbose and debug modes
Dump matched full URLs into matched_urls.log
Save output in normal Adblock format or localhost (127.0.0.1/0.0.0.0)
Subdomain handling (collapse to root or full subdomain)
Optionally match only first-party, third-party, or both
Enhanced redirect handling with JavaScript and meta refresh detection
Capture and drive popup/popunder chains (capture_popups + interact_popups) so domains reachable only via a clicked popup still match
Per-site proxy routing (SOCKS5, SOCKS4, HTTP, HTTPS) with pre-flight health checks

Command Line Arguments

Output Options

Argument	Description
`-o, --output <file>`	Output file for rules. If omitted, prints to console
`--compare <file>`	Remove rules that already exist in this file before output
`--no-color, --no-colour`	Disable colored console output (colors enabled by default)
`--append`	Append new rules to output file instead of overwriting (requires `-o`)

Output Format Options

Argument	Description
`--localhost[=IP]`	Output as `IP domain.com` (default: 127.0.0.1)
	Examples: `--localhost`, `--localhost=0.0.0.0`, `--localhost=192.168.1.1`
`--plain`	Output just domains (no adblock formatting)
`--dnsmasq`	Output as `local=/domain.com/` (dnsmasq format)
`--dnsmasq-old`	Output as `server=/domain.com/` (dnsmasq old format)
`--unbound`	Output as `local-zone: "domain.com." always_null` (unbound format)
`--privoxy`	Output as `{ +block } .domain.com` (Privoxy format)
`--pihole`	Output as `(^\|\\.)domain\\.com$` (Pi-hole regex format)
`--adblock-rules`	Generate adblock filter rules with resource type modifiers (requires `-o`)

General Options

Argument	Description
`--debug`	Force debug mode globally
`--silent`	Suppress normal console logs
`--titles`	Add `! <url>` title before each site's group
`--dumpurls`	Dump matched URLs into matched_urls.log
`--remove-tempfiles`	Remove Chrome/Puppeteer temporary files before exit
`--compress-logs`	Compress log files with gzip (requires `--dumpurls`)
`--sub-domains`	Output full subdomains instead of collapsing to root
`--no-interact`	Disable page interactions globally
`--ghost-cursor`	Use ghost-cursor Bezier mouse movements globally (requires `npm i ghost-cursor`)
`--custom-json <file>`	Use a custom config JSON file instead of config.json
`--headful`	Launch browser with GUI (not headless)
`--keep-open`	Keep browser and tabs open after scan completes (use with `--headful` for debugging)
`--use-puppeteer-core`	Use `puppeteer-core` with system Chrome instead of bundled Chromium
`--use-obscura`	Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth
`--load-extension <path>`	Load unpacked Chrome extension from directory (can be used multiple times)
`--dns-cache`	Persist dig/whois results to disk between runs (dig 20hr / whois 36hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), plus the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly.
`--no-dns-precheck`	Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the dig/whois cache TTL) skip their c-ares pre-check via a positive-resolution index.
`--block-ads=<files>`	Block ads/trackers during the scan using EasyList-format filter list(s) (`\|\|domain^`, `/ads/*`, etc.). Comma-separated for multiple: `--block-ads=easylist.txt,easyprivacy.txt`. See Blocking ads during the scan.
`--adblock-engine=<js\|rust>`	Matcher backend for `--block-ads` (default: `js`). `rust` uses Brave's `adblock-rs` (much faster on large lists) and requires `npm i adblock-rs`.
`--cdp`	Enable Chrome DevTools Protocol logging (now per-page if enabled)
`--remove-dupes`	Remove duplicate domains from output (only with `-o`)
`--dry-run`	Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules
`--eval-on-doc`	Globally enable evaluateOnNewDocument() for Fetch/XHR interception
`--help`, `-h`	Show this help menu
`--version`	Show script version
`--max-concurrent <number>`	Maximum concurrent site processing (1-50, overrides config/default)
`--dns <ip[,ip,...]>`	Resolver(s) for the DNS pre-check and nettools' `dig` (one pins, several rotate per query; overrides `/etc/resolv.conf`). Does not affect Chrome navigation or `whois`. Useful when the system resolver is flaky and `dig`-gated domains time out
`--show-dead-domains`	At end of scan, list hostnames that did not resolve / were unreachable (`NXDOMAIN`/`ENODATA` + `ERR_NAME_NOT_RESOLVED`/`ERR_ADDRESS_UNREACHABLE`). Excludes blocks/timeouts (those mean the domain is alive). For pruning dead URLs.
`--cleanup-interval <number>`	Browser restart interval in URLs processed (1-1000, overrides config/default)

Validation Options

Argument	Description
`--cache-requests`	Cache HTTP requests to avoid re-requesting same URLs within scan
`--validate-config`	Validate config.json file and exit
`--validate-rules [file]`	Validate rule file format (uses --output/--compare files if no file specified)
`--clean-rules [file]`	Clean rule files by removing invalid lines and optionally duplicates (uses --output/--compare files if no file specified)
`--test-validation`	Run domain validation tests and exit
`--clear-cache`	Clear persistent cache before scanning (improves fresh start performance)
`--ignore-cache`	Bypass all smart caching functionality during scanning

Blocking ads during the scan

--block-ads makes the scanner block matching requests during the scan (separate from capturing rules) — to keep ad/tracker noise out of the page, speed up loads, or test that a list catches what it should.

Adding lists. Pass one or more EasyList-format filter lists (same syntax as uBlock Origin / EasyList):

# Single list
node nwss.js --block-ads=easylist.txt

# Multiple lists — comma-separated, no spaces
node nwss.js --block-ads=easylist.txt,easyprivacy.txt,mylist.txt

Lists are plain-text network rules — ||doubleclick.net^, /ads/*, ||example.com^$script, etc. Element-hiding/cosmetic rules (##…) don't apply to request blocking and are ignored. The scanned page's own top-level document is never blocked (only sub-resources), so a site whose own domain is in a list still loads.

Engine — js vs rust (--adblock-engine, default js):

Engine	Flag	Backend	When
js (default)	`--adblock-engine=js`	`lib/adblock.js` — pure-JS, no extra deps	Default; fine for small/medium lists, works everywhere
rust	`--adblock-engine=rust`	`lib/adblock-rust.js` — Brave's `adblock-rs`	Large lists (full EasyList + EasyPrivacy + …); much faster matching. Drop-in (same rules, same results). Requires `npm install adblock-rs` (needs a Rust toolchain)

The two engines are interchangeable — same rule format, same blocking result; rust is purely a speed option for big lists. If you pass --adblock-engine=rust without adblock-rs installed, install it (npm i adblock-rs) or drop the flag to use js.

# Fast matching over big lists with the Rust engine
npm install adblock-rs
node nwss.js --block-ads=easylist.txt,easyprivacy.txt --adblock-engine=rust

config.json Format

Example:

{
  "ignoreDomains": [
    "googleapis.com",
    "googletagmanager.com"
  ],
  "ignoreDomainsByUrl": [
    "\\/jwplayer\\/"
  ],
  "blockDomainsByUrl": [
    "\\/tracker\\/"
  ],
  "sites": [
    {
      "url": "https://example.com/",
      "userAgent": "chrome",
      "filterRegex": "ads|analytics",
      "resourceTypes": ["script", "xhr", "image"],
      "reload": 2,
      "delay": 5000,
      "timeout": 30000,
      "verbose": 1,
      "debug": 1,
      "interact": true,
      "fingerprint_protection": "random",
      "referrer_headers": {
        "mode": "random_search",
        "search_terms": ["example reviews", "best deals"]
      },
      "custom_headers": {
        "X-Custom-Header": "value"
      },
      "firstParty": 0,
      "thirdParty": 1,
      "subDomains": 0,
      "blocked": [
        "googletagmanager.com",
        ".*tracking.*"
      ]
    }
  ]
}

config.json Field Table

Basic Configuration

Field	Values	Default	Description
`url`	String or Array	-	Website URL(s) to scan
`userAgent`	`chrome`, `chrome_mac`, `chrome_linux`, `firefox`, `firefox_mac`, `firefox_linux`, `safari`	-	User agent for page
`filterRegex`	String or Array	`.*`	Regex or list of regexes to match requests
`regex_and`	Boolean	`false`	Use AND logic for multiple filterRegex patterns - ALL patterns must match the same URL
`output_regex`	String	—	Regex applied to each matched URL to build the rule body: capture group 1 (or whole match) becomes `\|\|<capture>` instead of `\|\|host^`. E.g. `^https?:\/\/([^\/]+\/[^\/]+\/)` turns `https://host.com/script/abc.js` into `\|\|host.com/script/`. The capture must include the host. No match → falls back to `\|\|host^`. Adblock-only; domain formats (dnsmasq/pihole/hosts/plain) emit the bare host
`comments`	String or Array	-	String of comments or references
`resourceTypes`	Array	`["script", "xhr", "image", "stylesheet"]`	What resource types to monitor
`reload`	Integer	`1`	Number of times to reload page
`delay`	Milliseconds	`4000`	Wait time after loading/reloading
`timeout`	Milliseconds	`30000`	Timeout for page load
`verbose`	`0` or `1`	`0`	Enable verbose output per site
`debug`	`0` or `1`	`0`	Dump matching URLs for the site
`interact`	`true` or `false`	`false`	Simulate user interaction (hover, click)
`firstParty`	`0` or `1`	`0`	Match first-party requests
`thirdParty`	`0` or `1`	`1`	Match third-party requests
`subDomains`	`0` or `1`	`0`	1 = preserve subdomains in output
`blocked`	Array	-	Domains or regexes to block during scanning
`even_blocked`	Boolean	`false`	Add matching rules even if requests are blocked
`bypass_cache`	Boolean	`false`	Skip all caching for this site's URLs
`window_cleanup`	Boolean or String	`false`	Close old/unused browser windows/tabs after entire URL group completes
`window_cleanup_threshold`	Integer	`8`	For `"realtime"` mode: max pages to keep open before cleanup

Window cleanup modes: false (disabled), true (conservative - closes obvious leftovers), "realtime" (continuously cleanup oldest pages when threshold exceeded), "all" (aggressive - closes all content pages after group). Both active modes preserve the main Puppeteer window and wait 16 seconds before cleanup to avoid interfering with active operations.

Redirect Handling Options

Field	Values	Default	Description
`max_redirects`	Integer	`10`	Maximum number of redirects to follow (`0` = follow none)
`js_redirect_timeout`	Milliseconds	`5000`	Time to wait for JavaScript redirects
`detect_js_patterns`	Boolean	`true`	Analyze page source for redirect patterns
`redirect_timeout_multiplier`	Number	`1.5`	Increase timeout for redirected URLs

When a page redirects to a new domain, first-party/third-party detection is based on the final redirected domain, and all intermediate redirect domains (like bit.ly, t.co) are automatically excluded from the generated rules.

Advanced Stealth & Fingerprinting

Field	Values	Default	Description
`fingerprint_protection`	`true`, `false`, `"random"`	`false`	Enable navigator/device spoofing
`referrer_headers`	String, Array, or Object	-	Set referrer header for realistic traffic sources
`custom_headers`	Object	-	Add custom HTTP headers to requests

Referrer Header Options

Simple formats:

"referrer_headers": "https://google.com/search?q=example"
"referrer_headers": ["url1", "url2"]

Smart modes:

"referrer_headers": {"mode": "random_search", "search_terms": ["reviews"]}
"referrer_headers": {"mode": "social_media"}
"referrer_headers": {"mode": "direct_navigation"}
"referrer_headers": {"mode": "custom", "custom": ["https://news.ycombinator.com/"]}

Disable referrer for specific URLs:

"referrer_disable": ["https://example.com/no-ref", "sensitive-site.com"]

Protection Bypassing

Field	Values	Default	Description
`cloudflare_phish`	Boolean	`false`	Auto-click through Cloudflare phishing warnings
`cloudflare_bypass`	Boolean	`false`	Auto-solve Cloudflare "Verify you are human" challenges
`cloudflare_parallel_detection`	Boolean	`true`	Use parallel detection for faster Cloudflare checks
`cloudflare_max_retries`	Integer	`3`	Maximum retry attempts for Cloudflare operations
`cloudflare_cache_ttl`	Milliseconds	`300000`	TTL for Cloudflare detection cache (5 minutes)
`cloudflare_retry_on_error`	Boolean	`true`	Enable retry logic for Cloudflare operations
`flowproxy_detection`	Boolean	`false`	Enable flowProxy protection detection and handling
`flowproxy_page_timeout`	Milliseconds	`45000`	Page timeout for flowProxy sites
`flowproxy_nav_timeout`	Milliseconds	`45000`	Navigation timeout for flowProxy sites
`flowproxy_js_timeout`	Milliseconds	`15000`	JavaScript challenge timeout
`flowproxy_delay`	Milliseconds	`30000`	Delay for rate limiting
`flowproxy_additional_delay`	Milliseconds	`5000`	Additional processing delay

WHOIS/DNS Analysis Options

Field	Values	Default	Description
`whois`	Array	-	Check whois data for ALL specified terms (AND logic)
`whois-or`	Array	-	Check whois data for ANY specified term (OR logic)
`whois_delay`	Integer	`3000`	Delay whois requests to avoid throttling
`whois_server`	String or Array	-	Custom whois server(s) - single server or randomized list
`whois_server_mode`	String	`"random"`	Server selection mode: `"random"` or `"cycle"`
`whois_max_retries`	Integer	`2`	Maximum retry attempts per domain
`whois_timeout_multiplier`	Number	`1.5`	Timeout increase multiplier per retry
`whois_use_fallback`	Boolean	`true`	Add TLD-specific fallback servers
`whois_retry_on_timeout`	Boolean	`true`	Retry on timeout errors
`whois_retry_on_error`	Boolean	`true`	Retry on connection/other errors
`dig`	Array	-	Check dig output for ALL specified terms (AND logic)
`dig-or`	Array	-	Check dig output for ANY specified term (OR logic)
`dig_subdomain`	Boolean	`false`	Use subdomain for dig lookup instead of root domain
`digRecordType`	String	`"A"`	DNS record type for dig (A, CNAME, MX, etc.)

Content Analysis Options

Field	Values	Default	Description
`searchstring`	String or Array	-	Text to search in response content (OR logic)
`searchstring_and`	String or Array	-	Text to search with AND logic - ALL terms must be present
`curl`	Boolean	`false`	Use curl to download content for analysis
`grep`	Boolean	`false`	Use grep instead of JavaScript for pattern matching (requires curl=true)

Advanced Browser Options

Field	Values	Default	Description
`goto_options`	Object	`{"waitUntil": "load"}`	Custom page.goto() options
`clear_sitedata`	Boolean	`false`	Clear all cookies, cache, storage before each load
`forcereload`	Boolean or Array	`false`	Force cache-clearing reload for all URLs (`true`) or specific domains (`["domain1.com"]`)
`isBrave`	Boolean	`false`	Spoof Brave browser detection
`evaluateOnNewDocument`	Boolean	`false`	Inject fetch/XHR interceptor in page
`cdp`	Boolean	`false`	Enable CDP logging for this site
`cdp_specific`	Array	-	Enable CDP logging only for specific domains in the URL list
`css_blocked`	Array	-	CSS selectors to hide elements
`source`	Boolean	`false`	Save page source HTML after load
`screenshot`	Boolean	`false`	Capture screenshot on load failure
`headful`	Boolean	`false`	Launch browser with GUI for this site
`localhost`	String	-	Force custom IP format for this site (e.g., "127.0.0.1", "0.0.0.0", "192.168.1.1")
`adblock_rules`	Boolean	`false`	Generate adblock filter rules with resource types for this site
`interact_duration`	Milliseconds	`2000`	Duration of interaction simulation
`interact_scrolling`	Boolean	`true`	Enable scrolling simulation
`interact_clicks`	Boolean	`false`	Enable element clicking simulation
`interact_click_count`	Integer	`3`	Number of random content-zone clicks per load (capped at 20). Default 3 = primary + 2 backups, since ad SDKs sometimes suppress the 1st/2nd click as warmup
`realistic_click`	Boolean	`false`	Higher click fidelity: denser mouse approach (15 steps), ±1px hand-tremor micro-moves during the press, and ±1.5px mouseup drift (so mousedown≠mouseup coords) — for sites that score click realism. Costs ~80–120ms/click
`interact_typing`	Boolean	`false`	Enable typing simulation
`interact_intensity`	String	`"medium"`	Interaction simulation intensity: "low", "medium", "high"
`cursor_mode`	`"ghost"`	-	Use ghost-cursor Bezier mouse movements (requires `npm i ghost-cursor`)
`ghost_cursor_speed`	Number	auto	Ghost-cursor movement speed multiplier
`ghost_cursor_hesitate`	Milliseconds	`50`	Delay before ghost-cursor clicks
`ghost_cursor_overshoot`	Pixels	auto	Max ghost-cursor overshoot distance before correcting
`ghost_cursor_duration`	Milliseconds	`interact_duration` or `2000`	How long ghost-cursor movements run
`dnsmasq`	Boolean	`false`	Force dnsmasq output for this site
`dnsmasq_old`	Boolean	`false`	Force dnsmasq old format output for this site
`unbound`	Boolean	`false`	Force unbound output for this site
`privoxy`	Boolean	`false`	Force Privoxy output for this site
`pihole`	Boolean	`false`	Force Pi-hole regex output for this site
`ignore_similar`	Boolean	-	Override global `ignore_similar` setting for this site
`ignore_similar_threshold`	Integer	-	Override global similarity threshold for this site
`ignore_similar_ignored_domains`	Boolean	-	Override global `ignore_similar_ignored_domains` for this site

Popup Capture Options

Capture (and optionally drive) the popup/popunder windows that ad and redirect scripts open, so domains reachable only via a popup chain still match filterRegex. The same filterRegex applies to the whole chain — it must contain every pattern you expect along it. Popup capture only fires when the main page is actually clicking, so set interact: true and interact_clicks: true as well.

Field	Values	Default	Description
`capture_popups`	Boolean	`false`	Capture popup windows opened during the scan and evaluate their landing URL + in-popup requests against `filterRegex`/`dig`/`whois` (requires `interact` + `interact_clicks` to fire user-gesture clicks)
`interact_popups`	Boolean	`false`	Mouse-click inside captured popups (3 content-zone clicks) so the chain cascades to its next redirect/ad. Requires `capture_popups`. Clicks popups up to `capture_popups_max_depth − 1` (the deepest captured popup is observed, not clicked)
`capture_popups_max_depth`	Integer	`4`	Max popup-chain depth to capture (`site → p1 → p2 → p3 → destination`). Each extra level multiplies popups + time
`capture_popups_window_ms`	Integer	`5000`	Per-popup capture window (ms) before the popup is auto-closed

VPN Options

Route traffic through a VPN for specific sites. Requires sudo privileges. The VPN connection is established before scanning and torn down after the site completes.

Note: VPN modifies system-level routing. During concurrent scanning, all traffic routes through the active tunnel — not just the site that requested it. For isolated per-site VPN, run sites sequentially or use the same VPN config for all concurrent sites.

WireGuard

Field	Values	Default	Description
`vpn`	String or Object	-	WireGuard VPN configuration
`vpn.config`	String	-	Path to `.conf` file or interface name in `/etc/wireguard/`
`vpn.config_inline`	String	-	Inline WireGuard config content
`vpn.interface`	String	auto	Interface name (auto-derived from config filename)
`vpn.health_check`	Boolean	`true`	Ping through tunnel to verify connectivity
`vpn.test_host`	String	`"1.1.1.1"`	Host to ping for health check
`vpn.retry`	Boolean	`true`	Retry on connection failure
`vpn.max_retries`	Integer	`2`	Maximum retry attempts

OpenVPN

Field	Values	Default	Description
`openvpn`	String or Object	-	OpenVPN configuration
`openvpn.config`	String	-	Path to `.ovpn` file
`openvpn.config_inline`	String	-	Inline OpenVPN config content
`openvpn.name`	String	auto	Connection name (auto-derived from config filename)
`openvpn.username`	String	-	VPN username (written to secure temp file)
`openvpn.password`	String	-	VPN password (written to secure temp file)
`openvpn.auth_file`	String	-	Path to existing auth credentials file
`openvpn.health_check`	Boolean	`true`	Ping through tunnel to verify connectivity
`openvpn.test_host`	String	`"1.1.1.1"`	Host to ping for health check
`openvpn.retry`	Boolean	`true`	Retry on connection failure
`openvpn.max_retries`	Integer	`2`	Maximum retry attempts
`openvpn.connect_timeout`	Milliseconds	`30000`	Timeout for connection establishment
`openvpn.extra_args`	Array	-	Additional OpenVPN command line arguments
`openvpn.verbosity`	String	`"3"`	OpenVPN log verbosity level

Authentication: If the .ovpn file already contains credentials (via auth-user-pass /path/to/file or an inline <auth-user-pass> block), no additional config is needed — just provide the config path. The username/password fields are only needed when the .ovpn file has a bare auth-user-pass directive that expects interactive input.

Proxy Options

Route traffic through a proxy for specific sites. Supports SOCKS5, SOCKS4, HTTP, and HTTPS proxies. Unlike VPN, proxy routing is per-site-group — only URLs in the same config block use the proxy; other sites connect directly.

Note: Chromium's --proxy-server flag is browser-wide. Sites requiring different proxies (or direct vs proxied) are automatically separated into different browser instances. Tasks are sorted so proxy groups are contiguous to minimise restarts.

Field	Values	Default	Description
`proxy`	String	-	Proxy URL: `socks5://host:port`, `http://host:port`, `https://host:port`, or `http://user:pass@host:port`
`proxy_bypass`	Array	`[]`	Domains that skip the proxy (e.g. `["localhost", "127.0.0.1", "*.local"]`)
`proxy_remote_dns`	Boolean	`true`	Resolve DNS through the proxy (SOCKS only — prevents DNS leaks)
`proxy_debug`	Boolean	`false`	Print proxy diagnostics: launch args, auth, health checks, error codes

Legacy aliases (socks5_proxy, socks5_bypass, socks5_remote_dns, socks5_debug) are supported for backwards compatibility.

Proxy Examples

SOCKS5 — no auth:

{
  "url": ["https://blocked-site.com/", "https://another-blocked.com/"],
  "proxy": "socks5://127.0.0.1:1080",
  "search_string": ["tracking.js"]
}

HTTP proxy with credentials:

{
  "url": ["https://geo-restricted.com/"],
  "proxy": "http://user:pass@proxy.corp.com:3128",
  "search_string": ["analytics"]
}

SOCKS5 with bypass list and debug:

{
  "url": ["https://target-site.com/"],
  "proxy": "socks5://user:pass@proxy.example.com:9050",
  "proxy_bypass": ["localhost", "127.0.0.1", "*.internal.corp"],
  "proxy_remote_dns": true,
  "proxy_debug": true,
  "search_string": ["tracker"]
}

Mixed direct + proxied in one config:

[
  {
    "url": ["https://direct-site.com/"],
    "search_string": ["ads"]
  },
  {
    "url": ["https://blocked-site.com/"],
    "proxy": "socks5://127.0.0.1:1080",
    "search_string": ["ads"]
  }
]

Proxy Error Handling

If a proxy is unreachable, the batch is skipped with a clear error before any navigation is attempted:

[error] [proxy] Unreachable: socks5://127.0.0.1:1080 — Connection refused
[error] [proxy] Skipping 5 URL(s) in this batch

If a proxy fails mid-scan, Chromium's error code is detected and diagnosed:

[error] [proxy] ERR_SOCKS_CONNECTION_FAILED — proxy: socks5://127.0.0.1:1080 — URL: https://example.com/
[error] [proxy] Check: is the proxy running? Are credentials correct? Is the target reachable from the proxy?

Detected error codes: ERR_PROXY_CONNECTION_FAILED, ERR_SOCKS_CONNECTION_FAILED, ERR_TUNNEL_CONNECTION_FAILED, ERR_PROXY_AUTH_UNSUPPORTED, ERR_PROXY_AUTH_REQUESTED, ERR_SOCKS_CONNECTION_HOST_UNREACHABLE, ERR_PROXY_CERTIFICATE_INVALID, ERR_NO_SUPPORTED_PROXIES.

.nwssconfig — Per-Config Settings

Create a .nwssconfig file in the project root to define CLI settings per config file. When a config filename matches a key, those settings are automatically applied. CLI flags merge with and override .nwssconfig settings.

{
  "configs": {
    "config-clean1.json": {
      "output": "outputfile.txt",
      "max_concurrent": 30,
      "dns_cache": true,
      "cache_requests": true,
      "dumpurls": true,
      "remove_tempfiles": true,
      "color": true
    },
    "config-clean2.json": {
      "output": "outputfile.txt",
      "max_concurrent": 15,
      "dns_cache": true,
      "cache_requests": true,
      "dumpurls": true,
      "remove_tempfiles": true,
      "color": true,
      "debug": true,
      "block_ads": "easylist.txt,easyprivacy.txt"
    }
  }
}

Usage:

node nwss.js config-clean1.json                    # uses .nwssconfig settings
node nwss.js config-clean2.json --debug             # .nwssconfig + debug override
node nwss.js config-other.json --max-concurrent 5   # no match in .nwssconfig, uses CLI flags

Supported settings: output, max_concurrent, dns_cache, cache_requests, dumpurls, remove_tempfiles, color, remove_dupes, compress_logs, debug, silent, verbose, headful, keep_open, dry_run, titles, sub_domains, no_interact, ghost_cursor, plain, cdp, dnsmasq, unbound, privoxy, pihole, eval_on_doc, use_puppeteer_core, use_obscura, ignore_cache, clear_cache, block_ads, compare, localhost, append.

Priority: CLI flags > .nwssconfig > hardcoded defaults.

Global Configuration Options

These options go at the root level of your config.json:

Field	Values	Default	Description
`ignoreDomains`	Array	-	Domains to completely ignore (supports wildcards like `*.ads.com`). Subdomains of any listed entry are also ignored via parent-walk (e.g. `example.com` ignores `cdn.example.com` and `a.b.example.com`).
`ignoreDomainsByUrl`	Array	-	Regex patterns; if a request URL matches, the request's root domain is dynamically ignored for the rest of the scan AND any subsequent request to its subdomains (cascade matches the static `ignoreDomains` semantic). Example: `["\\/jwplayer\\/", "\\/build\\/assets\\/"]`
`blockDomainsByUrl`	Array	-	Symmetric to `ignoreDomainsByUrl` but for active blocking. Regex patterns; if a request URL matches, the request's root domain is added to a dynamic block set and ALL subsequent requests on that root (and subdomains) are aborted via Puppeteer for the rest of the scan. The triggering request itself is also aborted. Use when seeing a trigger URL is sufficient evidence the whole host is hostile.
`blocked`	Array	-	Global regex patterns to block requests (combined with per-site blocked). Patterns that fail to compile are warned about at scan start (`[config] blocked (global) pattern dropped (compile error): ...`) instead of crashing startup or silently disappearing. Per-pattern hit counts are reported at scan end via `[blocked-stats]` lines so stale patterns are easy to spot.
`whois_server_mode`	String	`"random"`	Default server selection mode for all sites
`ignore_similar`	Boolean	`true`	Ignore domains similar to already found domains
`ignore_similar_threshold`	Integer	`80`	Similarity threshold percentage for ignore_similar
`ignore_similar_ignored_domains`	Boolean	`true`	Ignore domains similar to ignoreDomains list
`max_concurrent_sites`	Integer	`6`	Maximum concurrent site processing (1-50)
`resource_cleanup_interval`	Integer	`80`	Browser restart interval in URLs processed (1-1000)
`cache_path`	String	`".cache"`	Directory path for persistent cache storage
`cache_max_size`	Integer	`5000`	Maximum number of entries in cache
`cache_autosave_minutes`	Integer	`1`	Interval for automatic cache saves (minutes)
`cache_requests`	Boolean	`false`	Enable HTTP request response caching

Special Characters in searchstring

The searchstring parameter supports all characters including special symbols. Only double quotes need JSON escaping:

{
  "searchstring": [
    ")}return n}function N(n,e,r){try{\"function\"==typeof",
    "addEventListener(\"click\",function(){",
    "{\"status\":\"success\",\"data\":[",
    "console.log('Debug: ' + value);",
    "`API endpoint: ${baseUrl}/users`",
    "@media screen and (max-width: 768px)",
    "if(e&&e.preventDefault){e.preventDefault()}",
    "__webpack_require__(/*! ./module */ \"./src/module.js\")",
    "console.log('Hello world')",
    "#header { background-color: #ff0000; }",
    "$(document).ready(function() {",
    "completion: 85% @ $1,500 budget",
    "SELECT * FROM users WHERE id = *",
    "regex: ^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$",
    "typeof window !== 'undefined'"
  ]
}

Character escaping rules:

" becomes \" (required in JSON)
\ becomes \\ (if searching for literal backslashes)
All other characters are used literally: ' ` @ # $ % * ^ [ ] { } ( ) ; = ! ? :

Usage Examples

Basic Scanning

# Scan with default config and output to console
node nwss.js

# Scan and save rules to file
node nwss.js -o blocklist.txt

# Append new rules to existing file
node nwss.js --append -o blocklist.txt

# Clean existing rules and append new ones
node nwss.js --clean-rules --append -o blocklist.txt

Advanced Options

# Debug mode with URL dumping and colored output
node nwss.js --debug --dumpurls --color -o rules.txt

# Dry run to see what would be matched
node nwss.js --dry-run --debug

# Validate configuration before running
node nwss.js --validate-config

# Clean rule files
node nwss.js --clean-rules existing_rules.txt

# Maximum stealth scanning
node nwss.js --debug --color -o stealth_rules.txt

Performance Tuning

# High-performance scanning with custom concurrency
node nwss.js --max-concurrent 12 --cleanup-interval 300 -o rules.txt

Stealth Configuration Examples

Memory Management with Window Cleanup

{
  "url": [
    "https://popup-heavy-site1.com",
    "https://popup-heavy-site2.com", 
    "https://popup-heavy-site3.com"
  ],
  "filterRegex": "\\.(space|website|tech)\\b",
  "window_cleanup": "all",
  "interact": true,
  "reload": 2,
  "resourceTypes": ["script", "fetch"],
  "comments": "Aggressive cleanup for sites that open many popups"
}

Conservative Memory Management

{
  "url": "https://complex-site.com",
  "filterRegex": "analytics|tracking",
  "window_cleanup": true,
  "interact": true,
  "delay": 8000,
  "reload": 3,
  "comments": [
    "Conservative cleanup preserves potentially active content",
    "Good for sites with complex iframe structures"
  ]
}

Ghost Cursor (Advanced Bezier Mouse)

{
  "url": "https://anti-bot-site.com",
  "interact": true,
  "interact_clicks": true,
  "cursor_mode": "ghost",
  "realistic_click": true,
  "interact_click_count": 3,
  "ghost_cursor_duration": 5000,
  "ghost_cursor_speed": 1.2,
  "fingerprint_protection": "random",
  "filterRegex": "tracking|analytics",
  "comments": "ghost-cursor uses Bezier curves with overshoot for realistic mouse paths"
}

Or enable globally via CLI:

node nwss.js --ghost-cursor --debug -o rules.txt

Ghost-cursor clicks. The cursor moves with cursor_mode: "ghost", but it only clicks when both interact: true and interact_clicks: true are set (same rule as the built-in path). Click behavior:

realistic_click: true — each press adds hand-tremor during the hold and a mouseup drift, so mousedown ≠ mouseup coordinates (the press is routed through the same humanClick the built-in content clicks use).
interact_click_count — number of clicks per load (default 3, capped at 20). The default of 3 matters because some ad SDKs swallow the 1st/2nd click as warmup.
Duration vs. clicks: realistic clicks take ~600–700ms each, and the bezier movement loop reserves up to half of ghost_cursor_duration for them. So the default ghost_cursor_duration: 2000 only fits ~1 click — raise it to roughly interact_click_count × 700 + movement (e.g. 5000–8000) to fit all of them.

Note: ghost-cursor is an optional dependency. Install with npm install ghost-cursor. If not installed, the scanner falls back to the built-in mouse simulation automatically.

E-commerce Site Scanning

{
  "url": "https://shopping-site.com",
  "userAgent": "chrome",
  "fingerprint_protection": "random",
  "referrer_headers": {
    "mode": "random_search",
    "search_terms": ["product reviews", "best deals", "price comparison"]
  },
  "interact": true,
  "delay": 6000,
  "filterRegex": "analytics|tracking|ads"
}

News Site Analysis

{
  "url": "https://news-site.com",
  "userAgent": "firefox",
  "fingerprint_protection": true,
  "referrer_headers": {"mode": "social_media"},
  "custom_headers": {
    "Accept-Language": "en-US,en;q=0.9"
  },
  "filterRegex": "doubleclick|googletagmanager"
}

Tech Blog with Custom Referrers

{
  "url": "https://tech-blog.com",
  "fingerprint_protection": "random",
  "referrer_headers": {
    "mode": "custom",
    "custom": [
      "https://news.ycombinator.com/",
      "https://www.reddit.com/r/programming/",
      "https://lobste.rs/"
    ]
  }
}

Scanning through OpenVPN

{
  "url": "https://geo-restricted-site.com",
  "openvpn": "/etc/openvpn/us-server.ovpn",
  "filterRegex": "tracking|analytics",
  "userAgent": "chrome",
  "fingerprint_protection": "random"
}

OpenVPN with Credentials

{
  "url": "https://region-locked-site.com",
  "openvpn": {
    "config": "/etc/openvpn/eu-server.ovpn",
    "username": "vpn_user",
    "password": "vpn_pass",
    "connect_timeout": 45000
  },
  "filterRegex": "ads|tracking"
}

Scanning through WireGuard

{
  "url": "https://another-site.com",
  "vpn": "/etc/wireguard/wg-us.conf",
  "filterRegex": "analytics",
  "userAgent": "firefox"
}

Memory Management

The scanner includes intelligent window management to prevent memory accumulation during long scans:

Conservative cleanup (window_cleanup: true): Selectively closes pages that appear to be leftovers from previous scans
Aggressive cleanup (window_cleanup: "all"): Closes all content pages from previous operations for maximum memory recovery
Main window preservation: Both modes always preserve the main Puppeteer browser window to maintain stability
Popup window handling: Automatically detects and closes popup windows created by previous site scans
Timing protection: 16-second delay ensures no active operations are interrupted during cleanup
Active page protection: Never affects pages currently being processed by concurrent scanning operations
Memory reporting: Reports estimated memory freed from closed windows for performance monitoring

Use aggressive cleanup for sites that open many popups or when processing large numbers of URLs. Use conservative cleanup when you want to preserve potentially active content but still free obvious leftovers.

INSTALL

(Ubuntu as example). NOTE: Use Chrome and not Chromium for best compatibility.

Add Google's signing key

wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg

Add Google Chrome repository

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list

Update and install

sudo apt update
sudo apt install google-chrome-stable

dig & whois (needed for network checks)

sudo apt install bind9-dnsutils whois

OpenVPN (optional, for per-site VPN routing)

sudo apt install openvpn

Grant passwordless sudo for OpenVPN operations:

sudo visudo -f /etc/sudoers.d/openvpn-nwss

Add:

your_username ALL=(root) NOPASSWD: /usr/sbin/openvpn, /usr/bin/kill, /usr/bin/pgrep, /usr/bin/pkill

On WSL2, load the TUN module (required each reboot):

sudo modprobe tun

WireGuard (optional, for per-site VPN routing)

sudo apt install wireguard

Grant passwordless sudo for WireGuard operations:

sudo visudo -f /etc/sudoers.d/wg-nwss

Add:

your_username ALL=(root) NOPASSWD: /usr/bin/wg-quick, /usr/bin/wg

Notes

If both firstParty: 0 and thirdParty: 0 are set for a site, it will be skipped.
ignoreDomains applies globally across all sites.
ignoreDomains supports wildcards (e.g., *.ads.com matches tracker.ads.com)
Blocking (blocked) can match full domains or regex.
If a site's blocked field is missing, no extra blocking is applied.
--clean-rules with --append will clean existing files first, then append new rules
--remove-dupes works with all output modes and removes duplicates from final output
Validation tools help ensure rule files are properly formatted before use
--remove-tempfiles removes Chrome/Puppeteer temporary files before exiting, avoids disk space issues
For maximum stealth, combine fingerprint_protection: "random" with appropriate referrer_headers modes
User agents are automatically updated to latest versions (Chrome 131, Firefox 133, Safari 18.2)
Referrer headers work independently from fingerprint protection - use both for best results
VPN connections (vpn/openvpn) are established before scanning and torn down after the site completes
If an .ovpn file contains embedded credentials, no additional auth config is needed in the JSON
VPN affects system-level routing — all concurrent scans will route through the active tunnel
Both vpn (WireGuard) and openvpn can be set, but vpn takes precedence
Ghost-cursor (cursor_mode: "ghost") is optional — install with npm i ghost-cursor. Falls back to built-in mouse if not installed
Ghost-cursor duration defaults to interact_duration (or 2000ms), capped by the 15s hard timeout

Stealth Testing

scripts/test-stealth.js is a developer-facing smoke test for the fingerprint spoofing stack in lib/fingerprint.js. It launches Puppeteer with the same applyAllFingerprintSpoofing call that nwss.js uses, navigates to public bot-detection pages, and reports what they concluded — pass/warn/fail counts from sannysoft, trust score from creepjs, raw navigator values from browserleaks.

Use it to A/B a stealth change: run before the edit, run after, diff the output. Not a unit test — it doesn't assert; it reports.

node scripts/test-stealth.js                  # all targets, human-readable
node scripts/test-stealth.js sannysoft        # one target only
node scripts/test-stealth.js --no-spoof       # baseline (spoof disabled)
node scripts/test-stealth.js --ua=firefox     # spoof a different UA family
node scripts/test-stealth.js --format=json    # machine-readable for diff/jq
node scripts/test-stealth.js --help           # full flag list

Set PUPPETEER_NO_SANDBOX=1 when running as root (CI containers, some Docker setups). Off by default so local dev doesn't silently drop the Chromium sandbox.

Name		Name	Last commit message	Last commit date
Latest commit History 1,061 Commits
.github/workflows		.github/workflows
lib		lib
regex-tool		regex-tool
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
JSONMANUAL.md		JSONMANUAL.md
LICENSE		LICENSE
README.md		README.md
config.json		config.json
eslint.config.mjs		eslint.config.mjs
nwss.1		nwss.1
nwss.js		nwss.js
package-lock.json		package-lock.json
package.json		package.json
regex-samples.md		regex-samples.md

Folders and files

Latest commit

History

Repository files navigation

Features

Command Line Arguments

Output Options

Output Format Options

General Options

Validation Options

Blocking ads during the scan

config.json Format

config.json Field Table

Basic Configuration

Redirect Handling Options

Advanced Stealth & Fingerprinting

Referrer Header Options

Protection Bypassing

WHOIS/DNS Analysis Options

Content Analysis Options

Advanced Browser Options

Popup Capture Options

VPN Options

WireGuard

OpenVPN

Proxy Options

Proxy Examples

Proxy Error Handling

.nwssconfig — Per-Config Settings

Global Configuration Options

Special Characters in searchstring

Usage Examples

Basic Scanning

Advanced Options

Performance Tuning

Stealth Configuration Examples

Memory Management with Window Cleanup

Conservative Memory Management

Ghost Cursor (Advanced Bezier Mouse)

E-commerce Site Scanning

News Site Analysis

Tech Blog with Custom Referrers

Scanning through OpenVPN

OpenVPN with Credentials

Scanning through WireGuard

Memory Management

INSTALL

(Ubuntu as example). NOTE: Use Chrome and not Chromium for best compatibility.

Add Google's signing key

Add Google Chrome repository

Update and install

dig & whois (needed for network checks)

OpenVPN (optional, for per-site VPN routing)

WireGuard (optional, for per-site VPN routing)

Notes

Stealth Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 65

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages