Core concepts

Providers

The 24 image sources webfetch federates — auth, rate limits, license defaults, and when to enable each one.

webfetch ships with 24 provider adapters. Each one normalizes a different source into the same ImageCandidate shape, tagged with a license and a confidence score. The DEFAULT_PROVIDERS set — 19 of the 24 — is optimized for "ship a safe image without human review". The other five are opt-in for ToS-grey, paid, or variant use cases.

#Provider matrix

Provider Coverage Default license Auth Rate limit Opt-in
wikimedia portraits, events, logos, history CC_BY_SA (from meta) none (UA required) 20/s no
openverse any CC-licensed content (50+ GLAMs) CC_BY (from meta) none 5/s no
unsplash high-quality photography CC0 (Unsplash License) UNSPLASH_ACCESS_KEY 1/s demo, 5000/hr prod no
pexels stock photography CC0 (Pexels License) PEXELS_API_KEY 3/s no
pixabay stock photos + illustrations CC0 (Pixabay License) PIXABAY_API_KEY 2/s no
itunes album covers, artist portraits EDITORIAL_LICENSED none 5/s no
musicbrainz-caa canonical album art EDITORIAL_LICENSED none (UA required) 1/s hard cap no
spotify artist + album images EDITORIAL_LICENSED SPOTIFY_CLIENT_ID/SECRET 10/s no
youtube-thumb video thumbnail by id/URL EDITORIAL_LICENSED none 20/s no
brave general web image search UNKNOWN (+heuristic) BRAVE_API_KEY 1/s no
bing general web image search UNKNOWN (+heuristic) BING_API_KEY 3/s yes
serpapi Google Images + reverse-image UNKNOWN (+heuristic) SERPAPI_KEY 2/s yes
browser headless fallback on images.google UNKNOWN (+heuristic) Playwright installed 0.25/s yes
flickr CC-licensed + PD photos CC_BY (from meta) FLICKR_API_KEY 3/s no
internet-archive PD / CC images PUBLIC_DOMAIN none 5/s no
smithsonian Open Access museum (100% CC0) CC0 SMITHSONIAN_API_KEY (DEMO_KEY) 1/s no
nasa NASA imagery (all public domain) PUBLIC_DOMAIN none 5/s no
met-museum The Met Open Access (CC0) CC0 none 4/s no
europeana European cultural heritage CC_BY (from meta) EUROPEANA_API_KEY 5/s no
library-of-congress US historical archive PUBLIC_DOMAIN none 10/s no
wellcome-collection medical/historical imagery CC_BY (from meta) none 5/s no
rawpixel CC0 slice of Rawpixel library CC0 optional 3/s no
burst Shopify Burst (100% CC0) CC0 none 3/s no
europeana-archival Europeana TEXT records CC_BY (from meta) EUROPEANA_API_KEY 5/s yes

#Default set

DEFAULT_PROVIDERS runs when you don't pass --providers. It excludes every ToS-grey source and everything with per-query cost. Included by default:

wikimedia, openverse, itunes, musicbrainz-caa, unsplash, pexels, pixabay,
spotify, brave, internet-archive, smithsonian, nasa, met-museum, flickr,
europeana, library-of-congress, wellcome-collection, rawpixel, burst

Providers that need auth gracefully skip when their keys are missing, so the default set always produces results — it just produces more results when you provision keys.

#When to enable opt-in providers

Provider Enable when
bing Your queries are English-web-heavy and Brave coverage is thin.
serpapi You need Google-quality recall and accept per-query cost (~$10/1k queries).
browser You explicitly accept ToS risk and have Playwright installed.
europeana-archival You're building editorial or historical layouts and want manuscript / newspaper scans.

Opt-in providers only run when the providers list mentions them AND, for browser, WEBFETCH_ENABLE_BROWSER=1 is set. See Browser layer for the consent model.

#Per-use-case tuning

#Musicians (artist pages, album pages)

webfetch artist "Taylor Swift" --kind portrait \
  --providers spotify,musicbrainz-caa,itunes,wikimedia

These four cover ~98% of releases with editorial-licensed or CC-BY-SA media. Spotify needs SPOTIFY_CLIENT_ID + SPOTIFY_CLIENT_SECRET.

#Editorial / journalism (portraits, events, landmarks)

webfetch search "shinzo abe state funeral" \
  --providers wikimedia,openverse,unsplash

Wikimedia carries news events and portraits with metadata-backed CC licensing. Openverse federates 50+ GLAM collections. Unsplash fills stylistic gaps.

#Stock photography (marketing, blog hero images)

webfetch search "modern kitchen interior" \
  --providers unsplash,pexels,pixabay --min-width 1600

CC0-equivalent platform licenses, no attribution required. Fastest pipeline of the three.

#Science / nature / reference

webfetch search "axolotl" --providers wikimedia,openverse

#Public-domain archival

webfetch search "moon landing" \
  --providers nasa,library-of-congress,internet-archive,smithsonian

#General web (last-resort coverage)

webfetch search "new product category xyz" \
  --providers wikimedia,openverse,brave --license prefer-safe

prefer-safe keeps unknown-license results in the output but ranks them below the safe ones.

#Gotchas

  • Wikimedia returns mixed licenses; coercion is 100% metadata-driven. We never default to "probably CC".
  • Openverse sets license_type=commercial,modification so results are commercial-safe by default. Results without metadata are dropped.
  • Unsplash / Pexels / Pixabay technically aren't CC0 — they use custom licenses that track CC0 terms. We map to CC0 with confidence 0.85 (vs 0.95 for real CC0).
  • iTunes / MusicBrainz CAA / Spotify tag as EDITORIAL_LICENSED. This means "OK as part of album/artist identification UI under platform ToS". Always display attribution.
  • Brave returns web images with no structured license; we upgrade via heuristicLicenseFromUrl (Unsplash host → CC0, Commons host → CC-BY-SA pending verification, etc).
  • Bing — Microsoft retired the classic Bing Search API mid-2025. This adapter still targets v7. Callers may need to swap endpoints.
  • SerpAPI — opt-in because it's a paid relay. Most valuable for find_similar (Google reverse image search).
  • browser — only runs when WEBFETCH_ENABLE_BROWSER=1 AND "browser" is in providers. Every result is flagged viaBrowserFallback: true; downstream code can refuse to ship it.
  • Smithsonian — DEMO_KEY is fine for dev but hits a 30/hr rate limit under real load. Provision SMITHSONIAN_API_KEY.

#Bring your own keys

webfetch reads keys from environment variables. The full set:

export UNSPLASH_ACCESS_KEY=...
export PEXELS_API_KEY=...
export PIXABAY_API_KEY=...
export BRAVE_API_KEY=...
export BING_API_KEY=...
export SERPAPI_KEY=...
export SPOTIFY_CLIENT_ID=...
export SPOTIFY_CLIENT_SECRET=...
export FLICKR_API_KEY=...
export EUROPEANA_API_KEY=...
export SMITHSONIAN_API_KEY=...
export RAWPIXEL_API_KEY=...

Or put them in ~/.webfetchrc under a profile. See CLI reference for the config file format.

#Picking providers programmatically

If you're calling the core library directly, the same ids work:

import { searchImages } from "@webfetch/core";

const out = await searchImages("kyoto temple", {
  providers: ["wikimedia", "openverse", "flickr"],
  licensePolicy: "safe-only",
  minWidth: 1200,
});

For more rationale on why license-first is a correctness concern and not a preference, see License safety.