Providers
The 24 image sources webfetch federates — auth, rate limits, license defaults, and when to enable each one.
webfetch ships with 24 provider adapters. Each one normalizes a different source into the same ImageCandidate shape, tagged with a license and a confidence score. The DEFAULT_PROVIDERS set — 19 of the 24 — is optimized for "ship a safe image without human review". The other five are opt-in for ToS-grey, paid, or variant use cases.
#Provider matrix
| Provider | Coverage | Default license | Auth | Rate limit | Opt-in |
|---|---|---|---|---|---|
| wikimedia | portraits, events, logos, history | CC_BY_SA (from meta) | none (UA required) | 20/s | no |
| openverse | any CC-licensed content (50+ GLAMs) | CC_BY (from meta) | none | 5/s | no |
| unsplash | high-quality photography | CC0 (Unsplash License) | UNSPLASH_ACCESS_KEY |
1/s demo, 5000/hr prod | no |
| pexels | stock photography | CC0 (Pexels License) | PEXELS_API_KEY |
3/s | no |
| pixabay | stock photos + illustrations | CC0 (Pixabay License) | PIXABAY_API_KEY |
2/s | no |
| itunes | album covers, artist portraits | EDITORIAL_LICENSED | none | 5/s | no |
| musicbrainz-caa | canonical album art | EDITORIAL_LICENSED | none (UA required) | 1/s hard cap | no |
| spotify | artist + album images | EDITORIAL_LICENSED | SPOTIFY_CLIENT_ID/SECRET |
10/s | no |
| youtube-thumb | video thumbnail by id/URL | EDITORIAL_LICENSED | none | 20/s | no |
| brave | general web image search | UNKNOWN (+heuristic) | BRAVE_API_KEY |
1/s | no |
| bing | general web image search | UNKNOWN (+heuristic) | BING_API_KEY |
3/s | yes |
| serpapi | Google Images + reverse-image | UNKNOWN (+heuristic) | SERPAPI_KEY |
2/s | yes |
| browser | headless fallback on images.google | UNKNOWN (+heuristic) | Playwright installed | 0.25/s | yes |
| flickr | CC-licensed + PD photos | CC_BY (from meta) | FLICKR_API_KEY |
3/s | no |
| internet-archive | PD / CC images | PUBLIC_DOMAIN | none | 5/s | no |
| smithsonian | Open Access museum (100% CC0) | CC0 | SMITHSONIAN_API_KEY (DEMO_KEY) |
1/s | no |
| nasa | NASA imagery (all public domain) | PUBLIC_DOMAIN | none | 5/s | no |
| met-museum | The Met Open Access (CC0) | CC0 | none | 4/s | no |
| europeana | European cultural heritage | CC_BY (from meta) | EUROPEANA_API_KEY |
5/s | no |
| library-of-congress | US historical archive | PUBLIC_DOMAIN | none | 10/s | no |
| wellcome-collection | medical/historical imagery | CC_BY (from meta) | none | 5/s | no |
| rawpixel | CC0 slice of Rawpixel library | CC0 | optional | 3/s | no |
| burst | Shopify Burst (100% CC0) | CC0 | none | 3/s | no |
| europeana-archival | Europeana TEXT records | CC_BY (from meta) | EUROPEANA_API_KEY |
5/s | yes |
#Default set
DEFAULT_PROVIDERS runs when you don't pass --providers. It excludes every ToS-grey source and everything with per-query cost. Included by default:
wikimedia, openverse, itunes, musicbrainz-caa, unsplash, pexels, pixabay,
spotify, brave, internet-archive, smithsonian, nasa, met-museum, flickr,
europeana, library-of-congress, wellcome-collection, rawpixel, burstProviders that need auth gracefully skip when their keys are missing, so the default set always produces results — it just produces more results when you provision keys.
#When to enable opt-in providers
| Provider | Enable when |
|---|---|
| bing | Your queries are English-web-heavy and Brave coverage is thin. |
| serpapi | You need Google-quality recall and accept per-query cost (~$10/1k queries). |
| browser | You explicitly accept ToS risk and have Playwright installed. |
| europeana-archival | You're building editorial or historical layouts and want manuscript / newspaper scans. |
Opt-in providers only run when the providers list mentions them AND, for browser, WEBFETCH_ENABLE_BROWSER=1 is set. See Browser layer for the consent model.
#Per-use-case tuning
#Musicians (artist pages, album pages)
webfetch artist "Taylor Swift" --kind portrait \
--providers spotify,musicbrainz-caa,itunes,wikimediaThese four cover ~98% of releases with editorial-licensed or CC-BY-SA media. Spotify needs SPOTIFY_CLIENT_ID + SPOTIFY_CLIENT_SECRET.
#Editorial / journalism (portraits, events, landmarks)
webfetch search "shinzo abe state funeral" \
--providers wikimedia,openverse,unsplashWikimedia carries news events and portraits with metadata-backed CC licensing. Openverse federates 50+ GLAM collections. Unsplash fills stylistic gaps.
#Stock photography (marketing, blog hero images)
webfetch search "modern kitchen interior" \
--providers unsplash,pexels,pixabay --min-width 1600CC0-equivalent platform licenses, no attribution required. Fastest pipeline of the three.
#Science / nature / reference
webfetch search "axolotl" --providers wikimedia,openverse#Public-domain archival
webfetch search "moon landing" \
--providers nasa,library-of-congress,internet-archive,smithsonian#General web (last-resort coverage)
webfetch search "new product category xyz" \
--providers wikimedia,openverse,brave --license prefer-safeprefer-safe keeps unknown-license results in the output but ranks them below the safe ones.
#Gotchas
- Wikimedia returns mixed licenses; coercion is 100% metadata-driven. We never default to "probably CC".
- Openverse sets
license_type=commercial,modificationso results are commercial-safe by default. Results without metadata are dropped. - Unsplash / Pexels / Pixabay technically aren't CC0 — they use custom licenses that track CC0 terms. We map to CC0 with confidence 0.85 (vs 0.95 for real CC0).
- iTunes / MusicBrainz CAA / Spotify tag as
EDITORIAL_LICENSED. This means "OK as part of album/artist identification UI under platform ToS". Always display attribution. - Brave returns web images with no structured license; we upgrade via
heuristicLicenseFromUrl(Unsplash host → CC0, Commons host → CC-BY-SA pending verification, etc). - Bing — Microsoft retired the classic Bing Search API mid-2025. This adapter still targets v7. Callers may need to swap endpoints.
- SerpAPI — opt-in because it's a paid relay. Most valuable for
find_similar(Google reverse image search). - browser — only runs when
WEBFETCH_ENABLE_BROWSER=1AND"browser"is inproviders. Every result is flaggedviaBrowserFallback: true; downstream code can refuse to ship it. - Smithsonian — DEMO_KEY is fine for dev but hits a 30/hr rate limit under real load. Provision
SMITHSONIAN_API_KEY.
#Bring your own keys
webfetch reads keys from environment variables. The full set:
export UNSPLASH_ACCESS_KEY=...
export PEXELS_API_KEY=...
export PIXABAY_API_KEY=...
export BRAVE_API_KEY=...
export BING_API_KEY=...
export SERPAPI_KEY=...
export SPOTIFY_CLIENT_ID=...
export SPOTIFY_CLIENT_SECRET=...
export FLICKR_API_KEY=...
export EUROPEANA_API_KEY=...
export SMITHSONIAN_API_KEY=...
export RAWPIXEL_API_KEY=...Or put them in ~/.webfetchrc under a profile. See CLI reference for the config file format.
#Picking providers programmatically
If you're calling the core library directly, the same ids work:
import { searchImages } from "@webfetch/core";
const out = await searchImages("kyoto temple", {
providers: ["wikimedia", "openverse", "flickr"],
licensePolicy: "safe-only",
minWidth: 1200,
});For more rationale on why license-first is a correctness concern and not a preference, see License safety.