DATA METHODOLOGY
How the 13D Watch data moat is built.
10 government and regulatory sources, ingested on cron, normalized into Postgres, cross-referenced by issuer CIK and ticker, enriched with AI analyst briefs. Every claim on the site traces to a primary source.
Data sources
The pipeline ingests the following ten sources directly from their authoritative origins. No third-party aggregators sit between us and the source.
- SEC EDGAR — Schedule 13D / 13G / 13D-A / 13G-A · Activist beneficial-ownership filings · refreshed daily at 11:00 UTC. Both index and primary_doc.xml body are parsed; Item 4 (Purpose), percent_owned, and CUSIP are extracted from the SGML body.
- SEC EDGAR — Form 4 · Insider transaction reports · refreshed every 6 hours. Issuer CIK / ticker / name extracted from the XML body, including option exercises, awards, and direct/indirect ownership.
- SEC EDGAR — Form 13F-HR · Quarterly institutional holdings for 10 tracked elite portfolio managers (Berkshire Hathaway, Pershing Square, Baupost, Appaloosa, Third Point, Duquesne Family Office, Viking Global, Tiger Global, Greenlight Capital, Lone Pine).
- FINRA — Consolidated Short Interest · Every settlement-date snapshot (~21K symbols × ~bi-weekly), via the api.finra.org REST API.
- CFTC — Commitment of Traders (Disaggregated) · Weekly futures positioning across 565 contracts back to 2019.
- U.S. Treasury — Auction Results · Bid-to-cover, high yield, dealer take, direct/indirect bid percentages.
- U.S. Treasury — TIC Foreign Holdings · Monthly foreign holdings of U.S. securities by 60 countries.
- FRED (St. Louis Fed) · 51 macro series including yield curve, TIPS real yields, breakeven inflation, credit spreads, BAA/AAA yields, ICE/BofA HY/IG indices, FX, commodities, Fed balance sheet.
- BLS — Sector Employment · 16 series, monthly with revision tracking.
- CoinGecko — Crypto Spot · BTC, ETH, SOL price + market cap + dominance + total volume, daily.
Ingestion architecture
All ingestion runs in a single Cloudflare Worker (data-moat-ingest) on cron triggers. Each source has its own ingest function with idempotent upserts. Raw payloads are also stored in an R2 bucket (data-moat-bronze) as immutable bronze-layer snapshots, so any extraction logic change can be back-applied without re-fetching from the source.
Storage: Supabase Postgres (project ref acouoyxkzrqtcbimveyl). Tables include activist_filings, insider_transactions, institutional_holdings, short_interest, treasury_auctions, tic_holdings, cot_positions, normalized_series (FRED + BLS + CoinGecko), entities. Views activist_latest and institutional_consensus serve common joins.
Cross-referencing and enrichment
Each new activist filing is joined to the other sources at query time, not pre-computed:
- Insider activity — joined by
issuer_cik against insider_transactions in the trailing 365 days.
- Institutional consensus — joined by
cusip, then by normalized issuer name as a fallback (uppercase, strip trailing entity suffixes like INC/CORP/PLC/LTD).
- Short interest — joined by
issuer_ticker against the latest FINRA settlement-date snapshot. Tickers are resolved from activist_filings.issuer_ticker first; if null, looked up via the SEC's company_tickers.json file (10,357 mappings, 24h edge-cached).
- Issuer cluster — count of distinct activist filings on the same
issuer_cik in the last 90 days.
- Filer history — count of filings by the same
filer_cik in the last 90 days.
AI analyst briefs
Each filing receives a 1-2 sentence analyst brief plus an intent classification, generated by Anthropic's Claude Sonnet 4.6 via OpenRouter. The model receives the full enrichment context (filer, issuer, stake, Item 4 text, insider activity, institutional consensus, short interest, issuer cluster, filer history) and returns strict JSON: { summary, intent, confidence }.
Intent classes: passive, strategic, m&a-arb, board-change, capital-structure, distressed, unclear. Briefs are cached forever per accession_number in Cloudflare KV — filings are immutable and never need regeneration. Cost is bounded at roughly $0.005 per filing, ~$0.04/day steady-state.
Refresh cadence
Source freshness as currently scheduled:
- EDGAR Schedule 13D/G + activist enrichment backfill: daily at 11:00 UTC
- EDGAR Form 4: every 6 hours
- FRED + Treasury auctions + CoinGecko: daily at 11:00 UTC
- FINRA short interest: daily check, new settlements typically every ~2 weeks
- 13F holdings: quarterly (manual trigger, ~45 days after quarter end)
- CFTC COT: weekly, Saturday 14:00 UTC
- BLS employment: monthly, 5th of month 16:00 UTC
Accuracy and known limitations
Some data quality limits we disclose openly:
- Item 4 extraction: ~86% population on activist_filings. Schedule 13D/A amendments often do not repeat Item 4 text — only the items being amended. Where Item 4 is null, the filing detail page shows "Read the raw SEC filing" rather than fabricating.
- Insider activity coverage: ~10% of Form 4 rows have populated issuer fields. The remainder are non-Form-4 filings that EDGAR's atom feed surfaces under
type=4 (e.g., 424B2 prospectuses). Our parser now filters these out at ingest, so this percentage will grow over time as the legacy junk is purged.
- Institutional consensus: only 10 elite portfolio managers tracked. Most small-cap and microcap activist targets have zero coverage from this filer set. This is a market reality, not a data bug.
- Short interest: FINRA snapshot is bi-weekly, not real-time. The page shows the latest available settlement date.
For methodology questions or data-quality reports, contact contact@13dwatch.com.
The infrastructure that powers this is the moat.
One pipeline. Ten sources. Cross-referenced at query time. Built on Cloudflare Workers + Supabase. API documentation →
Request a pilot →