We measured our research page against four benchmarks: Anthropic's research blog (inline hyperlinks, methodology disclosure), Stripe's guides (structured navigation, code examples), McKinsey's publications (survey methodology, raw data release), and a16z's thought leadership (named authors, education-first framing). The audit found systemic gaps in source documentation, citation practice, and evidence presentation. The existing content is solid. The evidence standards need to match the quality of the thinking.
The Benchmark
Across Anthropic, Stripe, McKinsey, and a16z, the minimum research quality bar includes: every factual claim links to its source (inline, not just endnotes), methodology is disclosed (how data was collected, sample size, limitations), authors or institutional attribution is visible, data is presented visually where appropriate, vendor-sourced data is disclosed as such, and limitations are acknowledged before conclusions.
Anthropic uses inline hyperlinks to prior work, figures with captions, and categorizes posts as Features, Workflows, and Field Notes. Stripe uses table-of-contents sidebars, card callouts, and embeds documentation standards in engineering career ladders. McKinsey discloses survey sample sizes and released 206 pages of raw data when their ACA survey methodology was questioned. a16z publishes with named authors and positions content to educate rather than sell.
Current State: The Numbers
| Metric | Result |
|---|---|
| Papers with clickable source links | 11/20 (55%) |
| Papers with ZERO clickable links | 9/20 (45%) |
| Papers using inline citations | 0/20 (0%) |
| Papers with methodology disclosure | 0/20 (0%) |
| Author attribution | 0/20 (0%) |
| Data visualization (charts/figures) | 0/20 (0%) |
| Limitations section | 16/20 (80%) |
| Summary callout box | 20/20 (100%) |
| Total sources cited across all papers | ~238 (avg 11.9/paper) |
Evidence Quality Issues Found
Recycled unattributed statistics. The claim “42% of companies scrapped AI initiatives in 2024” appears in at least 3 papers without a traceable source in any of them.
Vendor-sourced data presented as objective. Multiple papers cite vendor marketing data (Oscar Chat, Keepme, Stammer AI) as findings rather than vendor assertions, without disclosing the source's financial interest.
Dated research cited as current. A 2011 Harvard Business Review statistic about lead response appears in multiple papers without acknowledging the study is 15 years old.
Conflicting data within papers. The White-Label AI paper cites two market projections that conflict ($82.46 billion by 2034 vs. $41.39 billion by 2030) without reconciliation.
Paper-by-Paper Assessment
Industry Guides are the strongest category (all have clickable links, consistent structure, limitations sections). Agent Orchestration is the single best paper. Methodology papers are the weakest category (4/6 have zero links).
| Paper | Links | Grade | Primary Issue |
|---|---|---|---|
| Agent Orchestration | 18 | A- | Analyst predictions not hedged |
| Lead Capture | 23 | B+ | Vendor-sourced claims need disclosure |
| Restaurant AI | 20 | B+ | $20.1B methodology unclear |
| Platform Integration | 8 | B+ | “92% of photographers” unsourced |
| D1 Sports Videography | 17 | B+ | 7-site sample generalized broadly |
| Conversation Intelligence | 0 | C+ | Zero source links |
| Cross-Model Consensus | 0 | C | Describes system without proving it works |
| Agent Visual Design | 0 | C | 6 sources, none linked, unsubstantiated framework |
The Fix: Prioritized Improvements
Priority 1: Add source links to 9 papers (~4.5 hours). The single most impactful change. A paper without verifiable sources is a blog post, not research. Papers needing links: Conversation Intelligence, Agent Visual Design, Visualizing AI Agents, Infrastructure Trust, Cross-Model Consensus, Structural Role Separation, Cinematic Portfolio Design.
Priority 2: Inline citations (~15 hours). Move from endnote-only sources to inline hyperlinks. When a paper says “42% of companies scrapped AI initiatives,” the number should link to the source.
Priority 3: Vendor disclosure (~3 hours). Add disclosure when citing vendor-sourced data. “Oscar Chat (a chatbot vendor) reports...” vs. the current “chatbots capture 3x more leads.”
Priority 4: Methodology sections (~10 hours). Add methodology disclosure to papers making empirical claims. Sample size, data collection method, scoring criteria.
Priority 5: Data visualization (~15 hours). Add at least one chart or figure to papers presenting numerical data. The booking funnel with dollar values at each stage should be visual, not prose.
Total estimated effort: ~57 hours across all priorities. The highest-impact fix (P1: source links) takes less than 5 hours and immediately moves 9 papers from unverifiable to properly sourced.
Sources
- Anthropic Research Page — research blog structure and citation practices
- Anthropic Alignment Science Blog — Features, Workflows, Field Notes categorization
- Stripe Guides — guide format and design patterns
- Mintlify: How Stripe Creates Documentation — writing culture and documentation standards
- a16z News and Content — thought leadership organization
- EI Exchange: a16z Thought Leadership Analysis — education-first content strategy
- BlogSEO: Inline Citations in Content Authority — citation credibility impact