Specialized13 min read

Web Scraping & Search: Crawling, Tavily, Exa

Master web content extraction with Graphlit. Learn web crawling patterns, site mapping, search API integration (Tavily, Exa), and web-to-markdown conversion.

Web scraping and search APIs let you ingest competitor sites, documentation, and search results. Graphlit crawls websites and integrates with Tavily and Exa search APIs.

What You'll Learn

  • Web crawling configuration
  • Site mapping and link following
  • Domain and path filtering
  • Tavily search integration
  • Exa search integration
  • Web-to-markdown conversion

Part 1: Web Crawling

import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

// Crawl documentation site
const webCrawl = await graphlit.createFeed({
  name: 'Documentation Crawler',
  type: FeedServiceTypes.Web,
  web: {
    uri: 'https://docs.example.com',
    readLimit: 500,
    allowedDomains: ['docs.example.com'],  // Stay on domain
    excludedPaths: ['/api/', '/archive/'],  // Skip sections
    depth: 3  // Follow links 3 levels deep
  }
});

What happens:

  1. Starts at uri
  2. Extracts page content → markdown
  3. Follows links (within allowedDomains)
  4. Continues until readLimit pages or depth reached

Part 2: Sitemap Crawling

// Crawl from sitemap.xml
const sitemapCrawl = await graphlit.createFeed({
  name: 'Sitemap Crawler',
  type: FeedServiceTypes.Sitemap,
  sitemap: {
    uri: 'https://example.com/sitemap.xml',
    readLimit: 1000
  }
});

Benefits:

  • Faster than link following
  • Gets all pages
  • Respects site structure

Part 3: Tavily Search

// Ingest Tavily search results
const tavilyFeed = await graphlit.createFeed({
  name: 'Tavily Search',
  type: FeedServiceTypes.TavilySearch,
  tavilySearch: {
    apiKey: process.env.TAVILY_API_KEY,
    query: 'AI agent frameworks',
    maxResults: 50
  }
});

Use cases:

  • Market research
  • Competitive intelligence
  • Trend analysis

Part 4: Exa Search

// Ingest Exa search results
const exaFeed = await graphlit.createFeed({
  name: 'Exa Search',
  type: FeedServiceTypes.ExaSearch,
  exaSearch: {
    apiKey: process.env.EXA_API_KEY,
    query: 'machine learning papers',
    maxResults: 100
  }
});

Production Patterns

Competitive Intelligence

// Crawl competitor docs
const competitors = [
  'https://competitor1.com/docs',
  'https://competitor2.com/docs',
  'https://competitor3.com/docs'
];

for (const url of competitors) {
  await graphlit.createFeed({
    name: `Competitor: ${url}`,
    type: FeedServiceTypes.Web,
    web: {
      uri: url,
      readLimit: 200,
      allowedDomains: [new URL(url).hostname]
    }
  });
}

// Search across all competitors
const features = await graphlit.queryContents({
  search: 'pricing features'
});

Documentation Monitoring

// Monitor docs for changes
const docsCrawl = await graphlit.createFeed({
  name: 'Docs Monitor',
  type: FeedServiceTypes.Web,
  web: {
    uri: 'https://docs.example.com',
    readLimit: 500
  },
  schedulePolicy: {
    recurrenceType: 'DAILY',  // Re-crawl daily
    interval: 1
  }
});

Related Guides

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Web Scraping & Search: Crawling, Tavily, Exa | Graphlit Developer Guides