Web scraping and search APIs let you ingest competitor sites, documentation, and search results. Graphlit crawls websites and integrates with Tavily and Exa search APIs.
What You'll Learn
- Web crawling configuration
- Site mapping and link following
- Domain and path filtering
- Tavily search integration
- Exa search integration
- Web-to-markdown conversion
Part 1: Web Crawling
import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';
// Crawl documentation site
const webCrawl = await graphlit.createFeed({
name: 'Documentation Crawler',
type: FeedServiceTypes.Web,
web: {
uri: 'https://docs.example.com',
readLimit: 500,
allowedDomains: ['docs.example.com'], // Stay on domain
excludedPaths: ['/api/', '/archive/'], // Skip sections
depth: 3 // Follow links 3 levels deep
}
});
What happens:
- Starts at
uri - Extracts page content → markdown
- Follows links (within
allowedDomains) - Continues until
readLimitpages ordepthreached
Part 2: Sitemap Crawling
// Crawl from sitemap.xml
const sitemapCrawl = await graphlit.createFeed({
name: 'Sitemap Crawler',
type: FeedServiceTypes.Sitemap,
sitemap: {
uri: 'https://example.com/sitemap.xml',
readLimit: 1000
}
});
Benefits:
- Faster than link following
- Gets all pages
- Respects site structure
Part 3: Tavily Search
// Ingest Tavily search results
const tavilyFeed = await graphlit.createFeed({
name: 'Tavily Search',
type: FeedServiceTypes.TavilySearch,
tavilySearch: {
apiKey: process.env.TAVILY_API_KEY,
query: 'AI agent frameworks',
maxResults: 50
}
});
Use cases:
- Market research
- Competitive intelligence
- Trend analysis
Part 4: Exa Search
// Ingest Exa search results
const exaFeed = await graphlit.createFeed({
name: 'Exa Search',
type: FeedServiceTypes.ExaSearch,
exaSearch: {
apiKey: process.env.EXA_API_KEY,
query: 'machine learning papers',
maxResults: 100
}
});
Production Patterns
Competitive Intelligence
// Crawl competitor docs
const competitors = [
'https://competitor1.com/docs',
'https://competitor2.com/docs',
'https://competitor3.com/docs'
];
for (const url of competitors) {
await graphlit.createFeed({
name: `Competitor: ${url}`,
type: FeedServiceTypes.Web,
web: {
uri: url,
readLimit: 200,
allowedDomains: [new URL(url).hostname]
}
});
}
// Search across all competitors
const features = await graphlit.queryContents({
search: 'pricing features'
});
Documentation Monitoring
// Monitor docs for changes
const docsCrawl = await graphlit.createFeed({
name: 'Docs Monitor',
type: FeedServiceTypes.Web,
web: {
uri: 'https://docs.example.com',
readLimit: 500
},
schedulePolicy: {
recurrenceType: 'DAILY', // Re-crawl daily
interval: 1
}
});
Related Guides
- Data Connectors - All connector types
- Content Ingestion - Ingestion basics