Core Platform24 min read

Data Connectors: Complete Integration Guide

Connect 25+ data sources to Graphlit. Comprehensive guide covering OAuth flows, API keys, feed management, and polling strategies for Slack, Gmail, Drive, GitHub, and more.

Data connectors (Feeds) automatically sync content from external sources into Graphlit. Instead of manually ingesting files, feeds continuously monitor Slack channels, Gmail inboxes, Google Drive folders, GitHub repos, and 25+ other sources—keeping your knowledge base up-to-date in real-time.

This guide covers feed architecture, OAuth vs API key authentication, all connector types, polling strategies, and production patterns. By the end, you'll know how to connect any data source and build automated content pipelines.

What You'll Learn

  • Feed architecture and lifecycle
  • OAuth flows vs API key authentication
  • Connector patterns by category (messaging, cloud storage, project management)
  • Feed configuration options (readLimit, schedules, filters)
  • Polling vs webhook patterns
  • Production feed management
  • Error handling and retry strategies

Prerequisites:

  • A Graphlit project - Sign up (2 min)
  • SDK installed: npm install graphlit-client (30 sec)
  • OAuth apps set up for connectors you want to use (we'll show you how)

Time to complete: 80 minutes
Difficulty: Intermediate

Developer Note: All Graphlit IDs are GUIDs. Example outputs show realistic GUID format.


Table of Contents

  1. Feed Architecture
  2. Authentication Methods
  3. Messaging Connectors
  4. Cloud Storage Connectors
  5. Project Management Connectors
  6. Social Media & Web Connectors
  7. Feed Management
  8. Production Patterns

Part 1: Feed Architecture

What is a Feed?

A feed is a continuous sync between an external data source and Graphlit. Once created, it:

  1. Initial sync: Fetches existing content (e.g., last 100 Slack messages)
  2. Continuous monitoring: Polls for new content (e.g., every 15 minutes)
  3. Auto-ingestion: New content automatically appears in Graphlit

Key insight: Feeds are "set it and forget it"—no manual re-triggering needed.

✅ Quick Win: Once a feed is created, new content automatically appears in your search results and RAG responses—no additional code needed.

Feed Types

Feeds are categorized by data source:

import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

// Messaging
FeedServiceTypes.Slack
FeedServiceTypes.MicrosoftTeams
FeedServiceTypes.Discord
FeedServiceTypes.Gmail
FeedServiceTypes.OutlookEmail
FeedServiceTypes.Intercom

// Cloud Storage
FeedServiceTypes.GoogleDrive
FeedServiceTypes.OneDrive
FeedServiceTypes.SharePoint
FeedServiceTypes.Dropbox
FeedServiceTypes.Box
FeedServiceTypes.GitHub
FeedServiceTypes.AmazonS3
FeedServiceTypes.AzureStorage

// Project Management
FeedServiceTypes.Jira
FeedServiceTypes.Linear
FeedServiceTypes.Notion
FeedServiceTypes.Trello
FeedServiceTypes.GitHubIssues
FeedServiceTypes.GitHubPullRequests

// Social/Web
FeedServiceTypes.Reddit
FeedServiceTypes.Twitter
FeedServiceTypes.YouTube
FeedServiceTypes.Rss
FeedServiceTypes.Web  // Web crawling
FeedServiceTypes.Sitemap

// Calendars
FeedServiceTypes.GoogleCalendar
FeedServiceTypes.OutlookCalendar

Feed Lifecycle

CREATE → ENABLED → SYNCING → INDEXED
    ↓
DISABLED (if paused)
    ↓
DELETED (if removed)

Part 2: Authentication Methods

OAuth (Recommended for Most Connectors)

OAuth lets users authorize access without sharing passwords. Graphlit manages the OAuth flow.

Connectors using OAuth:

  • Slack
  • Gmail / Google Drive / Google Calendar
  • Microsoft (Outlook, OneDrive, SharePoint, Teams)
  • GitHub
  • Notion
  • Jira
  • Linear
  • Reddit
  • Twitter

OAuth flow:

  1. User clicks "Connect Slack"
  2. Redirected to Slack OAuth
  3. User authorizes
  4. Graphlit receives OAuth token
  5. Create feed with token
// Example: Slack OAuth
const authUrl = `https://slack.com/oauth/v2/authorize?client_id=${SLACK_CLIENT_ID}&scope=channels:read,channels:history&redirect_uri=${REDIRECT_URI}`;

// User visits authUrl, authorizes
// Slack redirects back with code

// Exchange code for token
const tokenResponse = await fetch('https://slack.com/api/oauth.v2.access', {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: `code=${code}&client_id=${SLACK_CLIENT_ID}&client_secret=${SLACK_CLIENT_SECRET}`
});

const { access_token } = await tokenResponse.json();

// Create feed with token
const feed = await graphlit.createFeed({
  name: 'My Slack Feed',
  type: FeedServiceTypes.Slack,
  slack: {
    token: access_token,
    channels: ['general', 'engineering'],
    readLimit: 100
  }
});

API Keys (For Services Without OAuth)

Some connectors use direct API keys:

  • RSS feeds (no auth)
  • Web crawling (no auth)
  • S3 (access key + secret)
  • Azure Storage (connection string)
// Example: S3 feed with API keys
const s3Feed = await graphlit.createFeed({
  name: 'Company S3 Bucket',
  type: FeedServiceTypes.AmazonS3,
  amazonS3: {
    accountName: 'my-company',
    accessKey: process.env.AWS_ACCESS_KEY,
    accessSecret: process.env.AWS_SECRET_KEY,
    bucketName: 'documents',
    prefix: 'pdfs/'  // Optional: filter by folder
  }
});

Part 3: Messaging Connectors

Slack

Use case: Search team conversations, RAG over chat history, entity extraction from messages.

import { Graphlit } from 'graphlit-client';
import { FeedServiceTypes } from 'graphlit-client/dist/generated/graphql-types';

const graphlit = new Graphlit();

// Create Slack feed
const slackFeed = await graphlit.createFeed({
  name: 'Engineering Slack',
  type: FeedServiceTypes.Slack,
  slack: {
    token: process.env.SLACK_BOT_TOKEN,
    channels: ['engineering', 'product'],  // Channel names
    readMessages: true,    // Sync messages
    readThreads: true,     // Sync replies
    readLimit: 500,        // Last 500 messages per channel
    includeAttachments: true  // Sync files/images
  }
});

console.log('Slack feed created:', slackFeed.createFeed.id);

// Wait for initial sync
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(slackFeed.createFeed.id);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 10000));  // Check every 10s
}

console.log('✓ Slack history synced');

OAuth scopes needed:

  • channels:read - List channels
  • channels:history - Read messages
  • groups:read - Private channels (optional)
  • groups:history - Private messages (optional)

What gets synced:

  • All messages in specified channels
  • Threaded replies
  • User mentions
  • Files/images attached to messages
  • Reactions (optional)

💡 Pro Tip: Combine Slack feeds with entity extraction to automatically identify who's working on which projects from Slack conversations.

Gmail

Use case: Search emails, extract contacts/companies, email-based RAG.

const gmailFeed = await graphlit.createFeed({
  name: 'My Gmail',
  type: FeedServiceTypes.Gmail,
  gmail: {
    token: process.env.GMAIL_OAUTH_TOKEN,
    readLimit: 100,  // Last 100 emails
    includeAttachments: true
  }
});

OAuth scopes needed:

  • https://www.googleapis.com/auth/gmail.readonly

What gets synced:

  • Email subject, body, sender, recipients
  • Attachments (PDFs, images, etc.)
  • Timestamps
  • Email threads

Microsoft Teams

const teamsFeed = await graphlit.createFeed({
  name: 'Engineering Team',
  type: FeedServiceTypes.MicrosoftTeams,
  teams: {
    token: process.env.TEAMS_OAUTH_TOKEN,
    teamId: 'team-guid',
    channelId: 'channel-guid',
    readLimit: 100
  }
});

Discord

const discordFeed = await graphlit.createFeed({
  name: 'Community Discord',
  type: FeedServiceTypes.Discord,
  discord: {
    token: process.env.DISCORD_BOT_TOKEN,
    guildId: 'guild-id',
    channelId: 'channel-id',
    readLimit: 500
  }
});

Part 4: Cloud Storage Connectors

Google Drive

Use case: Sync company documents, collaborative files, shared folders.

const driveFeed = await graphlit.createFeed({
  name: 'Company Drive',
  type: FeedServiceTypes.GoogleDrive,
  googleDrive: {
    token: process.env.GOOGLE_OAUTH_TOKEN,
    folderId: 'folder-id',  // Optional: sync specific folder
    readLimit: 1000,
    includeSharedDrives: true  // Include Team Drives
  }
});

What gets synced:

  • Google Docs (converted to markdown)
  • Google Sheets (tables extracted)
  • Google Slides (text extracted)
  • PDFs, images, videos
  • Files in subfolders

OAuth scopes needed:

  • https://www.googleapis.com/auth/drive.readonly

OneDrive / SharePoint

// OneDrive personal
const oneDriveFeed = await graphlit.createFeed({
  name: 'My OneDrive',
  type: FeedServiceTypes.OneDrive,
  oneDrive: {
    token: process.env.MICROSOFT_OAUTH_TOKEN,
    folderId: 'folder-id',  // Optional
    readLimit: 500
  }
});

// SharePoint (team sites)
const sharePointFeed = await graphlit.createFeed({
  name: 'Company SharePoint',
  type: FeedServiceTypes.SharePoint,
  sharePoint: {
    token: process.env.MICROSOFT_OAUTH_TOKEN,
    siteId: 'site-id',
    driveId: 'drive-id',
    readLimit: 1000
  }
});

GitHub

Use case: Sync code repos, documentation, READMEs.

const githubFeed = await graphlit.createFeed({
  name: 'Company Repo',
  type: FeedServiceTypes.GitHub,
  github: {
    token: process.env.GITHUB_PAT,  // Personal Access Token
    repositoryOwner: 'my-company',
    repositoryName: 'main-repo',
    includeBranches: ['main', 'develop']
  }
});

What gets synced:

  • Source code files
  • README.md files
  • Documentation
  • Commit messages (optional)

Amazon S3

const s3Feed = await graphlit.createFeed({
  name: 'Documents S3 Bucket',
  type: FeedServiceTypes.AmazonS3,
  amazonS3: {
    accountName: 'my-company',
    accessKey: process.env.AWS_ACCESS_KEY,
    accessSecret: process.env.AWS_SECRET_KEY,
    bucketName: 'company-documents',
    prefix: 'public/',  // Optional: sync specific folder
    region: 'us-east-1'
  }
});

Part 5: Project Management Connectors

Jira

Use case: Search issues, track project status, entity extraction from tickets.

const jiraFeed = await graphlit.createFeed({
  name: 'Engineering Jira',
  type: FeedServiceTypes.Jira,
  jira: {
    token: process.env.JIRA_OAUTH_TOKEN,
    accountId: 'jira-account-id',
    project: 'PROJ',  // Project key
    readLimit: 500
  }
});

What gets synced:

  • Issue title, description, comments
  • Status, assignee, reporter
  • Attachments
  • Custom fields

Linear

const linearFeed = await graphlit.createFeed({
  name: 'Product Linear',
  type: FeedServiceTypes.Linear,
  linear: {
    token: process.env.LINEAR_API_KEY,
    teamId: 'team-id',
    readLimit: 500
  }
});

Notion

const notionFeed = await graphlit.createFeed({
  name: 'Company Wiki',
  type: FeedServiceTypes.Notion,
  notion: {
    token: process.env.NOTION_INTEGRATION_TOKEN,
    databaseId: 'database-id',  // Optional
    readLimit: 1000
  }
});

What gets synced:

  • Pages and sub-pages
  • Databases and records
  • Embedded content
  • Inline comments

GitHub Issues & Pull Requests

// Issues
const issuesFeed = await graphlit.createFeed({
  name: 'Repo Issues',
  type: FeedServiceTypes.GitHubIssues,
  githubIssues: {
    token: process.env.GITHUB_PAT,
    repositoryOwner: 'my-company',
    repositoryName: 'main-repo',
    readLimit: 500,
    includeClosedIssues: true
  }
});

// Pull Requests
const prFeed = await graphlit.createFeed({
  name: 'Repo PRs',
  type: FeedServiceTypes.GitHubPullRequests,
  githubPullRequests: {
    token: process.env.GITHUB_PAT,
    repositoryOwner: 'my-company',
    repositoryName: 'main-repo',
    readLimit: 100
  }
});

Part 6: Social Media & Web Connectors

Reddit

const redditFeed = await graphlit.createFeed({
  name: 'Tech Subreddit',
  type: FeedServiceTypes.Reddit,
  reddit: {
    token: process.env.REDDIT_OAUTH_TOKEN,
    subreddit: 'MachineLearning',
    readLimit: 100,
    sortBy: 'hot'  // 'hot', 'new', 'top'
  }
});

RSS Feeds

const rssFeed = await graphlit.createFeed({
  name: 'Tech News RSS',
  type: FeedServiceTypes.Rss,
  rss: {
    uri: 'https://techcrunch.com/feed/',
    readLimit: 50
  }
});

Web Crawling

Use case: Scrape documentation sites, competitor analysis, content aggregation.

const webCrawl = await graphlit.createFeed({
  name: 'Documentation Crawler',
  type: FeedServiceTypes.Web,
  web: {
    uri: 'https://docs.example.com',
    readLimit: 500,
    allowedDomains: ['docs.example.com'],  // Stay on domain
    excludedPaths: ['/api/', '/archive/']  // Skip sections
  }
});

What gets scraped:

  • Page HTML (converted to markdown)
  • Links (follows to crawl more pages)
  • Images (optional)
  • Metadata (title, description)

YouTube

const youtubeFeed = await graphlit.createFeed({
  name: 'Channel Videos',
  type: FeedServiceTypes.YouTube,
  youtube: {
    token: process.env.YOUTUBE_API_KEY,
    channelId: 'channel-id',
    readLimit: 50
  }
});

What gets synced:

  • Video transcripts (auto-generated or manual)
  • Titles, descriptions
  • Thumbnails
  • Comments (optional)

Part 7: Feed Management

Query Feeds

// Get all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  console.log(`${feed.name} (${feed.type})`);
  console.log(`  State: ${feed.state}`);
  console.log(`  Last sync: ${feed.lastSyncDateTime}`);
});

Update Feed

// Change feed configuration
await graphlit.updateFeed(feedId, {
  name: 'Updated Name',
  slack: {
    readLimit: 1000  // Increase sync limit
  }
});

Disable/Enable Feed

// Pause syncing
await graphlit.disableFeed(feedId);

// Resume syncing
await graphlit.enableFeed(feedId);

Delete Feed

// Delete feed (and optionally its content)
await graphlit.deleteFeed(feedId);

// Delete feed but keep synced content
await graphlit.deleteFeed(feedId, false);

Trigger Manual Sync

// Force immediate sync (useful for testing)
await graphlit.triggerFeedSync(feedId);

// Wait for sync to complete
let isDone = false;
while (!isDone) {
  const status = await graphlit.isFeedDone(feedId);
  isDone = status.isFeedDone.result;
  await new Promise(r => setTimeout(r, 5000));
}

Part 8: Advanced Patterns

Pattern 1: Feed with Workflow

Apply processing to synced content:

// Create workflow first
const workflow = await graphlit.createWorkflow({
  name: "Extract Entities",
  extraction: { /* ... */ }
});

// Create feed with workflow
const feed = await graphlit.createFeed({
  name: 'Slack with Entities',
  type: FeedServiceTypes.Slack,
  slack: { /* ... */ },
  workflow: { id: workflow.createWorkflow.id }
});

// All synced messages will have entities extracted

Pattern 2: Feed with Collections

Auto-organize synced content:

// Create collection
const collection = await graphlit.createCollection('Slack Messages');

// Create feed that adds to collection
const feed = await graphlit.createFeed({
  name: 'Slack Feed',
  type: FeedServiceTypes.Slack,
  slack: { /* ... */ },
  collections: [{ id: collection.createCollection.id }]
});

Pattern 3: Multi-Feed Strategy

Sync from multiple sources into unified knowledge base:

// Feed 1: Slack
const slackFeed = await graphlit.createFeed({
  name: 'Slack',
  type: FeedServiceTypes.Slack,
  slack: { /* ... */ }
});

// Feed 2: Gmail
const gmailFeed = await graphlit.createFeed({
  name: 'Gmail',
  type: FeedServiceTypes.Gmail,
  gmail: { /* ... */ }
});

// Feed 3: Google Drive
const driveFeed = await graphlit.createFeed({
  name: 'Drive',
  type: FeedServiceTypes.GoogleDrive,
  googleDrive: { /* ... */ }
});

// Now search across all sources
const results = await graphlit.queryContents({
  search: "project update"
});
// Returns results from Slack, Gmail, AND Drive

Pattern 4: Scheduled Feeds

Control sync frequency:

const feed = await graphlit.createFeed({
  name: 'Daily News Feed',
  type: FeedServiceTypes.Rss,
  rss: {
    uri: 'https://news.com/feed',
    readLimit: 50
  },
  schedulePolicy: {
    recurrenceType: 'DAILY',
    interval: 1  // Every 1 day
  }
});

Part 9: Production Patterns

Pattern 1: OAuth Token Refresh

OAuth tokens expire—handle refresh:

// Store refresh token when user authorizes
const oauthData = {
  accessToken: '...',
  refreshToken: '...',
  expiresAt: Date.now() + 3600000
};

// Before creating feed, check if token is expired
async function getValidToken() {
  if (Date.now() > oauthData.expiresAt) {
    // Refresh token
    const newTokens = await refreshOAuthToken(oauthData.refreshToken);
    oauthData.accessToken = newTokens.accessToken;
    oauthData.expiresAt = Date.now() + 3600000;
  }
  return oauthData.accessToken;
}

// Use refreshed token
const token = await getValidToken();
const feed = await graphlit.createFeed({
  type: FeedServiceTypes.Slack,
  slack: { token, /* ... */ }
});

Pattern 2: Feed Health Monitoring

Monitor feed status:

// Check all feeds
const feeds = await graphlit.queryFeeds();

feeds.feeds.results.forEach(feed => {
  if (feed.state === 'FAILED') {
    console.error(`Feed ${feed.name} failed`);
    // Alert ops team
  }
  
  if (feed.lastSyncDateTime) {
    const hoursSinceSync = (Date.now() - new Date(feed.lastSyncDateTime).getTime()) / 3600000;
    if (hoursSinceSync > 24) {
      console.warn(`Feed ${feed.name} hasn't synced in ${hoursSinceSync}h`);
    }
  }
});

Pattern 3: Rate Limiting

Avoid overwhelming external APIs:

// Create feeds with delays
const urls = ['url1', 'url2', 'url3'];

for (const url of urls) {
  const feed = await graphlit.createFeed({
    type: FeedServiceTypes.Rss,
    rss: { uri: url }
  });
  
  // Wait 5 seconds between feed creations
  await new Promise(r => setTimeout(r, 5000));
}

Common Issues & Solutions

Issue: OAuth Token Invalid

Problem: "Invalid token" error when creating feed.

Solution: Refresh OAuth token or re-authorize:

try {
  const feed = await graphlit.createFeed(config);
} catch (error: any) {
  if (error.message.includes('invalid token')) {
    // Redirect user to re-authorize
    window.location.href = getOAuthUrl();
  }
}

Issue: Feed Not Syncing

Problem: Feed created but no content appears.

Solutions:

  1. Check feed state:
const feed = await graphlit.getFeed(feedId);
console.log('State:', feed.feed.state);
  1. Wait for initial sync:
await waitForFeedCompletion(feedId);
  1. Trigger manual sync:
await graphlit.triggerFeedSync(feedId);

Issue: Too Much Content

Problem: Feed syncs thousands of items, overwhelming system.

Solution: Use readLimit:

const feed = await graphlit.createFeed({
  type: FeedServiceTypes.Slack,
  slack: {
    /* ... */,
    readLimit: 100  // Only last 100 messages
  }
});

What's Next?

You now understand data connectors completely. Next steps:

  1. Set up OAuth apps for connectors you need
  2. Create feeds for key data sources
  3. Apply workflows to customize processing
  4. Monitor feed health in production

Related guides:

Happy connecting! 🔌

Ready to Build with Graphlit?

Start building AI-powered applications with our API-first platform. Free tier includes 100 credits/month — no credit card required.

No credit card required • 5 minutes to first API call

Data Connectors: Complete Integration Guide | Graphlit Developer Guides