GPT-to-Audio: Publish AI-Generated Podcasts with Graphlit, GPT-4 and ElevenLabs

Kirk Marple

February 3, 2024

Text-to-speech models such as those from ElevenLabs have become incredibly human-sounding, and have the ability to clone your own voice.

Now, the Graphlit Platform can be used to generate audio with ElevenLabs voices from any content - web pages, PDFs, audio transcripts and more.

Say we want to generate a podcast about this week's interesting AI news and academic papers.

🔉 Listen to an example here.

We first need to ingest some content to use for our podcast. We'll start by ingesting an interesting blog post about knowledge graphs, but we also want to crawl any hyperlinks and ingest the ArXiV papers referenced in the blog post.

After ingesting all the content we want to use for the podcast, we'll create an LLM specification for the latest GPT-4 Turbo (0125) model, and then will publish summarized versions of our content as an MP3 audio file using an ElevenLabs voice.

Once the publishing has completed, you can download the MP3 to post on social media or upload to a media hosting site.

You can use any content you want with this audio publishing process, create your own LLM publishing prompt, and select any ElevenLabs voice.

Create Workflow

To crawl the links automatically, we need to create a workflow object which enables crawling of Web page hyperlinks.

Mutation:

mutation CreateWorkflow($workflow: WorkflowInput!) {
  createWorkflow(workflow: $workflow) {
    id
    name
    state
  }
}

Variables:

{
  "workflow": {
    "enrichment": {
      "link": {
        "enableCrawling": true,
        "allowedLinks": [
          "WEB"
        ],
        "maximumLinks": 10
      }
    },
    "name": "AI Podcast Workflow"
  }
}

Response:

{
  "id": "1ee5b5dd-f445-4aff-b492-f4178e7e8550",
  "name": "AI Podcast Workflow",
  "state": "ENABLED"
}

Ingest Contents

To ingest a web page, we will specify the uri and the workflow id we created above, since we want to crawl any hyperlinks we find in this blog post.

You can repeat this process for any other content you want to include in your AI-generated podcast, and you aren't limited just to web pages. You can use audio transcripts, PDFs, Word documents, or even Slack or email.

Mutation:

mutation IngestPage($name: String, $uri: URL!, $id: ID, $workflow: EntityReferenceInput) {
  ingestPage(name: $name, uri: $uri, id: $id, workflow: $workflow) {
    id
    name
    state
    type
    uri
  }
}

Variables:

{
  "uri": "https://towardsdatascience.com/ultra-foundation-models-for-knowledge-graph-reasoning-9f8f4a0d7f09#71ab",
  "workflow": {
    "id": "1ee5b5dd-f445-4aff-b492-f4178e7e8550"
  }
}

Response:

{
  "type": "PAGE",
  "uri": "https://towardsdatascience.com/ultra-foundation-models-for-knowledge-graph-reasoning-9f8f4a0d7f09#71ab",
  "id": "0c350de4-0e5d-40c8-8245-8b32e44cbd77",
  "name": "ULTRA: Foundation Models for Knowledge Graph Reasoning | by Michael Galkin | Towards Data Science",
  "state": "CREATED"
}

Create Specification

Once we have the content we want to publish, let's use the GPT-4 Turbo (0125) model to publish the script which will be provided to the ElevenLabs text-to-speech API.

Here we specify the OPEN_AI service, and the GPT4_TURBO_128K_0125 model.

Creating a specification is optional, but GPT-4 tends to give better quality output for the publishing process.

Mutation:

mutation CreateSpecification($specification: SpecificationInput!) {
  createSpecification(specification: $specification) {
    id
    name
    state
    type
    serviceType
  }
}

Variables:

{
  "specification": {
    "type": "COMPLETION",
    "serviceType": "OPEN_AI",
    "openAI": {
      "model": "GPT4_TURBO_128K_0125"
    },
    "name": "GPT-4 0125"
  }
}

Response:

{
  "type": "COMPLETION",
  "serviceType": "OPEN_AI",
  "id": "12748d8b-16f8-4c94-a71c-ccd1ced57322",
  "name": "GPT-4 0125",
  "state": "ENABLED"
}

Publish Contents

We're now ready to publish our AI-generated podcast. Listen to an example here.

We first need to tell Graphlit that we want to use ELEVEN_LABS_AUDIO as our publishing type, with the MP3 format. We are using the ENGLISH_V1 model with a British male voice. You can find all the supported the voice IDs on the ElevenLabs website.

We are also assigning the publishSpecification to the GPT-4 specification we created above.

Here is the publish prompt used for this podcast:

Graphlit will summarize each of the contents that we ingested, and then use the publishing prompt provided to generate the final script which is passed to the ElevenLabs text-to-speech API.

This process takes a few minutes to complete, and when the publishContents mutation finishes, your MP3 will be ready for download.

Mutation:

mutation PublishContents($summaryPrompt: String, $publishPrompt: String!, $connector: ContentPublishingConnectorInput!, $filter: ContentFilter, $name: String, $summarySpecification: EntityReferenceInput, $publishSpecification: EntityReferenceInput, $workflow: EntityReferenceInput) {
  publishContents(summaryPrompt: $summaryPrompt, publishPrompt: $publishPrompt, connector: $connector, filter: $filter, name: $name, summarySpecification: $summarySpecification, publishSpecification: $publishSpecification, workflow: $workflow) {
    id
    name
    creationDate
    state
    uri
    type
    fileType
    mimeType
    fileName
    fileSize
  }
}

Variables:

{
  "publishPrompt": "Follow these steps.\\n\\nStep 1: Think about a structure for 10 minute long, engaging AI-generated podcast, with an welcome and introduction, an in-depth discussion of 4-6 interesting topics across all provided content, and a wrap-up. Focus your topics around knowledge graphs, knowledge retrieval, and related areas.\\nStep 2: For each topic, write 2-3 detailed paragraphs discussing it in-depth. Touch on key points for each topic which would be interesting to listeners. Mention the content metadata, entities and details from the provided summaries, as appropriate in the discussion. Remove any topic or section headings. Remove any references to podcast background music.  Remove any timestamps.\\nStep 3: Combine all topics into a lengthy, single-person script which can be used to record a podcast. Use friendly and compelling conversation to write the scripts.  You can be witty, but don't be cheesy.\\nStep 4: Remove any unnecessary formatting or final notes about being AI generated. \\n\\nRefer to this as the Graphlit AI podcast, but don't mention anything about the host.",
  "connector": {
    "type": "ELEVEN_LABS_AUDIO",
    "format": "MP3",
    "elevenLabs": {
      "model": "ENGLISH_V1",
      "voice": "NO8iseyk20Aw8Lx2NvBu"
    }
  },
  "publishSpecification": {
    "id": "12748d8b-16f8-4c94-a71c-ccd1ced57322"
  }
}

Response:

{
  "type": "FILE",
  "mimeType": "audio/mp3",
  "fileType": "AUDIO",
  "fileSize": 4145319,
  "uri": "https://redacted.blob.core.windows.net/files/ccd78bbd-7f26-4964-99e9-e231459b4c08/Content.mp3",
  "id": "ccd78bbd-7f26-4964-99e9-e231459b4c08",
  "name": "Content.mp3",
  "state": "CREATED",
  "creationDate": "2024-02-03T03:50:15Z"
}

Get Published MP3

Once you've published your audio file, you can easily get a URI to download it.

Query the content via the id returned from publishContents, and the masterUri provides a temporary URI for download. Since the published content is re-ingested into Graphlit, you can look at transcriptUri for the audio transcript automatically generated from the MP3.

Query:

query GetContent($id: ID!) {
  content(id: $id) {
    id
    name
    creationDate
    state
    originalDate
    finishedDate
    workflowDuration
    uri
    type
    fileType
    mimeType
    fileName
    fileSize
    masterUri
    mezzanineUri
    transcriptUri
  }
}

Variables:

{
  "id": "ccd78bbd-7f26-4964-99e9-e231459b4c08"
}

Response:

{
  "type": "FILE",
  "mimeType": "audio/mp3",
  "fileType": "AUDIO",
  "fileName": "Content.mp3",
  "fileSize": 4145319,
  "masterUri": "https://redacted.blob.core.windows.net/files/ccd78bbd-7f26-4964-99e9-e231459b4c08/Content.mp3?sv=2023-11-03&se=2024-02-03T09%3A50%3A15Z&sr=c&sp=rl&sig=v3VHjZz3NGVbHqksmVtmZJiMssC7RMaZJ6DWemkx2MA%3D",
  "mezzanineUri": "https://redacted.blob.core.windows.net/files/ccd78bbd-7f26-4964-99e9-e231459b4c08/Mezzanine/Content.mp3?sv=2023-11-03&se=2024-02-03T09%3A50%3A15Z&sr=c&sp=rl&sig=v3VHjZz3NGVbHqksmVtmZJiMssC7RMaZJ6DWemkx2MA%3D",
  "transcriptUri": "https://redacted.blob.core.windows.net/files/ccd78bbd-7f26-4964-99e9-e231459b4c08/Transcript/Content.json?sv=2023-11-03&se=2024-02-03T09%3A50%3A15Z&sr=c&sp=rl&sig=v3VHjZz3NGVbHqksmVtmZJiMssC7RMaZJ6DWemkx2MA%3D",
  "uri": "https://redacted.blob.core.windows.net/files/ccd78bbd-7f26-4964-99e9-e231459b4c08/Content.mp3",
  "id": "ccd78bbd-7f26-4964-99e9-e231459b4c08",
  "name": "Content.mp3",
  "state": "FINISHED",
  "creationDate": "2024-02-03T03:50:15Z",
  "finishedDate": "2024-02-03T03:50:21Z",
  "workflowDuration": "PT5.8222653S",
}

By combining the power of LLMs, such as OpenAI GPT-4 Turbo, with the latest text-to-speech models from ElevenLabs, Graphlit helps you automate the creation of AI-generated podcasts, daily audio summaries, or any other audio renditions of your ingested contents.

Summary

Please email any questions on this tutorial or the Graphlit Platform to questions@graphlit.com.

For more information, you can read our Graphlit Documentation, visit our marketing site, or join our Discord community.