Back to all posts

How to Scrape YouTube Transcripts - The Easy Way

Jonathan Geiger
youtubeapiweb-scrapingtutorialtranscriptyoutube-shorts

Extracting transcripts from YouTube videos manually doesn't scale. Whether you're analyzing educational content, building accessibility tools, or conducting content research, you need a reliable way to get YouTube transcripts programmatically - including YouTube Shorts.

In this guide, I'll show you how to scrape YouTube transcripts using SocialKit's YouTube Transcript API - a straightforward solution that handles YouTube's dynamic content loading, anti-scraping measures, and works seamlessly with both regular videos and Shorts.

Getting Started

1. Get Your API Access Key

First, you'll need an API access key. Visit your SocialKit Dashboard to get your free access key. The free tier includes 20 requests - perfect for testing and small projects.

2. The API Endpoint

Here's the endpoint you'll be working with:

GET https://api.socialkit.dev/youtube/transcript

Required Parameters:

  • access_key: Your API access key
  • url: The YouTube video URL (works with regular videos and Shorts)

Example Request & Response

Let's look at a real example using a YouTube video:

GET https://api.socialkit.dev/youtube/transcript?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ

Response:

{
  "success": true,
  "data": {
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "transcript": "[♪♪♪] ♪ We're no strangers to love ♪ ♪ You know the rules\nand so do I ♪ ♪ A full commitment's\nwhat I'm thinking of ♪ ♪ You wouldn't get this\nfrom any other guy ♪ ♪ I just wanna tell you\nhow I'm feeling ♪ ♪ Gotta make you understand ♪ ♪ Never gonna give you up ♪ ♪ Never gonna let you down ♪ ♪ Never gonna run around\nand desert you ♪ ♪ Never gonna make you cry ♪ ♪ Never gonna say goodbye ♪ ♪ Never gonna tell a lie\nand hurt you ♪",
    "transcriptSegments": [
      {
        "text": "[♪♪♪]",
        "start": 0,
        "duration": 5,
        "timestamp": "00:00"
      },
      {
        "text": "♪ We're no strangers to love ♪",
        "start": 5,
        "duration": 5,
        "timestamp": "00:05"
      },
      {
        "text": "♪ You know the rules\nand so do I ♪",
        "start": 10,
        "duration": 5,
        "timestamp": "00:10"
      },
      {
        "text": "♪ A full commitment's\nwhat I'm thinking of ♪",
        "start": 15,
        "duration": 5,
        "timestamp": "00:15"
      },
      {
        "text": "♪ You wouldn't get this\nfrom any other guy ♪",
        "start": 20,
        "duration": 5,
        "timestamp": "00:20"
      }
    ],
    "wordCount": 458,
    "segments": 61
  }
}

You get:

  • Full transcript as plain text
  • Timestamped segments with precise start times and durations
  • Word count for quick content analysis
  • Segment count to understand video structure
  • Works with YouTube Shorts - same API, same format

Code Examples

Choose your preferred language and start extracting YouTube transcripts:

JavaScript/Node.js Example

const axios = require('axios');

/**
 * Extract transcript from a YouTube video or Short
 * @param {string} videoUrl - The YouTube video URL
 * @param {string} accessKey - Your SocialKit API access key
 * @param {boolean} cache - Enable caching for faster subsequent requests
 * @returns {Promise<Object>} Transcript data
 */
async function getYouTubeTranscript(videoUrl, accessKey, cache = false) {
	const endpoint = 'https://api.socialkit.dev/youtube/transcript';

	try {
		const response = await axios.get(endpoint, {
			params: {
				access_key: accessKey,
				url: videoUrl,
				cache: cache,
			},
		});

		const data = response.data;

		if (data.success) {
			return data.data;
		} else {
			throw new Error(`API Error: ${data.message || 'Unknown error'}`);
		}
	} catch (error) {
		if (error.response) {
			throw new Error(
				`API Error: ${error.response.data.message || error.response.statusText}`
			);
		}
		throw new Error(`Request failed: ${error.message}`);
	}
}

// Usage
(async () => {
	const ACCESS_KEY = 'your-access-key-here';
	const VIDEO_URL = 'https://youtube.com/watch?v=dQw4w9WgXcQ';

	try {
		const result = await getYouTubeTranscript(VIDEO_URL, ACCESS_KEY);

		console.log(`Video URL: ${result.url}`);
		console.log(`Word Count: ${result.wordCount}`);
		console.log(`Segments: ${result.segments}`);
		console.log(`\nFull Transcript:\n${result.transcript}`);

		console.log('\n--- First 5 Segments ---');
		result.transcriptSegments.slice(0, 5).forEach((segment) => {
			console.log(`[${segment.timestamp}] ${segment.text}`);
		});
	} catch (error) {
		console.error(`Error: ${error.message}`);
	}
})();

Python Example

import requests

def get_youtube_transcript(video_url, access_key, cache=False):
    """
    Extract transcript from a YouTube video or Short

    Args:
        video_url: The YouTube video URL
        access_key: Your SocialKit API access key
        cache: Enable caching for faster subsequent requests

    Returns:
        dict: Transcript data including full text and timestamped segments
    """
    endpoint = "https://api.socialkit.dev/youtube/transcript"

    params = {
        "access_key": access_key,
        "url": video_url,
        "cache": cache
    }

    try:
        response = requests.get(endpoint, params=params)
        response.raise_for_status()

        data = response.json()

        if data["success"]:
            return data["data"]
        else:
            raise Exception(f"API Error: {data.get('message', 'Unknown error')}")

    except requests.exceptions.RequestException as e:
        raise Exception(f"Request failed: {str(e)}")

# Usage
if __name__ == "__main__":
    ACCESS_KEY = "your-access-key-here"
    VIDEO_URL = "https://youtube.com/watch?v=dQw4w9WgXcQ"

    try:
        result = get_youtube_transcript(VIDEO_URL, ACCESS_KEY)

        print(f"Video URL: {result['url']}")
        print(f"Word Count: {result['wordCount']}")
        print(f"Segments: {result['segments']}")
        print(f"\nFull Transcript:\n{result['transcript']}")

        print("\n--- First 5 Segments ---")
        for segment in result['transcriptSegments'][:5]:
            print(f"[{segment['timestamp']}] {segment['text']}")

    except Exception as e:
        print(f"Error: {str(e)}")

Response Structure Explained

The API returns comprehensive transcript data:

Full Transcript Data

  • transcript: Complete text transcript of the video
  • wordCount: Total number of words in the transcript
  • segments: Total number of transcript segments

Timestamped Segments

Each segment in transcriptSegments contains:

  • text: The spoken text for this segment
  • start: Start time in seconds
  • duration: Duration of the segment in seconds
  • timestamp: Human-readable timestamp (MM:SS format)

Caching for Better Performance

Enable caching to speed up repeated requests:

// Cache for 1 hour (3600 seconds)
const result = await axios.get(endpoint, {
	params: {
		access_key: accessKey,
		url: videoUrl,
		cache: true,
		cache_ttl: 3600,
	},
});

Caching Benefits:

  • Faster response times for repeated requests
  • Reduced API costs
  • Better performance for batch processing
  • TTL range: 1 hour (3600s) to 1 month (2592000s)

Use Cases

Here are practical ways to use YouTube transcript extraction:

  1. Content Analysis: Extract keywords and topics from educational videos
  2. Accessibility: Generate accurate subtitles and captions
  3. Research: Analyze video content at scale for academic studies
  4. SEO: Convert video content into searchable text
  5. AI Training: Create datasets from YouTube educational content
  6. Translation: Source material for multi-language subtitles
  7. Sentiment Analysis: Analyze tone and messaging in video content
  8. Content Moderation: Detect policy violations in video speech
  9. YouTube Shorts Analysis: Extract transcripts from short-form content

YouTube Shorts Support

The API seamlessly handles YouTube Shorts URLs:

// Works with Shorts URLs
const shortsUrl = 'https://youtube.com/shorts/abc123';
const result = await getYouTubeTranscript(shortsUrl, ACCESS_KEY);

No special handling needed - the API automatically detects Shorts and returns the same structured data.

Try It Free First

Want to test transcript quality before integrating the API? Use our free YouTube Transcript Extractor tool to extract transcripts from any YouTube video or Short directly in your browser. No coding required!

Want to Build Your Own Scraper?

Prefer building your own YouTube transcript scraper with Puppeteer? Check out our comprehensive DIY guide:

How to Scrape YouTube Transcripts With Puppeteer: Complete step-by-step tutorial covering:

  • Setting up Puppeteer for YouTube scraping
  • Handling YouTube's dynamic content loading
  • Extracting timestamped transcript segments
  • Bypassing anti-bot detection measures
  • Error handling and retry logic
  • Code examples and best practices

This tutorial is perfect if you:

  • Want to learn how YouTube transcript scraping works under the hood
  • Need custom scraping logic for specific requirements
  • Prefer self-hosted solutions over APIs
  • Want to avoid API costs for low-volume projects

Keep in mind that building your own scraper means maintaining it as YouTube's interface changes, handling rate limits, and dealing with CAPTCHAs. For production use at scale, the API approach is more reliable and cost-effective.

Expand your YouTube data extraction capabilities:

Complete YouTube API Suite

SocialKit offers a comprehensive suite of YouTube APIs:

All APIs work seamlessly with both regular YouTube videos and YouTube Shorts.

Conclusion

Extracting YouTube transcripts doesn't have to be complicated. With SocialKit's YouTube Transcript API, you can skip the complexity of web scraping, browser automation, and anti-bot detection.

Get your free API access key and start extracting YouTube transcripts in minutes. Whether you're building accessibility tools, conducting research, or analyzing content at scale, you'll have reliable transcript data at your fingertips.

Works perfectly with YouTube videos and Shorts - one API for all your transcript needs!

Happy coding! 🚀