How to Scrape YouTube Shorts Video Transcripts With Puppeteer

July 11, 2025•Jonathan Geiger

web-scrapingpuppeteeryoutube-shortstutorialyoutube

YouTube Shorts have revolutionized short-form video content, but extracting transcripts from these vertical videos presents unique challenges. Unlike regular YouTube videos, Shorts use a different URL structure that requires special handling when automating transcript extraction.

In this comprehensive guide, we'll explore how to scrape YouTube Shorts transcripts using Puppeteer, with a focus on the critical URL conversion technique that makes this process possible. Whether you're analyzing trending short-form content or building accessibility features, this tutorial provides everything you need to extract transcripts from YouTube Shorts automatically.

Why YouTube Shorts Transcripts Matter

YouTube Shorts generate billions of views daily, making them a goldmine for:

Content trend analysis - Understanding viral short-form content patterns
Accessibility compliance - Providing text alternatives for hearing-impaired users
Market research - Analyzing competitor short-form content strategies
SEO optimization - Extracting keywords from popular short videos
Content repurposing - Converting video content to written formats

Prerequisites

Before we dive into scraping YouTube Shorts transcripts, ensure you have:

Node.js installed (version 14 or higher)
Basic understanding of JavaScript and async/await
Familiarity with DOM manipulation and CSS selectors
Knowledge of regular expressions for URL parsing
Understanding of browser automation concepts

Understanding YouTube Shorts URL Structure

The key difference between regular YouTube videos and Shorts lies in their URL structure:

Regular YouTube video: https://www.youtube.com/watch?v=DS4OsxHR9EQ
YouTube Shorts: https://www.youtube.com/shorts/DS4OsxHR9EQ

To extract transcripts from YouTube Shorts, we need to convert the Shorts URL to the regular YouTube format, as the transcript functionality is only available in the standard video player interface.

URL Conversion Pattern

// Regex pattern to extract video ID from Shorts URL
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;

// Example conversion
const shortsUrl = 'https://www.youtube.com/shorts/DS4OsxHR9EQ';
const videoId = shortsUrl.match(shortsPattern)[1]; // 'DS4OsxHR9EQ'
const regularUrl = `https://www.youtube.com/watch?v=${videoId}`;

Setting Up the Project

Create a new project directory and install dependencies:

mkdir youtube-shorts-scraper
cd youtube-shorts-scraper
npm init -y

Install the required packages:

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Basic YouTube Shorts Transcript Scraper

Here's a complete implementation that handles YouTube Shorts URL conversion and transcript extraction using the same methods as regular YouTube videos:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Use stealth plugin to avoid detection
puppeteer.use(StealthPlugin());

const convertShortsToRegularUrl = (url) => {
  // Handle both regular YouTube URLs and Shorts URLs
  const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
  const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
  
  if (shortsPattern.test(url)) {
    // Extract video ID from Shorts URL
    const videoId = url.match(shortsPattern)[1];
    return `https://www.youtube.com/watch?v=${videoId}`;
  } else if (regularPattern.test(url)) {
    // Already a regular YouTube URL
    return url;
  } else {
    throw new Error('Invalid YouTube URL format');
  }
};

const scrapeYouTubeShortsTranscript = async (shortsUrl) => {
  // Convert Shorts URL to regular YouTube URL
  const url = convertShortsToRegularUrl(shortsUrl);
  console.log(`Converting: ${shortsUrl} -> ${url}`);

  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(2000);

    // Handle cookie banner (EU compliance)
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
          console.log('Closed cookie banner');
        }
      });
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Click transcript button
    await page.waitForSelector('ytd-video-description-transcript-section-renderer button', { timeout: 10000 });
    await page.evaluate(() => {
      const transcriptButton = document.querySelector('ytd-video-description-transcript-section-renderer button');
      if (transcriptButton) {
        transcriptButton.click();
        console.log('Clicked transcript button');
      }
    });

    await page.waitForTimeout(2000);

    // Extract transcript text
    const transcriptText = await page.evaluate(() => {
      const segments = Array.from(document.querySelectorAll('#segments-container yt-formatted-string'));
      return segments.map(element => element.textContent?.trim()).filter(text => text && text.length > 0);
    });

    return transcriptText.join(' ');

  } catch (error) {
    console.error('Error scraping transcript:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const shortsUrl = 'https://www.youtube.com/shorts/DS4OsxHR9EQ';
scrapeYouTubeShortsTranscript(shortsUrl)
  .then(transcript => console.log('Transcript:', transcript))
  .catch(error => console.error('Failed to scrape transcript:', error));

Advanced YouTube Shorts Scraper with Error Handling

The basic implementation works for most videos, but YouTube's interface can vary. Here's a more robust version with comprehensive error handling that uses the same methods as regular YouTube transcript scraping:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const convertShortsToRegularUrl = (url) => {
  const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
  const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
  
  if (shortsPattern.test(url)) {
    const videoId = url.match(shortsPattern)[1];
    return `https://www.youtube.com/watch?v=${videoId}`;
  } else if (regularPattern.test(url)) {
    return url;
  } else {
    throw new Error('Invalid YouTube URL format');
  }
};

const formatTimestamp = (seconds) => {
  const mins = Math.floor(seconds / 60);
  const secs = seconds % 60;
  return `${mins.toString().padStart(2, '0')}:${secs.toString().padStart(2, '0')}`;
};

const scrapeYouTubeShortsTranscriptAdvanced = async (shortsUrl) => {
  // Convert Shorts URL to regular YouTube URL
  const url = convertShortsToRegularUrl(shortsUrl);
  console.log(`Converting: ${shortsUrl} -> ${url}`);

  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 1024,
      deviceScaleFactor: 1,
    });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(2000);

    // Try to close cookie banner if present
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
          console.log('Closed cookie banner');
        }
      });
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Click transcript button with multiple fallback selectors
    try {
      await page.waitForSelector('ytd-video-description-transcript-section-renderer button', { timeout: 10000 });
      await page.evaluate(() => {
        const transcriptButton = document.querySelector('ytd-video-description-transcript-section-renderer button');
        if (transcriptButton) {
          transcriptButton.click();
          console.log('Clicked transcript button');
        }
      });
      await page.waitForTimeout(2000);
    } catch (e) {
      console.log('Transcript button not found, trying alternative selectors');
      
      // Try alternative selectors
      try {
        await page.evaluate(() => {
          const selectors = [
            'button[aria-label*="transcript"]',
            'button[aria-label*="Transcript"]',
            '[data-target-id="engagement-panel-transcript"] button',
            '#transcript-button',
            'button[aria-label*="Show transcript"]'
          ];
          
          for (const selector of selectors) {
            const button = document.querySelector(selector);
            if (button) {
              button.click();
              console.log('Clicked transcript button with selector:', selector);
              return;
            }
          }
        });
        await page.waitForTimeout(2000);
      } catch (e2) {
        console.log('Alternative transcript button selectors failed');
      }
    }

    // Extract transcript text with multiple fallback selectors
    let transcriptText = await page.evaluate(() => {
      const segments = Array.from(document.querySelectorAll('#segments-container yt-formatted-string'));
      return segments.map(element => element.textContent?.trim()).filter(text => text && text.length > 0);
    });

    if (transcriptText.length === 0) {
      // Try alternative transcript selectors
      const alternativeTranscript = await page.evaluate(() => {
        const selectors = [
          '#segments-container span',
          '#segments-container div',
          '[data-target-id="engagement-panel-transcript"] span',
          '[data-target-id="engagement-panel-transcript"] div',
          '.ytd-transcript-segment-renderer span',
          '.ytd-transcript-segment-renderer div'
        ];
        
        for (const selector of selectors) {
          const elements = document.querySelectorAll(selector);
          if (elements.length > 0) {
            const texts = Array.from(elements).map(el => el.textContent?.trim()).filter(text => text && text.length > 0);
            if (texts.length > 0) {
              return texts;
            }
          }
        }
        return [];
      });

      if (alternativeTranscript.length === 0) {
        throw new Error('No transcript available for this video.');
      }
      
      transcriptText = alternativeTranscript;
    }

    // Format transcript with timestamps
    const transcript = transcriptText.map((text, index) => ({
      text,
      start: index * 5,
      duration: 5,
      timestamp: formatTimestamp(index * 5),
    }));

    const fullText = transcript.map(entry => entry.text).join(' ');
    
    return {
      url,
      transcript,
      fullText,
      wordCount: fullText.split(' ').length,
      segments: transcript.length,
    };

  } catch (error) {
    console.error('Error scraping transcript:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const shortsUrl = 'https://www.youtube.com/shorts/DS4OsxHR9EQ';
scrapeYouTubeShortsTranscriptAdvanced(shortsUrl)
  .then(result => {
    console.log('Full transcript:', result.fullText);
    console.log('Word count:', result.wordCount);
    console.log('Segments:', result.segments);
    console.log('First few segments:', result.transcript.slice(0, 5));
  })
  .catch(error => console.error('Failed to scrape transcript:', error));

Handling Common Issues

Since YouTube Shorts use the same transcript interface as regular videos after URL conversion, they face the same challenges:

YouTube shows cookie consent banners in EU regions. Our code handles this by looking for buttons with "cookies" in their aria-label and clicking them automatically.

2. Transcript Button Variations

YouTube's interface changes frequently. We use multiple selectors to find the transcript button, ensuring compatibility across different layouts.

3. Dynamic Content Loading

Transcripts are loaded dynamically. We use waitForTimeout() to ensure content is fully loaded before attempting to extract it.

4. Rate Limiting

To avoid being blocked, consider:

Adding random delays between requests
Using residential proxies
Implementing retry logic with exponential backoff

Best Practices

Respect robots.txt: Always check YouTube's robots.txt file
Implement caching: Store transcripts locally to avoid repeated requests
Error handling: Always wrap your scraping code in try-catch blocks
User agent rotation: Use different user agents to appear more natural
Respect rate limits: Don't overwhelm YouTube's servers
URL validation: Always validate and convert Shorts URLs before processing

Alternative: Using SocialKit YouTube Transcript API

While web scraping works, managing browser instances and handling YouTube's changing interface can be complex. For production applications, consider using SocialKit's YouTube Transcript API, which handles both regular videos and Shorts automatically:

curl "https://api.socialkit.dev/youtube/transcript?access_key=YOUR_ACCESS_KEY&url=https://youtube.com/shorts/DS4OsxHR9EQ"

Example Response for YouTube Shorts

{
  "success": true,
  "data": {
    "url": "https://youtube.com/shorts/DS4OsxHR9EQ",
    "transcript": "Hey everyone! Today I'm showing you this amazing quick tip that will change everything. Watch this transformation happen in just 30 seconds. Isn't that incredible? Make sure to follow for more!",
    "transcriptSegments": [
      {
        "text": "Hey everyone! Today I'm showing you this amazing quick tip",
        "start": 0,
        "duration": 4,
        "timestamp": "00:00"
      },
      {
        "text": "that will change everything. Watch this transformation",
        "start": 4,
        "duration": 3,
        "timestamp": "00:04"
      },
      {
        "text": "happen in just 30 seconds. Isn't that incredible?",
        "start": 7,
        "duration": 4,
        "timestamp": "00:07"
      },
      {
        "text": "Make sure to follow for more!",
        "start": 11,
        "duration": 2,
        "timestamp": "00:11"
      }
    ],
    "wordCount": 32,
    "segments": 4
  }
}

Benefits of Using SocialKit API:

Automatic URL handling: Works with both Shorts and regular YouTube URLs
No browser management: Eliminates Puppeteer complexity and resource usage
Consistent reliability: Built-in handling of YouTube's interface changes
Scale-ready infrastructure: Process thousands of videos without rate limits
Structured data: Returns properly formatted timestamps and segments
Global availability: Works worldwide without geo-restrictions

Free YouTube Tools

Need quick access to YouTube content without building your own scraper? Try our free tools:

YouTube Video Summarizer Tool

Get AI-powered summaries with our free YouTube Video Summarizer tool:

Generate AI-powered summaries of any YouTube video or YouTube Shorts
Extract key insights including main topics, key points, and important quotes
Analyze video tone and identify target audience
Get instant results without any setup or API keys required

Try the Free YouTube Video Summarizer

YouTube Transcript Extractor Tool

Extract accurate transcripts with our free YouTube Transcript Extractor tool:

Extract accurate transcripts from any YouTube video or YouTube Shorts
Get timestamped segments for easy navigation and reference
Copy individual segments or the complete transcript
Perfect for accessibility and content analysis
100% free with no registration required

Try the Free YouTube Transcript Extractor

Both tools automatically handle YouTube Shorts URLs and are perfect for content creators, students, researchers, and anyone who wants to quickly extract valuable information from YouTube content.

Conclusion

Scraping YouTube Shorts transcripts with Puppeteer requires understanding the unique URL structure and conversion process that transforms Shorts URLs into regular YouTube video URLs. While the technical implementation involves careful handling of dynamic content and various edge cases, the ability to extract transcripts from short-form video content opens up powerful possibilities for content analysis, accessibility, and automation.

The key to success lies in the URL conversion technique - transforming youtube.com/shorts/VIDEO_ID to youtube.com/watch?v=VIDEO_ID - which allows access to YouTube's standard transcript interface. Combined with robust error handling and rate limiting, this approach enables reliable extraction from one of the web's fastest-growing content formats.

For production applications or when dealing with large volumes of videos, consider using a dedicated API service like SocialKit's YouTube Transcript API to ensure reliability and save development time. The API automatically handles both Shorts and regular YouTube videos, eliminating the need for manual URL conversion.

If you're looking for quick ways to extract information from video content without coding, check out our free tools: YouTube Video Summarizer for instant AI-powered summaries and YouTube Transcript Extractor for accurate timestamped transcripts.

Remember to always respect YouTube's terms of service, implement appropriate rate limiting, and consider the ethical implications of automated content extraction. Happy scraping!

How to Scrape YouTube Shorts Video Transcripts With Puppeteer

Why YouTube Shorts Transcripts Matter

Prerequisites

Understanding YouTube Shorts URL Structure

URL Conversion Pattern

Setting Up the Project

Basic YouTube Shorts Transcript Scraper

Advanced YouTube Shorts Scraper with Error Handling

Handling Common Issues

2. Transcript Button Variations

3. Dynamic Content Loading

4. Rate Limiting

Best Practices

Alternative: Using SocialKit YouTube Transcript API

Example Response for YouTube Shorts

Benefits of Using SocialKit API:

Free YouTube Tools

YouTube Video Summarizer Tool

YouTube Transcript Extractor Tool

Conclusion

You Might Also Like

How to Scrape YouTube Transcripts With Puppeteer

Best YouTube Transcript APIs in 2025

Best YouTube Summarizers in 2025

How to Scrape YouTube Shorts Video Transcripts With Puppeteer

Why YouTube Shorts Transcripts Matter

Prerequisites

Understanding YouTube Shorts URL Structure

URL Conversion Pattern

Setting Up the Project

Basic YouTube Shorts Transcript Scraper

Advanced YouTube Shorts Scraper with Error Handling

Handling Common Issues

1. Cookie Banners

2. Transcript Button Variations

3. Dynamic Content Loading

4. Rate Limiting

Best Practices

Alternative: Using SocialKit YouTube Transcript API

Example Response for YouTube Shorts

Benefits of Using SocialKit API:

Free YouTube Tools

YouTube Video Summarizer Tool

YouTube Transcript Extractor Tool

Conclusion

You Might Also Like

How to Scrape YouTube Transcripts With Puppeteer

Best YouTube Transcript APIs in 2025

Best YouTube Summarizers in 2025