How to Scrape YouTube Videos: Complete Guide

September 7, 2025•Jonathan Geiger

youtubeweb-scrapingpuppeteerapitutorialvideo-datadeveloper-tools

YouTube video data extraction is a common requirement for developers building analytics tools, content management systems, or research applications. This guide shows you how to scrape YouTube video metadata, comments, and transcripts using Puppeteer with practical code examples.

We'll cover three main data types you can extract from YouTube videos: video details (title, views, channel info), comments (with sorting and metadata), and transcripts (timestamped captions). Each section includes working code and links to comprehensive tutorials for advanced implementations.

Prerequisites

Before starting, ensure you have:

Node.js installed (version 14 or higher)
Basic knowledge of JavaScript and async/await
Understanding of DOM manipulation and CSS selectors
Familiarity with browser automation concepts

Setting Up the Project

Create a new project and install dependencies:

mkdir youtube-scraper
cd youtube-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

The stealth plugin helps avoid detection by making automated browser behavior appear more natural.

Extracting Video Details

Video details include metadata like title, view count, upload date, and channel information. Here's a basic implementation:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

async function scrapeVideoDetails(url) {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    
    // Wait for content to load
    await page.waitForTimeout(3000);
    
    const videoDetails = await page.evaluate(() => {
      // Extract video title
      const titleElement = document.querySelector('h1.ytd-video-primary-info-renderer yt-formatted-string');
      const title = titleElement ? titleElement.textContent.trim() : '';

      // Extract view count
      const viewsElement = document.querySelector('yt-view-count-renderer .view-count');
      const viewsText = viewsElement ? viewsElement.textContent : '';
      
      // Extract channel name
      const channelElement = document.querySelector('ytd-video-owner-renderer .ytd-channel-name a');
      const channelName = channelElement ? channelElement.textContent.trim() : '';
      
      // Extract upload date
      const dateElement = document.querySelector('#info-strings yt-formatted-string');
      const uploadDate = dateElement ? dateElement.textContent : '';

      return {
        title,
        views: viewsText,
        channelName,
        uploadDate,
        url: window.location.href
      };
    });

    return videoDetails;
  } finally {
    await browser.close();
  }
}

// Usage
// scrapeVideoDetails('https://www.youtube.com/watch?v=dQw4w9WgXcQ').then(console.log);

This basic implementation extracts core video metadata. For production use, you'll need more robust selectors, error handling, and numeric parsing for view counts.

For advanced video details extraction: How to Extract YouTube Video Details Using Puppeteer - includes advanced selectors, like/dislike parsing, duration extraction, and comprehensive error handling.

Extracting Comments

YouTube comments require scrolling to load more content and handling dynamic loading. Here's a simple implementation:

async function scrapeComments(url, limit = 10) {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 1024 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

    // Scroll to comments section
    await page.evaluate(() => {
      window.scrollTo(0, document.body.scrollHeight / 3);
    });
    
    // Wait for comments to load
    await page.waitForSelector('#comments', { timeout: 15000 });
    await page.waitForTimeout(3000);

    const comments = await page.evaluate((maxComments) => {
      const commentElements = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
      const results = [];

      for (let i = 0; i < Math.min(commentElements.length, maxComments); i++) {
        const element = commentElements[i];
        
        const authorElement = element.querySelector('#author-text span');
        const textElement = element.querySelector('#content-text');
        const likesElement = element.querySelector('#vote-count-middle');
        const timeElement = element.querySelector('#published-time-text');

        const comment = {
          author: authorElement ? authorElement.textContent.trim() : 'Unknown',
          text: textElement ? textElement.textContent.trim() : '',
          likes: likesElement ? parseInt(likesElement.textContent.replace(/[^\d]/g, '')) || 0 : 0,
          time: timeElement ? timeElement.textContent.trim() : '',
          position: i + 1
        };

        if (comment.text) {
          results.push(comment);
        }
      }

      return results;
    }, limit);

    return comments;
  } finally {
    await browser.close();
  }
}

// Usage
// scrapeComments('https://www.youtube.com/watch?v=dQw4w9WgXcQ', 15).then(console.log);

This extracts the first set of loaded comments. For more comments, you need infinite scrolling and deduplication.

For advanced comment scraping: How to Scrape YouTube Comments With Puppeteer - includes infinite scrolling, comment sorting by top/new, reply counts, and creator hearts detection.

Extracting Transcripts

YouTube transcripts require clicking the transcript button and parsing timestamped content:

async function scrapeTranscript(url) {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    
    await page.waitForTimeout(3000);

    // Look for transcript button and click it
    await page.evaluate(() => {
      const buttons = Array.from(document.querySelectorAll('button'));
      const transcriptButton = buttons.find(button => 
        button.textContent && button.textContent.toLowerCase().includes('transcript')
      );
      if (transcriptButton) {
        transcriptButton.click();
      }
    });

    // Wait for transcript panel to load
    await page.waitForTimeout(2000);

    const transcript = await page.evaluate(() => {
      const transcriptContainer = document.querySelector('#segments-container');
      if (!transcriptContainer) return null;

      const segments = transcriptContainer.querySelectorAll('ytd-transcript-segment-renderer');
      const results = [];

      segments.forEach((segment, index) => {
        const timestampElement = segment.querySelector('.ytd-transcript-segment-renderer[role="button"] .timestamp');
        const textElement = segment.querySelector('.ytd-transcript-segment-renderer[role="button"] .segment-text');

        if (timestampElement && textElement) {
          results.push({
            timestamp: timestampElement.textContent.trim(),
            text: textElement.textContent.trim(),
            index: index + 1
          });
        }
      });

      return results;
    });

    return transcript;
  } finally {
    await browser.close();
  }
}

// Usage
// scrapeTranscript('https://www.youtube.com/watch?v=dQw4w9WgXcQ').then(console.log);

This basic transcript extraction handles the common case where transcripts are available.

For advanced transcript scraping: How to Scrape YouTube Transcripts With Puppeteer - includes multiple language support, auto-generated vs manual detection, and robust error handling.

API Alternative for Production Use

While Puppeteer gives you complete control, scraping at scale requires handling rate limits, browser management, and YouTube's frequent interface changes. For production applications, consider using the SocialKit YouTube APIs:

// Simple API calls - no browser management needed
const videoDetails = await fetch('https://api.socialkit.dev/youtube/stats?url=VIDEO_URL&access_key=KEY');
const comments = await fetch('https://api.socialkit.dev/youtube/comments?url=VIDEO_URL&limit=100&access_key=KEY');
const transcript = await fetch('https://api.socialkit.dev/youtube/transcript?url=VIDEO_URL&access_key=KEY');

API benefits: Reliable infrastructure, automatic updates, rate limiting handled, 20 free requests monthly.

Conclusion

This guide provides working examples for scraping YouTube video data with Puppeteer. Start with these basic implementations and refer to the detailed tutorials for production-ready solutions with advanced features.

For learning and experimentation, Puppeteer gives you complete control. For production applications requiring reliability and scale, consider using specialized APIs that handle the complexity of YouTube scraping automatically.

How to Scrape YouTube Videos: Complete Guide

Prerequisites

Setting Up the Project

Extracting Video Details

Extracting Comments

Extracting Transcripts

API Alternative for Production Use

Conclusion

You Might Also Like

SocialKit Make.com Integration: Automate YouTube APIs with Make

Best TikTok Data APIs in 2025

Best YouTube Shorts Transcript APIs in 2025