Back to all posts

How to Scrape YouTube Comments With Puppeteer

Jonathan Geiger
web-scrapingpuppeteeryoutubetutorialcomments

YouTube comments are a goldmine for community insights, sentiment, and market research. In this guide, you’ll learn how to scrape YouTube comments using Puppeteer, including support for YouTube Shorts. We’ll cover setup, scrolling and sorting, extracting rich metadata (author, likes, replies), and robust techniques for dynamic content.

Prerequisites

Before you start, make sure you have:

  • Node.js (v14 or higher)
  • Familiarity with JavaScript and async/await
  • Basic understanding of DOM selectors and browser automation

Setting Up the Project

mkdir youtube-comments-scraper
cd youtube-comments-scraper
npm init -y

Install dependencies:

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Enable stealth to reduce bot-detection risk.

Basic Implementation

This minimal version scrolls to the comments, extracts the first N entries, and returns core fields.

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

async function scrapeBasicComments(url, limit = 10) {
  const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 1024 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

    // Scroll to comments and wait
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
    await page.waitForSelector('#comments', { timeout: 15000 });
    await page.waitForTimeout(2000);

    const comments = await page.evaluate((max) => {
      const items = Array.from(document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer')).slice(0, max);
      return items.map((el, index) => {
        const author = el.querySelector('#author-text span')?.textContent?.trim() || 'Unknown';
        const text = el.querySelector('#content-text')?.textContent?.trim() || '';
        const likesText = el.querySelector('#vote-count-middle')?.textContent || '0';
        const likes = parseInt(likesText.replace(/[^\d]/g, ''), 10) || 0;
        const relativeTime = el.querySelector('#published-time-text')?.textContent?.trim() || '';
        const avatar = el.querySelector('#author-thumbnail img')?.src || '';
        const replyBtn = el.querySelector('#replies button');
        const replyCount = replyBtn && replyBtn.textContent ? (parseInt((replyBtn.textContent.match(/(\d+)/) || [0])[0], 10) || 0) : 0;
        return { author, text, likes, relativeTime, avatar, replyCount, position: index + 1 };
      }).filter(c => c.text);
    }, limit);

    return comments;
  } finally {
    await browser.close();
  }
}

// Usage
// scrapeBasicComments('https://www.youtube.com/watch?v=dQw4w9WgXcQ', 10).then(console.log);

Advanced Implementation with Sorting and Infinite Scroll

The following robust function supports sorting by top or new, infinite scrolling, deduplication, reply counts, and creator-heart detection. Works for standard videos and Shorts.

async function scrapeYouTubeComments(page, limit = 20, sortBy = 'new') {
 console.log(`Starting to scrape YouTube comments, limit: ${limit}, sortBy: ${sortBy}`);
 
 try {
  // First scroll down to make sure comments section is visible
  await page.evaluate(() => {
   window.scrollTo(0, document.body.scrollHeight / 3);
  });
  await page.waitForTimeout(2000);

  // Wait for comments to load
  await page.waitForSelector('#comments', { timeout: 15000 });
  await page.waitForTimeout(2000);

  // Handle sort dropdown if sortBy is specified
  if (sortBy === 'top' || sortBy === 'new') {
   try {
    console.log(`Setting sort to: ${sortBy}`);
    
    // Find and click the sort dropdown trigger
    await page.waitForSelector('#comments #trigger', { timeout: 5000 });
    
    // Check current sort state
    const currentSort = await page.evaluate(() => {
     const trigger = document.querySelector('#comments #trigger');
     if (trigger && trigger.textContent) {
      const text = trigger.textContent.toLowerCase();
      if (text.includes('newest') || text.includes('new')) return 'new';
      if (text.includes('top')) return 'top';
     }
     return 'unknown';
    });

    console.log(`Current sort: ${currentSort}, wanted: ${sortBy}`);

    // Only click if we need to change the sort
    if (currentSort !== sortBy) {
     // Click the dropdown trigger
     await page.click('#comments #trigger');
     await page.waitForTimeout(1000);

     // Find and click the appropriate sort option
     const sortOption = sortBy === 'new' ? 'Newest first' : 'Top comments';
     
     await page.evaluate((targetSort) => {
      const menuItems = document.querySelectorAll('#comments tp-yt-paper-listbox tp-yt-paper-item');
      for (const item of menuItems) {
       const text = item.textContent.trim();
       if ((targetSort === 'new' && (text.includes('Newest') || text.includes('newest'))) ||
        (targetSort === 'top' && (text.includes('Top') || text.includes('top')))) {
        item.click();
        console.log(`Clicked sort option: ${text}`);
        return;
       }
      }
     }, sortBy);

     // Wait for comments to reload with new sort
     await page.waitForTimeout(3000);
     console.log(`Sort changed to: ${sortBy}`);
    }
   } catch (e) {
    console.log('Could not change sort order, using default:', e.message);
   }
  }

  let comments = [];
  let previousCommentsCount = 0;
  let noNewCommentsCount = 0;
  const maxScrollAttempts = 50;
  let scrollAttempts = 0;

  while (comments.length < limit && scrollAttempts < maxScrollAttempts) {
   // Extract current comments
   const currentComments = await page.evaluate(() => {
    const commentElements = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
    const comments = [];

    commentElements.forEach((commentEl, index) => {
     try {
      const authorElement = commentEl.querySelector('#author-text span');
      const textElement = commentEl.querySelector('#content-text');
      const likesElement = commentEl.querySelector('#vote-count-middle');
      const timeElement = commentEl.querySelector('#published-time-text');
      const avatarElement = commentEl.querySelector('#author-thumbnail img');
      const heartElement = commentEl.querySelector('#heart-button[aria-pressed="true"]');

      // Get reply count
      const replyButton = commentEl.querySelector('#replies button');
      let replyCount = 0;
      if (replyButton && replyButton.textContent) {
       const match = replyButton.textContent.match(/(\d+)/);
       replyCount = match ? parseInt(match[1], 10) : 0;
      }

      const comment = {
      id: `comment-${index}`,
      author: authorElement ? authorElement.textContent.trim() : 'Unknown',
      text: textElement ? textElement.textContent.trim() : '',
      likes: likesElement ? parseInt(likesElement.textContent.replace(/[^\d]/g, '')) || 0 : 0,
      relativeTime: timeElement ? timeElement.textContent.trim() : '',
      avatar: avatarElement ? avatarElement.src : '',
      isCreatorHearted: !!heartElement,
      replyCount: replyCount,
      position: index + 1
     };

      if (comment.text) {
       comments.push(comment);
      }
     } catch (e) {
      console.log('Error extracting comment:', e);
     }
    });

    return comments;
   });

   // Update comments array with new unique comments
   const existingIds = new Set(comments.map(c => c.author + c.text + c.relativeTime));
   const newComments = currentComments.filter(c => 
    !existingIds.has(c.author + c.text + c.relativeTime)
   );
   comments.push(...newComments);

   console.log(`Extracted ${currentComments.length} comments, ${newComments.length} new, total: ${comments.length}`);

   // Check if we have enough comments
   if (comments.length >= limit) {
    break;
   }

   // Check if no new comments were loaded
   if (currentComments.length === previousCommentsCount) {
    noNewCommentsCount++;
    if (noNewCommentsCount >= 3) {
     console.log('No new comments loaded after 3 attempts, stopping');
     break;
    }
   } else {
    noNewCommentsCount = 0;
   }

   previousCommentsCount = currentComments.length;

   // Scroll down to load more comments
   await page.evaluate(() => {
    const commentsSection = document.querySelector('#comments');
    if (commentsSection) {
     commentsSection.scrollIntoView({ behavior: 'smooth', block: 'end' });
    }
    window.scrollBy(0, 1000);
   });

   await page.waitForTimeout(2000);
   scrollAttempts++;
  }

  // Trim to limit and add index, convert relative time to date
  const finalComments = comments.slice(0, limit).map((comment, index) => ({
   ...comment,
   index: index + 1,
   id: `comment-${index + 1}`,
   date: comment.relativeTime ? convertRelativeTimeToDate(comment.relativeTime) : new Date().toISOString()
  }));

  console.log(`Scraped ${finalComments.length} comments total`);
  return finalComments;

 } catch (error) {
  console.error('Error scraping YouTube comments:', error);
  return [];
 }
}

Helper to convert “x hours/days ago” to ISO date:

function convertRelativeTimeToDate(relative) {
  try {
    const text = relative.toLowerCase();
    const now = new Date();
    const match = text.match(/(\d+)\s*(second|minute|hour|day|week|month|year)s?\s*ago/);
    if (!match) return now.toISOString();
    const amount = parseInt(match[1], 10);
    const unit = match[2];
    const ms = {
      second: 1000,
      minute: 60 * 1000,
      hour: 60 * 60 * 1000,
      day: 24 * 60 * 60 * 1000,
      week: 7 * 24 * 60 * 60 * 1000,
      month: 30 * 24 * 60 * 60 * 1000,
      year: 365 * 24 * 60 * 60 * 1000,
    }[unit] || 0;
    return new Date(now.getTime() - amount * ms).toISOString();
  } catch {
    return new Date().toISOString();
  }
}

Putting it together

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 1024 });
  const url = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
  await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

  const comments = await scrapeYouTubeComments(page, 25, 'top');
  console.log(JSON.stringify(comments.slice(0, 5), null, 2));

  await browser.close();
})();

Handling Common Issues

  1. Cookie banners: Close consent dialogs if present before interacting with the page.
  2. Layout changes: YouTube updates selectors frequently. Keep multiple selector fallbacks where possible.
  3. Dynamic loading: Use incremental scrolling and short waits between loads to capture more comments.
  4. Rate limiting: Add random delays, rotate user agents, and consider proxies for large-scale scraping.

Alternative: Use the SocialKit YouTube Comments API

Offload maintenance and scale by using the managed API. Supports standard videos and Shorts, sorting, limits, and rich fields.

curl "https://api.socialkit.dev/youtube/comments?access_key=YOUR_ACCESS_KEY&url=https://youtube.com/watch?v=dQw4w9WgXcQ&limit=5&sortBy=top"

Example response (truncated):

{
  "success": true,
  "data": {
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "comments": [
      {
        "author": "@YouTube",
        "text": "can confirm: he never gave us up",
        "likes": 88,
        "date": "",
        "avatar": "https://yt3.ggpht.com/...",
        "replyCount": 1,
        "position": 1
      }
    ]
  }
}

Free YouTube Tools

If you want fast results without coding, try our free tools:

YouTube Video Summarizer

Get AI-powered summaries with our free YouTube Video Summarizer:

  • Generate summaries for any YouTube video or Shorts
  • Extract key insights and topics
  • Instant results with no setup

YouTube Transcript Extractor

Extract accurate transcripts with our free YouTube Transcript Extractor:

  • Timestamped segments for easy reference
  • Copy individual segments or full transcript
  • Great for accessibility and analysis

Conclusion

Scraping YouTube comments with Puppeteer enables powerful analysis for community insights, moderation, and market research. For production use or large-scale workloads, consider the managed SocialKit YouTube Comments API to save time and ensure reliability.

More YouTube scraping tutorials

I write a lot about scraping. If you’re interested in more YouTube scraping tutorials, here are a few to follow next: