How to Scrape YouTube Shorts Comments With Puppeteer

August 22, 2025•Jonathan Geiger

web-scrapingpuppeteeryoutubeyoutube-shortstutorialcomments

YouTube Shorts are wildly popular, but scraping comments from Shorts requires one key trick: converting the Shorts URL to the regular YouTube watch URL. Once converted, you can use the same robust Puppeteer techniques to scroll, sort, and extract comments just like a standard video.

In this guide, you’ll learn how to scrape YouTube Shorts comments with:

URL conversion (/shorts/VIDEO_ID → /watch?v=VIDEO_ID)
Sorting by Top or New
Infinite scrolling and deduplication
Extracting rich fields: author, text, likes, avatar, relative time, replies, creator heart

Prerequisites

Node.js v16+ recommended
Basic Puppeteer knowledge
Comfort with DOM selectors

Project Setup

mkdir youtube-shorts-comments-scraper
cd youtube-shorts-comments-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Enable stealth to reduce bot-detection risk.

Shorts URL Conversion

Shorts pages don’t expose the same metadata interface. Convert to the standard watch URL first.

const convertShortsToRegularUrl = (url) => {
  const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
  const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;

  if (shortsPattern.test(url)) {
    const videoId = url.match(shortsPattern)[1];
    return `https://www.youtube.com/watch?v=${videoId}`;
  } else if (regularPattern.test(url)) {
    return url;
  }
  throw new Error('Invalid YouTube URL format');
};

// Example
// convertShortsToRegularUrl('https://www.youtube.com/shorts/ABC123xyz')
// → 'https://www.youtube.com/watch?v=ABC123xyz'

Basic Shorts Comments Scraper

This minimal version converts the Shorts URL and extracts the first N comments.

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

async function scrapeShortsCommentsBasic(shortsUrl, limit = 10) {
  const url = convertShortsToRegularUrl(shortsUrl);
  const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 1024 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

    // Scroll to comments and wait
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
    await page.waitForSelector('#comments', { timeout: 15000 });
    await page.waitForTimeout(2000);

    const comments = await page.evaluate((max) => {
      const items = Array.from(document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer')).slice(0, max);
      return items.map((el, index) => {
        const author = el.querySelector('#author-text span')?.textContent?.trim() || 'Unknown';
        const text = el.querySelector('#content-text')?.textContent?.trim() || '';
        const likesText = el.querySelector('#vote-count-middle')?.textContent || '0';
        const likes = parseInt(likesText.replace(/[^\d]/g, ''), 10) || 0;
        const relativeTime = el.querySelector('#published-time-text')?.textContent?.trim() || '';
        const avatar = el.querySelector('#author-thumbnail img')?.src || '';
        const replyBtn = el.querySelector('#replies button');
        const replyCount = replyBtn && replyBtn.textContent ? (parseInt((replyBtn.textContent.match(/(\d+)/) || [0])[0], 10) || 0) : 0;
        return { author, text, likes, relativeTime, avatar, replyCount, position: index + 1 };
      }).filter(c => c.text);
    }, limit);

    return comments;
  } finally {
    await browser.close();
  }
}

// Usage
// scrapeShortsCommentsBasic('https://www.youtube.com/shorts/ABC123xyz', 10).then(console.log);

Advanced Shorts Comments Scraper (Sorting + Infinite Scroll)

This version supports sorting by top or new, infinite scroll, deduplication, and creator-heart detection.

async function scrapeYouTubeShortsComments(page, shortsUrl, limit = 20, sortBy = 'new') {
  const url = convertShortsToRegularUrl(shortsUrl);
  console.log(`Shorts → watch conversion: ${shortsUrl} -> ${url}`);

  try {
    await page.setViewport({ width: 1280, height: 1024 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });

    // Ensure comments are visible
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
    await page.waitForSelector('#comments', { timeout: 15000 });
    await page.waitForTimeout(2000);

    // Optional: change sort order
    if (sortBy === 'top' || sortBy === 'new') {
      try {
        await page.waitForSelector('#comments #trigger', { timeout: 5000 });
        const currentSort = await page.evaluate(() => {
          const trigger = document.querySelector('#comments #trigger');
          const text = trigger?.textContent?.toLowerCase() || '';
          if (text.includes('newest') || text.includes('new')) return 'new';
          if (text.includes('top')) return 'top';
          return 'unknown';
        });
        if (currentSort !== sortBy) {
          await page.click('#comments #trigger');
          await page.waitForTimeout(800);
          await page.evaluate((targetSort) => {
            const items = document.querySelectorAll('#comments tp-yt-paper-listbox tp-yt-paper-item');
            for (const item of items) {
              const text = item.textContent?.trim() || '';
              if ((targetSort === 'new' && /Newest/i.test(text)) || (targetSort === 'top' && /Top/i.test(text))) {
                item.click();
                return;
              }
            }
          }, sortBy);
          await page.waitForTimeout(2500);
        }
      } catch (e) {
        console.log('Could not change sort order:', e.message);
      }
    }

    let comments = [];
    let previousCount = 0;
    let stagnantCycles = 0;
    const maxScrolls = 50;
    let attempts = 0;

    while (comments.length < limit && attempts < maxScrolls) {
      const batch = await page.evaluate(() => {
        const nodes = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
        const out = [];
        nodes.forEach((el, index) => {
          try {
            const authorEl = el.querySelector('#author-text span');
            const textEl = el.querySelector('#content-text');
            const likesEl = el.querySelector('#vote-count-middle');
            const timeEl = el.querySelector('#published-time-text');
            const avatarEl = el.querySelector('#author-thumbnail img');
            const heartEl = el.querySelector('#heart-button[aria-pressed="true"]');
            const replyButton = el.querySelector('#replies button');
            let replyCount = 0;
            if (replyButton && replyButton.textContent) {
              const m = replyButton.textContent.match(/(\d+)/);
              replyCount = m ? parseInt(m[1], 10) : 0;
            }
            const item = {
              id: `comment-${index}`,
              author: authorEl ? authorEl.textContent.trim() : 'Unknown',
              text: textEl ? textEl.textContent.trim() : '',
              likes: likesEl ? (parseInt(likesEl.textContent.replace(/[^\d]/g, ''), 10) || 0) : 0,
              relativeTime: timeEl ? timeEl.textContent.trim() : '',
              avatar: avatarEl ? avatarEl.src : '',
              isCreatorHearted: !!heartEl,
              replyCount,
              position: index + 1,
            };
            if (item.text) out.push(item);
          } catch (e) {
            // skip
          }
        });
        return out;
      });

      const key = (c) => c.author + c.text + c.relativeTime;
      const existing = new Set(comments.map(key));
      const newOnes = batch.filter((c) => !existing.has(key(c)));
      comments.push(...newOnes);

      if (comments.length >= limit) break;

      if (batch.length === previousCount) {
        stagnantCycles += 1;
        if (stagnantCycles >= 3) break;
      } else {
        stagnantCycles = 0;
      }
      previousCount = batch.length;

      // Scroll further to load more
      await page.evaluate(() => {
        const commentsSection = document.querySelector('#comments');
        if (commentsSection) commentsSection.scrollIntoView({ behavior: 'smooth', block: 'end' });
        window.scrollBy(0, 1000);
      });
      await page.waitForTimeout(1500);
      attempts += 1;
    }

    return comments.slice(0, limit).map((c, i) => ({ ...c, index: i + 1, id: `comment-${i + 1}` }));
  } catch (error) {
    console.error('Error scraping Shorts comments:', error);
    return [];
  }
}

Optional helper to convert relative time (e.g., “3 days ago”) to a Date:

function convertRelativeTimeToDate(relative) {
  try {
    const text = String(relative).toLowerCase();
    const now = new Date();
    const match = text.match(/(\d+)\s*(second|minute|hour|day|week|month|year)s?\s*ago/);
    if (!match) return now.toISOString();
    const amount = parseInt(match[1], 10);
    const unit = match[2];
    const ms = {
      second: 1000,
      minute: 60 * 1000,
      hour: 60 * 60 * 1000,
      day: 24 * 60 * 60 * 1000,
      week: 7 * 24 * 60 * 60 * 1000,
      month: 30 * 24 * 60 * 60 * 1000,
      year: 365 * 24 * 60 * 60 * 1000,
    }[unit] || 0;
    return new Date(now.getTime() - amount * ms).toISOString();
  } catch {
    return new Date().toISOString();
  }
}

Putting It Together

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 1024 });

  const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
  const comments = await scrapeYouTubeShortsComments(page, shortsUrl, 25, 'top');

  console.log(JSON.stringify(comments.slice(0, 5), null, 2));
  await browser.close();
})();

Handling Common Issues

Cookie banners: Close consent dialogs before interacting.
Layout changes: YouTube updates selectors frequently; use multiple fallbacks.
Dynamic loading: Incremental scrolling + short waits to load more.
Rate limits: Add random delays, rotate agents, consider proxies for scale.

Alternative: Use the SocialKit YouTube Comments API

Offload maintenance and scale by using the managed API. Supports standard videos and Shorts, sorting, limits, and rich fields.

curl "https://api.socialkit.dev/youtube/comments?access_key=YOUR_ACCESS_KEY&url=https://youtube.com/watch?v=dQw4w9WgXcQ&limit=5&sortBy=top"

Example response (truncated)

{
  "success": true,
  "data": {
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "comments": [
      {
        "author": "@YouTube",
        "text": "can confirm: he never gave us up",
        "likes": 88,
        "date": "",
        "avatar": "https://yt3.ggpht.com/...",
        "replyCount": 1,
        "position": 1
      }
    ]
  }
}

Free YouTube Tools

YouTube Video Summarizer: free tool
YouTube Transcript Extractor: free tool

Both tools work great with Shorts URLs.

Conclusion

Scraping YouTube Shorts comments works seamlessly once you convert Shorts URLs to the standard watch format. With sorting, infinite scrolling, and robust selector handling, you can capture high-quality dataset(s) for analysis, moderation, and research.