How to Scrape YouTube Shorts Comments With Puppeteer
YouTube Shorts are wildly popular, but scraping comments from Shorts requires one key trick: converting the Shorts URL to the regular YouTube watch URL. Once converted, you can use the same robust Puppeteer techniques to scroll, sort, and extract comments just like a standard video.
In this guide, you’ll learn how to scrape YouTube Shorts comments with:
- URL conversion (
/shorts/VIDEO_ID
→/watch?v=VIDEO_ID
) - Sorting by Top or New
- Infinite scrolling and deduplication
- Extracting rich fields: author, text, likes, avatar, relative time, replies, creator heart
Prerequisites
- Node.js v16+ recommended
- Basic Puppeteer knowledge
- Comfort with DOM selectors
Project Setup
mkdir youtube-shorts-comments-scraper
cd youtube-shorts-comments-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Enable stealth to reduce bot-detection risk.
Shorts URL Conversion
Shorts pages don’t expose the same metadata interface. Convert to the standard watch URL first.
const convertShortsToRegularUrl = (url) => {
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
return url;
}
throw new Error('Invalid YouTube URL format');
};
// Example
// convertShortsToRegularUrl('https://www.youtube.com/shorts/ABC123xyz')
// → 'https://www.youtube.com/watch?v=ABC123xyz'
Basic Shorts Comments Scraper
This minimal version converts the Shorts URL and extracts the first N comments.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
async function scrapeShortsCommentsBasic(shortsUrl, limit = 10) {
const url = convertShortsToRegularUrl(shortsUrl);
const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
// Scroll to comments and wait
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(2000);
const comments = await page.evaluate((max) => {
const items = Array.from(document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer')).slice(0, max);
return items.map((el, index) => {
const author = el.querySelector('#author-text span')?.textContent?.trim() || 'Unknown';
const text = el.querySelector('#content-text')?.textContent?.trim() || '';
const likesText = el.querySelector('#vote-count-middle')?.textContent || '0';
const likes = parseInt(likesText.replace(/[^\d]/g, ''), 10) || 0;
const relativeTime = el.querySelector('#published-time-text')?.textContent?.trim() || '';
const avatar = el.querySelector('#author-thumbnail img')?.src || '';
const replyBtn = el.querySelector('#replies button');
const replyCount = replyBtn && replyBtn.textContent ? (parseInt((replyBtn.textContent.match(/(\d+)/) || [0])[0], 10) || 0) : 0;
return { author, text, likes, relativeTime, avatar, replyCount, position: index + 1 };
}).filter(c => c.text);
}, limit);
return comments;
} finally {
await browser.close();
}
}
// Usage
// scrapeShortsCommentsBasic('https://www.youtube.com/shorts/ABC123xyz', 10).then(console.log);
Advanced Shorts Comments Scraper (Sorting + Infinite Scroll)
This version supports sorting by top
or new
, infinite scroll, deduplication, and creator-heart detection.
async function scrapeYouTubeShortsComments(page, shortsUrl, limit = 20, sortBy = 'new') {
const url = convertShortsToRegularUrl(shortsUrl);
console.log(`Shorts → watch conversion: ${shortsUrl} -> ${url}`);
try {
await page.setViewport({ width: 1280, height: 1024 });
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
// Ensure comments are visible
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(2000);
// Optional: change sort order
if (sortBy === 'top' || sortBy === 'new') {
try {
await page.waitForSelector('#comments #trigger', { timeout: 5000 });
const currentSort = await page.evaluate(() => {
const trigger = document.querySelector('#comments #trigger');
const text = trigger?.textContent?.toLowerCase() || '';
if (text.includes('newest') || text.includes('new')) return 'new';
if (text.includes('top')) return 'top';
return 'unknown';
});
if (currentSort !== sortBy) {
await page.click('#comments #trigger');
await page.waitForTimeout(800);
await page.evaluate((targetSort) => {
const items = document.querySelectorAll('#comments tp-yt-paper-listbox tp-yt-paper-item');
for (const item of items) {
const text = item.textContent?.trim() || '';
if ((targetSort === 'new' && /Newest/i.test(text)) || (targetSort === 'top' && /Top/i.test(text))) {
item.click();
return;
}
}
}, sortBy);
await page.waitForTimeout(2500);
}
} catch (e) {
console.log('Could not change sort order:', e.message);
}
}
let comments = [];
let previousCount = 0;
let stagnantCycles = 0;
const maxScrolls = 50;
let attempts = 0;
while (comments.length < limit && attempts < maxScrolls) {
const batch = await page.evaluate(() => {
const nodes = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
const out = [];
nodes.forEach((el, index) => {
try {
const authorEl = el.querySelector('#author-text span');
const textEl = el.querySelector('#content-text');
const likesEl = el.querySelector('#vote-count-middle');
const timeEl = el.querySelector('#published-time-text');
const avatarEl = el.querySelector('#author-thumbnail img');
const heartEl = el.querySelector('#heart-button[aria-pressed="true"]');
const replyButton = el.querySelector('#replies button');
let replyCount = 0;
if (replyButton && replyButton.textContent) {
const m = replyButton.textContent.match(/(\d+)/);
replyCount = m ? parseInt(m[1], 10) : 0;
}
const item = {
id: `comment-${index}`,
author: authorEl ? authorEl.textContent.trim() : 'Unknown',
text: textEl ? textEl.textContent.trim() : '',
likes: likesEl ? (parseInt(likesEl.textContent.replace(/[^\d]/g, ''), 10) || 0) : 0,
relativeTime: timeEl ? timeEl.textContent.trim() : '',
avatar: avatarEl ? avatarEl.src : '',
isCreatorHearted: !!heartEl,
replyCount,
position: index + 1,
};
if (item.text) out.push(item);
} catch (e) {
// skip
}
});
return out;
});
const key = (c) => c.author + c.text + c.relativeTime;
const existing = new Set(comments.map(key));
const newOnes = batch.filter((c) => !existing.has(key(c)));
comments.push(...newOnes);
if (comments.length >= limit) break;
if (batch.length === previousCount) {
stagnantCycles += 1;
if (stagnantCycles >= 3) break;
} else {
stagnantCycles = 0;
}
previousCount = batch.length;
// Scroll further to load more
await page.evaluate(() => {
const commentsSection = document.querySelector('#comments');
if (commentsSection) commentsSection.scrollIntoView({ behavior: 'smooth', block: 'end' });
window.scrollBy(0, 1000);
});
await page.waitForTimeout(1500);
attempts += 1;
}
return comments.slice(0, limit).map((c, i) => ({ ...c, index: i + 1, id: `comment-${i + 1}` }));
} catch (error) {
console.error('Error scraping Shorts comments:', error);
return [];
}
}
Optional helper to convert relative time (e.g., “3 days ago”) to a Date:
function convertRelativeTimeToDate(relative) {
try {
const text = String(relative).toLowerCase();
const now = new Date();
const match = text.match(/(\d+)\s*(second|minute|hour|day|week|month|year)s?\s*ago/);
if (!match) return now.toISOString();
const amount = parseInt(match[1], 10);
const unit = match[2];
const ms = {
second: 1000,
minute: 60 * 1000,
hour: 60 * 60 * 1000,
day: 24 * 60 * 60 * 1000,
week: 7 * 24 * 60 * 60 * 1000,
month: 30 * 24 * 60 * 60 * 1000,
year: 365 * 24 * 60 * 60 * 1000,
}[unit] || 0;
return new Date(now.getTime() - amount * ms).toISOString();
} catch {
return new Date().toISOString();
}
}
Putting It Together
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
const comments = await scrapeYouTubeShortsComments(page, shortsUrl, 25, 'top');
console.log(JSON.stringify(comments.slice(0, 5), null, 2));
await browser.close();
})();
Handling Common Issues
- Cookie banners: Close consent dialogs before interacting.
- Layout changes: YouTube updates selectors frequently; use multiple fallbacks.
- Dynamic loading: Incremental scrolling + short waits to load more.
- Rate limits: Add random delays, rotate agents, consider proxies for scale.
Alternative: Use the SocialKit YouTube Comments API
Offload maintenance and scale by using the managed API. Supports standard videos and Shorts, sorting, limits, and rich fields.
curl "https://api.socialkit.dev/youtube/comments?access_key=YOUR_ACCESS_KEY&url=https://youtube.com/watch?v=dQw4w9WgXcQ&limit=5&sortBy=top"
Example response (truncated)
{
"success": true,
"data": {
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"comments": [
{
"author": "@YouTube",
"text": "can confirm: he never gave us up",
"likes": 88,
"date": "",
"avatar": "https://yt3.ggpht.com/...",
"replyCount": 1,
"position": 1
}
]
}
}
Free YouTube Tools
Both tools work great with Shorts URLs.
Conclusion
Scraping YouTube Shorts comments works seamlessly once you convert Shorts URLs to the standard watch format. With sorting, infinite scrolling, and robust selector handling, you can capture high-quality dataset(s) for analysis, moderation, and research.