How to Scrape YouTube Comments With Puppeteer
YouTube comments are a goldmine for community insights, sentiment, and market research. In this guide, you’ll learn how to scrape YouTube comments using Puppeteer, including support for YouTube Shorts. We’ll cover setup, scrolling and sorting, extracting rich metadata (author, likes, replies), and robust techniques for dynamic content.
Prerequisites
Before you start, make sure you have:
- Node.js (v14 or higher)
- Familiarity with JavaScript and async/await
- Basic understanding of DOM selectors and browser automation
Setting Up the Project
mkdir youtube-comments-scraper
cd youtube-comments-scraper
npm init -y
Install dependencies:
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Enable stealth to reduce bot-detection risk.
Basic Implementation
This minimal version scrolls to the comments, extracts the first N entries, and returns core fields.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
async function scrapeBasicComments(url, limit = 10) {
const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
// Scroll to comments and wait
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(2000);
const comments = await page.evaluate((max) => {
const items = Array.from(document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer')).slice(0, max);
return items.map((el, index) => {
const author = el.querySelector('#author-text span')?.textContent?.trim() || 'Unknown';
const text = el.querySelector('#content-text')?.textContent?.trim() || '';
const likesText = el.querySelector('#vote-count-middle')?.textContent || '0';
const likes = parseInt(likesText.replace(/[^\d]/g, ''), 10) || 0;
const relativeTime = el.querySelector('#published-time-text')?.textContent?.trim() || '';
const avatar = el.querySelector('#author-thumbnail img')?.src || '';
const replyBtn = el.querySelector('#replies button');
const replyCount = replyBtn && replyBtn.textContent ? (parseInt((replyBtn.textContent.match(/(\d+)/) || [0])[0], 10) || 0) : 0;
return { author, text, likes, relativeTime, avatar, replyCount, position: index + 1 };
}).filter(c => c.text);
}, limit);
return comments;
} finally {
await browser.close();
}
}
// Usage
// scrapeBasicComments('https://www.youtube.com/watch?v=dQw4w9WgXcQ', 10).then(console.log);
Advanced Implementation with Sorting and Infinite Scroll
The following robust function supports sorting by top
or new
, infinite scrolling, deduplication, reply counts, and creator-heart detection. Works for standard videos and Shorts.
async function scrapeYouTubeComments(page, limit = 20, sortBy = 'new') {
console.log(`Starting to scrape YouTube comments, limit: ${limit}, sortBy: ${sortBy}`);
try {
// First scroll down to make sure comments section is visible
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight / 3);
});
await page.waitForTimeout(2000);
// Wait for comments to load
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(2000);
// Handle sort dropdown if sortBy is specified
if (sortBy === 'top' || sortBy === 'new') {
try {
console.log(`Setting sort to: ${sortBy}`);
// Find and click the sort dropdown trigger
await page.waitForSelector('#comments #trigger', { timeout: 5000 });
// Check current sort state
const currentSort = await page.evaluate(() => {
const trigger = document.querySelector('#comments #trigger');
if (trigger && trigger.textContent) {
const text = trigger.textContent.toLowerCase();
if (text.includes('newest') || text.includes('new')) return 'new';
if (text.includes('top')) return 'top';
}
return 'unknown';
});
console.log(`Current sort: ${currentSort}, wanted: ${sortBy}`);
// Only click if we need to change the sort
if (currentSort !== sortBy) {
// Click the dropdown trigger
await page.click('#comments #trigger');
await page.waitForTimeout(1000);
// Find and click the appropriate sort option
const sortOption = sortBy === 'new' ? 'Newest first' : 'Top comments';
await page.evaluate((targetSort) => {
const menuItems = document.querySelectorAll('#comments tp-yt-paper-listbox tp-yt-paper-item');
for (const item of menuItems) {
const text = item.textContent.trim();
if ((targetSort === 'new' && (text.includes('Newest') || text.includes('newest'))) ||
(targetSort === 'top' && (text.includes('Top') || text.includes('top')))) {
item.click();
console.log(`Clicked sort option: ${text}`);
return;
}
}
}, sortBy);
// Wait for comments to reload with new sort
await page.waitForTimeout(3000);
console.log(`Sort changed to: ${sortBy}`);
}
} catch (e) {
console.log('Could not change sort order, using default:', e.message);
}
}
let comments = [];
let previousCommentsCount = 0;
let noNewCommentsCount = 0;
const maxScrollAttempts = 50;
let scrollAttempts = 0;
while (comments.length < limit && scrollAttempts < maxScrollAttempts) {
// Extract current comments
const currentComments = await page.evaluate(() => {
const commentElements = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
const comments = [];
commentElements.forEach((commentEl, index) => {
try {
const authorElement = commentEl.querySelector('#author-text span');
const textElement = commentEl.querySelector('#content-text');
const likesElement = commentEl.querySelector('#vote-count-middle');
const timeElement = commentEl.querySelector('#published-time-text');
const avatarElement = commentEl.querySelector('#author-thumbnail img');
const heartElement = commentEl.querySelector('#heart-button[aria-pressed="true"]');
// Get reply count
const replyButton = commentEl.querySelector('#replies button');
let replyCount = 0;
if (replyButton && replyButton.textContent) {
const match = replyButton.textContent.match(/(\d+)/);
replyCount = match ? parseInt(match[1], 10) : 0;
}
const comment = {
id: `comment-${index}`,
author: authorElement ? authorElement.textContent.trim() : 'Unknown',
text: textElement ? textElement.textContent.trim() : '',
likes: likesElement ? parseInt(likesElement.textContent.replace(/[^\d]/g, '')) || 0 : 0,
relativeTime: timeElement ? timeElement.textContent.trim() : '',
avatar: avatarElement ? avatarElement.src : '',
isCreatorHearted: !!heartElement,
replyCount: replyCount,
position: index + 1
};
if (comment.text) {
comments.push(comment);
}
} catch (e) {
console.log('Error extracting comment:', e);
}
});
return comments;
});
// Update comments array with new unique comments
const existingIds = new Set(comments.map(c => c.author + c.text + c.relativeTime));
const newComments = currentComments.filter(c =>
!existingIds.has(c.author + c.text + c.relativeTime)
);
comments.push(...newComments);
console.log(`Extracted ${currentComments.length} comments, ${newComments.length} new, total: ${comments.length}`);
// Check if we have enough comments
if (comments.length >= limit) {
break;
}
// Check if no new comments were loaded
if (currentComments.length === previousCommentsCount) {
noNewCommentsCount++;
if (noNewCommentsCount >= 3) {
console.log('No new comments loaded after 3 attempts, stopping');
break;
}
} else {
noNewCommentsCount = 0;
}
previousCommentsCount = currentComments.length;
// Scroll down to load more comments
await page.evaluate(() => {
const commentsSection = document.querySelector('#comments');
if (commentsSection) {
commentsSection.scrollIntoView({ behavior: 'smooth', block: 'end' });
}
window.scrollBy(0, 1000);
});
await page.waitForTimeout(2000);
scrollAttempts++;
}
// Trim to limit and add index, convert relative time to date
const finalComments = comments.slice(0, limit).map((comment, index) => ({
...comment,
index: index + 1,
id: `comment-${index + 1}`,
date: comment.relativeTime ? convertRelativeTimeToDate(comment.relativeTime) : new Date().toISOString()
}));
console.log(`Scraped ${finalComments.length} comments total`);
return finalComments;
} catch (error) {
console.error('Error scraping YouTube comments:', error);
return [];
}
}
Helper to convert “x hours/days ago” to ISO date:
function convertRelativeTimeToDate(relative) {
try {
const text = relative.toLowerCase();
const now = new Date();
const match = text.match(/(\d+)\s*(second|minute|hour|day|week|month|year)s?\s*ago/);
if (!match) return now.toISOString();
const amount = parseInt(match[1], 10);
const unit = match[2];
const ms = {
second: 1000,
minute: 60 * 1000,
hour: 60 * 60 * 1000,
day: 24 * 60 * 60 * 1000,
week: 7 * 24 * 60 * 60 * 1000,
month: 30 * 24 * 60 * 60 * 1000,
year: 365 * 24 * 60 * 60 * 1000,
}[unit] || 0;
return new Date(now.getTime() - amount * ms).toISOString();
} catch {
return new Date().toISOString();
}
}
Putting it together
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({ headless: 'new', ignoreDefaultArgs: ['--enable-automation'] });
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
const url = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
const comments = await scrapeYouTubeComments(page, 25, 'top');
console.log(JSON.stringify(comments.slice(0, 5), null, 2));
await browser.close();
})();
Handling Common Issues
- Cookie banners: Close consent dialogs if present before interacting with the page.
- Layout changes: YouTube updates selectors frequently. Keep multiple selector fallbacks where possible.
- Dynamic loading: Use incremental scrolling and short waits between loads to capture more comments.
- Rate limiting: Add random delays, rotate user agents, and consider proxies for large-scale scraping.
Alternative: Use the SocialKit YouTube Comments API
Offload maintenance and scale by using the managed API. Supports standard videos and Shorts, sorting, limits, and rich fields.
curl "https://api.socialkit.dev/youtube/comments?access_key=YOUR_ACCESS_KEY&url=https://youtube.com/watch?v=dQw4w9WgXcQ&limit=5&sortBy=top"
Example response (truncated):
{
"success": true,
"data": {
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"comments": [
{
"author": "@YouTube",
"text": "can confirm: he never gave us up",
"likes": 88,
"date": "",
"avatar": "https://yt3.ggpht.com/...",
"replyCount": 1,
"position": 1
}
]
}
}
Free YouTube Tools
If you want fast results without coding, try our free tools:
YouTube Video Summarizer
Get AI-powered summaries with our free YouTube Video Summarizer:
- Generate summaries for any YouTube video or Shorts
- Extract key insights and topics
- Instant results with no setup
YouTube Transcript Extractor
Extract accurate transcripts with our free YouTube Transcript Extractor:
- Timestamped segments for easy reference
- Copy individual segments or full transcript
- Great for accessibility and analysis
Conclusion
Scraping YouTube comments with Puppeteer enables powerful analysis for community insights, moderation, and market research. For production use or large-scale workloads, consider the managed SocialKit YouTube Comments API to save time and ensure reliability.
More YouTube scraping tutorials
I write a lot about scraping. If you’re interested in more YouTube scraping tutorials, here are a few to follow next: