How to Scrape YouTube Shorts: Complete Guide
YouTube Shorts use a different URL structure that requires a conversion step before you can extract data effectively. This guide shows you how to scrape YouTube Shorts using Puppeteer, starting with the essential URL conversion technique.
The key insight for Shorts scraping is converting the Shorts URL format (youtube.com/shorts/VIDEO_ID
) to the regular format (youtube.com/watch?v=VIDEO_ID
) to access the full metadata interface. Once converted, you can use standard YouTube scraping techniques.
Prerequisites
Before starting, ensure you have:
- Node.js installed (version 14 or higher)
- Basic knowledge of JavaScript and async/await
- Understanding of DOM manipulation and CSS selectors
- Familiarity with browser automation concepts
Setting Up the Project
Create a new project and install dependencies:
mkdir youtube-shorts-scraper
cd youtube-shorts-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
The stealth plugin helps avoid detection by making automated browser behavior appear more natural.
URL Conversion: The Foundation
Every Shorts scraping operation starts with URL conversion. Here's the essential function:
function convertShortsToRegularUrl(url) {
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
return url; // Already regular format
}
throw new Error('Invalid YouTube URL format');
}
// Examples:
// convertShortsToRegularUrl('https://www.youtube.com/shorts/ABC123xyz')
// Returns: 'https://www.youtube.com/watch?v=ABC123xyz'
// convertShortsToRegularUrl('https://www.youtube.com/watch?v=ABC123xyz')
// Returns: 'https://www.youtube.com/watch?v=ABC123xyz' (no change needed)
This function handles both Shorts URLs and regular YouTube URLs, making your scraper flexible.
Extracting Shorts Video Details
Extract Shorts metadata by converting the URL first, then using standard extraction techniques:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
async function scrapeShortsDetails(shortsUrl) {
// Convert Shorts URL to regular format
const regularUrl = convertShortsToRegularUrl(shortsUrl);
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"]
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
// Navigate to converted URL
await page.goto(regularUrl, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(3000);
const shortsDetails = await page.evaluate(() => {
// Extract video title
const titleElement = document.querySelector('h1.ytd-video-primary-info-renderer yt-formatted-string');
const title = titleElement ? titleElement.textContent.trim() : '';
// Extract view count
const viewsElement = document.querySelector('yt-view-count-renderer .view-count');
const viewsText = viewsElement ? viewsElement.textContent : '';
// Extract channel name
const channelElement = document.querySelector('ytd-video-owner-renderer .ytd-channel-name a');
const channelName = channelElement ? channelElement.textContent.trim() : '';
// Extract upload date
const dateElement = document.querySelector('#info-strings yt-formatted-string');
const uploadDate = dateElement ? dateElement.textContent : '';
return {
title,
views: viewsText,
channelName,
uploadDate,
originalUrl: window.location.href,
isShorts: true
};
});
return shortsDetails;
} finally {
await browser.close();
}
}
// Usage
// scrapeShortsDetails('https://www.youtube.com/shorts/ABC123xyz').then(console.log);
The key difference from regular video scraping is the URL conversion step at the beginning.
For advanced Shorts details extraction: How to Extract YouTube Shorts Video Details Using Puppeteer - includes comprehensive metadata extraction, error handling, and mobile-specific optimizations.
Extracting Shorts Comments
Shorts comments work the same way as regular video comments after URL conversion:
async function scrapeShortsComments(shortsUrl, limit = 10) {
// Convert Shorts URL to regular format
const regularUrl = convertShortsToRegularUrl(shortsUrl);
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"]
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
// Navigate to converted URL
await page.goto(regularUrl, { waitUntil: 'domcontentloaded', timeout: 30000 });
// Scroll to comments section
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight / 3);
});
// Wait for comments to load
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(3000);
const comments = await page.evaluate((maxComments) => {
const commentElements = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
const results = [];
for (let i = 0; i < Math.min(commentElements.length, maxComments); i++) {
const element = commentElements[i];
const authorElement = element.querySelector('#author-text span');
const textElement = element.querySelector('#content-text');
const likesElement = element.querySelector('#vote-count-middle');
const timeElement = element.querySelector('#published-time-text');
const comment = {
author: authorElement ? authorElement.textContent.trim() : 'Unknown',
text: textElement ? textElement.textContent.trim() : '',
likes: likesElement ? parseInt(likesElement.textContent.replace(/[^\d]/g, '')) || 0 : 0,
time: timeElement ? timeElement.textContent.trim() : '',
position: i + 1
};
if (comment.text) {
results.push(comment);
}
}
return results;
}, limit);
return comments;
} finally {
await browser.close();
}
}
// Usage
// scrapeShortsComments('https://www.youtube.com/shorts/ABC123xyz', 15).then(console.log);
For advanced Shorts comment scraping: How to Scrape YouTube Shorts Comments With Puppeteer - includes infinite scrolling, comment sorting, and deduplication techniques.
Extracting Shorts Transcripts
Shorts transcripts also require URL conversion before accessing the transcript interface:
async function scrapeShortsTranscript(shortsUrl) {
// Convert Shorts URL to regular format
const regularUrl = convertShortsToRegularUrl(shortsUrl);
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"]
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
// Navigate to converted URL
await page.goto(regularUrl, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(3000);
// Look for transcript button and click it
await page.evaluate(() => {
const buttons = Array.from(document.querySelectorAll('button'));
const transcriptButton = buttons.find(button =>
button.textContent && button.textContent.toLowerCase().includes('transcript')
);
if (transcriptButton) {
transcriptButton.click();
}
});
// Wait for transcript panel to load
await page.waitForTimeout(2000);
const transcript = await page.evaluate(() => {
const transcriptContainer = document.querySelector('#segments-container');
if (!transcriptContainer) return null;
const segments = transcriptContainer.querySelectorAll('ytd-transcript-segment-renderer');
const results = [];
segments.forEach((segment, index) => {
const timestampElement = segment.querySelector('.ytd-transcript-segment-renderer[role="button"] .timestamp');
const textElement = segment.querySelector('.ytd-transcript-segment-renderer[role="button"] .segment-text');
if (timestampElement && textElement) {
results.push({
timestamp: timestampElement.textContent.trim(),
text: textElement.textContent.trim(),
index: index + 1
});
}
});
return results;
});
return transcript;
} finally {
await browser.close();
}
}
// Usage
// scrapeShortsTranscript('https://www.youtube.com/shorts/ABC123xyz').then(console.log);
For advanced Shorts transcript scraping: How to Scrape YouTube Shorts Transcripts With Puppeteer - includes language detection, error handling for missing transcripts, and mobile optimizations.
Complete Shorts Scraper Example
Here's a complete example that extracts all data types from a Shorts URL:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
function convertShortsToRegularUrl(url) {
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
return url;
}
throw new Error('Invalid YouTube URL format');
}
async function scrapeCompleteShortsData(shortsUrl) {
const regularUrl = convertShortsToRegularUrl(shortsUrl);
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"]
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 1024 });
await page.goto(regularUrl, { waitUntil: 'domcontentloaded', timeout: 30000 });
// Extract video details
const details = await page.evaluate(() => {
const titleElement = document.querySelector('h1.ytd-video-primary-info-renderer yt-formatted-string');
const viewsElement = document.querySelector('yt-view-count-renderer .view-count');
const channelElement = document.querySelector('ytd-video-owner-renderer .ytd-channel-name a');
return {
title: titleElement ? titleElement.textContent.trim() : '',
views: viewsElement ? viewsElement.textContent : '',
channelName: channelElement ? channelElement.textContent.trim() : ''
};
});
// Extract comments (first 5)
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight / 3));
await page.waitForSelector('#comments', { timeout: 15000 });
await page.waitForTimeout(2000);
const comments = await page.evaluate(() => {
const commentElements = document.querySelectorAll('#comments #contents > ytd-comment-thread-renderer');
const results = [];
for (let i = 0; i < Math.min(commentElements.length, 5); i++) {
const element = commentElements[i];
const authorElement = element.querySelector('#author-text span');
const textElement = element.querySelector('#content-text');
if (authorElement && textElement) {
results.push({
author: authorElement.textContent.trim(),
text: textElement.textContent.trim()
});
}
}
return results;
});
return {
originalShortsUrl: shortsUrl,
convertedUrl: regularUrl,
details,
comments
};
} finally {
await browser.close();
}
}
// Usage
// scrapeCompleteShortsData('https://www.youtube.com/shorts/ABC123xyz').then(console.log);
API Alternative for Production Use
For production applications, the SocialKit YouTube APIs handle Shorts URLs natively without requiring manual URL conversion:
// APIs handle Shorts URLs directly - no conversion needed
const shortsDetails = await fetch('https://api.socialkit.dev/youtube/stats?url=https://youtube.com/shorts/VIDEO_ID&access_key=KEY');
const shortsComments = await fetch('https://api.socialkit.dev/youtube/comments?url=https://youtube.com/shorts/VIDEO_ID&limit=100&access_key=KEY');
const shortsTranscript = await fetch('https://api.socialkit.dev/youtube/transcript?url=https://youtube.com/shorts/VIDEO_ID&access_key=KEY');
API benefits: Native Shorts support, no URL conversion needed, reliable infrastructure, 20 free requests monthly.
Conclusion
The key to scraping YouTube Shorts is understanding the URL conversion technique. Once you convert Shorts URLs to regular format, you can use standard YouTube scraping methods. Start with the URL conversion function and build from there.
For production applications requiring reliability and native Shorts support, consider using specialized APIs that handle the URL conversion and scraping complexity automatically.