How to Extract YouTube Shorts Video Details Using Puppeteer
YouTube Shorts have transformed the social media landscape, generating billions of views with their bite-sized vertical video format. However, extracting metadata and analytics data from these short-form videos presents unique technical challenges that differ from regular YouTube videos.
The YouTube Shorts URL Challenge
The fundamental difference between YouTube Shorts and regular videos lies in their URL structure:
YouTube Shorts URL Pattern:
https://www.youtube.com/shorts/VIDEO_ID
Regular YouTube URL Pattern:
https://www.youtube.com/watch?v=VIDEO_ID
To extract video details from YouTube Shorts, we must convert the Shorts URL to the regular format, as the detailed metadata interface is only accessible through the standard video player.
URL Conversion Implementation
const convertShortsToRegularUrl = (url) => {
// Define regex patterns for both URL types
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
// Extract video ID from Shorts URL
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
// Already a regular YouTube URL
return url;
} else {
throw new Error('Invalid YouTube URL format');
}
};
// Example usage
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
const regularUrl = convertShortsToRegularUrl(shortsUrl);
// Result: 'https://www.youtube.com/watch?v=ABC123xyz'
Project Setup
Initialize your YouTube Shorts scraper project:
mkdir youtube-shorts-details-scraper
cd youtube-shorts-details-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
The stealth plugin helps bypass detection mechanisms that might block automated browser sessions.
Basic YouTube Shorts Details Extractor
Here's a fundamental implementation that converts Shorts URLs and extracts core video metadata:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// Enable stealth mode to avoid detection
puppeteer.use(StealthPlugin());
const convertShortsToRegularUrl = (url) => {
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
return url;
} else {
throw new Error('Invalid YouTube URL format');
}
};
const extractYouTubeShortsDetails = async (shortsUrl) => {
// Convert Shorts URL to regular YouTube format
const url = convertShortsToRegularUrl(shortsUrl);
console.log(`Converting: ${shortsUrl} -> ${url}`);
const browser = await puppeteer.launch({
headless: 'new',
ignoreDefaultArgs: ['--enable-automation'],
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
// Navigate to converted YouTube URL
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(1500);
// Handle cookie consent banner
try {
await page.evaluate(() => {
const cookieButton = document.querySelector(
'button[aria-label*="cookies"]'
);
if (cookieButton) {
cookieButton.click();
console.log('Cookie banner closed');
}
});
await page.waitForTimeout(1000);
} catch (e) {
console.log('No cookie banner detected');
}
// Scroll to trigger content loading
await page.evaluate(() => window.scrollBy(0, 300));
await page.waitForTimeout(800);
// Extract video details
const videoDetails = await page.evaluate(() => {
const extractNumber = (text) => {
if (!text) return 0;
const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
const match = cleanText.match(
/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/
);
if (!match) return 0;
const numStr = match[0];
const suffix = numStr.slice(-1);
if (['K', 'M', 'B'].includes(suffix)) {
const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
if (isNaN(num)) return 0;
switch (suffix) {
case 'K':
return Math.floor(num * 1000);
case 'M':
return Math.floor(num * 1000000);
case 'B':
return Math.floor(num * 1000000000);
}
} else {
const num = parseFloat(numStr.replace(/,/g, ''));
return isNaN(num) ? 0 : Math.floor(num);
}
return 0;
};
const data = {};
// Extract video title
const titleElement = document.querySelector(
'h1.ytd-watch-metadata yt-formatted-string'
);
data.title = titleElement ? titleElement.textContent.trim() : '';
// Extract channel information
const channelElement = document.querySelector('ytd-channel-name a');
if (channelElement) {
data.channelName = channelElement.textContent.trim();
data.channelLink = channelElement.href || '';
}
// Extract view count
const viewsElement = document.querySelector('#info span[class*="view"]');
data.views = viewsElement ? extractNumber(viewsElement.textContent) : 0;
// Extract likes
const likesElement = document.querySelector(
'.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
);
data.likes = likesElement ? extractNumber(likesElement.textContent) : 0;
return data;
});
return {
originalUrl: shortsUrl,
convertedUrl: url,
...videoDetails,
};
} catch (error) {
console.error('Error extracting Shorts details:', error);
throw error;
} finally {
await browser.close();
}
};
// Usage example
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
extractYouTubeShortsDetails(shortsUrl)
.then((details) => console.log('Shorts Details:', details))
.catch((error) => console.error('Extraction failed:', error));
Comprehensive YouTube Shorts Metadata Extraction
For production use, here's an advanced implementation with robust error handling and comprehensive data extraction:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const convertShortsToRegularUrl = (url) => {
const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;
if (shortsPattern.test(url)) {
const videoId = url.match(shortsPattern)[1];
return `https://www.youtube.com/watch?v=${videoId}`;
} else if (regularPattern.test(url)) {
return url;
} else {
throw new Error('Invalid YouTube URL format');
}
};
const scrapeYouTubeShortsDetails = async (shortsUrl) => {
const url = convertShortsToRegularUrl(shortsUrl);
console.log(`Processing: ${shortsUrl} -> ${url}`);
const browser = await puppeteer.launch({
headless: 'new',
ignoreDefaultArgs: ['--enable-automation'],
});
try {
const page = await browser.newPage();
await page.setViewport({
width: 1280,
height: 1024,
deviceScaleFactor: 1,
});
// Navigate to YouTube video
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(1500);
// Handle cookie banner
try {
await page.evaluate(() => {
const cookieButton = document.querySelector(
'button[aria-label*="cookies"]'
);
if (cookieButton) {
cookieButton.click();
}
});
await page.waitForTimeout(1000);
} catch (e) {
console.log('No cookie banner found');
}
// Scroll to load below-the-fold content
try {
await page.evaluate(() => window.scrollBy(0, 300));
await page.waitForTimeout(800);
} catch (e) {
console.log('Could not scroll page');
}
// Extract comprehensive metadata
const metadata = await page.evaluate(() => {
const extractNumber = (text) => {
if (!text) return 0;
const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
const match = cleanText.match(
/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/
);
if (!match) return 0;
const numStr = match[0];
const suffix = numStr.slice(-1);
if (['K', 'M', 'B'].includes(suffix)) {
const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
if (isNaN(num) || num < 0 || num > 999999) return 0;
switch (suffix) {
case 'K':
return Math.floor(num * 1000);
case 'M':
return Math.floor(num * 1000000);
case 'B':
return Math.floor(num * 1000000000);
}
} else {
const num = parseFloat(numStr.replace(/,/g, ''));
if (isNaN(num) || num < 0 || num > 999999999999) return 0;
return Math.floor(num);
}
return 0;
};
const data = {
title: '',
channelName: '',
channelLink: '',
views: 0,
likes: 0,
comments: 0,
publishDate: '',
description: '',
thumbnailUrl: '',
};
// Extract title
const titleElement = document.querySelector(
'h1.ytd-watch-metadata yt-formatted-string'
);
if (titleElement) {
data.title = titleElement.textContent.trim();
}
// Extract channel information
const channelElement = document.querySelector('ytd-channel-name a');
if (channelElement) {
data.channelName = channelElement.textContent.trim();
data.channelLink = channelElement.href || '';
}
// Extract views with multiple fallback selectors
const viewsSelectors = [
'#info span[class*="view"]',
'#info .style-scope.yt-formatted-string',
'#info .view-count',
];
for (const selector of viewsSelectors) {
const viewsElement = document.querySelector(selector);
if (viewsElement && viewsElement.textContent.trim()) {
const text = viewsElement.textContent.trim();
if (
text.includes('views') ||
text.includes('view') ||
/[\d,]+[KMB]?\s*(views?|watching)/i.test(text)
) {
data.views = extractNumber(text);
break;
}
}
}
// Extract likes
const likesElement = document.querySelector(
'.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
);
if (likesElement) {
data.likes = extractNumber(likesElement.textContent);
}
// Extract comments count
const commentsElement = document.querySelector('#title #count span');
if (commentsElement) {
data.comments = extractNumber(commentsElement.textContent);
}
// Extract publish date
const publishElement = document.querySelector(
'ytd-watch-metadata #info-strings yt-formatted-string:nth-child(2)'
);
if (publishElement) {
data.publishDate = publishElement.textContent.trim();
}
// Extract description
const descriptionElement = document.querySelector(
'ytd-watch-metadata #description-text'
);
if (descriptionElement) {
data.description =
descriptionElement.textContent.trim().substring(0, 300) + '...';
}
// Extract thumbnail
const thumbnailElement = document.querySelector('video');
if (thumbnailElement) {
data.thumbnailUrl = thumbnailElement.poster || '';
}
return data;
});
return {
originalShortsUrl: shortsUrl,
convertedUrl: url,
extractedAt: new Date().toISOString(),
...metadata,
};
} catch (error) {
console.error('Error scraping Shorts details:', error);
throw error;
} finally {
await browser.close();
}
};
// Usage example
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
scrapeYouTubeShortsDetails(shortsUrl)
.then((details) => {
console.log('Shorts Title:', details.title);
console.log('Channel:', details.channelName);
console.log('Views:', details.views.toLocaleString());
console.log('Likes:', details.likes.toLocaleString());
})
.catch((error) => console.error('Failed to scrape Shorts details:', error));
Technical Challenges and Solutions
1. URL Structure Conversion
The most critical aspect of YouTube Shorts scraping is proper URL conversion:
// URL validation and conversion
const validateAndConvertUrl = (url) => {
try {
const urlObj = new URL(url);
if (
urlObj.hostname !== 'www.youtube.com' &&
urlObj.hostname !== 'youtube.com'
) {
throw new Error('Not a YouTube URL');
}
return convertShortsToRegularUrl(url);
} catch (error) {
throw new Error(`Invalid URL: ${error.message}`);
}
};
2. Dynamic Content Loading
YouTube Shorts interface elements load asynchronously:
// Wait for critical elements before extraction
await page.waitForSelector('h1.ytd-watch-metadata', { timeout: 10000 });
await page.waitForSelector('#info', { timeout: 5000 });
// Additional wait for engagement metrics
await page.evaluate(() => window.scrollBy(0, 300));
await page.waitForTimeout(800);
Alternative: SocialKit YouTube Stats API
For production applications requiring reliable YouTube Shorts analytics, consider SocialKit's YouTube Stats API:
curl "https://api.socialkit.dev/youtube/stats?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ"
Example Response
{
"success": true,
"data": {
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
"channelName": "Rick Astley",
"channelLink": "https://youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
"views": 1428567890,
"likes": 16234567,
"comments": 4567890
}
}
API Benefits:
- Automatic URL handling: Processes both Shorts and regular YouTube URLs
- No conversion needed: Handles URL transformation internally
- Consistent data structure: Standardized response format across all video types
- Real-time accuracy: Always up-to-date with current video statistics
- Scale-ready: Handle thousands of Shorts without rate limits
- Global availability: Works worldwide without geo-restrictions
Free YouTube Tools
Need instant access to YouTube Shorts data? Try our free tools:
YouTube Video Summarizer Tool
Get AI-powered insights with our free YouTube Video Summarizer tool:
- Analyze YouTube Shorts content with AI-powered summaries
- Extract key themes and trending topics from short-form videos
- Identify viral content patterns for your own content strategy
- Get instant insights without any setup or registration required
YouTube Transcript Extractor Tool
Extract content from Shorts with our free YouTube Transcript Extractor tool:
- Extract transcripts from YouTube Shorts automatically
- Get timestamped segments for precise content analysis
- Perfect for accessibility and content repurposing
- 100% free with support for both Shorts and regular videos
Both tools automatically handle YouTube Shorts URLs and provide immediate value for content creators, social media managers, and digital marketers.
Conclusion
Extracting YouTube Shorts video details with Puppeteer requires mastering the critical URL conversion technique that transforms Shorts URLs into standard YouTube video URLs. This conversion unlocks access to YouTube's comprehensive metadata interface, enabling extraction of views, likes, comments, and other valuable analytics data.