Back to all posts

How to Extract YouTube Shorts Video Details Using Puppeteer

Jonathan Geiger
web-scrapingpuppeteeryoutube-shortstutorialyoutube

YouTube Shorts have transformed the social media landscape, generating billions of views with their bite-sized vertical video format. However, extracting metadata and analytics data from these short-form videos presents unique technical challenges that differ from regular YouTube videos.

The YouTube Shorts URL Challenge

The fundamental difference between YouTube Shorts and regular videos lies in their URL structure:

YouTube Shorts URL Pattern:

https://www.youtube.com/shorts/VIDEO_ID

Regular YouTube URL Pattern:

https://www.youtube.com/watch?v=VIDEO_ID

To extract video details from YouTube Shorts, we must convert the Shorts URL to the regular format, as the detailed metadata interface is only accessible through the standard video player.

URL Conversion Implementation

const convertShortsToRegularUrl = (url) => {
	// Define regex patterns for both URL types
	const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
	const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;

	if (shortsPattern.test(url)) {
		// Extract video ID from Shorts URL
		const videoId = url.match(shortsPattern)[1];
		return `https://www.youtube.com/watch?v=${videoId}`;
	} else if (regularPattern.test(url)) {
		// Already a regular YouTube URL
		return url;
	} else {
		throw new Error('Invalid YouTube URL format');
	}
};

// Example usage
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
const regularUrl = convertShortsToRegularUrl(shortsUrl);
// Result: 'https://www.youtube.com/watch?v=ABC123xyz'

Project Setup

Initialize your YouTube Shorts scraper project:

mkdir youtube-shorts-details-scraper
cd youtube-shorts-details-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

The stealth plugin helps bypass detection mechanisms that might block automated browser sessions.

Basic YouTube Shorts Details Extractor

Here's a fundamental implementation that converts Shorts URLs and extracts core video metadata:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Enable stealth mode to avoid detection
puppeteer.use(StealthPlugin());

const convertShortsToRegularUrl = (url) => {
	const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
	const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;

	if (shortsPattern.test(url)) {
		const videoId = url.match(shortsPattern)[1];
		return `https://www.youtube.com/watch?v=${videoId}`;
	} else if (regularPattern.test(url)) {
		return url;
	} else {
		throw new Error('Invalid YouTube URL format');
	}
};

const extractYouTubeShortsDetails = async (shortsUrl) => {
	// Convert Shorts URL to regular YouTube format
	const url = convertShortsToRegularUrl(shortsUrl);
	console.log(`Converting: ${shortsUrl} -> ${url}`);

	const browser = await puppeteer.launch({
		headless: 'new',
		ignoreDefaultArgs: ['--enable-automation'],
	});

	try {
		const page = await browser.newPage();
		await page.setViewport({ width: 1280, height: 800 });

		// Navigate to converted YouTube URL
		await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
		await page.waitForTimeout(1500);

		// Handle cookie consent banner
		try {
			await page.evaluate(() => {
				const cookieButton = document.querySelector(
					'button[aria-label*="cookies"]'
				);
				if (cookieButton) {
					cookieButton.click();
					console.log('Cookie banner closed');
				}
			});
			await page.waitForTimeout(1000);
		} catch (e) {
			console.log('No cookie banner detected');
		}

		// Scroll to trigger content loading
		await page.evaluate(() => window.scrollBy(0, 300));
		await page.waitForTimeout(800);

		// Extract video details
		const videoDetails = await page.evaluate(() => {
			const extractNumber = (text) => {
				if (!text) return 0;

				const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
				const match = cleanText.match(
					/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/
				);
				if (!match) return 0;

				const numStr = match[0];
				const suffix = numStr.slice(-1);

				if (['K', 'M', 'B'].includes(suffix)) {
					const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
					if (isNaN(num)) return 0;

					switch (suffix) {
						case 'K':
							return Math.floor(num * 1000);
						case 'M':
							return Math.floor(num * 1000000);
						case 'B':
							return Math.floor(num * 1000000000);
					}
				} else {
					const num = parseFloat(numStr.replace(/,/g, ''));
					return isNaN(num) ? 0 : Math.floor(num);
				}

				return 0;
			};

			const data = {};

			// Extract video title
			const titleElement = document.querySelector(
				'h1.ytd-watch-metadata yt-formatted-string'
			);
			data.title = titleElement ? titleElement.textContent.trim() : '';

			// Extract channel information
			const channelElement = document.querySelector('ytd-channel-name a');
			if (channelElement) {
				data.channelName = channelElement.textContent.trim();
				data.channelLink = channelElement.href || '';
			}

			// Extract view count
			const viewsElement = document.querySelector('#info span[class*="view"]');
			data.views = viewsElement ? extractNumber(viewsElement.textContent) : 0;

			// Extract likes
			const likesElement = document.querySelector(
				'.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
			);
			data.likes = likesElement ? extractNumber(likesElement.textContent) : 0;

			return data;
		});

		return {
			originalUrl: shortsUrl,
			convertedUrl: url,
			...videoDetails,
		};
	} catch (error) {
		console.error('Error extracting Shorts details:', error);
		throw error;
	} finally {
		await browser.close();
	}
};

// Usage example
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
extractYouTubeShortsDetails(shortsUrl)
	.then((details) => console.log('Shorts Details:', details))
	.catch((error) => console.error('Extraction failed:', error));

Comprehensive YouTube Shorts Metadata Extraction

For production use, here's an advanced implementation with robust error handling and comprehensive data extraction:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const convertShortsToRegularUrl = (url) => {
	const shortsPattern = /youtube\.com\/shorts\/([^&\n?#]+)/;
	const regularPattern = /youtube\.com\/watch\?v=([^&\n?#]+)/;

	if (shortsPattern.test(url)) {
		const videoId = url.match(shortsPattern)[1];
		return `https://www.youtube.com/watch?v=${videoId}`;
	} else if (regularPattern.test(url)) {
		return url;
	} else {
		throw new Error('Invalid YouTube URL format');
	}
};

const scrapeYouTubeShortsDetails = async (shortsUrl) => {
	const url = convertShortsToRegularUrl(shortsUrl);
	console.log(`Processing: ${shortsUrl} -> ${url}`);

	const browser = await puppeteer.launch({
		headless: 'new',
		ignoreDefaultArgs: ['--enable-automation'],
	});

	try {
		const page = await browser.newPage();
		await page.setViewport({
			width: 1280,
			height: 1024,
			deviceScaleFactor: 1,
		});

		// Navigate to YouTube video
		await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
		await page.waitForTimeout(1500);

		// Handle cookie banner
		try {
			await page.evaluate(() => {
				const cookieButton = document.querySelector(
					'button[aria-label*="cookies"]'
				);
				if (cookieButton) {
					cookieButton.click();
				}
			});
			await page.waitForTimeout(1000);
		} catch (e) {
			console.log('No cookie banner found');
		}

		// Scroll to load below-the-fold content
		try {
			await page.evaluate(() => window.scrollBy(0, 300));
			await page.waitForTimeout(800);
		} catch (e) {
			console.log('Could not scroll page');
		}

		// Extract comprehensive metadata
		const metadata = await page.evaluate(() => {
			const extractNumber = (text) => {
				if (!text) return 0;

				const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
				const match = cleanText.match(
					/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/
				);
				if (!match) return 0;

				const numStr = match[0];
				const suffix = numStr.slice(-1);

				if (['K', 'M', 'B'].includes(suffix)) {
					const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
					if (isNaN(num) || num < 0 || num > 999999) return 0;

					switch (suffix) {
						case 'K':
							return Math.floor(num * 1000);
						case 'M':
							return Math.floor(num * 1000000);
						case 'B':
							return Math.floor(num * 1000000000);
					}
				} else {
					const num = parseFloat(numStr.replace(/,/g, ''));
					if (isNaN(num) || num < 0 || num > 999999999999) return 0;
					return Math.floor(num);
				}

				return 0;
			};

			const data = {
				title: '',
				channelName: '',
				channelLink: '',
				views: 0,
				likes: 0,
				comments: 0,
				publishDate: '',
				description: '',
				thumbnailUrl: '',
			};
            
			// Extract title
			const titleElement = document.querySelector(
				'h1.ytd-watch-metadata yt-formatted-string'
			);
			if (titleElement) {
				data.title = titleElement.textContent.trim();
			}

			// Extract channel information
			const channelElement = document.querySelector('ytd-channel-name a');
			if (channelElement) {
				data.channelName = channelElement.textContent.trim();
				data.channelLink = channelElement.href || '';
			}

			// Extract views with multiple fallback selectors
			const viewsSelectors = [
				'#info span[class*="view"]',
				'#info .style-scope.yt-formatted-string',
				'#info .view-count',
			];

			for (const selector of viewsSelectors) {
				const viewsElement = document.querySelector(selector);
				if (viewsElement && viewsElement.textContent.trim()) {
					const text = viewsElement.textContent.trim();
					if (
						text.includes('views') ||
						text.includes('view') ||
						/[\d,]+[KMB]?\s*(views?|watching)/i.test(text)
					) {
						data.views = extractNumber(text);
						break;
					}
				}
			}

			// Extract likes
			const likesElement = document.querySelector(
				'.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
			);
			if (likesElement) {
				data.likes = extractNumber(likesElement.textContent);
			}

			// Extract comments count
			const commentsElement = document.querySelector('#title #count span');
			if (commentsElement) {
				data.comments = extractNumber(commentsElement.textContent);
			}

			// Extract publish date
			const publishElement = document.querySelector(
				'ytd-watch-metadata #info-strings yt-formatted-string:nth-child(2)'
			);
			if (publishElement) {
				data.publishDate = publishElement.textContent.trim();
			}

			// Extract description
			const descriptionElement = document.querySelector(
				'ytd-watch-metadata #description-text'
			);
			if (descriptionElement) {
				data.description =
					descriptionElement.textContent.trim().substring(0, 300) + '...';
			}

			// Extract thumbnail
			const thumbnailElement = document.querySelector('video');
			if (thumbnailElement) {
				data.thumbnailUrl = thumbnailElement.poster || '';
			}

			return data;
		});

		return {
			originalShortsUrl: shortsUrl,
			convertedUrl: url,
			extractedAt: new Date().toISOString(),
			...metadata,
		};
	} catch (error) {
		console.error('Error scraping Shorts details:', error);
		throw error;
	} finally {
		await browser.close();
	}
};

// Usage example
const shortsUrl = 'https://www.youtube.com/shorts/ABC123xyz';
scrapeYouTubeShortsDetails(shortsUrl)
	.then((details) => {
		console.log('Shorts Title:', details.title);
		console.log('Channel:', details.channelName);
		console.log('Views:', details.views.toLocaleString());
		console.log('Likes:', details.likes.toLocaleString());
	})
	.catch((error) => console.error('Failed to scrape Shorts details:', error));

Technical Challenges and Solutions

1. URL Structure Conversion

The most critical aspect of YouTube Shorts scraping is proper URL conversion:

// URL validation and conversion
const validateAndConvertUrl = (url) => {
	try {
		const urlObj = new URL(url);
		if (
			urlObj.hostname !== 'www.youtube.com' &&
			urlObj.hostname !== 'youtube.com'
		) {
			throw new Error('Not a YouTube URL');
		}
		return convertShortsToRegularUrl(url);
	} catch (error) {
		throw new Error(`Invalid URL: ${error.message}`);
	}
};

2. Dynamic Content Loading

YouTube Shorts interface elements load asynchronously:

// Wait for critical elements before extraction
await page.waitForSelector('h1.ytd-watch-metadata', { timeout: 10000 });
await page.waitForSelector('#info', { timeout: 5000 });

// Additional wait for engagement metrics
await page.evaluate(() => window.scrollBy(0, 300));
await page.waitForTimeout(800);

Alternative: SocialKit YouTube Stats API

For production applications requiring reliable YouTube Shorts analytics, consider SocialKit's YouTube Stats API:

curl "https://api.socialkit.dev/youtube/stats?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ"

Example Response

{
	"success": true,
	"data": {
		"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
		"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
		"channelName": "Rick Astley",
		"channelLink": "https://youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
		"views": 1428567890,
		"likes": 16234567,
		"comments": 4567890
	}
}

API Benefits:

  • Automatic URL handling: Processes both Shorts and regular YouTube URLs
  • No conversion needed: Handles URL transformation internally
  • Consistent data structure: Standardized response format across all video types
  • Real-time accuracy: Always up-to-date with current video statistics
  • Scale-ready: Handle thousands of Shorts without rate limits
  • Global availability: Works worldwide without geo-restrictions

Free YouTube Tools

Need instant access to YouTube Shorts data? Try our free tools:

YouTube Video Summarizer Tool

Get AI-powered insights with our free YouTube Video Summarizer tool:

  • Analyze YouTube Shorts content with AI-powered summaries
  • Extract key themes and trending topics from short-form videos
  • Identify viral content patterns for your own content strategy
  • Get instant insights without any setup or registration required

YouTube Transcript Extractor Tool

Extract content from Shorts with our free YouTube Transcript Extractor tool:

  • Extract transcripts from YouTube Shorts automatically
  • Get timestamped segments for precise content analysis
  • Perfect for accessibility and content repurposing
  • 100% free with support for both Shorts and regular videos

Both tools automatically handle YouTube Shorts URLs and provide immediate value for content creators, social media managers, and digital marketers.

Conclusion

Extracting YouTube Shorts video details with Puppeteer requires mastering the critical URL conversion technique that transforms Shorts URLs into standard YouTube video URLs. This conversion unlocks access to YouTube's comprehensive metadata interface, enabling extraction of views, likes, comments, and other valuable analytics data.