How to Extract YouTube Video Details Using Puppeteer

July 26, 2025•Jonathan Geiger

web-scrapingpuppeteeryoutubeyoutube shortstutorial

YouTube video details contain valuable metadata that developers, content creators, and researchers often need to extract programmatically. While YouTube provides an API, it requires authentication and has rate limits. For simpler use cases or when you need complete control over the extraction process, web scraping with Puppeteer offers a powerful alternative.

In this comprehensive guide, we'll explore how to extract YouTube video details using Puppeteer, including titles, view counts, likes, comments, channel information, and more. We'll cover everything from basic setup to advanced error handling techniques.

Prerequisites

Before diving into the implementation, ensure you have:

Node.js installed (version 14 or higher)
Basic knowledge of JavaScript and async/await
Understanding of DOM manipulation and CSS selectors
Familiarity with browser automation concepts
Experience with handling dynamic content loading

Setting Up the Project

Create a new project and install dependencies:

mkdir youtube-details-scraper
cd youtube-details-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

The stealth plugin helps avoid detection by making our automated browser behavior appear more natural.

Basic Implementation

Let's start with a basic implementation that extracts core video details:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Use stealth plugin to avoid detection
puppeteer.use(StealthPlugin());

const extractYouTubeDetails = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(1000);

    // Handle cookie banner (EU compliance)
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
          console.log('Closed cookie banner');
        }
      });
      await page.waitForTimeout(1000);
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Scroll down to load below-the-fold content
    await page.evaluate(() => window.scrollBy(0, 300));
    await page.waitForTimeout(600);

    // Extract video details
    const videoDetails = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
        const match = cleanText.match(/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/);
        if (!match) return 0;
        
        const numStr = match[0];
        const suffix = numStr.slice(-1);
        
        if (['K', 'M', 'B'].includes(suffix)) {
          const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
          if (isNaN(num)) return 0;
          
          switch(suffix) {
            case 'K': return Math.floor(num * 1000);
            case 'M': return Math.floor(num * 1000000);
            case 'B': return Math.floor(num * 1000000000);
          }
        } else {
          const num = parseFloat(numStr.replace(/,/g, ''));
          return isNaN(num) ? 0 : Math.floor(num);
        }
        
        return 0;
      };

      const data = {};

      // Extract title
      const titleElement = document.querySelector('h1.ytd-watch-metadata yt-formatted-string');
      data.title = titleElement ? titleElement.textContent.trim() : '';

      // Extract channel information
      const channelElement = document.querySelector('ytd-channel-name a');
      if (channelElement) {
        data.channelName = channelElement.textContent.trim();
        data.channelLink = channelElement.href || '';
      }

      // Extract views
      const viewsElement = document.querySelector('#info span[class*="view"]');
      data.views = viewsElement ? extractNumber(viewsElement.textContent) : 0;

      // Extract likes
      const likesElement = document.querySelector('.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content');
      data.likes = likesElement ? extractNumber(likesElement.textContent) : 0;

      return data;
    });

    return videoDetails;

  } catch (error) {
    console.error('Error extracting video details:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
extractYouTubeDetails(videoUrl)
  .then(details => console.log('Video Details:', details))
  .catch(error => console.error('Failed to extract details:', error));

Advanced Implementation with Comprehensive Data Extraction

Here's a more robust version that extracts all available video metadata:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const scrapeYouTubeDetails = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 1024,
      deviceScaleFactor: 1,
    });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(1000);

    // Handle cookie banner
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
        }
      });
      await page.waitForTimeout(1000);
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Scroll down to load content below the fold
    try {
      await page.evaluate(() => window.scrollBy(0, 300));
      await page.waitForTimeout(600);
    } catch (e) {
      console.log('Could not scroll page');
    }

    // Extract comprehensive metadata
    const metadata = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
        const match = cleanText.match(/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/);
        if (!match) return 0;
        
        const numStr = match[0];
        const suffix = numStr.slice(-1);
        
        if (['K', 'M', 'B'].includes(suffix)) {
          const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
          if (isNaN(num) || num < 0 || num > 999999) return 0;
          
          switch(suffix) {
            case 'K': return Math.floor(num * 1000);
            case 'M': return Math.floor(num * 1000000);
            case 'B': return Math.floor(num * 1000000000);
          }
        } else {
          const num = parseFloat(numStr.replace(/,/g, ''));
          if (isNaN(num) || num < 0 || num > 999999999999) return 0;
          return Math.floor(num);
        }
        
        return 0;
      };

      const data = {
        title: '',
        channelName: '',
        channelLink: '',
        views: 0,
        likes: 0,
        comments: 0,
        publishDate: '',
        description: '',
        thumbnailUrl: ''
      };

      // Title
      const titleElement = document.querySelector('h1.ytd-watch-metadata yt-formatted-string');
      if (titleElement) {
        data.title = titleElement.textContent.trim();
      }

      // Channel Name and Link
      const channelElement = document.querySelector('ytd-channel-name a');
      if (channelElement) {
        data.channelName = channelElement.textContent.trim();
        data.channelLink = channelElement.href || '';
      }

      // Views - try multiple selectors
      const viewsSelectors = [
        '#info span[class*="view"]',
        '#info .style-scope.yt-formatted-string',
        '#info .view-count',
      ];

      for (const selector of viewsSelectors) {
        const viewsElement = document.querySelector(selector);
        if (viewsElement && viewsElement.textContent.trim()) {
          const text = viewsElement.textContent.trim();
          if (text.includes('views') || text.includes('view') || /[\d,]+[KMB]?\s*(views?|watching)/i.test(text)) {
            data.views = extractNumber(text);
            break;
          }
        }
      }

      // Likes
      const likesElement = document.querySelector(
        '.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
      );
      if (likesElement) {
        data.likes = extractNumber(likesElement.textContent);
      }

      // Comments
      const commentsElement = document.querySelector('#title #count span');
      if (commentsElement) {
        data.comments = extractNumber(commentsElement.textContent);
      }

      // Publish date
      const publishElement = document.querySelector('ytd-watch-metadata #info-strings yt-formatted-string:nth-child(2)');
      if (publishElement) {
        data.publishDate = publishElement.textContent.trim();
      }

      // Description
      const descriptionElement = document.querySelector('ytd-watch-metadata #description-text');
      if (descriptionElement) {
        data.description = descriptionElement.textContent.trim().substring(0, 500) + '...';
      }

      // Thumbnail
      const thumbnailElement = document.querySelector('video');
      if (thumbnailElement) {
        data.thumbnailUrl = thumbnailElement.poster || '';
      }

      return data;
    });

    return {
      url,
      extractedAt: new Date().toISOString(),
      ...metadata
    };

  } catch (error) {
    console.error('Error scraping video details:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
scrapeYouTubeDetails(videoUrl)
  .then(details => {
    console.log('Video Title:', details.title);
    console.log('Channel:', details.channelName);
    console.log('Views:', details.views.toLocaleString());
    console.log('Likes:', details.likes.toLocaleString());
    console.log('Published:', details.publishDate);
  })
  .catch(error => console.error('Failed to scrape details:', error));

Handling Dynamic Content and Common Issues

1. Waiting for Elements to Load

YouTube loads content dynamically, so proper timing is crucial:

// Wait for specific elements before extraction
await page.waitForSelector('h1.ytd-watch-metadata', { timeout: 10000 });
await page.waitForSelector('#info', { timeout: 5000 });

2. Scrolling to Load Below-the-Fold Content

Some elements only load when they come into view:

// Scroll to load engagement metrics
await page.evaluate(() => {
  window.scrollBy(0, 300);
});
await page.waitForTimeout(600);

3. Number Extraction and Formatting

YouTube uses abbreviated numbers (1.2M, 45K). Our extractNumber function handles:

Comma-separated numbers (1,234,567)
Abbreviated suffixes (K, M, B)
Decimal points (1.2M)
Validation and bounds checking

4. Multiple Selector Fallbacks

YouTube's interface changes frequently. Use multiple selectors:

const viewsSelectors = [
  '#info span[class*="view"]',
  '#info .style-scope.yt-formatted-string',
  '#info .view-count',
];

for (const selector of viewsSelectors) {
  const element = document.querySelector(selector);
  if (element && element.textContent.includes('views')) {
    // Extract views
    break;
  }
}

Best Practices

Respect Rate Limits: Add delays between requests to avoid being blocked
Use Stealth Mode: Puppeteer-extra-plugin-stealth makes detection harder
Handle Errors Gracefully: Always wrap extraction code in try-catch blocks
Validate Data: Check extracted numbers for reasonable bounds
Cache Results: Store extracted data to avoid repeated requests
Monitor Selector Changes: YouTube updates its interface regularly

Alternative: Using SocialKit YouTube API

For production applications requiring reliable video details extraction, consider using SocialKit's YouTube Stats API:

curl "https://api.socialkit.dev/youtube/stats?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ"

Example Response

{
	"success": true,
	"data": {
		"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
		"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
		"channelName": "Rick Astley",
		"channelLink": "https://youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
		"views": 1428567890,
		"likes": 16234567,
		"comments": 4567890
	}
}

Benefits of using SocialKit:

Reliable extraction: Handles YouTube's interface changes automatically
No browser overhead: Faster and more resource-efficient
Consistent formatting: Standardized data structure across all videos
Built-in retry logic: Automatic error handling and retries
Scale-ready: Handle thousands of videos without infrastructure concerns
Always up-to-date: Adapts to YouTube changes without code updates

Free YouTube Tools

Need quick video insights without coding? Try our free tools:

YouTube Video Summarizer Tool

Get AI-powered summaries and insights with our free YouTube Video Summarizer tool:

Generate detailed summaries of any YouTube video or YouTube Shorts
Extract key insights including main topics, quotes, and takeaways
Analyze video content for tone, target audience, and themes
Get instant results without any setup or registration

Try the Free YouTube Video Summarizer

YouTube Transcript Extractor Tool

Extract accurate transcripts with our free YouTube Transcript Extractor tool:

Extract timestamped transcripts from any YouTube video
Copy segments individually or get the complete transcript
Perfect for content analysis and accessibility purposes
100% free with no API keys required

Try the Free YouTube Transcript Extractor

Both tools complement video details extraction by providing deeper content insights for researchers, content creators, and developers.

Advanced Use Cases

Batch Processing Multiple Videos

const processVideoList = async (urls) => {
  const results = [];
  
  for (const url of urls) {
    try {
      const details = await scrapeYouTubeDetails(url);
      results.push(details);
      
      // Add delay between requests
      await new Promise(resolve => setTimeout(resolve, 2000));
    } catch (error) {
      console.error(`Failed to process ${url}:`, error.message);
      results.push({ url, error: error.message });
    }
  }
  
  return results;
};

Channel Analysis

const analyzeChannel = async (channelUrl) => {
  // Navigate to channel and extract video URLs
  // Then process each video for detailed analytics
  const videoDetails = await processVideoList(videoUrls);
  
  return {
    channelName: channelDetails.name,
    totalVideos: videoDetails.length,
    totalViews: videoDetails.reduce((sum, video) => sum + video.views, 0),
    averageViews: videoDetails.reduce((sum, video) => sum + video.views, 0) / videoDetails.length,
    videos: videoDetails
  };
};

Conclusion

Extracting YouTube video details with Puppeteer provides complete control over the data extraction process. While the implementation requires careful handling of dynamic content loading, element timing, and YouTube's evolving interface, the results enable powerful video analytics and content research capabilities.

For production applications or when processing large volumes of videos, consider using SocialKit's YouTube API for reliability and scale. For quick insights and analysis, our free YouTube tools provide immediate value without any setup.

Remember to respect YouTube's terms of service, implement appropriate rate limiting, and handle errors gracefully to build robust video data extraction systems.

Whether you're building content analytics dashboards, research tools, or social media management platforms, automated YouTube video details extraction opens up endless possibilities for data-driven insights.