Back to all posts

How to Extract YouTube Video Details Using Puppeteer

Jonathan Geiger
web-scrapingpuppeteeryoutubeyoutube shortstutorial

YouTube video details contain valuable metadata that developers, content creators, and researchers often need to extract programmatically. While YouTube provides an API, it requires authentication and has rate limits. For simpler use cases or when you need complete control over the extraction process, web scraping with Puppeteer offers a powerful alternative.

In this comprehensive guide, we'll explore how to extract YouTube video details using Puppeteer, including titles, view counts, likes, comments, channel information, and more. We'll cover everything from basic setup to advanced error handling techniques.

Prerequisites

Before diving into the implementation, ensure you have:

  • Node.js installed (version 14 or higher)
  • Basic knowledge of JavaScript and async/await
  • Understanding of DOM manipulation and CSS selectors
  • Familiarity with browser automation concepts
  • Experience with handling dynamic content loading

Setting Up the Project

Create a new project and install dependencies:

mkdir youtube-details-scraper
cd youtube-details-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

The stealth plugin helps avoid detection by making our automated browser behavior appear more natural.

Basic Implementation

Let's start with a basic implementation that extracts core video details:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Use stealth plugin to avoid detection
puppeteer.use(StealthPlugin());

const extractYouTubeDetails = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(1000);

    // Handle cookie banner (EU compliance)
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
          console.log('Closed cookie banner');
        }
      });
      await page.waitForTimeout(1000);
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Scroll down to load below-the-fold content
    await page.evaluate(() => window.scrollBy(0, 300));
    await page.waitForTimeout(600);

    // Extract video details
    const videoDetails = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
        const match = cleanText.match(/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/);
        if (!match) return 0;
        
        const numStr = match[0];
        const suffix = numStr.slice(-1);
        
        if (['K', 'M', 'B'].includes(suffix)) {
          const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
          if (isNaN(num)) return 0;
          
          switch(suffix) {
            case 'K': return Math.floor(num * 1000);
            case 'M': return Math.floor(num * 1000000);
            case 'B': return Math.floor(num * 1000000000);
          }
        } else {
          const num = parseFloat(numStr.replace(/,/g, ''));
          return isNaN(num) ? 0 : Math.floor(num);
        }
        
        return 0;
      };

      const data = {};

      // Extract title
      const titleElement = document.querySelector('h1.ytd-watch-metadata yt-formatted-string');
      data.title = titleElement ? titleElement.textContent.trim() : '';

      // Extract channel information
      const channelElement = document.querySelector('ytd-channel-name a');
      if (channelElement) {
        data.channelName = channelElement.textContent.trim();
        data.channelLink = channelElement.href || '';
      }

      // Extract views
      const viewsElement = document.querySelector('#info span[class*="view"]');
      data.views = viewsElement ? extractNumber(viewsElement.textContent) : 0;

      // Extract likes
      const likesElement = document.querySelector('.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content');
      data.likes = likesElement ? extractNumber(likesElement.textContent) : 0;

      return data;
    });

    return videoDetails;

  } catch (error) {
    console.error('Error extracting video details:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
extractYouTubeDetails(videoUrl)
  .then(details => console.log('Video Details:', details))
  .catch(error => console.error('Failed to extract details:', error));

Advanced Implementation with Comprehensive Data Extraction

Here's a more robust version that extracts all available video metadata:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const scrapeYouTubeDetails = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 1024,
      deviceScaleFactor: 1,
    });

    // Navigate to YouTube video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(1000);

    // Handle cookie banner
    try {
      await page.evaluate(() => {
        const cookieButton = document.querySelector('button[aria-label*="cookies"]');
        if (cookieButton) {
          cookieButton.click();
        }
      });
      await page.waitForTimeout(1000);
    } catch (e) {
      console.log('No cookie banner found');
    }

    // Scroll down to load content below the fold
    try {
      await page.evaluate(() => window.scrollBy(0, 300));
      await page.waitForTimeout(600);
    } catch (e) {
      console.log('Could not scroll page');
    }

    // Extract comprehensive metadata
    const metadata = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        const cleanText = text.replace(/[^\d.,KMB\s]/g, '').trim();
        const match = cleanText.match(/(\d{1,3}(?:,\d{3})*(?:\.\d+)?[KMB]?|\d+(?:\.\d+)?[KMB]?)/);
        if (!match) return 0;
        
        const numStr = match[0];
        const suffix = numStr.slice(-1);
        
        if (['K', 'M', 'B'].includes(suffix)) {
          const num = parseFloat(numStr.slice(0, -1).replace(/,/g, ''));
          if (isNaN(num) || num < 0 || num > 999999) return 0;
          
          switch(suffix) {
            case 'K': return Math.floor(num * 1000);
            case 'M': return Math.floor(num * 1000000);
            case 'B': return Math.floor(num * 1000000000);
          }
        } else {
          const num = parseFloat(numStr.replace(/,/g, ''));
          if (isNaN(num) || num < 0 || num > 999999999999) return 0;
          return Math.floor(num);
        }
        
        return 0;
      };

      const data = {
        title: '',
        channelName: '',
        channelLink: '',
        views: 0,
        likes: 0,
        comments: 0,
        publishDate: '',
        description: '',
        thumbnailUrl: ''
      };

      // Title
      const titleElement = document.querySelector('h1.ytd-watch-metadata yt-formatted-string');
      if (titleElement) {
        data.title = titleElement.textContent.trim();
      }

      // Channel Name and Link
      const channelElement = document.querySelector('ytd-channel-name a');
      if (channelElement) {
        data.channelName = channelElement.textContent.trim();
        data.channelLink = channelElement.href || '';
      }

      // Views - try multiple selectors
      const viewsSelectors = [
        '#info span[class*="view"]',
        '#info .style-scope.yt-formatted-string',
        '#info .view-count',
      ];

      for (const selector of viewsSelectors) {
        const viewsElement = document.querySelector(selector);
        if (viewsElement && viewsElement.textContent.trim()) {
          const text = viewsElement.textContent.trim();
          if (text.includes('views') || text.includes('view') || /[\d,]+[KMB]?\s*(views?|watching)/i.test(text)) {
            data.views = extractNumber(text);
            break;
          }
        }
      }

      // Likes
      const likesElement = document.querySelector(
        '.ytd-watch-metadata .yt-spec-button-view-model .yt-spec-button-shape-next__button-text-content'
      );
      if (likesElement) {
        data.likes = extractNumber(likesElement.textContent);
      }

      // Comments
      const commentsElement = document.querySelector('#title #count span');
      if (commentsElement) {
        data.comments = extractNumber(commentsElement.textContent);
      }

      // Publish date
      const publishElement = document.querySelector('ytd-watch-metadata #info-strings yt-formatted-string:nth-child(2)');
      if (publishElement) {
        data.publishDate = publishElement.textContent.trim();
      }

      // Description
      const descriptionElement = document.querySelector('ytd-watch-metadata #description-text');
      if (descriptionElement) {
        data.description = descriptionElement.textContent.trim().substring(0, 500) + '...';
      }

      // Thumbnail
      const thumbnailElement = document.querySelector('video');
      if (thumbnailElement) {
        data.thumbnailUrl = thumbnailElement.poster || '';
      }

      return data;
    });

    return {
      url,
      extractedAt: new Date().toISOString(),
      ...metadata
    };

  } catch (error) {
    console.error('Error scraping video details:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const videoUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ';
scrapeYouTubeDetails(videoUrl)
  .then(details => {
    console.log('Video Title:', details.title);
    console.log('Channel:', details.channelName);
    console.log('Views:', details.views.toLocaleString());
    console.log('Likes:', details.likes.toLocaleString());
    console.log('Published:', details.publishDate);
  })
  .catch(error => console.error('Failed to scrape details:', error));

Handling Dynamic Content and Common Issues

1. Waiting for Elements to Load

YouTube loads content dynamically, so proper timing is crucial:

// Wait for specific elements before extraction
await page.waitForSelector('h1.ytd-watch-metadata', { timeout: 10000 });
await page.waitForSelector('#info', { timeout: 5000 });

2. Scrolling to Load Below-the-Fold Content

Some elements only load when they come into view:

// Scroll to load engagement metrics
await page.evaluate(() => {
  window.scrollBy(0, 300);
});
await page.waitForTimeout(600);

3. Number Extraction and Formatting

YouTube uses abbreviated numbers (1.2M, 45K). Our extractNumber function handles:

  • Comma-separated numbers (1,234,567)
  • Abbreviated suffixes (K, M, B)
  • Decimal points (1.2M)
  • Validation and bounds checking

4. Multiple Selector Fallbacks

YouTube's interface changes frequently. Use multiple selectors:

const viewsSelectors = [
  '#info span[class*="view"]',
  '#info .style-scope.yt-formatted-string',
  '#info .view-count',
];

for (const selector of viewsSelectors) {
  const element = document.querySelector(selector);
  if (element && element.textContent.includes('views')) {
    // Extract views
    break;
  }
}

Best Practices

  1. Respect Rate Limits: Add delays between requests to avoid being blocked
  2. Use Stealth Mode: Puppeteer-extra-plugin-stealth makes detection harder
  3. Handle Errors Gracefully: Always wrap extraction code in try-catch blocks
  4. Validate Data: Check extracted numbers for reasonable bounds
  5. Cache Results: Store extracted data to avoid repeated requests
  6. Monitor Selector Changes: YouTube updates its interface regularly

Alternative: Using SocialKit YouTube API

For production applications requiring reliable video details extraction, consider using SocialKit's YouTube Stats API:

curl "https://api.socialkit.dev/youtube/stats?access_key=<your-access-key>&url=https://youtube.com/watch?v=dQw4w9WgXcQ"

Example Response

{
	"success": true,
	"data": {
		"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
		"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
		"channelName": "Rick Astley",
		"channelLink": "https://youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
		"views": 1428567890,
		"likes": 16234567,
		"comments": 4567890
	}
}

Benefits of using SocialKit:

  • Reliable extraction: Handles YouTube's interface changes automatically
  • No browser overhead: Faster and more resource-efficient
  • Consistent formatting: Standardized data structure across all videos
  • Built-in retry logic: Automatic error handling and retries
  • Scale-ready: Handle thousands of videos without infrastructure concerns
  • Always up-to-date: Adapts to YouTube changes without code updates

Free YouTube Tools

Need quick video insights without coding? Try our free tools:

YouTube Video Summarizer Tool

Get AI-powered summaries and insights with our free YouTube Video Summarizer tool:

  • Generate detailed summaries of any YouTube video or YouTube Shorts
  • Extract key insights including main topics, quotes, and takeaways
  • Analyze video content for tone, target audience, and themes
  • Get instant results without any setup or registration

Try the Free YouTube Video Summarizer

YouTube Transcript Extractor Tool

Extract accurate transcripts with our free YouTube Transcript Extractor tool:

  • Extract timestamped transcripts from any YouTube video
  • Copy segments individually or get the complete transcript
  • Perfect for content analysis and accessibility purposes
  • 100% free with no API keys required

Try the Free YouTube Transcript Extractor

Both tools complement video details extraction by providing deeper content insights for researchers, content creators, and developers.

Advanced Use Cases

Batch Processing Multiple Videos

const processVideoList = async (urls) => {
  const results = [];
  
  for (const url of urls) {
    try {
      const details = await scrapeYouTubeDetails(url);
      results.push(details);
      
      // Add delay between requests
      await new Promise(resolve => setTimeout(resolve, 2000));
    } catch (error) {
      console.error(`Failed to process ${url}:`, error.message);
      results.push({ url, error: error.message });
    }
  }
  
  return results;
};

Channel Analysis

const analyzeChannel = async (channelUrl) => {
  // Navigate to channel and extract video URLs
  // Then process each video for detailed analytics
  const videoDetails = await processVideoList(videoUrls);
  
  return {
    channelName: channelDetails.name,
    totalVideos: videoDetails.length,
    totalViews: videoDetails.reduce((sum, video) => sum + video.views, 0),
    averageViews: videoDetails.reduce((sum, video) => sum + video.views, 0) / videoDetails.length,
    videos: videoDetails
  };
};

Conclusion

Extracting YouTube video details with Puppeteer provides complete control over the data extraction process. While the implementation requires careful handling of dynamic content loading, element timing, and YouTube's evolving interface, the results enable powerful video analytics and content research capabilities.

For production applications or when processing large volumes of videos, consider using SocialKit's YouTube API for reliability and scale. For quick insights and analysis, our free YouTube tools provide immediate value without any setup.

Remember to respect YouTube's terms of service, implement appropriate rate limiting, and handle errors gracefully to build robust video data extraction systems.

Whether you're building content analytics dashboards, research tools, or social media management platforms, automated YouTube video details extraction opens up endless possibilities for data-driven insights.