Back to all posts

How to Scrape TikTok Video Stats with Puppeteer

Jonathan Geiger
web-scrapingpuppeteertiktoktutorialstats

TikTok has become one of the most influential social media platforms, with billions of videos generating massive engagement daily. For developers, marketers, and researchers, accessing TikTok video statistics programmatically is crucial for content analysis, trend monitoring, and competitive research. While TikTok doesn't provide a public API for video stats, web scraping with Puppeteer offers a powerful solution.

In this comprehensive guide, we'll explore how to extract TikTok video statistics using Puppeteer, including views, likes, comments, shares, creator information, and more. We'll cover everything from basic setup to advanced error handling techniques.

Prerequisites

Before diving into the implementation, ensure you have:

  • Node.js installed (version 14 or higher)
  • Basic knowledge of JavaScript and async/await
  • Understanding of DOM manipulation and CSS selectors
  • Familiarity with browser automation concepts
  • Experience with handling dynamic content loading
  • Important: Knowledge of anti-bot detection systems

Setting Up the Project

Create a new project and install dependencies:

mkdir tiktok-stats-scraper
cd tiktok-stats-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

The stealth plugin is crucial for TikTok scraping as it helps avoid detection by making our automated browser behavior appear more natural.

Important Disclaimer: CAPTCHA and Anti-Bot Measures

⚠️ Important Notice: TikTok implements sophisticated anti-bot detection systems and may present CAPTCHAs or other verification challenges during automated scraping. If you encounter these issues, you may need to use specialized services like:

  • Bright Data - Professional proxy and CAPTCHA solving services

For production use cases, consider using SocialKit's TikTok Stats API which handles these challenges automatically.

Basic Implementation

Let's start with a basic implementation that extracts core TikTok video statistics:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Use stealth plugin to avoid detection
puppeteer.use(StealthPlugin());

const extractTikTokStats = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 800 });

    // Navigate to TikTok video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(5000);

    // Extract video statistics
    const videoStats = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        // Remove all non-numeric characters except dots, commas, and K/M/B suffixes
        const cleanText = text.replace(/[^\d.,KMBkmb]/g, '').trim();
        
        if (!cleanText) return 0;
        
        // Check if it ends with K, M, or B (case insensitive)
        const lastChar = cleanText.slice(-1).toUpperCase();
        
        if (['K', 'M', 'B'].includes(lastChar)) {
          // Has suffix - extract the number part
          const numberPart = cleanText.slice(0, -1).replace(/,/g, '');
          const baseNum = parseFloat(numberPart);
          
          if (isNaN(baseNum)) return 0;
          
          let multiplier;
          switch(lastChar) {
            case 'K': multiplier = 1000; break;
            case 'M': multiplier = 1000000; break;
            case 'B': multiplier = 1000000000; break;
          }
          
          return Math.floor(baseNum * multiplier);
        } else {
          // No suffix - just parse the number
          const num = parseFloat(cleanText.replace(/,/g, ''));
          if (isNaN(num)) return 0;
          return Math.floor(num);
        }
      };

      const data = {
        title: '',
        channelName: '',
        channelLink: '',
        views: 0,
        likes: 0,
        comments: 0,
        shares: 0,
        description: '',
        musicTitle: ''
      };

      // Extract title/description
      const titleElement = document.querySelector('[data-e2e="browse-video-desc"]');
      if (titleElement) {
        data.title = titleElement.textContent.trim();
        data.description = data.title;
      }

      // Extract channel name
      const channelNameElement = document.querySelector('[data-e2e="browse-username"]');
      if (channelNameElement) {
        data.channelName = channelNameElement.textContent.trim();
      }

      // Extract channel link
      const channelLinkElement = document.querySelector('[data-e2e="browse-user-avatar"]');
      if (channelLinkElement && channelLinkElement.href) {
        data.channelLink = channelLinkElement.href;
      }

      // Extract engagement metrics
      const likesElement = document.querySelector('[data-e2e="like-count"]');
      if (likesElement) {
        data.likes = extractNumber(likesElement.textContent);
      }

      const commentsElement = document.querySelector('[data-e2e="comment-count"]');
      if (commentsElement) {
        data.comments = extractNumber(commentsElement.textContent);
      }

      const sharesElement = document.querySelector('[data-e2e="share-count"]');
      if (sharesElement) {
        data.shares = extractNumber(sharesElement.textContent);
      }

      // Extract music information
      const musicElement = document.querySelector('[data-e2e="browse-music"]');
      if (musicElement) {
        data.musicTitle = musicElement.textContent.trim();
      }

      return data;
    });

    return videoStats;

  } catch (error) {
    console.error('Error extracting TikTok stats:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const tiktokUrl = 'https://www.tiktok.com/@thepeteffect/video/7522711492140059912';
extractTikTokStats(tiktokUrl)
  .then(stats => console.log('TikTok Stats:', stats))
  .catch(error => console.error('Failed to extract stats:', error));

Advanced Implementation with JSON Data Extraction

TikTok stores detailed video statistics in JSON format within the page. Here's a more robust version that extracts comprehensive data:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const scrapeTikTokStats = async (url) => {
  const browser = await puppeteer.launch({
    headless: "new",
    ignoreDefaultArgs: ["--enable-automation"],
    args: [
      '--no-sandbox',
      '--disable-setuid-sandbox',
      '--disable-dev-shm-usage',
      '--disable-accelerated-2d-canvas',
      '--no-first-run',
      '--no-zygote',
      '--single-process',
      '--disable-gpu'
    ]
  });

  try {
    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 1024,
      deviceScaleFactor: 1,
    });

    // Set user agent to appear more natural
    await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');

    // Navigate to TikTok video
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
    await page.waitForTimeout(5000);

    // Extract comprehensive metadata
    const metadata = await page.evaluate(() => {
      const extractNumber = (text) => {
        if (!text) return 0;
        
        const cleanText = text.replace(/[^\d.,KMBkmb]/g, '').trim();
        if (!cleanText) return 0;
        
        const lastChar = cleanText.slice(-1).toUpperCase();
        
        if (['K', 'M', 'B'].includes(lastChar)) {
          const numberPart = cleanText.slice(0, -1).replace(/,/g, '');
          const baseNum = parseFloat(numberPart);
          
          if (isNaN(baseNum)) return 0;
          
          let multiplier;
          switch(lastChar) {
            case 'K': multiplier = 1000; break;
            case 'M': multiplier = 1000000; break;
            case 'B': multiplier = 1000000000; break;
          }
          
          return Math.floor(baseNum * multiplier);
        } else {
          const num = parseFloat(cleanText.replace(/,/g, ''));
          if (isNaN(num)) return 0;
          return Math.floor(num);
        }
      };

      const data = {
        title: '',
        channelName: '',
        channelLink: '',
        views: 0,
        likes: 0,
        comments: 0,
        shares: 0,
        description: '',
        duration: '',
        thumbnailUrl: '',
        musicTitle: '',
      };

      // First, try to get stats from the JSON data in script element
      try {
        const scriptElement = document.querySelector('#__UNIVERSAL_DATA_FOR_REHYDRATION__');
        if (scriptElement && scriptElement.textContent) {
          console.log('TikTok: Found script element with JSON data');
          const jsonData = JSON.parse(scriptElement.textContent);
          
          // Navigate to statsV2
          const statsV2 = jsonData["__DEFAULT_SCOPE__"]?.["webapp.video-detail"]?.["itemInfo"]?.["itemStruct"]?.["statsV2"];
          
          if (statsV2) {
            console.log('TikTok: Found statsV2:', statsV2);
            
            // Extract stats with exact numbers
            data.views = parseInt(statsV2.playCount || '0', 10);
            data.likes = parseInt(statsV2.diggCount || '0', 10);
            data.comments = parseInt(statsV2.commentCount || '0', 10);
            data.shares = parseInt(statsV2.shareCount || '0', 10);
            
            console.log('TikTok: Extracted stats from JSON:', {
              views: data.views,
              likes: data.likes,
              comments: data.comments,
              shares: data.shares
            });
          } else {
            console.log('TikTok: Could not find statsV2 in JSON data');
          }
        }
      } catch (e) {
        console.log('TikTok: Error parsing JSON data:', e.message);
      }

      // Extract title/description
      const titleElement = document.querySelector('[data-e2e="browse-video-desc"]');
      if (titleElement) {
        data.title = titleElement.textContent.trim();
        data.description = data.title;
      }

      // Extract channel name
      const channelNameElement = document.querySelector('[data-e2e="browse-username"]');
      if (channelNameElement) {
        data.channelName = channelNameElement.textContent.trim();
      }

      // Extract channel link
      const channelLinkElement = document.querySelector('[data-e2e="browse-user-avatar"]');
      if (channelLinkElement && channelLinkElement.href) {
        data.channelLink = channelLinkElement.href;
      }

      // If JSON parsing didn't work, fallback to DOM scraping for stats
      if (data.views === 0 && data.likes === 0 && data.comments === 0 && data.shares === 0) {
        console.log('TikTok: Using DOM fallback for stats extraction');
        
        // Extract likes
        const likesElement = document.querySelector('[data-e2e="like-count"]');
        if (likesElement) {
          data.likes = extractNumber(likesElement.textContent);
        }

        // Extract comments
        const commentsElement = document.querySelector('[data-e2e="comment-count"]');
        if (commentsElement) {
          data.comments = extractNumber(commentsElement.textContent);
        }

        // Extract shares
        const sharesElement = document.querySelector('[data-e2e="share-count"]');
        if (sharesElement) {
          data.shares = extractNumber(sharesElement.textContent);
        }
      }

      // Extract music info
      const musicElement = document.querySelector('[data-e2e="browse-music"]');
      if (musicElement) {
        data.musicTitle = musicElement.textContent.trim();
      }

      // Extract thumbnail URL
      const thumbnailElement = document.querySelector('[data-e2e="browse-user-avatar"] img');
      if (thumbnailElement && thumbnailElement.src) {
        data.thumbnailUrl = thumbnailElement.src;
      }

      // Try to get duration from video element
      const videoElement = document.querySelector('video');
      if (videoElement && videoElement.duration) {
        const duration = Math.floor(videoElement.duration);
        const minutes = Math.floor(duration / 60);
        const seconds = duration % 60;
        data.duration = `${minutes}:${seconds.toString().padStart(2, '0')}`;
      }

      return data;
    });

    return {
      url,
      extractedAt: new Date().toISOString(),
      platform: 'tiktok',
      ...metadata
    };

  } catch (error) {
    console.error('Error scraping TikTok stats:', error);
    throw error;
  } finally {
    await browser.close();
  }
};

// Usage
const tiktokUrl = 'https://www.tiktok.com/@thepeteffect/video/7522711492140059912';
scrapeTikTokStats(tiktokUrl)
  .then(stats => {
    console.log('Video Title:', stats.title);
    console.log('Creator:', stats.channelName);
    console.log('Views:', stats.views?.toLocaleString() || 'N/A');
    console.log('Likes:', stats.likes?.toLocaleString() || 'N/A');
    console.log('Comments:', stats.comments?.toLocaleString() || 'N/A');
    console.log('Shares:', stats.shares?.toLocaleString() || 'N/A');
    console.log('Music:', stats.musicTitle);
  })
  .catch(error => console.error('Failed to scrape stats:', error));

Handling Dynamic Content and Common Issues

1. TikTok's JSON Data Structure

TikTok stores comprehensive video data in a script tag with ID __UNIVERSAL_DATA_FOR_REHYDRATION__. This approach provides more accurate statistics:

// Extract exact view counts from JSON data
const scriptElement = document.querySelector('#__UNIVERSAL_DATA_FOR_REHYDRATION__');
if (scriptElement && scriptElement.textContent) {
  const jsonData = JSON.parse(scriptElement.textContent);
  const statsV2 = jsonData["__DEFAULT_SCOPE__"]?.["webapp.video-detail"]?.["itemInfo"]?.["itemStruct"]?.["statsV2"];
  
  if (statsV2) {
    data.views = parseInt(statsV2.playCount || '0', 10);
    data.likes = parseInt(statsV2.diggCount || '0', 10);
    data.comments = parseInt(statsV2.commentCount || '0', 10);
    data.shares = parseInt(statsV2.shareCount || '0', 10);
  }
}

2. Fallback DOM Extraction

When JSON parsing fails, fall back to DOM element extraction:

// Fallback to DOM scraping if JSON extraction fails
if (data.views === 0 && data.likes === 0) {
  const likesElement = document.querySelector('[data-e2e="like-count"]');
  if (likesElement) {
    data.likes = extractNumber(likesElement.textContent);
  }
}

3. Number Extraction and Formatting

TikTok uses abbreviated numbers (1.2M, 45K). Our extractNumber function handles:

  • Comma-separated numbers (1,234,567)
  • Abbreviated suffixes (K, M, B) - case insensitive
  • Decimal points (1.2M)
  • Validation and error handling

4. Anti-Bot Detection Handling

TikTok employs sophisticated anti-bot measures:

// Enhanced browser configuration
const browser = await puppeteer.launch({
  headless: "new",
  ignoreDefaultArgs: ["--enable-automation"],
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    '--disable-accelerated-2d-canvas',
    '--no-first-run',
    '--no-zygote',
    '--single-process',
    '--disable-gpu'
  ]
});

// Set realistic user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');

Alternative: Using SocialKit TikTok API

For production applications requiring reliable TikTok stats extraction, consider using SocialKit's TikTok Stats API:

curl "https://api.socialkit.dev/tiktok/stats?access_key=<your-access-key>&url=https://www.tiktok.com/@thepeteffect/video/7522711492140059912"

Example Response

{
  "success": true,
  "data": {
    "url": "https://www.tiktok.com/@thepeteffect/video/7522711492140059912",
    "title": "Cute pet tricks that will make you smile 🐕✨",
    "channelName": "@thepeteffect",
    "channelLink": "https://www.tiktok.com/@thepeteffect",
    "views": 2459876,
    "likes": 123456,
    "comments": 5678,
    "shares": 987,
    "musicTitle": "Original Sound - thepeteffect",
    "duration": "0:45"
  }
}

Benefits of using SocialKit:

  • Bypass anti-bot detection: Handles CAPTCHAs and verification challenges automatically
  • No infrastructure overhead: Faster and more resource-efficient than running browsers
  • Consistent data extraction: Adapts to TikTok's interface changes automatically
  • Built-in retry logic: Automatic error handling and intelligent retries
  • Scale-ready: Handle thousands of videos without proxy or infrastructure concerns
  • Always up-to-date: Maintains compatibility with TikTok's evolving platform
  • JSON data access: Gets exact view counts from TikTok's internal data structures

Conclusion

Scraping TikTok video statistics with Puppeteer provides valuable insights for content analysis and trend monitoring. For production use, consider using SocialKit TikTok Stats API to save time and ensure reliability.

If you’re interested in more scraping tutorials, here are some tutorials you can check next: