How to Scrape TikTok Video Stats with Puppeteer
TikTok has become one of the most influential social media platforms, with billions of videos generating massive engagement daily. For developers, marketers, and researchers, accessing TikTok video statistics programmatically is crucial for content analysis, trend monitoring, and competitive research. While TikTok doesn't provide a public API for video stats, web scraping with Puppeteer offers a powerful solution.
In this comprehensive guide, we'll explore how to extract TikTok video statistics using Puppeteer, including views, likes, comments, shares, creator information, and more. We'll cover everything from basic setup to advanced error handling techniques.
Prerequisites
Before diving into the implementation, ensure you have:
- Node.js installed (version 14 or higher)
- Basic knowledge of JavaScript and async/await
- Understanding of DOM manipulation and CSS selectors
- Familiarity with browser automation concepts
- Experience with handling dynamic content loading
- Important: Knowledge of anti-bot detection systems
Setting Up the Project
Create a new project and install dependencies:
mkdir tiktok-stats-scraper
cd tiktok-stats-scraper
npm init -y
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
The stealth plugin is crucial for TikTok scraping as it helps avoid detection by making our automated browser behavior appear more natural.
Important Disclaimer: CAPTCHA and Anti-Bot Measures
⚠️ Important Notice: TikTok implements sophisticated anti-bot detection systems and may present CAPTCHAs or other verification challenges during automated scraping. If you encounter these issues, you may need to use specialized services like:
- Bright Data - Professional proxy and CAPTCHA solving services
For production use cases, consider using SocialKit's TikTok Stats API which handles these challenges automatically.
Basic Implementation
Let's start with a basic implementation that extracts core TikTok video statistics:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// Use stealth plugin to avoid detection
puppeteer.use(StealthPlugin());
const extractTikTokStats = async (url) => {
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"]
});
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800 });
// Navigate to TikTok video
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(5000);
// Extract video statistics
const videoStats = await page.evaluate(() => {
const extractNumber = (text) => {
if (!text) return 0;
// Remove all non-numeric characters except dots, commas, and K/M/B suffixes
const cleanText = text.replace(/[^\d.,KMBkmb]/g, '').trim();
if (!cleanText) return 0;
// Check if it ends with K, M, or B (case insensitive)
const lastChar = cleanText.slice(-1).toUpperCase();
if (['K', 'M', 'B'].includes(lastChar)) {
// Has suffix - extract the number part
const numberPart = cleanText.slice(0, -1).replace(/,/g, '');
const baseNum = parseFloat(numberPart);
if (isNaN(baseNum)) return 0;
let multiplier;
switch(lastChar) {
case 'K': multiplier = 1000; break;
case 'M': multiplier = 1000000; break;
case 'B': multiplier = 1000000000; break;
}
return Math.floor(baseNum * multiplier);
} else {
// No suffix - just parse the number
const num = parseFloat(cleanText.replace(/,/g, ''));
if (isNaN(num)) return 0;
return Math.floor(num);
}
};
const data = {
title: '',
channelName: '',
channelLink: '',
views: 0,
likes: 0,
comments: 0,
shares: 0,
description: '',
musicTitle: ''
};
// Extract title/description
const titleElement = document.querySelector('[data-e2e="browse-video-desc"]');
if (titleElement) {
data.title = titleElement.textContent.trim();
data.description = data.title;
}
// Extract channel name
const channelNameElement = document.querySelector('[data-e2e="browse-username"]');
if (channelNameElement) {
data.channelName = channelNameElement.textContent.trim();
}
// Extract channel link
const channelLinkElement = document.querySelector('[data-e2e="browse-user-avatar"]');
if (channelLinkElement && channelLinkElement.href) {
data.channelLink = channelLinkElement.href;
}
// Extract engagement metrics
const likesElement = document.querySelector('[data-e2e="like-count"]');
if (likesElement) {
data.likes = extractNumber(likesElement.textContent);
}
const commentsElement = document.querySelector('[data-e2e="comment-count"]');
if (commentsElement) {
data.comments = extractNumber(commentsElement.textContent);
}
const sharesElement = document.querySelector('[data-e2e="share-count"]');
if (sharesElement) {
data.shares = extractNumber(sharesElement.textContent);
}
// Extract music information
const musicElement = document.querySelector('[data-e2e="browse-music"]');
if (musicElement) {
data.musicTitle = musicElement.textContent.trim();
}
return data;
});
return videoStats;
} catch (error) {
console.error('Error extracting TikTok stats:', error);
throw error;
} finally {
await browser.close();
}
};
// Usage
const tiktokUrl = 'https://www.tiktok.com/@thepeteffect/video/7522711492140059912';
extractTikTokStats(tiktokUrl)
.then(stats => console.log('TikTok Stats:', stats))
.catch(error => console.error('Failed to extract stats:', error));
Advanced Implementation with JSON Data Extraction
TikTok stores detailed video statistics in JSON format within the page. Here's a more robust version that extracts comprehensive data:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const scrapeTikTokStats = async (url) => {
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"],
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--no-first-run',
'--no-zygote',
'--single-process',
'--disable-gpu'
]
});
try {
const page = await browser.newPage();
await page.setViewport({
width: 1280,
height: 1024,
deviceScaleFactor: 1,
});
// Set user agent to appear more natural
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
// Navigate to TikTok video
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
await page.waitForTimeout(5000);
// Extract comprehensive metadata
const metadata = await page.evaluate(() => {
const extractNumber = (text) => {
if (!text) return 0;
const cleanText = text.replace(/[^\d.,KMBkmb]/g, '').trim();
if (!cleanText) return 0;
const lastChar = cleanText.slice(-1).toUpperCase();
if (['K', 'M', 'B'].includes(lastChar)) {
const numberPart = cleanText.slice(0, -1).replace(/,/g, '');
const baseNum = parseFloat(numberPart);
if (isNaN(baseNum)) return 0;
let multiplier;
switch(lastChar) {
case 'K': multiplier = 1000; break;
case 'M': multiplier = 1000000; break;
case 'B': multiplier = 1000000000; break;
}
return Math.floor(baseNum * multiplier);
} else {
const num = parseFloat(cleanText.replace(/,/g, ''));
if (isNaN(num)) return 0;
return Math.floor(num);
}
};
const data = {
title: '',
channelName: '',
channelLink: '',
views: 0,
likes: 0,
comments: 0,
shares: 0,
description: '',
duration: '',
thumbnailUrl: '',
musicTitle: '',
};
// First, try to get stats from the JSON data in script element
try {
const scriptElement = document.querySelector('#__UNIVERSAL_DATA_FOR_REHYDRATION__');
if (scriptElement && scriptElement.textContent) {
console.log('TikTok: Found script element with JSON data');
const jsonData = JSON.parse(scriptElement.textContent);
// Navigate to statsV2
const statsV2 = jsonData["__DEFAULT_SCOPE__"]?.["webapp.video-detail"]?.["itemInfo"]?.["itemStruct"]?.["statsV2"];
if (statsV2) {
console.log('TikTok: Found statsV2:', statsV2);
// Extract stats with exact numbers
data.views = parseInt(statsV2.playCount || '0', 10);
data.likes = parseInt(statsV2.diggCount || '0', 10);
data.comments = parseInt(statsV2.commentCount || '0', 10);
data.shares = parseInt(statsV2.shareCount || '0', 10);
console.log('TikTok: Extracted stats from JSON:', {
views: data.views,
likes: data.likes,
comments: data.comments,
shares: data.shares
});
} else {
console.log('TikTok: Could not find statsV2 in JSON data');
}
}
} catch (e) {
console.log('TikTok: Error parsing JSON data:', e.message);
}
// Extract title/description
const titleElement = document.querySelector('[data-e2e="browse-video-desc"]');
if (titleElement) {
data.title = titleElement.textContent.trim();
data.description = data.title;
}
// Extract channel name
const channelNameElement = document.querySelector('[data-e2e="browse-username"]');
if (channelNameElement) {
data.channelName = channelNameElement.textContent.trim();
}
// Extract channel link
const channelLinkElement = document.querySelector('[data-e2e="browse-user-avatar"]');
if (channelLinkElement && channelLinkElement.href) {
data.channelLink = channelLinkElement.href;
}
// If JSON parsing didn't work, fallback to DOM scraping for stats
if (data.views === 0 && data.likes === 0 && data.comments === 0 && data.shares === 0) {
console.log('TikTok: Using DOM fallback for stats extraction');
// Extract likes
const likesElement = document.querySelector('[data-e2e="like-count"]');
if (likesElement) {
data.likes = extractNumber(likesElement.textContent);
}
// Extract comments
const commentsElement = document.querySelector('[data-e2e="comment-count"]');
if (commentsElement) {
data.comments = extractNumber(commentsElement.textContent);
}
// Extract shares
const sharesElement = document.querySelector('[data-e2e="share-count"]');
if (sharesElement) {
data.shares = extractNumber(sharesElement.textContent);
}
}
// Extract music info
const musicElement = document.querySelector('[data-e2e="browse-music"]');
if (musicElement) {
data.musicTitle = musicElement.textContent.trim();
}
// Extract thumbnail URL
const thumbnailElement = document.querySelector('[data-e2e="browse-user-avatar"] img');
if (thumbnailElement && thumbnailElement.src) {
data.thumbnailUrl = thumbnailElement.src;
}
// Try to get duration from video element
const videoElement = document.querySelector('video');
if (videoElement && videoElement.duration) {
const duration = Math.floor(videoElement.duration);
const minutes = Math.floor(duration / 60);
const seconds = duration % 60;
data.duration = `${minutes}:${seconds.toString().padStart(2, '0')}`;
}
return data;
});
return {
url,
extractedAt: new Date().toISOString(),
platform: 'tiktok',
...metadata
};
} catch (error) {
console.error('Error scraping TikTok stats:', error);
throw error;
} finally {
await browser.close();
}
};
// Usage
const tiktokUrl = 'https://www.tiktok.com/@thepeteffect/video/7522711492140059912';
scrapeTikTokStats(tiktokUrl)
.then(stats => {
console.log('Video Title:', stats.title);
console.log('Creator:', stats.channelName);
console.log('Views:', stats.views?.toLocaleString() || 'N/A');
console.log('Likes:', stats.likes?.toLocaleString() || 'N/A');
console.log('Comments:', stats.comments?.toLocaleString() || 'N/A');
console.log('Shares:', stats.shares?.toLocaleString() || 'N/A');
console.log('Music:', stats.musicTitle);
})
.catch(error => console.error('Failed to scrape stats:', error));
Handling Dynamic Content and Common Issues
1. TikTok's JSON Data Structure
TikTok stores comprehensive video data in a script tag with ID __UNIVERSAL_DATA_FOR_REHYDRATION__
. This approach provides more accurate statistics:
// Extract exact view counts from JSON data
const scriptElement = document.querySelector('#__UNIVERSAL_DATA_FOR_REHYDRATION__');
if (scriptElement && scriptElement.textContent) {
const jsonData = JSON.parse(scriptElement.textContent);
const statsV2 = jsonData["__DEFAULT_SCOPE__"]?.["webapp.video-detail"]?.["itemInfo"]?.["itemStruct"]?.["statsV2"];
if (statsV2) {
data.views = parseInt(statsV2.playCount || '0', 10);
data.likes = parseInt(statsV2.diggCount || '0', 10);
data.comments = parseInt(statsV2.commentCount || '0', 10);
data.shares = parseInt(statsV2.shareCount || '0', 10);
}
}
2. Fallback DOM Extraction
When JSON parsing fails, fall back to DOM element extraction:
// Fallback to DOM scraping if JSON extraction fails
if (data.views === 0 && data.likes === 0) {
const likesElement = document.querySelector('[data-e2e="like-count"]');
if (likesElement) {
data.likes = extractNumber(likesElement.textContent);
}
}
3. Number Extraction and Formatting
TikTok uses abbreviated numbers (1.2M, 45K). Our extractNumber
function handles:
- Comma-separated numbers (1,234,567)
- Abbreviated suffixes (K, M, B) - case insensitive
- Decimal points (1.2M)
- Validation and error handling
4. Anti-Bot Detection Handling
TikTok employs sophisticated anti-bot measures:
// Enhanced browser configuration
const browser = await puppeteer.launch({
headless: "new",
ignoreDefaultArgs: ["--enable-automation"],
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--no-first-run',
'--no-zygote',
'--single-process',
'--disable-gpu'
]
});
// Set realistic user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
Alternative: Using SocialKit TikTok API
For production applications requiring reliable TikTok stats extraction, consider using SocialKit's TikTok Stats API:
curl "https://api.socialkit.dev/tiktok/stats?access_key=<your-access-key>&url=https://www.tiktok.com/@thepeteffect/video/7522711492140059912"
Example Response
{
"success": true,
"data": {
"url": "https://www.tiktok.com/@thepeteffect/video/7522711492140059912",
"title": "Cute pet tricks that will make you smile 🐕✨",
"channelName": "@thepeteffect",
"channelLink": "https://www.tiktok.com/@thepeteffect",
"views": 2459876,
"likes": 123456,
"comments": 5678,
"shares": 987,
"musicTitle": "Original Sound - thepeteffect",
"duration": "0:45"
}
}
Benefits of using SocialKit:
- Bypass anti-bot detection: Handles CAPTCHAs and verification challenges automatically
- No infrastructure overhead: Faster and more resource-efficient than running browsers
- Consistent data extraction: Adapts to TikTok's interface changes automatically
- Built-in retry logic: Automatic error handling and intelligent retries
- Scale-ready: Handle thousands of videos without proxy or infrastructure concerns
- Always up-to-date: Maintains compatibility with TikTok's evolving platform
- JSON data access: Gets exact view counts from TikTok's internal data structures
Conclusion
Scraping TikTok video statistics with Puppeteer provides valuable insights for content analysis and trend monitoring. For production use, consider using SocialKit TikTok Stats API to save time and ensure reliability.
If you’re interested in more scraping tutorials, here are some tutorials you can check next: