March 10, 2021 No Comments

Introduction

In this blog, we will see how to fetch the text/caption/transcription of all videos from a particular YouTube channel. For that, we need YouTube data API v3 and the channel ID of the channel for which we want to fetch the captions.

Below are the steps to fetch the caption of all videos, if captions are available.

Enable YouTube data API v3

Sign-in to Google cloud console. Now select your project if you already have any or make a new project. After that, search YouTube data API and select YouTube Data API v3 and enable it.

Get API key for your project

After enabling API, select credentials option from the APIs & Services menu as shown below.

Now click on CREATE CREDENTIALS and then select API key option. 

After that, you will get the screen as shown below. Now copy the key and save it somewhere. We will need that to access the API.

Get YouTube channel’s channel ID

Search for channel on YouTube for which you want to get captions. After that copy the URL of that search which will look like this, https://www.youtube.com/channel/CHANNEL_ID. Here CHANNEL_ID is your channel ID.

Getting captions Using python

To fetch the caption of videos using python, install the below requirements:

				
					pip install google-api-python-client
				
			
				
					pip install youtube_transcript_api
				
			

Now copy the code given below in the python file and enter your API key and channel ID in it. Note: We have done this for the English language, if your captions are in some other language you can change languages=[‘en’] to your caption language.

				
					from apiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi
 
api_key = "Your API key"  # replace it with your API key
channel_id = 'Your Channel_id'  # replace it with your channel id
youtube = build('youtube', 'v3', developerKey=api_key)
 
def get_channel_videos(channel_id):
 
    # get Uploads playlist id
    res = youtube.channels().list(id=channel_id,
                                  part='contentDetails').execute()
    playlist_id = res['items'][0]['contentDetails']['relatedPlaylists']['uploads']
    videos = []
    next_page_token = None
 
    while 1:
        res = youtube.playlistItems().list(playlistId=playlist_id,
                                           part='snippet',
                                           maxResults=50,
                                           pageToken=next_page_token).execute()
        videos += res['items']
        next_page_token = res.get('nextPageToken')
 
        if next_page_token is None:
            break
 
    return videos
 
videos = get_channel_videos(channel_id)
video_ids = []  # list of all video_id of channel
 
for video in videos:
    video_ids.append(video['snippet']['resourceId']['videoId'])
 
for video_id in video_ids:
    try:
        responses = YouTubeTranscriptApi.get_transcript(
            video_id, languages=['en'])
        print('\n'+"Video: "+"https://www.youtube.com/watch?v="+str(video_id)+'\n'+'\n'+"Captions:")
        for response in responses:
            text = response['text']
            print(text)
    except Exception as e:
	   print(e)
				
			

After running the file you will get output like below:

Here you will get all the video links and their captions.

Note:- If captions are disabled or language is not EN(English) for a video you will get an Exception Message.

Write a comment

Your email address will not be published. Required fields are marked *

Pragnakalp Techlabs: Your trusted partner in Python, AI, NLP, Generative AI, ML, and Automation. Our skilled experts have successfully delivered robust solutions to satisfied clients, driving innovation and success.