ย
Description: A comprehensive Python tool that fetches YouTube videos or processes local video files, transcribes the content, translates the transcript into a desired language, and finally, generates a video complete with dual subtitles (original and translated). Leveraging the Whisper model and different translation APIs, this script aims to revolutionize the way we understand and interact with video content across languages.
Compared to Google Translate, GPT-3.5's translation was significantly more accurate and smoother than traditional machine translation.
ย
ย
ย
๐ Key Features:
- Versatile Input: Download YouTube videos or use local video files for processing.
- Dynamic Transcription and Translation: Utilizes the Whisper model for transcription and offers multiple translation APIs including M2M100, Google Translate, and GPT-3.5.
- Dual Subtitles Generation: Produces a video with both original and translated subtitles side-by-side.
ย
ย
๐งช Extended Features:
- Subtitle Translation with GPT-3.5: A dedicated script for high-quality translation leveraging OpenAI's GPT-3.5 model, especially beneficial for context-specific translations or idiomatic expressions.
- Google Colab Integration: An interactive Google Colab notebook for users to easily try out the script without local setup.
ย
ย
๐ Impact and Recognition:
- Bilibili Engagement: Translated videos using this tool have caught significant attention in Bilibili, with a tally of over 500,000 views in just 3 months. It's clear that there's a strong appetite for quality translated content out there.
- GitHub Metrics: The project's traction on GitHub speaks volumes. With over 20,000 page views and more than 500 unique clones, it's evident that the developer community sees value in this tool.
- Efficiency Highlight: It can translate a 20-minute YouTube video in less than 5 minutes. Moreover, it's adept at translating English subtitles to Chinese with high accuracy, ensuring minimal human intervention is needed for corrections.
ย
๐ Method
- Whisper Model Integration: The tool utilizes the Whisper model for transcription to ensure high-quality and accurate transcripts that are essential for the accurate translation of the subtitles.
- Subtitle Translation System: I've designed a comprehensive prompt that translates batches of subtitles into the desired target language, ensuring fidelity to the context of each line in the video.
The prompt adheres to guidelines from the deep learning Prompt Engineering course, employing techniques such as few-shot examples, the use of delimiters, and structured output. It has been tested extensively and refined iteratively for optimal performance.
ย
ย
system_content = f"""You are a program responsible for translating subtitles. Your task is to translate the current batch of subtitles into {self.target_language} for the video titled '{self.titles}' and follow the guidelines below. Guidelines: - Keep in mind that each index should correspond exactly with the original text, and your translation should be faithful to the context of each sentence. - Translate with informal slang if necessary, ensuring that the translation is accurate and reflects the context and terminology. Please do not output any text other than the translation. - You will also receive some additional information for your reference only, such as the previous batch of subtitles, the translation of the previous batch, the next batch of subtitle, and maybe error messages. - Please ensure that each line in the current batch has a corresponding translated line. - If the last sentence in the current batch is incomplete, you may ignore the last sentence. If the first sentence in the current batch is incomplete, you may combine the last sentence in the last batch to make the sentence complete. - Please only output the translation of the current batch of subtitles (current_batch_subtitles_translation). - Do not put the translation of the next whole sentence in the current sentence. - Each index in the current batch of subtitles must correspond to the exact original text and translation. Do not combine sentences from different indices. - Ensure that the number of lines in the current batch of subtitles is the same as the number of lines in the translation. - You may translate with conversational language if the original text is informal. - Additional information for the video: {self.video_info} - Please ensure that the translation and the original text are matched correctly. - You may receive original text in other languages, but please only output {self.target_language} translation. - Please translate the following subtitles and summarize all the proper nouns that appear to generate a mapping. - Please only output the proper nouns and their translation that appear in the current batch of subtitles, do not repeat from the input - You may receive translation_mapping as input, which is a mapping of proper nouns to their translation in {self.target_language}. - Please follow this mapping to translate the subtitles to improve translation consistency. - Target language: {self.target_language} - Please output proper JSON with this format: {{ "current_batch_subtitles_translation": [ {{ "index": <int>, "original_text": <str>, "translation": <str> }} ], "translation_mapping": {{ "proper nouns": <translation in target language> }} }}"""
ย
๐ Relevant Links: