Create .SRT Subtitles & .LRC Lyrics from Audio with Whisper AI and GPT-5-nano
Overview
This workflow automates the entire process of creating professional subtitle (.SRT) and synced lyrics (.LRC) files from audio recordings. Upload your vocal track, let Whisper AI transcribe it with precise timestamps, and GPT-5-nano segments it into natural, singable lyric lines. With an optional quality control step, you can manually refine the output while maintaining perfect timestamp alignment.
Key Features
Whisper AI Transcription**: Word-level timestamps with multi-language support via ISO codes Intelligent Segmentation**: GPT-5-nano formats transcriptions into natural lyric lines (2-8 words per line) Quality Control Option**: Download, edit, and re-upload corrections with smart timestamp matching Advanced Alignment**: Levenshtein distance algorithm preserves timestamps during manual edits Dual Format Export**: Generate both .SRT (video subtitles) and .LRC (synced lyrics) files No Storage Needed**: Files generated in-memory for instant download Multi-Language**: Supports various languages through Whisper API
Use Cases Generate synced lyrics for music video releases on YouTube Create .LRC files for Musixmatch, Apple Music, and Spotify Prepare professional subtitles for social media content Batch process subtitle files for catalog releases Maintain consistent lyric formatting across artists Streamline content delivery for streaming platforms Speed up video editing workflow
Perfect For For Musicians & Artists For Record Labels For Content Creators
What You'll Need
Required Setup OpenAI API Key** for Whisper transcription and GPT-5-nano segmentation
Recommended Input Format**: MP3 audio files (max 25MB) Content**: Clean vocal tracks work best (isolated vocals recommended, but whole tracks works still good) Languages**: Any language supported by Whisper (specify via ISO code)
How It Works
Automatic Mode (No Quality Check) Upload your MP3 vocal track to the workflow Transcription: Whisper AI processes audio with word-level timestamps Segmentation: GPT-5-nano formats text into natural lyric lines Generation: Workflow creates .SRT and .LRC files Download your ready-to-use subtitle files
Manual Quality Control Mode Upload your MP3 vocal track and enable quality check Transcription: Whisper AI processes audio with timestamps Initial Segmentation: GPT-5-nano creates first draft Download the .TXT file for review Edit lyrics in any text editor (keep line structure intact) Re-upload corrected .TXT file Smart Matching: Advanced diff algorithm aligns changes with original timestamps Download final .SRT and .LRC files with perfect timing
Technical Details
Transcription API**: OpenAI Whisper (/v1/audio/transcriptions) Segmentation Model**: GPT-5-nano with custom lyric-focused prompt System Prompt*: "You are helping with preparing song lyrics for musicians. Take the following transcription and split it into lyric-like lines. Keep lines short (2–8 words), natural for singing/rap phrasing, and do not change the wording." Timestamp Matching*: Levenshtein distance + alignment algorithm File Size Limit**: 25MB (n8n platform default) Processing**: All in-memory, no disk storage Cost**: Based on Whisper API usage (varies with audio length)
Output Formats
.SRT (SubRip Subtitle) Standard format for: YouTube video subtitles Video editing software (Premiere, DaVinci Resolve, etc.) Media players (VLC, etc.)
.LRC (Lyric File) Synced lyrics format for: Musixmatch Apple Music Spotify Music streaming services Audio players with lyrics display
Pro Tips
💡 For Best Results: Use isolated vocal tracks when possible (remove instrumentals) Ensure clear recordings with minimal background noise For quality check edits, only modify text content—don't change line breaks Test with shorter tracks first to optimize your workflow
⚙️ Customization Options: Adjust GPT segmentation style by modifying the system prompt Add language detection or force specific languages in Whisper settings Customize output file naming conventions in final nodes Extend workflow with additional format exports if needed
Workflow Components
Audio Input: Upload interface for MP3 files Whisper Transcribe: OpenAI API call with timestamp extraction Post-Processing: GPT-5-nano segmentation into lyric format Routing Quality Check: Decision point for manual review Timestamp Matching: Diff and alignment for corrected text Subtitles Preparation: JSON formatting for both output types File Generation: Convert to .SRT and .LRC formats Download Nodes: Export final files
Template Author: Questions or need help with setup? 📧 Email:xciklv@gmail.com 💼 LinkedIn:https://www.linkedin.com/in/vaclavcikl/
Related Templates
Instagram Full Profile Scraper with Apify and Google Sheets
📸 Instagram Full Profile Scraper with Apify and Google Sheets This n8n workflow automates the process of scraping ful...
Technology News Workflow Explanation with Key Services
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. Auto-Publish Techn...
Auto-classify Gmail emails with AI and apply labels for inbox organization
Who is this for? Professionals and individuals who receive high volumes of emails, those who want to automatically organ...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments