Audio overlay in videos: A step-by-step guide

August 29, 2025
10 Min
Video Engineering
Share
This is some text inside of a div block.
Join Our Newsletter for the Latest in Streaming Technology

A developer building a video creation app once said, “All my users wanted was to add a voiceover and some music to their videos. I didn’t think it’d be the hardest part of the stack.”

But it often is. Audio overlay seems simple - until you have to line up multiple tracks, fade them in and out, adjust volume levels, and make sure everything syncs perfectly across devices. If you’re using FFmpeg, it means wrangling timestamps and obscure flags just to do something your end users expect to “just work.”

And here’s the kicker: viewers notice bad audio faster than bad visuals. Stats show 60% of viewers engage more with videos that sound good - and one-third drop off if the audio is off within the first 30 seconds. Whether it’s a late voiceover, mismatched background score, or a track that never fades out, sloppy audio breaks the moment.

What is audio overlay?

If you're building video tools - whether for creators, educators, podcasters, or internal teams - you need a way to handle audio overlays cleanly, at scale, and with precision.

In this guide, we’ll walk through two ways to do just that:

  • First, the traditional route using FFmpeg for full control.
  • Then, how to do it in a single API call using FastPix - with fade-in, fade-out, and timestamp-based overlays built right in.

How to overlay audio on video using FFmpeg

Most developers don’t want to handcraft media pipelines - but they do it anyway because FFmpeg is the default. It’s open-source, powerful, and everywhere. Need to overlay audio on a video? Sure, FFmpeg can do it - if you’re okay writing commands that feel more like debugging compiler errors than building features.

Here’s the reality: FFmpeg gives you low-level control over audio placement, volume, fades, and sync. But everything has to be spelled out - timestamps, offsets, filters, codecs, stream maps. One wrong flag and your overlay is either out of sync or doesn’t render at all.

So why are we even talking about it? Because it’s where most teams start. They build something quick, test it locally, and wrap it in a script or background job. But the moment your users want dynamic overlays - multiple tracks, precise sync, fades, intros, outros, then you’re stuck maintaining a brittle setup.

If you’re building something that others will use - a platform where users upload content and expect things to “just work” - FFmpeg isn’t scalable on its own. You’ll either wrap it in a service or switch to an API like FastPix that does the heavy lifting for you.

That said, here’s how FFmpeg handles audio overlay - in case you want to try it, wrap it, or understand how it works before moving to the API model.

1. Understanding the timeline

To properly overlay audio, you must understand your timeline. A timeline is a visual representation of a video's sequence of events, where you can arrange and edit video clips, audio, and other media. The most critical elements to focus on are:

  • Timestamps: Each point in the timeline corresponds to a specific moment in the video.
  • Audio sync points: If you’re adding audio that must sync with specific actions in the video (e.g., dialogue or effects), you must ensure they align perfectly.

2. Preparing your files

You'll need your video file and the audio file you plan to overlay. Ensure that both are in compatible formats FFmpeg supports a wide range of formats, including MP4, MOV, WAV and MP3.

Here’s an example setup:

  • Video file: video.mp4
  • Audio file: background_music.mp3

3. Basic audio overlay using FFmpeg

FFmpeg is powerful for this task because it provides detailed control over timing, quality, and transitions.

Here’s a simple command to overlay audio:

Code snippet:

1ffmpeg -i video.mp4 -i background_music.mp3 -c:v copy -c:a aac -strict experimental output.mp4

    `-i video.mp4`: This specifies the input video file.

    `-i background_music.mp3`: This adds the audio file.

    `-c:v copy`: Copies the video without re-encoding.

    `-c:a aac`: Encodes the audio using AAC codec for compatibility.

This command takes the video from "video.mp4," adds audio from "background_music.mp3," keeps the original video codec without altering its quality, encodes the audio in AAC format, and outputs the final result as "output.mp4."

4. Syncing audio with the video timeline

If you need the audio to start at a specific time, you can adjust the timing with FFmpeg’s its offset option:

Code snippet:

1ffmpeg -i video.mp4 -itsoffset 00:00:05 -i background_music.mp3 -c:v copy -c:a aac -strict experimental output.mp4

In this example, the audio will start 5 seconds after the video begins. This approach is useful if the audio should align with a certain event or visual cue in the video.

5. Advanced synchronization using time stamps

Sometimes, basic time offsets aren't enough. For instance, you might need to fade in music at a particular moment or adjust audio levels over time. FFmpeg allows you to control audio precisely along the timeline:

Fade in:

1ffmpeg -i video.mp4 -i background_music.mp3 -filter_complex "[1]afade=t=in:st=5:d=3" -c:v copy -c:a aac output.mp4

This command fades the audio in, starting at 5 seconds and lasting for 3 seconds.

Adjusting audio volume:

1ffmpeg -i video.mp4 -i background_music.mp3 -filter_complex "[1]volume=0.5" -c:v copy -c:a aac output.mp4

This command reduces the background music volume by half.

The above commands using ffmpeg requires comprehensive knowledge on video engineering and is a tedious process for beginners or developers who are looking for a quick solution without many intricacies. There are many platforms which provide no-code solutions for integrating these features and one of the best options is FastPix API

How to use FastPix API to overlay audio on video

FastPix is a full-stack video API platform built for developers who are tired of stitching together complex video workflows from scratch. Instead of juggling tools like FFmpeg, storage layers, and encoding pipelines, FastPix lets you do everything - upload, transform, analyze, stream, protect - through a single API.

It’s designed for real products, not prototypes. If you’re building anything from a short video app to a podcast tool to an internal training platform, FastPix helps you ship media features faster - without wrangling low-level video engineering.

Watch video: What is FastPix?

Unlike FFmpeg, where you manage every flag and filter manually, FastPix lets you overlay audio tracks with just a few JSON parameters. You specify what to play, when to play it, and how it should fade in or out - and FastPix takes care of syncing, mixing, and rendering.

The API works for both uploaded files and those stored in public URLs, and scales naturally for user-generated content, dynamic timelines, and programmatic customization.

Let’s walk through how it works. (or see detailed guide)

Basic audio overlay using API

To overlay audio on a video, you include an `imposeTracks` parameter inside the payload (audio input). Each object defines:

  • url: the audio file to overlay
  • startTime: when it should begin (in seconds)
  • endTime: when it should stop
  • fadeInLevel: optional fade-in duration
  • fadeOutLevel: optional fade-out duration

Here’s a simple example that overlays a single track:

1POST https://v1.fastpix.io/on-demand 
2
3"inputs": [ 
4        { 
5            "type": "video", 
6            "url": "https://static.fastpix.io/sample.mp4", 
7            "startTime": 0, 
8            "endTime": 60 
9         }, 
10        { 
11            "type": "audio", 
12            "imposeTracks": [ 
13                { 
14                    "url": "https://fastpix-audio.com/example-impose-audio-track.m4a", 
15                    "startTime": 0, 
16                    "endTime": 5, 
17                    "fadeInLevel": 1, 
18                    "fadeOutLevel": 4 
19                } 
20            ] 
21        } 
22	] 

In the above payload imposeTracks parameter in the input with “type”:”audio” contains a list of json objects. The parameters provided in the imposeTracks help in understanding the start and end time of the overlay audio on the video and the fade in, fade out duration.

Overlaying multiple audio tracks

You can also provide multiple overlay audio for timestamps on the timeline for different use cases like this:

Let's assume that there is a 1-minute video on which you wish to overlay different audio at different timestamps with different fade-ins and fade-outs. A normal approach with code will make the command very sophisticated and very hard to understand. From the payload example below, you can do this task with ease.

1POST https://v1.fastpix.io/on-demand 
2
3"inputs": [ 
4        { 
5            "type": "video", 
6            "url": " https://static.fastpix.io/sample.mp4", 
7            "startTime": 0, 
8            "endTime": 60 
9         }, 
10        { 
11            "type": "audio", 
12            "imposeTracks": [ 
13                { 
14                    "url": "https://fastpix-audio.com/example-impose-audio-track.m4a", 
15                    "startTime": 0, 
16                    "endTime": 13, 
17                    "fadeInLevel": 1, 
18                    "fadeOutLevel": 4 
19                }, 
20
21{ 
22                    "url": "https://fastpix-audio.com/example-impose-audio-track-1.m4a", 
23                    "startTime": 14, 
24                    "endTime": 23, 
25                    "fadeInLevel": 1, 
26                    "fadeOutLevel":2  
27              }, 
28
29{ 
30                    "url": "https://fastpix-audio.com/example-impose-audio-track-2.m4a", 
31                    "startTime": 24, 
32                    "endTime": 60, 
33                    "fadeInLevel": 2, 
34                    "fadeOutLevel": 4 
35                } 
36            ] 
37        } 
38  ] 

Here, in the above payload, we can see that there are multiple JSON objects in the list of imposeTracks. Each json object has its own URL, start and end times. This helps in easy audio overlay onto any video asset with customizable options.

Applications of audio overlay

Industry / Use Case How Audio Overlay is Used
Media production (Film, TV, video) Add background music, voiceovers, or sound effects (footsteps, explosions, etc.) to match visuals.
Podcasts Insert intro/outro music for branding; layer interviews with ambiance or promo clips.
Gaming Add narration over gameplay; overlay soundtracks and effects to enhance immersion.
Advertising Use jingles, slogans, or voiceovers to make product videos more engaging; build consistent audio branding.
Music production Mix vocals and instruments, create mashups, or produce remixes with new layers.
Education Add voiceover explanations in training videos; overlay translations or pronunciation guides.
VR / AR Use spatial audio overlays to create immersive soundscapes that react to user interactions.

These are just a few of the many ways audio overlay shows up in real products. Whether you’re building for entertainment, education, or immersive experiences, getting audio right is what makes video feel polished and professional.

Save Time with Video Search

Know more

Other transformations you can do with FastPix

If you’re thinking about audio overlay, chances are you’ll also need other transformations to polish or customize your videos. With FastPix, these don’t require complex scripts - just simple API calls. A few common ones:

  • Replace existing audio
    Swap out an original audio track with a new one - useful for voiceovers, alternate languages, or fixing poor-quality recordings.  
  • Add intro and outro
    Automatically attach branded intros or closing segments to your videos without manual editing.
  • Optimize audio loudness
    Balance volume levels across tracks so dialogue, music, and effects sound consistent to your viewers.
  • Remove unwanted parts
    Trim silence, cut out mistakes, or drop filler sections without re-encoding the entire video. See guide.

Each of these features works just like audio overlay - simple JSON inputs instead of long FFmpeg commands. And because they’re API-first, they slot directly into whatever product or workflow you’re building.

What’s even more powerful is how these transformations can be combined with audio overlay to create richer, production-ready results.

Creating advanced media with FastPix

By chaining operations together, you can handle multiple transformations in a single API call. For example:

  • Audio replace + audio overlay
  • Intro + outro + audio overlay
  • Audio overlay + trim media

Instead of managing separate scripts or workflows, you declare everything once in JSON and let FastPix handle the heavy lifting. This makes it easy to build advanced media features that feel seamless to your end users.


Audio replace + audio overlay

Audio replace and audio overlay with FastPix

With the FastPix API, replacing audio in a video file and adding background music (BGM) at specific timestamps is a straightforward process. The API allows you to seamlessly replace the existing audio and overlay BGMs in one operation. This can all be done using a single API payload, as demonstrated below: ‍

1POST https://v1.fastpix.io/on-demand 
2
3"inputs": [ 
4        { 
5            "type": "video", 
6            "url": "https://static.fastpix.io/sample.mp4", 
7            "startTime": 0, 
8            "endTime": 60 
9        }, 
10        { 
11            "type": "audio", 
12
13“swapTrackUrl”: “https://fastpix-audio.com/example-impose-audio-track.m4a” 
14            "imposeTracks": [ 
15                { 
16                    "url": "https://fastpix-audio.com/example-impose-audio-track.m4a", 
17                    "startTime": 0, 
18                    "endTime": 5, 
19                    "fadeInLevel": 1, 
20                    "fadeOutLevel": 4 
21                } 
22            ] 
23        } 
24  ] 

In the payload above, the main video URL is provided, along with audio tracks for replacement and overlay. The 'swap' track replaces the existing audio, while the 'impose' tracks are used to overlay new audio at specified timestamps. This allows both operations audio replacement and overlaying to be performed in a single request.

Intro + outro + audio overlay

add Intro and outro with audio overlay on video with FastPix

With the FastPix API, you can add an intro and outro to a video file while also overlaying background music (BGM) at specific timestamps. This entire operation can be executed using a single API payload, as demonstrated below:

1POST https://v1.fastpix.io/on-demand 
2
3"inputs": [ 
4        { 
5            "type": "video", 
6            "url": “https://static.fastpix.io/sample.mp4”, 
7
8“introUrl”: “https://static.fastpix.io/sample-1.mp4”, 
9
10“outroUrl”: “https://static.fastpix.io/sample-2.mp4” 
11            "startTime": 0, 
12            "endTime": 60 
13       }, 
14        { 
15            "type": "audio", 
16            "imposeTracks": [ 
17                { 
18                    "url": "https://fastpix-audio.com/example-impose-audio-track.m4a", 
19                    "startTime": 0, 
20                    "endTime": 5, 
21                    "fadeInLevel": 1, 
22                    "fadeOutLevel": 4 
23                } 
24            ] 
25        } 
26  ] 

In the above payload we can see that there is a URL of the main video and then the intro and outro urls and the impose tracks are mentioned in the type “audio”. This helps in replacing the audio and then overlay audio at the specified timestamps with just one payload.

Audio overlay + trim media

Overlay and trim media with FastPix

You can perform this operation using a single payload as shown below:

1POST https://v1.fastpix.io/on-demand 
2
3"inputs": [ 
4        { 
5            "type": "video", 
6            "url": “https://static.fastpix.io/sample.mp4”, 
7            "startTime": 0, 
8            "endTime": 30 
9      }, 
10        { 
11            "type": "audio", 
12            "imposeTracks": [ 
13                { 
14                    "url": "https://fastpix-audio.com/example-impose-audio-track.m4a", 
15                    "startTime": 0, 
16                    "endTime": 5, 
17                    "fadeInLevel": 1, 
18                    "fadeOutLevel": 4 
19                } 
20            ] 
21        } 
22  ] 

In the above payload we can see that there are startTime and endTime parameters for specifying the trim duration and URL of the main video and the impose tracks are mentioned in the type “audio”. This helps in trimming and overlaying audio at the specified timestamps with just one payload.

Conclusion

Audio overlay is just one piece of building modern video experiences. With FastPix, it doesn’t stop there -  you can replace tracks, stitch intros and outros, trim media, balance audio levels, and layer it all together in a single workflow.

The goal isn’t just to make editing easier. It’s to give developers the tools to build polished, production-ready features that scale - without worrying about timelines, offsets, or infrastructure.  

Whether you’re building a creator platform, a podcast tool, or an education app, FastPix helps you deliver videos that sound as good as they look.

Try FastPix free with 25 credits → fastpix.io/signup

If you want to see it in action first, book a walkthrough with our team.

Want to build it yourself? → Explore the Video on Demand guide.

Frequently Asked Questions (FAQs)

What is audio overlay in video editing?

Audio overlay refers to adding or replacing audio tracks in a video. This can involve background music, voiceovers, or sound effects, enhancing the video’s overall experience.

How do I sync audio with video using FFmpeg?

To sync audio with video using FFmpeg, you adjust the audio’s start time, length, or playback speed to match the video's timing. The -itsoffset option allows you to delay or advance the audio to sync with video frames. You can also trim or loop the audio if needed to ensure it aligns perfectly with the video’s events or scene changes.

Can I replace the existing audio in a video using FFmpeg?

Yes, FFmpeg can replace the existing audio in a video with a new audio file. You use the -map command to select the video and the new audio file, ensuring the new audio syncs with the video without altering the original video quality. This method keeps the video intact while swapping out the audio.

How does FastPix handle large video files during audio overlay?

FastPix handles large video files efficiently by breaking them into smaller chunks for seamless processing. This ensures smooth overlay application without affecting video quality or performance.

What is the best format for adding audio to a video?

MP3 and AAC are the most common formats for audio overlay. AAC is generally preferred for its superior quality at lower bit rates, making it ideal for high-quality video projects. MP3 is also widely used due to its compatibility and decent audio quality, but it may not provide the same compression efficiency as AAC for video projects.

Can I overlay multiple audio tracks in a video using FFmpeg?

Yes, you can overlay multiple audio tracks using FFmpeg. By specifying multiple audio files with the -map option, you can adjust the audio levels, synchronize their timing, and combine them into a single output file. This is useful for creating complex soundscapes or layering different audio elements like background music, voiceovers, and sound effects.

Get started

Enjoyed reading? You might also like

Try FastPix today!

FastPix grows with you – from startups to growth stage and beyond.