What is real-time transcoding?

December 19, 2025
6 min
Video Engineering
Share
This is some text inside of a div block.
Join Our Newsletter for the Latest in Streaming Technology

When you upload a video and it takes time before it can be watched, what’s usually happening is pre-encoding. The platform immediately creates multiple versions of that video different resolutions and bitrates so it’s ready for every possible viewer before anyone presses play. It’s a safe approach, and for a long time it was the default.

It also means you pay the full cost upfront. Every upload trigger encoding jobs, storage, and delivery setup whether the video is watched once, a thousand times, or never.

That model breaks down fast in UGC platforms. Thousands of short videos get uploaded every day, most of which won’t see sustained traffic. But the system treats them all the same, and costs scale with uploads, not views.

Real-time transcoding changes that. You store a single master file. Nothing gets encoded until someone hits play.

TL;DR – What’s inside

This guide breaks down how real-time transcoding works and why it matters for modern video platforms. Inside, you’ll find:

  • What real-time transcoding is (and how it differs from traditional pipelines)
  • How packaging works alongside transcoding to enable instant playback
  • The limitations of legacy models like linear, file-based, and edge packaging
  • A step-by-step walkthrough of the real-time pipeline
  • Side-by-side comparison: traditional vs. real-time
  • Real-world use cases (OTT, live-to-VOD, multilingual, personalized playback)
  • Caching strategies for on-demand systems
  • How to use the FastPix API to request format, resolution, language, and more
  • Infra savings from switching to real-time: storage, speed, and CDN cost

What real-time transcoding?

At the core, it’s simple: you store one high-quality source, usually a mezzanine file. Nothing else is generated in advance.

Then a viewer shows up.

At that moment, your system makes a set of decisions:

  • What device is requesting the stream? That determines codec compatibility (AV1, H.264, HEVC), screen resolution, and decoding capabilities.
  • What kind of network are they on? This helps pick a target bitrate, so you don’t stream 1080p to a phone on spotty 3G.
  • What context applies to this viewer? Are they in a restricted region? Do they need captions? Do you need to add a watermark or dynamic branding layer?


Based on that input, you transcode exactly what’s needed , one or more renditions, at the right bitrate(s), using the right codec and then package the output into segments for delivery.

That includes:

  • Generating .ts or .mp4 chunks
  • Creating a fresh HLS or DASH manifest (.m3u8, .mpd)
  • Making it available for playback instantly.


It’s encoding + packaging + delivery,  all stitched together at request time. So instead of pre-rendering five renditions of every video and hoping one of them is a close match, you generate the right one when it’s actually needed. This makes your pipeline reactive not predictive.

You just read how real-time transcoding kicks in the moment someone hits play, encoding exactly what’s needed based on device, network, and viewer context.

But that’s only half the job.

Packaging is what makes it play

Transcoding gives you raw video. But that’s not what the player needs.

If you’re using HLS or DASH, and you almost certainly are the stream has to be:

  • Segmented into short chunks (.ts, .mp4)
  • Packaged with a manifest (.m3u8, .mpd)
  • Structured to support ABR (adaptive bitrate)

This isn’t a separate batch step. In a real-time pipeline, it happens immediately after encoding, before the first segment is even requested.

Packaging is what turns the output into a playable stream. That means:

  • Cutting the video into segments (usually 2–6 seconds)
  • Generating the manifest that points to those segments
  • Building the bitrate ladder so the player can switch qualities

And it needs to be fast. Lag here means buffer icons, playback stalls, or worse,  the video doesn’t play at all. In real-time systems, transcoding and packaging are tightly connected. Transcoding decides what gets delivered. Packaging decides how it reaches the viewer. If either one is missing, the pipeline breaks.

How video was packaged before real-time

Before packaging and transcoding were coupled together, there were a few standard ways to get video ready for delivery. Each one solved a specific problem. None of them were built for what we expect today: fast playback, session-level logic, or efficiency at scale.

Let’s walk through the three common models, and where they fall short.

Linear packaging

How linear packaging works

This diagram shows how traditional live streaming pipelines work.

  1. A live video signal (usually via SDI or another ingest) is encoded on-prem.
  2. That stream is pushed over the internet to a transcoder.
  3. The transcoded stream,  typically a single bitrate MPEG-TS is then passed to a packager.
  4. The packager segments the stream into HTTP-based chunks (for HLS or DASH).
  5. Those segments and manifests are sent to the CDN for delivery.

The CDN then replicates segments across edge locations for faster delivery.

Why it worked:

  • Solid choice for scheduled, cable-style broadcasts
  • Easy to plug into legacy encoder setups

Why it doesn’t scale today:

  • Latency builds up: everything flows sequentially ingest → encode → segment → distribute.
  • Limited ABR logic: the bitrate ladder is fixed.
  • Ad insertion is hard: timing needs to be pre-aligned.
  • CDN strain: every rendition is pushed and cached, whether it’s used or not.
  • Viewer-specific logic? Not possible.Used primarily for on-demand content, this method pre-processes everything. You take a video file, convert it into HLS/DASH chunks, generate the manifest, and store all of it,  ready for whenever someone wants to stream it.

This model assumes every viewer gets the same stream. That’s the core limitation.

File-based packaging

Used primarily for on-demand content, this method pre-processes everything. You take a video file, convert it into HLS/DASH chunks, generate the manifest, and store all of it,  ready for whenever someone wants to stream it.

How File-based packaging works

 

Workflow:

  • Input: encoded TS or MP4 file
  • Output: full HLS/DASH package (segments + manifests)
  • Delivery: origin → CDN → player

Why it worked:

  • Easy to cache and distribute
  • Clean pipeline for static content

Where it breaks:

  • No per-session control: everything is baked up front
  • Storage-heavy: multiple renditions stored for every asset
  • Bandwidth waste: you deliver the full file, even if the user watches 20%
  • ABR is fixed: network conditions can’t be responded to in real time
  • Slow to update: if a caption changes, you re-package the whole thing

Edge packaging

Edge packaging tries to reduce the delay and network load by pushing the packager out to the CDN’s edge, closer to the viewer. Instead of packaging at a central origin, the stream is segmented at or near the CDN’s edge node, then handed off for delivery.

How Edge packaging works?

Why it helps:

  • Lowers roundtrip time for packaging
  • Reduces load on the origin
  • Speeds up live segment availability


But it comes with trade-offs:

  • Infra overhead: now you’re managing packaging software at distributed edge locations
  • Content sync is messy: manifests and updates must be propagated carefully
  • Relies on edge stability: performance varies by region
  • Still pre-defined: even though it’s packaged late, you’re still using pre-encoded formats
  • Security and cache invalidation get tricky

None of these models are wrong. They were built for a different kind of video, when streams were predictable, devices were similar, and latency wasn’t a dealbreaker.

But today, that’s not the workload.

You’re dealing with thousands of device types. Fluctuating network conditions. Viewers who expect video to start instantly and adapt in real time. You need decisions made per session, not per file.

How real-time transcoding works?

The diagram illustrates how a single master file moves through the pipeline, getting transformed and delivered based on each playback request:

How real-time transcoding?

  1. File input & storage: A high-quality master file is uploaded and stored. This becomes the mezzanine source that every downstream step relies on.
  2. File-based packaging: From storage, the master is processed by a file-based packager into CMAF (Common Media Application Format). This produces mezzanine ABR files ready for reuse.
  3. Mezzanine storage: These packaged mezzanine files are stored, so you don’t need to regenerate renditions for every future request. They act as the working set between the original master and the final playback formats.
  4. Packaging for playback: When a playback request comes in, the CMAF mezzanine is packaged into the exact streaming protocol required, HLS, DASH, or MSS so it fits the device, network, and player environment.
  5. CDN delivery: The CDN distributes those packaged streams to all endpoints. Edge caching ensures repeat requests are served instantly without reprocessing.
  6. Playback on devices: Smart TVs, mobile apps, and browsers receive the right stream format and resolution, ensuring smooth playback without unnecessary overhead.

Area Traditional Pipelines Real-Time Transcoding
Storage Multiple renditions stored per video Single high-quality master stored once
Latency Fetch and buffer delay before playback Encode and stream instantly on request
Playback logic Pre-defined ladders fixed in advance Session-aware delivery, adaptive to context
Format support Limited to pre-encoded renditions Any format generated per request
Content updates Requires re-encoding the entire video Adjust logic without reprocessing storage

Use cases for real-time transcoding

Modern video products demand flexibility. Traditional pipelines with pre-encoded ladders can’t keep up with unpredictable formats, audiences, and workloads. Here’s where developers lean on real-time transcoding:

1. Playback in fast-moving platforms

Who it’s for: News outlets, OTT apps, UGC platforms
Why it matters: New videos appear constantly, and playback has to adapt instantly to device type, bandwidth, and viewer conditions. Real-time pipelines prevent delays and wasted storage on unused renditions.

2. Serving multilingual audiences

Who it’s for: Global streaming apps, education platforms, enterprise training systems
Why it matters: Audiences want subtitles, dubs, or alternate audio tracks in their own language. Real-time pipelines can deliver them per request without multiplying storage costs.

3. Instant live-to-VOD publishing

Who it’s for: Sports, concerts, events, classrooms

Why it matters: Viewers expect replays as soon as the event ends. Real-time transcoding turns live streams into on-demand assets immediately, without overnight processing jobs.

4. Personalized and monetized experiences

Who it’s for: Fitness platforms, e-learning apps, subscription services
Why it matters: User-level overlays, account-based watermarks, or gated content shouldn’t require separate files for every scenario. Real-time pipelines let you apply that logic dynamically at scale.

Caching and delivery logic (What devs should know)

Real-time transcoding changes how you think about caching. You're no longer pre-generating thousands of segments and pushing them to the edge, you're responding to playback demand in real time. That means caching strategies need to evolve too.

Start with what’s worth caching:

  • Manifest files should always be cached. They’re small, frequently requested, and serve as the index for any playback session.
  • Popular video segments, especially the first few seconds of high-traffic videos are also prime candidates for caching. If dozens of users are hitting the same video intro, there’s no reason to transcode that over and over.
  • Don’t cache full renditions unless demand patterns justify it. Pre-emptively caching an entire ladder for a video most users won’t watch fully leads to storage bloat and unnecessary CDN spend.

Instead of brute-force caching, combine short TTLs (time to live) with edge invalidation logic. That way, you let the CDN serve what it can, while the backend steps in only when something changes, format, user context, or access control.

It’s not just about load reduction. It’s about smarter delivery: caching what matters, skipping what doesn’t, and letting your system stay responsive under pressure.

FastPix real-time transcoding API

The FastPix API gives you direct control over how each video is prepared and delivered, no media pipeline complexity, no FFmpeg scripting, no manual workflows. You store a single master file and use API requests to handle everything else dynamically.

Here’s how it works:

  1. Upload the master file
    Store a high-quality mezzanine version. No renditions or duplicates.

  2. Playback request includes parameters like:
    • format: Choose HLS, DASH, or CMAF
    • resolution: e.g. 720p, 1080p, or device-dependent
    • codec: H.264, H.265, VP9, AV1
    • audio_track and subtitle: For multilingual viewers
    • watermark: For per-user overlays or licensing logic

  3. The API responds with:
    • Transcoded segments
    • ABR manifest (ready for adaptive playback)
    • CDN-ready delivery path

You also get webhooks and analytics hooks:

  • video.transcoded.ready: Use this to trigger downstream actions
  • Segment-level performance data: Know which parts of your video are being played, skipped, or abandoned
  • Playback logs: Great for debugging edge cases or handling user complaints

You don’t need to spin up workers, manage queues, or pre-encode anything. The API figures out what’s needed, and delivers it. To know more on our offering please check our guides and docs.

Infra gains worth bragging about

When teams switch to real-time pipelines, the benefits go beyond performance, they show up in hard infrastructure savings and dev-time wins. Here’s what we see most often:

  • 80% less storage used
    You’re no longer storing 4-6 renditions per video. One master file covers every format and every use case. For large libraries or UGC platforms, this adds up to massive cost savings.

  • 30-40% faster startup time
    Static ladders come with baggage, clients must parse the manifest, probe segments, and buffer before playback starts. With real-time delivery tuned to each session, you avoid unnecessary lookups and get to first frame faster.

  • Lower CDN egress costs
    By serving smaller, targeted chunks and caching only what’s requested, your CDN usage drops, even as your traffic grows. You're delivering less data per session without sacrificing quality.

And just as important: your team isn’t building or maintaining this system. You’re calling an API that does the heavy lifting, reliably, scalably, and predictably.

Final words

Real-time transcoding doesn’t just cut infra costs; it clears out delivery bottlenecks. If you’re already personalizing content by user, region, or device, your video pipeline should do the same. FastPix lets you upload once, transcode on demand, and stream exactly what’s needed. Start with one video. See the difference.

Frequently Asked Questions (FAQs)

How does Just-in-Time encoding impact server load and performance?

JIT encoding processes video content in real-time, which can increase server load compared to serving pre-encoded content. To mitigate this, it's essential to optimize server resources and implement efficient encoding algorithms to handle on-the-fly processing without compromising performance.

What are the security considerations when implementing Just-in-Time encoding?

Implementing JIT encoding requires robust security measures to protect content during real-time processing. This includes securing the encoding pipeline, ensuring encrypted transmission of video segments, and implementing access controls to prevent unauthorized access to the content.

How does Just-in-Time encoding affect content delivery networks (CDNs)?

JIT encoding can influence CDN performance by introducing additional processing steps before content delivery. It's crucial to configure CDNs to handle dynamic content efficiently, ensuring that encoded video segments are cached appropriately to reduce latency and improve user experience.

How does Just-in-Time encoding handle adaptive bitrate streaming?

JIT encoding supports adaptive bitrate streaming by generating video segments in multiple quality levels on demand. This allows the streaming service to deliver the best possible video quality based on the viewer's network conditions and device capabilities, enhancing the viewing experience.

How does Just-in-Time encoding help reduce latency during video streaming?

JIT encoding reduces latency by encoding only the segments needed for immediate playback. This minimizes the time between a user’s request and the start of playback, offering faster start times compared to streaming pre-encoded videos, which might involve waiting for large video files to load.

Get Started

Enjoyed reading? You might also like

Try FastPix today!

FastPix grows with you – from startups to growth stage and beyond.