H.264 and H.265 in the World of IP Cameras: Why the Same Label on the Box Does Not Mean the Same Compression, the Same Load, or the Same Archive Size

There is one old and stubborn illusion in the video surveillance market. If two cameras say H.264 or H.265 on the label, then they should behave more or less the same. They should put roughly the same load on the server, consume roughly the same archive space, handle difficult scenes in roughly the same way, and cooperate equally well with recorders, VMS platforms, browsers, and mobile clients. On paper, that idea sounds comforting. In real life, it works about as reliably as the promise “I will just step in for a minute and come right back out.”

The problem is not that there are no standards. The standards do exist, and they are very serious ones. The problem is that these standards mainly define the format of the compressed video bitstream, its syntax, the rules for parsing it, and the rules for decoding it. What they do not define is a single mandatory encoding algorithm. In other words, the standard explains in detail what the result must look like and how it must be read correctly, but it does not force every manufacturer to cook the same dish using the same recipe.

That is exactly why two cameras with the same H.264 or H.265 label can produce very different streams. Their frame structure may differ. Their bitrate control may differ. Their motion analysis depth may differ. Their number of reference frames may differ. Their aggressiveness in compression may differ. Their internal preprocessing may differ. Their hidden presets may differ. And the user, looking at the web interface, usually sees little or none of that. At best, they see buttons labeled “High Quality,” “Smart Codec,” “Storage,” “Balanced,” and other polite euphemisms.

That is where the famous compatibility zoo comes from, the one engineers know all too well. One camera that is “supposedly H.265” works perfectly at a site. Another one is also “supposedly H.265,” and suddenly the server starts gasping for air, the archive inflates, the mobile client stumbles, and the user stares at the whole situation as if they have been deceived not only by the vendor, but by physics itself.

What H.264 and H.265 Actually Define

When people say “H.264 codec” or “H.265 codec,” they often create the false impression that they are talking about one concrete encoding method. In reality, this is not one method, but a set of rules, limits, and tools.

The standard defines how the compressed stream must be structured, what elements are allowed inside it, how they are described, how the decoder must interpret them, how the image must be reconstructed, which profiles and levels are valid, and what limits apply to image size, frame rate, buffers, processing complexity, and compatibility.

But the standard does not say: here is the one true encoder, here is its one true set of decisions, and all cameras in the world must think the same way. A manufacturer is free to choose how exactly to search for motion, how to decide on block partitioning, how to evaluate the importance of details, how to control bitrate, when to insert key frames, how many reference frames to keep, how aggressively to use B-frames, how to suppress noise, and how to save resources on its system-on-chip.

As a result, the standard gives a common grammar for the language, but it does not force everyone to write in the same handwriting. One camera writes neatly like a careful school student. Another writes broadly and with flair. A third writes like a doctor filling out a prescription. Formally, all of them may still remain valid within the same ecosystem.

Why Cameras with the Same “Codec” Encode Differently

The first and main reason is that manufacturers use different encoding engines and different hardware platforms. One camera may have a newer system-on-chip with decent hardware encoding. Another may use an older and rougher one. A third may have its own “smart” archive-saving add-ons. A fourth may focus on low latency rather than image quality.

The second reason is that even with the same formally available set of coding tools, there is still a huge amount of room for choice. Which modes should be enabled by default. Which ones should be disabled. Where should computation be saved. How thoroughly should motion be searched. How deeply should the scene be analyzed. How should night noise be handled. How should the balance between quality and archive size be chosen.

The third reason is that the user usually sees only a small fraction of the actual parameters. In the camera interface there may be resolution, frame rate, bitrate, sometimes profile, sometimes the key frame interval, and more rarely a “smart codec” toggle or a quality mode. But behind those few buttons there are dozens and hundreds of internal variables that the manufacturer simply never shows. Sometimes because they do not want to scare the user. Sometimes because they know perfectly well that if they showed everything honestly, the web interface would stop looking like a camera menu and start looking like a spacecraft launch console.

The fourth reason is that before the codec ever touches the image, the image has to be captured. That means noise reduction, sharpening, wide dynamic range, night mode, exposure, automatic gain, anti-flicker settings, white balance, optics, and electronics all step into the game. The camera may significantly alter the original material before encoding begins. And once the source material changes, the codec’s behavior changes too. Sometimes the archive becomes smaller not because the encoding is more brilliant, but because the image was smoothed down in advance until there was simply less left to encode.

Why Old and New Versions of the Standard Do Not Cancel Each Other Out

H.264 and H.265 did not appear on a single day and then freeze forever. These recommendations evolved over many years. New profiles were added. New levels were added. New service fields were added. Clarifications appeared for compatibility, color, bit depth, multiview modes, supplemental messages, and many other details. The standard lives, changes, and gets refined.

But a camera installed in the field lives by different rules. It may have been released five, seven, or ten years ago. Its hardware is fixed. Its encoding pipeline is whatever the vendor made it at the time. It does not turn modern just because the rest of the world moves on. It continues to operate honestly within the subset of capabilities that was built into it from the beginning.

That is why “support for H.264” or “support for H.265” almost never means support for the entire standard in full. Usually it means a very specific subset: certain profiles, certain levels, certain bit depths, certain chroma formats, a certain set of coding tools, buffer and processing limits, and the internal limits of the system-on-chip itself.

Old cameras continue to work not because the standards stopped changing, but because mainstream systems know how to live with different subsets of those standards. But that does not change the fact that newer or more sophisticated streams can turn out to be heavier, more demanding, and less predictable for an older server, decoder, or software client.

What the “Version of H.264” or “Version of H.265” in a Particular Camera Really Means

In engineering practice, the question of the codec version in a camera almost always has to be translated into plain language. Not “which version of H.265,” but “which exact capabilities and limitations from the H.265 ecosystem this camera actually supports.”

You have to look not at the elegant line in the datasheet, but at the actual properties of the stream. The important things are the profile, the level, and for H.265 also the throughput class, the bit depth, the chroma format, the group-of-pictures structure, the number of reference frames, the bitrate behavior, the presence or absence of B-frames, the intra and inter prediction modes, the service fields, and how all of that is implemented on the specific hardware.

That is exactly why having the same VMS does not by itself guarantee the same performance across different cameras. If a server worked calmly with one brand, that does not mean a new brand with the same H.264 or H.265 label will be just as light. The stability of the surveillance software does not rescue the situation here. What matters is the real stream profile, not the slogan.

Why Profile and Level Matter So Much

A profile defines which coding tools are allowed in the stream at all. A level defines limits on image size, processing rate, buffers, and other resources that the decoder must be able to handle. For H.265 there is an additional axis of limitations related to permissible stream throughput.

From a practical point of view, that means the following. Profile and level affect much more than just a label in the stream header. They affect decoder requirements, the allowed complexity of the stream, memory, processing speed, and often indirectly the choice of coding tools available.

If the camera produces a stream with a heavier set of capabilities, the server, recorder, or client may need more resources. Even if the resolution looks familiar and the bitrate looks similar. Not all differences live on the surface.

Resolution and Frame Rate: The Most Visible, but Not the Only Culprits

Resolution and frame rate always affect load, and there is no point arguing with that. The more pixels there are in each frame and the more frames per second must be processed, the more data must be compressed, transmitted, stored, and later decoded.

But there is an important trap here. Many people think that if the resolution and frame rate are the same, then the load will be about the same too. That is wrong. Resolution and frame rate define only the upper layer of the problem. They do not describe the full character of the stream. Two cameras at 1920×1080 and 25 frames per second can behave very differently if one uses more aggressive B-frames, more reference frames, a different bitrate control mode, and more complex motion analysis.

So yes, resolution and frame rate matter. But by themselves they do not explain the full picture. They are only the first page of the case file, not the entire volume.

Bitrate and Stream Control: This Is Where the Fate of the Archive Is Decided

Bitrate affects archive size almost directly. If a stream really stays in a constant mode, then the archive volume per day or month can be calculated rather predictably. But real life is rarely that tidy.

There are different ways to control the stream. Constant bitrate, variable bitrate with limits, more open-ended variable bitrate, constant quantization modes, modes aimed at constant visual quality, and also the manufacturers’ own “smart” approaches.

With constant bitrate, the archive is relatively predictable. But image quality starts depending more heavily on scene complexity. In a calm environment the image may look good, but in a difficult and noisy scene the encoder begins to suffocate and losses increase.

With variable bitrate or quality-oriented modes, the situation is different. The encoder starts spending bits depending on the scene content. In a quiet hallway the archive behaves modestly. At night, in rain, with noise everywhere, it suddenly remembers that disk space is not free and starts growing.

That is why the same “4 Mbps” shown in the interface of two different cameras does not necessarily mean the same real archive size or the same quality on a difficult scene. A great deal depends on how exactly the camera interprets that mode and what internal limits it applies.

Buffers and Level Restrictions: The Hidden Accounting of the Stream

There are parameters that the user rarely sees, but that have a major influence on stream behavior. These are limits on instantaneous bitrate, buffer sizes, and the model of how the stream fills those buffers. They are needed so that the bitstream stays within the declared level and so that the receiving side can digest it.

If the buffers are configured more loosely, the camera can allow sharper bitrate spikes, preserve quality better in difficult moments, and then save bits again in static scenes. If the buffers are tight, the stream becomes more disciplined, but quality in hard scenes may drop more noticeably.

The user almost never sees this. They see only the word “bitrate,” while all the internal accounting takes place without them. Which, as any experienced engineer knows, is exactly how the scariest accounting is usually done.

Group-of-Pictures Structure: Where Space Is Saved and Problems Are Hidden

One of the most important layers of encoding, which strongly affects archive size, compatibility, and stream behavior, is the group-of-pictures structure. This includes group length, the distance between key frames, fixed or adaptive structure, open or closed groups, behavior at scene changes, periodic intra refresh without hard key frames, and lookahead analysis of future frames.

If key frames appear often, the stream is easier to seek through and easier to recover after losses, but the archive usually grows. If key frames are placed more rarely, compression improves, but the stream becomes more dependent on neighboring frames and less convenient in some access scenarios.

A long group of pictures almost always helps compression. But once packet loss, unstable networks, attempts to open archived footage from arbitrary positions, or mobile clients with imperfect decoders enter the story, the old savings can suddenly come back to bite.

B-Frames: The Archive’s Best Friend and Sometimes the Enemy of Simplicity

B-frames allow the encoder to use information not only from the past, but also from other neighboring frames. This is a very powerful compression tool. When implemented well, it reduces stream size without a catastrophic drop in quality.

But B-frames make the stream structure more complex. Their number, depth, hierarchy, adaptive use, and selection as reference frames strongly affect encoder load, memory, latency, and compatibility.

One manufacturer may use very few B-frames for the sake of simplicity and low latency. Another may rely heavily on them to save archive space. As a result, two streams with the same resolution and the same nominal bitrate can behave very differently in practice, both in quality and in how easy they are to decode.

This is where surprising stories often come from, such as “why does the new camera with the same resolution store less but make the server work harder?” Because saving space and being easy to process do not always go hand in hand.

Reference Frames: A Quiet Parameter with Loud Consequences

The number of reference frames has a strong effect on inter-frame prediction. The more useful previous frames the encoder is allowed to keep in memory, the better it can sometimes find matches, and the more efficiently it can compress repeating areas of the scene.

But every clever thought comes with a bill. Memory requirements grow. Motion search becomes more complex. Sometimes decoder load grows as well. And a very rich lifestyle in terms of reference frames does not always combine well with older or weaker hardware.

In practice, increasing the number of reference frames often helps compression, especially in calm and repetitive scenes. But it is not a free gain. It is an exchange of archive space for computation and resources.

Motion Estimation: The Place Where Cameras Think Differently

Motion estimation is probably one of the most important and most hidden sources of difference between vendors. This is where the encoder decides how to search for similarity between blocks in neighboring frames, how wide the search should be, how deeply positions should be refined, how far candidate shifts should be considered, and how much computation should be spent on all of that.

If motion estimation is rough, the encoder runs faster and lighter, but compresses worse. If motion estimation is thorough, it can produce a more compact stream or better quality at the same bitrate, but the computational cost rises sharply.

For the user, all of this is usually wrapped into a single checkbox like “quality” or “speed.” In reality, an entire world may be changing underneath. That is why two cameras with the same nominal bitrate on the same scene can produce different archives and different visual results. One truly searches for motion. The other only pretends to search for motion, because it is in a hurry to go home.

Block Partitioning: An Especially Important Topic for H.265

In H.264, the image is partitioned according to one logic. In H.265, that system became much more flexible and much more complex. A richer hierarchy of large and small coding and transform units appeared, with more choices of sizes and shapes.

That gave H.265 one of its main strengths. It can adapt more effectively to the structure of the image. Large uniform areas, tiny details, complicated borders, different types of motion, all of that can be described more flexibly.

But flexibility never comes free. The more partitioning variants the encoder analyzes, the heavier the computation becomes. That is why the quality of the H.265 implementation differs so much from vendor to vendor. A good implementation delivers a serious compression advantage. A poor one turns a modern codec into a heavy way of getting an average result.

This is where a good part of the disappointment comes from when people say, “we turned on H.265 and nothing magical happened.” Magic is rarely included in the box.

Intra and Inter Prediction: How the Encoder Guesses the Picture

The encoder is constantly trying to guess the best way to describe the image content. It can predict an area from neighboring parts within the same frame. It can search for similarity in previous or nearby frames. It can transmit the difference between the prediction and reality. It can use different directions, modes, and methods of combining candidates.

The richer and smarter the prediction system, the better the compression usually becomes. But then computation becomes harder, and the final result depends even more on the quality of the implementation. One encoder will save bits by using skips and merges more often. Another will spend more time looking for better matches. A third will be too protective of detail and increase the stream size.

In practice, this is one of the places where different vendors’ encoders show their personality. One loves thrift. Another loves confidence. A third likes doing everything its own way and then calling it a proprietary technology.

Transform and Residual Coding: The Heart of Losses and Trade-Offs

After prediction, the encoder still has to code what remains unsaid, that is, the difference between the predicted image and the real one. This is where transforms and subsequent quantization come in. The more precise and flexible these processes are, the more opportunities there are for efficient compression.

H.264 already had a serious set of such tools. H.265 expanded that world even further. It added more flexible sizes and depths of transforms, which is especially useful for different kinds of image content.

But again, this is a question of computational cost. Deep and careful residual analysis improves compression, while rough and fast analysis saves computation. In inexpensive cameras, this choice is often made not in favor of perfection, but in favor of making sure the system-on-chip can survive real-time operation at all.

Quantization: Where the Main “Magic of Quality” Actually Hides

Quantization determines how roughly the encoder will discard fine differences in the signal to save bits. The stronger the quantization, the smaller the stream and archive become, but the more detail is lost in textures, shadows, and complex image areas.

There is an entire universe of parameters here. The base quantization level, allowed limits, separate offsets for different frame types, adaptive quantization, importance weighting of image areas, perceptual choices, and optimization based on both distortion and bit cost.

This is where the encoder decides what it considers important and what it does not. A moving human face, a noisy background, leaves in the wind, a car plate, night asphalt, blown-out sky, a lamp in the frame, clothing texture, all of these may be judged differently.

As a result, one camera at the same bitrate may produce a cleaner-looking image but destroy fine texture. Another may preserve texture but inflate the stream. A third may jump in quality from one scene to another. And all three will still insist that they were trying their best.

Entropy Coding: A Boring Name with a Very Important Effect

This part is rarely discussed outside professional circles, but it noticeably affects the efficiency of how the stream is packed. In H.264, the choice of entropy coding method had important consequences for the trade-off between compression efficiency and computational complexity. In H.265, this area became even more sophisticated internally.

The basic idea is simple. Even after all decisions about prediction and quantization have been made, the resulting data still has to be packed into a bitstream as intelligently as possible. More efficient packing almost always means a smaller stream, but it also requires more complex processing.

The user usually does not see this. The server, the decoder, and the archive certainly do.

In-Loop Filters: Why the Picture May Improve or Degrade During the Process Itself

Both H.264 and H.265 use internal filters that are applied during image reconstruction and influence later prediction. In H.264 this is above all the deblocking filter. In H.265 more modern mechanisms are added on top of that.

These filters affect both the visual appearance and the efficiency of compression. If they are disabled or weakened, computation can become a little lighter, but the image may turn rougher and compression efficiency may worsen as well. If they are used fully, the image often improves, but the computational price grows.

So even parameters that seem to belong “only to picture quality” often also affect archive size and performance.

Weighted Prediction, Skips, and Merges: Small Things That Are Not Small at All

Some coding tools are especially useful when lighting changes in the frame, when headlights appear, when shadows move, when flashes happen, and when the real world behaves like the real world. Weighted prediction allows the encoder to handle brightness changes between frames more intelligently. Skip and merge modes allow it to describe areas efficiently when little has changed or when an already found solution can be reused.

All of this can bring noticeable compression gains. But only if it is implemented well. With one manufacturer, such tools genuinely help. With another, they exist mostly for checkboxes and marketing. With a third, they may work so aggressively that the archive becomes smaller while the actual usefulness of the image for surveillance shrinks along with it.

Parallel Processing and Real Performance

Camera performance and server performance depend not only on what is being encoded, but also on how the processing is organized. Division into regions, multithreaded operation, internal queue behavior, methods of parallelizing analysis and motion search, all of this affects how many resources are required to achieve the same result.

One manufacturer may choose a rougher but faster scheme so that the camera runs confidently in real time on weak hardware. Another may use richer logic in pursuit of better quality, but require a more serious system-on-chip. A third may choose a compromise and enable more advanced modes only under certain conditions.

For the user, the result usually looks simple: one camera is “light,” another is “heavy.” But behind that simplicity there is a very different internal architecture.

Presets and Proprietary Modes: The Place Where Marketing Meets Engineering

One of the most underestimated problems in the market is that the user rarely controls individual technical parameters directly. Instead, the user sees presets. “High Quality.” “Low Latency.” “Storage Saving.” “Smart Codec.” “Scene Mode.” “Night Priority.” And so on.

Behind one such button, the camera may be changing group-of-pictures length, B-frame count, reference frame count, lookahead depth, quantization settings, filters, bitrate strategy, scene change behavior, skip aggressiveness, noise reduction level, and many other things at the same time.

So when a camera promises a “smart codec,” that is not necessarily a lie. It is simply a very broad promise. Behind it may be a genuinely useful set of heuristics, or a collection of decisions that save archive space mainly by quietly eating details, something the user notices later, right when they need to read a plate number, identify a face, or inspect a small but crucial event.

Service Fields and Metadata: Not the Main Factor in Size, but a Major Factor in Compatibility

Besides the image itself, the stream contains service information. Color parameters, ranges, aspect information, additional messages, display metadata, and sometimes information needed for other capabilities.

Usually these elements do not make the archive dramatically larger. But for compatibility and correct playback they can be very important. This is exactly where an older client may suddenly start behaving oddly with a newer stream. Formally, the stream is correct. In practice, compatibility is less than perfect.

That is how the integrator’s favorite stories are born: “everything is by the standard, and yet the site has another surprise.”

What Has the Strongest Effect on Archive Size

If we remove the exotic cases and leave the main suspects, then archive size is most strongly affected by resolution, frame rate, bitrate control mode, actual scene complexity, group-of-pictures length, the presence and policy of B-frames, the number of reference frames, quantization strength, adaptive quality redistribution mechanisms, motion estimation quality, internal filters, noise reduction before encoding, and the manufacturer’s proprietary presets.

In other words, the archive is not determined by one line that says “H.265 codec,” but by an entire ecosystem of decisions. That is exactly why replacing a camera with one that is “the same on paper” can sometimes change real storage consumption much more than anyone would like.

What Has the Strongest Effect on Server, Client, and Decoder Load

On the receiving side, the biggest influences on performance are profile, level, bit depth, chroma format, resolution, frame rate, the number of reference frames, group-of-pictures structure, the presence of B-frames, buffer limits, and the general nature of the bitstream itself.

It is important to understand one simple thing. A decoder is stressed not only by the number of pixels, but also by the properties of the stream itself. Two streams with the same resolution may be very different in decoding complexity. That is why server sizing cannot be based only on the number of cameras and megapixels. You have to look at the real streams.

This is exactly the engineering point that often gets lost in conversations of the “but it worked before” variety. What worked before was not some abstract H.264. What worked before was a specific set of specific streams.

Why the Camera Web Interface Shows Only a Small Slice of Reality

Because if manufacturers showed the user everything, the camera would risk becoming a course in compression theory rather than a device. They are forced to simplify the interface. So they leave only the most understandable parameters at the top: resolution, frame rate, bitrate, sometimes profile, sometimes key frame interval, sometimes a “smart” mode switch.

Everything else is hidden inside. And even the parameters that are shown are often not called by their true engineering names. Instead of the precise name, the user sees a convenient marketing label. Instead of a group of real decisions, they get one switch. Instead of ten internal variables, they get the phrase “Balance of quality and storage.”

That is not necessarily a bad thing. But it is exactly why many end users end up confused. The camera says the same thing on the label, the same numbers seem to be set in the interface, and yet the result in the field is completely different. This is not magic. It is simply complicated engineering wrapped in friendly buttons.

What This Means for the Design of Video Surveillance Systems

The main practical conclusion is very simple. When designing or upgrading a system, you cannot rely only on the brand, only on the H.264 or H.265 label, or only on the few parameters visible in the web interface.

You need to evaluate the real stream profile. Which profile and level the video uses. What the bit depth is. What the group-of-pictures structure is. What the real average and peak bitrate are. Whether B-frames are present. How many reference frames are used. How the stream behaves during daytime and at night. How it reacts to noise, rain, moving leaves, vehicles, and difficult lighting. What the “smart codec” actually does. What noise reduction does. How the stream is decoded on the server, on the mobile client, in the archive, in the browser, and during analytics.

If that is not done, then replacing cameras turns into a lottery. Sometimes a pleasant one. Sometimes an expensive one. And sometimes a very educational one. The lesson is usually paid for in disks, processor time, and nerves.

Final Thoughts

H.264 and H.265 are not one single encoding algorithm with fixed and identical behavior across all manufacturers. They are families of compressed bitstream formats and decoding rules, within which a large set of tools, limits, and encoder implementation choices is allowed.

That is exactly why two cameras with the same label on the box may compress the image differently, consume archive space differently, load the server differently, and behave differently in a real surveillance system. The stability of the software alone does not save you here. What has to be evaluated is not some mythical “same codec,” but the actual stream parameters and the real behavior of a specific camera.