Disclaimer: LLMs were used for proof-reading and grammar check.
C++20 gave us coroutines. The machinery needed was included:co_yield, co_return, co_await but the standard library had no concrete coroutine types. You had to write your own promise type, your own iterator, your own bookkeeping. The complexity of this boilerplate was intimidating for most people.
C++23 fixes the most obvious gap: std::generator<T>. It’s a synchronous, pull-based, lazy sequence. You write a function that co_yields values, and the caller iterates over them with a range-for. No allocator gymnastics, no hand-rolled promise types. See appendix for an example of creating your own generator.
Let’s see this feature in detail, with some examples and illustrate some caveats.
What std::generator actually is
An std::generator<T> is a view. It satisfies std::ranges::input_range. That means it plugs into the entire ranges pipeline. You can use it with useful functions like views::take, views::filter, views::transform and without ever materializing the full sequence in memory.
The function suspends at each co_yield and resumes when the caller asks for the next element. Values are produced one at a time, on demand. If the caller stops iterating, the remaining values are never computed.
This makes it trivial to express infinite sequences, stateful streams, or anything where computing the entire result set upfront is wasteful or impossible.
Example 1: Fibonacci, forever
Take one of the most common arithmetic sequences: the Fibonacci. An infinite sequence that would blow up a std::vector but works fine as a generator because nothing is stored.
#include <cstdint>
#include <generator>
#include <iostream>
#include <ranges>
std::generator<std::uint64_t> fibonacci() {
std::uint64_t a = 0, b = 1;
while (true) {
co_yield a;
auto next = a + b;
a = b;
b = next;
}
}
int main() {
for (auto n : fibonacci() | std::views::take(20)) {
std::cout << n << '\n';
}
}
There’s no sentinel, no size. The generator runs until you stop pulling from it. Compose it with take, filter, or drop whatever you need. The generator doesn’t know or care.
A few things worth noticing:
- The
while (true)loop never terminates on its own. That’s fine. When the caller destructs the generator (by leaving the range-for), the coroutine frame is destroyed and the loop just stops. - You get the same performance characteristics as writing a manual iterator class, without actually writing one.
- The return type is
std::generator<std::uint64_t>. That’s it. No template metaprogramming, no CRTP base class.
Note: Compare how the generated assembly differs between this example and a hand-rolled Fibonacci range iterator coroutine in the appendix!
Example 2: Camera driver class
Fibonacci is clean but not realistic. Here’s something closer to production: a camera device that yields frames lazily, where each yield can succeed or fail. We pair std::generator with std::expected to get a typed error channel without exceptions.
In your camera interface, you could define something like:
using FrameResult = std::expected</*struct*/ Frame, /*enum class*/ FrameError>;
And to facilitate a polling mechanism from the device something like:
virtual std::generator<FrameResult> frames() = 0;
Now creating non-determnistic test cases that yield FrameErrors randomly can be a piece of cake! A mock class representing the camera device can be as simple as:
class CameraMock : public CameraI {
public:
explicit CameraMock(std::uint32_t w, std::uint32_t h)
: width(w), height(h) {}
std::generator<FrameResult> frames() override {
std::random_device rd;
std::mt19937 rng{rd()};
std::uniform_int_distribution<int> fault(0, 19);
while (true) {
int roll = fault(rng);
if (roll == 0) {
co_yield std::unexpected(FrameError::Timeout);
}
else if (roll == 1) {
co_yield std::unexpected(FrameError::CorruptedData);
}
else if (roll == 2) {
co_yield std::unexpected(FrameError::DeviceLost);
}
else {
co_yield getMockFrame(width, height, seq++);
}
}
}
private:
std::uint32_t width, height;
std::uint64_t seq{};
};
The generator lives inside a CameraMock class. Each iteration either produces a frame or reports an error and it is up to the caller to decide how to handle it. The consumer doesn’t know anything about the camera’s internal state machine. It just pulls results from a range.
This pattern of using generator of expected turns out to be genuinely useful. The producer can signal errors inline without throwing and the consumer handles them in the same loop that processes success values. No separate error callback, no out-of-band signalling. The control flow reads top to bottom.
You could also compose this with ranges. Want to skip errors and only process valid frames?
auto good_frames = camera.frames()
| std::views::filter(&CameraI::FrameResult::has_value);
Note: this range is not evaluated yet. I like to think of the variable
good_frameslike an un-invoked lambda: evaluation will be done while iterating it. Of course, lambdas capture variables at the time of definition, not invocation. This comparison isn’t an 1-to-1.
Whether you should do that depends on whether you need to log the errors. But the option is there, for free, because std::generator is a range.
Or use the monadic interface on std::expected directly. Each element in the generator is already an expected, so you can chain and_then and or_else per-frame without unwrapping anything yourself:
for (auto&& result : camera.frames() | std::views::take(100)) {
result
.and_then([&](const Frame& frame) -> CameraI::FrameResult {
// do something
++processed;
return frame;
})
.or_else([&](FrameError e) -> CameraI::FrameResult {
std::cerr << std::format("[cam] error: {}\n", to_string(e));
++errors;
return std::unexpected(e);
});
}
and_then runs only when the expected holds a value. or_else runs only on the error path. No if/else, no operator*. The types route execution(!). You can also propagate or convert errors mid-chain if a processing step can also fail. Can an interface get cleaner than that?
The sharp edges
A few things to know before you consider this for production:
Single-pass. std::generator is an input range, not a forward range. You can iterate it exactly once. If you need to make two passes, collect the values into a container first.
Move semantics matter. When you co_yield a value, the generator stores a pointer to the yielded object. For large types like our Frame, use co_yield with a moved variable or a temporary and consume the value before the next iteration advances the coroutine. Holding a reference across iterations is undefined behaviour.
No co_await. std::generator is synchronous. If you need to co_await an async operation inside the coroutine body, you need a different type. Generators yield values; they don’t wait on futures.
HALO: when the heap allocation disappears
Every coroutine needs a frame to store its local variables, parameters, and suspension-point bookkeeping. By default that frame lives on the heap: a new on creation, a delete on destruction. For a generator that yields millions of values in a tight loop, that allocation is noise. But it’s still there, and in latency-sensitive or real-time code it can matter.
HALO stands for Heap Allocation eLision Optimization. If the compiler can prove that:
- the coroutine’s lifetime is bounded by the caller, and
- the frame size is known at compile time
then it can allocate the coroutine frame on the caller’s stack (or in registers) instead of the heap. The new/delete pair vanishes entirely. This isn’t optional optimisation in the “maybe the compiler will do it” sense. It’s a well-defined elision that compilers actively implement.
The key factor is visibility. The compiler needs to see both the coroutine body and the call site in the same translation unit, and it needs to prove that the coroutine object doesn’t escape.
// HALO-friendly: coroutine defined in same TU, lifetime bounded by caller
generator<int> range(int from, int to) {
for (int i = from; i < to; ++i)
co_yield i;
}
int main() {
auto s = range(1, 10);
return std::accumulate(s.begin(), s.end(), 0);
// s destroyed here — compiler sees the full lifetime
}
// HALO-unfriendly: coroutine defined in another TU
generator<int> range(int from, int to); // declaration only
int main() {
auto s = range(1, 10); // compiler can't see the ramp function
return std::accumulate(s.begin(), s.end(), 0);
}
The set of functions the compiler must inline is small and bounded: the coroutine ramp function, get_return_object(), begin(), the constructor/move-constructor/destructor of the generator type, and coroutine_handle<>::destroy. Critically, neither the coroutine body itself nor the algorithm consuming it (accumulate in this case) needs to be inlined. The optimizer only needs to see enough to prove that every path from creation to scope exit calls destroy on the coroutine handle.
In practice this means: keep your generators close to their consumers. Define them in headers or the same file, consume them in tight scopes, and don’t stash them in containers or pass them across API boundaries if you care about the allocation. When HALO kicks in, an std::generator loop compiles down to the same code as a hand-written state machine with no use of heap and without any indirection.
Wrapping up
std::generator isn’t a revolution. It’s the missing piece that makes C++ coroutines usable for the most common case: producing a sequence of values lazily. The machinery was already there; now there’s a standard type that does what everyone was avoiding anyway.
Pair it with std::expected and you get a clean, composable pattern for fallible streams. Pair it with ranges and you get lazy pipelines with zero allocations beyond the coroutine frame. That’s a good trade for a one-line return type change (exaggerating).
Compiler explorer
You can find the above example compiled here:
Appendix
In Computer Science, It’s all about tradeoffs. Of course there are some everywhere! Notice the code bellow:
If we enable the comment on line 38, then we can make the de-reference cheaper and the difference to the produced assembly code is that std::generator stores the yielded value via a pointer in the promise and operator* and must dereference it (mov rax, QWORD PTR [rax]), adding a dependent load.
Also begin() is cheaper: the FibonacciRange returns a small iterator (just a coroutine handle) in two registers (rax+rdx). std::generator::_Iterator is larger or non-trivially constructible, forcing the caller to pass a destination pointer in rdi, which means an extra memory round-trip.
Finally FibonacciRange has a smaller stack footprint: it uses less stack space (variables around [rbp-96]) vs the generator block ([rbp-176]), reflecting a smaller iterator/view type.
This gap can get even larger with -O3, but I’ll leave that to the reader…