Mastering memory efficiency with std::span
Before diving into the usage of std::span
, it is important to understand the concept of ownership of resources. When a container such as a std::vector
is created, the container has ownership of the memory used to store data inside of it, meaning that it is the responsibility of the container to manage the lifecycle of this memory. Let’s look at a few common scenarios where a std::vector
is used and determine if the ownership is affected:
- Viewing data in a container:
int main() {
std::vector<int> myVector{1, 2, 3};
for (const auto& value : myVector) {
std::cout << value << " ";
}
std::cout << std::endl;
}
// Output:
// 1 2 3
In this example, a range-based for
loop is used to iterate over myVector
in the main
function. Here, myVector
has ownership of the memory used to store the data — The creation of the std::vector
allocated memory, which will be deallocated when main
exits. Iterating over myVector
doesn’t affect ownership of the container.
const auto&
is used within the for
loop for improved efficiency and best practices — const
indicates that value
should not be modified, auto
lets the compiler deduce the type, and the &
symbol turns value
into a reference instead of an unnecessary copy of data.
- Passing a container by value:
int calculateSum(std::vector<int> data) {
int sum = 0;
for (const auto& value : data) {
sum += value;
}
return sum;
}
int main() {
std::vector<int> myVector{1, 2, 3};
const auto sum = calculateSum(myVector);
std::cout << "Sum: " << sum << std::endl;
}
// Output:
// Sum: 6
In this example, we create myVector
just as before, then pass it by value into the calculateSum
function. Passing myVector
by value creates a copy of the container for calculateSum
as a parameter named data
. This means that data
owns its own copy of myVector
while myVector
still has ownership of the original data inside main
.
This is okay in some situations. Because calculateSum
is simply reading the contents of myVector
to calculate a sum and it does not modify the contents of myVector
, then this technically works (even though unnecessary copies are made). However, consider the following example of pass-by-value where the programmer assumes that myVector
will be modified by another function:
void incrementEachValue(std::vector<int> data) {
for (auto& value : data) {
value++;
}
}
int main() {
std::vector<int> myVector{1, 2, 3};
incrementEachValue(myVector);
for (const auto& value : myVector) {
std::cout << value << " ";
}
std::cout << std::endl;
}
// Output:
// 1 2 3
Here, we pass myVector
by value into a function that should hopefully modify the contents of myVector
. But because pass-by-value makes a copy of myVector
, then this does not work. Instead, myVector
needs to be passed by reference.
- Passing a container by reference:
void incrementEachValue(std::vector<int>& data) {
for (auto& value : data) {
value++;
}
}
int main() {
std::vector<int> myVector{1, 2, 3};
incrementEachValue(myVector);
for (const auto& value : myVector) {
std::cout << value << " ";
}
std::cout << std::endl;
}
// Output:
// 2 3 4
By passing myVector
by reference instead of by value, incrementEachValue
now operates on the actual data inside of myVector
. This solves the previous problem!
- Moving a container using
std::move
:
int calculateSum(std::vector<int> data) {
int sum = 0;
for (const auto& value : data) {
sum += value;
}
return sum;
}
int main() {
std::vector<int> myVector{1, 2, 3};
const auto sum = calculateSum(std::move(myVector));
std::cout << "Sum: " << sum << std::endl;
}
// Output:
// Sum: 6
In this example, we use std::move
to transfer ownership of myVector
to the calculateSum
function. This allows us to remove any unnecessary copies created when using pass-by-value without having to pass anything by reference. “Moving” the data also clearly indicates ownership of resources.
The problem with this approach is that myVector
will now be empty since we “moved” it. This is perfectly fine in many cases, but for this example, does calculateSum
really need ownership of the data? Wouldn’t it be ideal for calculateSum
to have a clear non-owning view of the data while avoiding copies, and keep the ownership of data with myVector
inside of main
?
Introducing std::span
In C++20, we now have std::span
, which is a lightweight, non-owning view of contiguous data. This provides similar benefits as pass-by-reference while clearly indicating that the user of a std::span
does not own the data they are reading/modifying. Here is the incrementEachValue
example rewritten to use std::span
:
void incrementEachValue(std::span<int> data) {
for (auto& value : data) {
value++;
}
}
int main() {
std::vector<int> myVector{1, 2, 3};
incrementEachValue(myVector);
for (const auto& value : myVector) {
std::cout << value << " ";
}
std::cout << std::endl;
}
// Output:
// 2 3 4
This is nearly identical to the pass-by-reference example but with std::span
instead. A std::span
is essentially just a struct
with two fields: A pointer to the beginning of contiguous data and a “size” field. This makes the std::span
extremely lightweight. When myVector
is passed into incrementEachValue
, we are essentially passing in a pointer to the underlying data of myVector
as well as the size of myVector
, rather than the entire container.
Please note that this incrementEachValue
example is simply a basic example intended to show the usage of std::span
, which is not necessarily the best way to actually increment each value of a collection. That can be optimally achieved with a one-liner like this: std::for_each(myVector.begin(), myVector.end(), [](auto& value) { value++; });
Other benefits of std::span
In addition to eliminating unnecessary copies of data, another useful benefit of std::span
is that it provides a unified interface that can be used for many collections of data, including C-style arrays. Consider the following example:
void incrementEachValue(std::span<int> data) {
for (auto& value : data) {
value++;
}
}
int main() {
int myArray[]{1, 2, 3};
incrementEachValue(myArray);
for (const auto& value : myArray) {
std::cout << value << " ";
}
std::cout << std::endl;
}
// Output:
// 2 3 4
This is the same example used previously, except that main
is now using a C-style array instead of std::vector
. Similarly, if a buffer is provided through the usage of a pointer and a “size” field instead of an actual array, std::span
can handle this as well:
int calculateSum(const std::span<const int>& data) {
int sum = 0;
for (const auto& value : data) {
sum += value;
}
return sum;
}
int main() {
int* myBuffer;
size_t myBufferSize;
getBuffer(&myBuffer, &myBufferSize);
const auto sum = calculateSum(std::span<int>(myBuffer, myBufferSize));
std::cout << "Sum: " << sum << std::endl;
}
// Output:
// Sum: 6
You may be wondering: “Why does calculateSum
take in a const std::span<const int>&
instead of just std::span<int>
in this example? I thought std::span
removes the need for pass-by-reference?” It is true that pass-by-reference is no longer needed for the underlying data of the std::span
, but since the std::span
is constructed in main
, then the std::span
itself should be passed by const
reference if following best practices. Lastly, we have a view of const int
rather than int
in this case because calculateSum
does not modify the underlying data values.
Another great benefit of std::span
is that slices or “subspans” can be passed around instead of entire collections. When a container is passed by reference, a reference to the entire container is given, which may not be ideal in some cases. Consider the following example:
int calculateSum(std::span<int> data) {
int sum = 0;
for (const auto& value : data) {
sum += value;
}
return sum;
}
int main() {
std::vector<int> myVector{1, 2, 3, 4};
std::span<int> mySpan{myVector};
std::span<int> mySubspan{mySpan.subspan(0, mySpan.size() / 2)}; // First half
const auto sum = calculateSum(mySubspan);
std::cout << "Sum: " << sum << std::endl;
}
// Output:
// Sum: 3
In this example, we create a subspan of the first half of myVector
and calculate the sum of just those elements. This allows us to keep the same simple calculateSum
function and create various non-owning subspans without creating any copies of data.
Drawbacks of std::span
The obvious drawback of std::span
is that it only works for contiguous data. However, std::vector
and std::array
, as well as C-style arrays are all commonly used contiguous data structures, so std::span
has plenty of use cases.
Conclusion
std::span
is a very useful C++20 feature for operating on contiguous collections of data. This lightweight, non-owning view allows programmers to have a unified interface for functions (even if they interface with legacy C-style arrays/buffers), easily pass around slices of data collections without unnecessary copies, and overall perform many tasks with high memory efficiency and few complications.