C++ Beyond the Syllabus #9: std::string_view
The modern C++ solution for read-only access of string data.
Not a Medium member? View this entire article here!
This article is a part of the C++ Beyond the Syllabus series. Subscribe here to receive each new issue directly in your inbox.
You’re probably familiar with both C-strings and C++’s std::string
. You might be wondering why we need yet another string type. The answer is that std::string_view
isn’t really another string type — it doesn’t actually own any string-like state.
A Quick Recap
C-Strings
C-strings are null-terminated (i.e. — end with '\n'
) arrays of characters used in both C and C++. These require careful management to avoid common issues like buffer overflows.
C-strings can be allocated pretty much anywhere depending on how they are defined…
const char* str_literal = "Hello!"; // Allocated in read-only static memory
char stack_str[100]; // Allocates memory on the stack
char* heap_str = new char[100]; // Allocates memory on the heap
If you use the new
keyword to allocate a C-string (char array) on the heap, you must remember the delete[]
keyword as well.
The only way to know the size of a C-string without caching it somewhere is by traversing until you come across the null-character.
std::string
The C++ STL introduced std::string
to provide safer and more convenient string usage.
std::string
maintains a C-string under the hood along with a ton of helper functions to help you efficiently operate on it.
For very short strings, std::string
uses small string optimization (SSO) and avoids heap allocation, but this is very much the exception.
Why don’t these suffice?
Well, a lot of applications work with both C-strings and std::string
s. Even when an application uses a single string representation throughout, it may use libraries that expect a different string type, requiring either redundant library code or unnecessary conversions and copies of data.
Prior to std::string_view
, code had three main options to be compatible with both string types:
- Duplicate all logic: For any code operating on
std::string
, write equivalent code for C-strings. This doesn’t even work all of the time — it falls apart (or becomes far more convoluted), when you need both string types to be represented in a container. - Handle all
std::string
s as C-strings: We can efficiently grab the underlying C-string from anystd::string
object, but then we’re effectively disregarding all of the great features that come withstd::string
. - Convert all C-strings to
std::string
s: This requires traversing the entire C-string (which can be arbitrarily long!) to find the size, then allocating the memory to store it within a newstd::string
, and then traversing the entire C-string again to copy its contents over to thestd::string
! Needless to say this option won’t suffice in any performance-sensitive application.
Introducing std::string_view
Hopefully now you’re onboard with the need for a new way to efficiently handle strings of either type.
Unfortunately for us, we can’t have it all…
std::string
is the C++ mechanism for making string manipulation safe and convenient — that already exists. If we want to manipulate strings of type C-string and std::string
in a uniform way, we’re still going to have to convert all C-strings to std::string
s or vice versa.
What we can do, however, is create an efficient read-only wrapper around C-strings, which can be used for std::string
’s underlying C-string as well.
Designing std::string_view
If you were to design such a wrapper, how would you do it?
Here are a few observations/requirements to get us started. We want…
- to make initializing the wrapper efficient.
- instances of the wrapper to have a small memory footprint.
- the wrapper to be capable of performing all of the same read-only operations as
std::string
.
All of these requirements are actually pretty simple to achieve.
The std::string_view
implementation only has two member variables…
const char*
pointing to the first charactersize_t
representing the size of the string
Together, these members require 16 bytes of memory (or 8B on a 32-bit system), which is pretty small in the grand scheme of things.
std::string_view
also provides nearly all of the read-only helper functions that std::string
provides, enabling convenient comparison, substring, and hashing functionality.
Constructing std::string_view
There are many constructors, but there are two that I think are worthy of a shoutout:
constexpr string_view( const CharT* start, size_type size );
This constructor is trivial. You pass in a potentially-constchar*
and string size (not including the null character). The time complexity to construct thestd::string_view
is then constant because the member variables can be directly initialized with these arguments.
This is generally the constructor you’d use to build astd::string_view
from astd::string
because you’ll already have a pointer to the first character (std::string::data()
) and the size (std::string::size()
). If you happen to be storing the size of a C-string, you can use this constructor for those too.constexpr string_view( const CharT* start );
This is generally the constructor you’ll use when building astd::string_view
from a C-string because C-strings do not cache their size. This constructor has linear time complexity as it must traverse the C-string until it finds a null character.
Notably, there is no constructor with a std::string
parameter. This is presumably because implicit construction of a std::string_view
from a std::string
would lead to confusing scenarios where the programmer is unsure if an owning or non-owning object was created. We’ll touch on other gotchas regarding ownership in the next section.
What’s the catch?
This isn’t the end-all-be-all future of string manipulation in C++. There are actually quite a few drawbacks and reasons not to use std::string_view
.
They Are Non-Owning
If you use them right, this is a good thing. The fact that std::string_view
does exactly what its name suggests (i.e. — view strings), as opposed to owning them, makes them an ultra-light weight way to view both C-strings and std::string
s.
This same property can cause some hiccups, though. What do you suspect the following example will output?
#include <iostream>
#include <string>
#include <string_view>
std::string_view generate_string()
{
const std::string temp = "Temporary String";
std::string_view temp_view_1(temp.data(), temp.size());
std::cout << "temp_view_1: " << temp_view_1 << "\n";
return temp_view_1;
}
int main()
{
std::string_view temp_view_2 = generate_string();
std::cout << "temp_view_2: " << temp_view_2;
}
Let’s walk through it.
The std::string_view
temp_view_1
is created atop of the local string temp
. When generate_string
returns temp_view_1
, the underlying data goes out of scope!
If we run this example in Compiler Explorer, we unsurprisingly see that temp_view_2
is observing some garbage, undefined memory:
temp_view_1: Temporary String
temp_view_2: C��Oaq�
As the example above illustrates, std::string_view
is similar to an lvalue reference or pointer in the sense that you need to make sure the underlying data outlives the view. If not, any accesses of the std::string_view
will read undefined memory.
Likewise, std::string
can grow in size (similar to std::vector
). If the size of a std::string
outgrows its capacity, a new, larger section of memory will be allocated and all of the underlying data will be copied to the new location. This will invalidate any std::string_view
that pointed to the first character of the std::string
before it grew. Any future accesses to such a std::string_view
will read undefined memory.
The fix is simple:
Your code must ensure the lifetime of a
std::string_view
never outlives the underlying data it was constructed from.
When Not To Use
There are a few clear scenarios outlining when not to use std::string_view
:
- If your application only works with
std::string
s,const std::string&
is always more space-efficient thanstd::string_view
as the single reference will be half the size of astd::string_view
. - When the lifetime of the underlying data is unknown. For example, you may not want to use
std::string_view
to represent string-like data returned from a function or when processing string data across multiple threads.
Wrapping Up
Large C++ applications often include various libraries, input sources, and other logic, which employ both C-string and std::string
representations of string-like data. std::string_view
is the go-to modern C++ solution for read-only access of string data in these applications.
What’s Next?
If you liked this article, please consider clapping and following!
Subscribe here to receive weekly-ish new issues of C++ Beyond the Syllabus directly in your inbox.