Omar Abid

A brief introduction to Rust

This blog post will take a deep dive into the Rust world of mutability. By deep dive, it means the blog post is considerably long. So it will take time to go through the different examples. The topic we will dive through is specific but we will have to go through various Rust concepts: Ownership/Borrowing, Lifetimes, Unsafe, Sync, Closures, Macros and more. This might be intimidating and I think this is where many developers are put off.

This article assumes some familiarity with Rust; that is if you have successfully run a "Hello World" program, then you are qualified! If you have struggled with some of the hard concepts then this article could be a good introduction. It will not go deep for each one of these but will give you enough knowledge to help you understand the bigger picture.


We will start from a reasonably easy problem: We want to access a particular variable from any location in our program; by access, I mean both read and write access. That might seem like a straightforward thing to do, especially if you are coming from other languages like JavaScript. If we have a variable (and the browser is our sandbox), then we can make it global by attaching it to the window object. So if we have window.config, we can access (both read and write) the config variable from anywhere in our code.

So our problem is we have this config object but in Rust. The configuration object is a struct. It can be set and initialized somewhere in our program. The initialization bit is important as we will find out later. Our configuration could be simple default values set in the code, and updated later; or it could require reading files, exchanging data over the network, etc. So we want to be sure that we can do both.

We have other requirements and although most of the use cases don't need these niceties, it is better to have them for the future (And these are the kind of problems that Rust is intended to solve). Like, for example, our program might also become multithreaded in the future, as we anticipate more cores in the CPU and less improvement in the clock rate.

Things are already starting to confuse; so how about we start with a simple example and the most basic implementation of a configuration struct.

main.rs

#[derive(Debug)]
struct MyConfig {
    debug: bool,
}

fn main() {
    println!("Hello, world!");
    println!("Debugging value: {}", config().debug);
    println!("Rechecking Debugging value: {}", config().debug);
}

fn config() -> MyConfig {
    MyConfig { debug: false }
}

output

Hello, world!
Debugging value: false
Rechecking Debugging value: false

Pretty simple: We use a function named config to return an instance of our configuration. Inside config, we make a new MyConfig instance for each call and then return it. (We could have placed this code in a different file and called it with mod filename; we are going to do that later, so don't worry!).

So far so good. But there is a small issue here: Every time we call the config function, we return a new instance of MyConfig. Practically speaking, it is the same configuration; but technically speaking, it is a different object. Imagine, for instance, that our config function was querying a database for a particular field. The config function will be doing the query every time we call it. This is not good; after all we should only do it one time both for efficiency and consistency.

What we want is a single initialization of our configuration. That initialization is safeguarded somewhere, and then we access it as many times as we want. Basically, a global variable accessible from the config function or any other location in our program.

This time, we will separate our program into two files: main.rs and config.rs. The code could function from within main.rs but the separation of concerns is helpful.

main.rs

mod config;

use config::MyConfig;

fn main() {
    println!("Hello, world!");

    println!("Debug: {}", MyConfig::global_config().debug);
    println!("Rechecking Debug: {}", MyConfig::global_config().debug);
}

config.rs

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

static CONFIG: MyConfig = MyConfig { debug: false };

impl MyConfig {
    pub fn global_config() -> &'static MyConfig {
        &CONFIG
    }
}

Alright, a lot is going on here; especially if you are both new to and confused by the Rust syntax. But don't worry, we will explain it in details. First, we are still using a function to return our configuration; but this time it is attached to the MyConfig struct. Think of a Rust struct like a JavaScript object or in OOP languages like a class where you can add both fields and methods to your struct. The method that returns our configuration is global_config.

Let's go back further, this line deserves some explanation.

static CONFIG: MyConfig = MyConfig { debug: false };

Creating a global variable in Rust is pretty easy. We just use the static keyword. Now, our variable can be available anywhere in our program. In fact, this is what the rust-lang documentation says about the static keyword:

A 'static lifetime is the longest possible lifetime, and lasts for the lifetime of the running program.

Okay, that was a trap. What is a lifetime? Well, you can go with the raw and naive definition: How long you will live. But that definition doesn't seem about our static keyword, or is it? It is also way too simplistic. The rust-lang website will already throw words like borrow checker at you. That's scary, so how about we stick to the raw definition for now: How long the variable will live.

So what's 'static? It is the longest lifetime; which means the entirety of our program. It means our variable will be alive and accessible as long as our program is running. The static keyword doesn't mean our variable is a constant but just that it has a 'static lifetime. A better word is immortal but that doesn't sound geeky and cryptic enough and hence the choice of 'static. If you are still confused, notice that 'static is the name of the lifetime while static is a keyword that gives our variable a 'static lifetime.

Our configuration is initialized once and we declare a global variable named CONFIG. To access it, we use the function global_config implemented in our struct; but since it is a global variable we could access it from anywhere really.

Let's dissect the global_config function. If you are new to Rust, you might have already noticed the gibberish in that function. We are not returning MyConfig in the function signature but adding a bunch of keywords; something must be going on.

To understand what is going on, you might need to understand the Rust Ownership/Borrowing model. Don't worry, we will stick to the naive definition: You can't take/move what you don't own, but you can borrow it. You can't make changes on what you borrow unless you request permissions; and even then you are going to have to follow some rules.

So let's test the constitution. If you run this line inside the main function.

main.rs

let myconfig: MyConfig = config::CONFIG;

On execution, we are going to get the following error.

error[E0507]: cannot move out of static item
 --> src/main.rs:9:30
  |
9 |     let myconfig: MyConfig = config::CONFIG;
  |                              ^^^^^^^^^^^^^^
  |                              |
  |                              cannot move out of static item
  |                              help: consider borrowing here: `&config::CONFIG`

We do not own CONFIG. The ruler, therefore, would not allow moving it but most importantly: There is no way around it. The ruler would also gives us a hint: consider borrowing.

Borrowing is actually pretty cool: We can access the object from where it is located; and that is basically it. But that serves our purposes, right? (Plus Rust doesn't charge us any interest!).

So this code is fine. (again we can run this code on main.rs)

  let myconfig: &MyConfig = &config::CONFIG;
  dbg!(myconfig.debug);

Notice that here we do neither own nor move CONFIG. We are barely referencing it. The type of my myconfig is &MyConfig and not MyConfig. A reference, and not the actual thing. To create a reference, we use an ampersand &.

If you have been successfully following along, you almost did understand what the code in global_config does. Except for one more sketchy keyword: &'static. If we tried to return &MyConfig instead of &'static MyConfig, we will get the following error.

error[E0106]: missing lifetime specifier
 --> src/config.rs:9:31
  |
9 |     pub fn global_config() -> &MyConfig {
  |                               ^ help: consider giving it a 'static lifetime: `&'static`
  |
  = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from

The ruler is asking for a lifetime to our reference. You might be wondering: Why, and what does that mean? And what does a reference mean in the first place?

The Rust Programming Language book has a rather helpful definition.

These ampersands are references, and they allow you to refer to some value without taking ownership of it.

The opposite of referencing by using & is dereferencing, which is accomplished with the dereference operator, *.

Okay, so referencing is actually borrowing. But what is a lifetime? Rust By Example has an average explanation.

A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid. Specifically, a variable's lifetime begins when it is created and ends when it is destroyed. While lifetimes and scopes are often referred to together, they are not the same.

Take, for example, the case where we borrow a variable via &. The borrow has a lifetime that is determined by where it is declared. As a result, the borrow is valid as long as it ends before the lender is destroyed. However, the scope of the borrow is determined by where the reference is used.

What if CONFIG is dropped from memory? Now, we know for certain that this will not happen because CONFIG is immortal but the compiler doesn't know that. In our signature, we are returning a reference to MyConfig and not CONFIG; so it is not clear how long the real object will live for. If that object is dropped, then our reference will fail. We need to ensure that our reference lives at most as the object it is pointing at. Not more.

In my opinion, the hard part about lifetimes is the unreadable gibberish that accompanies it. First, we need to define (or name) a lifetime. We do that before the function signature. We will make use of <,> and an apostrophe; that's pretty clean. To make it more confusing, let's name our lifetime "a" even though we can use a more meaningful names. If everybody on the community does that, it will greatly help beginners: They will think it is some sort of keyword out there. But as we will use it again in the returned value with an apostrophe, they will finally succumb and concede that it is pure wizardry.

Look at this! This code actually runs!

 pub fn global_config<'a>() -> &'a MyConfig {
        let myconfig = MyConfig { debug: true };
        &myconfig
    }

Instead of defining a lifetime, we can assign the 'static lifetime. That makes sense since the item we are borrowing from has a 'static lifetime itself and should live for the entirety of the program. We don't need to define the 'static lifetime though, as it already is. If we replace the 'a with 'static and remove the definition, we will achieve our previous program.


Let's move ahead. Now that your business is growing, your configuration is getting more sophisticated. In order to initialize the configuration, you need to run some tasks (like querying a database or reading from a file). Basically, you want to call a function that returns your configuration instead of having it statically defined.

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

pub static CONFIG: MyConfig = MyConfig::new();

impl MyConfig {
    pub fn new() -> Self {
        MyConfig { debug: false }
    }
    pub fn global_config() -> &'static MyConfig {
        &CONFIG
    }
}

This new code defines a function new; which you might think of as a constructor. It will create a new instance of Myconfig and returns it. Since it is implemented within the MyConfig struct, it is basically returning an instance of itself. Thus, the use of the keyword self.

Unluckily, this doesn't work. The ruler, again, complains.

error[E0015]: calls in statics are limited to constant functions, tuple structs and tuple variants
 --> src/config.rs:6:31
  |
6 | pub static CONFIG: MyConfig = MyConfig::new();
  |                               ^^^^^^^^^^^^^^^

In case you think this is restricting, you need to understand one more idea: Statics are evaluated at compile time; in other words they are hard-coded into the binary. Functions, on the other hand, are only called at run time. This means Rust can't determine the value of our static variable at compile time; and therefore can't compile the program.

This is a major roadblock and there seems to be no way around it. Since it is impossible to assign a value to a non-mutable static variable at run time, we will have to resort to magic. If you are a scientist and don't believe in magic, then this is a good time.

First, we need to add an external dependency named lazy_static to the Cargo.toml file.

Cargo.toml

[dependencies]
lazy_static = "1.3.0

Next, we import the dependency on main.rs.

main.rs

#[macro_use]
extern crate lazy_static;

And now to the magical part, replace the previous definition with the following.

config.rs

lazy_static! {
    pub static ref CONFIG: MyConfig = MyConfig::new();
}

Tada! Now our program works. Pretty cool, heh? It is not clear what kind of magic the lazy_static did but it just works. So that was lucky!


Except things are about to complicate a bit more. A new bureaucrat in the organization demands that he is able to pass the debug value to the configuration constructor. Bureaucrats want control and this particular one is not happy that debug can only be set from within the MyConfig constructor since it is outside of his jurisdiction.

So we change our constructor to allow him to sneak in.

pub fn new(custom_debug: Option<bool>) -> Self {
        // We are just eluding him of power, we don't actually use the custom_debug variable.
        MyConfig { debug: false }
    }

This code is fine but it won't compile because the static variable will require that we pass an optional boolean value. In case you don't know what Options are, this is a good time to introduce them. The bureaucrat is not certain that he wants to pass a debug value all the times. He just wants to have the option in case he feels like it. Rust, however, requires that parameters (declared in the function signature) are fully respected. That is, if we have a debug parameter, it needs to be there.

To manage this bureaucracy, Rust introduces an Option type: It encapsulates the real value if there is one; and if there is none it returns a type named None. Simple. It also has a set of functions to manage that additional bureaucracy overhead.

To make our code compile, we can do the following.

lazy_static! {
    pub static ref CONFIG: MyConfig = MyConfig::new(Some(false));
}

Notice that we are using a function named Some. This is because the value that we need to pass is not a bool but an Option<bool>. The Some function encapsulates our bool inside an Option. So that was easy.

Except that bureaucrats can't access the config.rs file. They need to pass the value from another place: main.rs. Since our CONFIG is not defined there, we need to create a new function to accommodate. This is going to complicate our setup but we have to comply.

Easy, right?

pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {
        lazy_static! {
            pub static ref CONFIG: MyConfig = MyConfig::new(Some(false));
        }

        Ok(())
}

Here we introduce, again, a new type: Result. Result is similar to Option but serves another purpose and the name should be self-explanatory. Result could be either a success and return our result (in this case ()) or a failure and return an error (in this case i32). I'm not going to go through it but you can check the Rust docs.

Unfortunately, this doesn't work. Even though CONFIG has a static (infinite) lifetime, you can't access it outside the init function. In order to be able to do that, you need to return it. But we don't want to do that; and neither that works (We do not own CONFIG).

If you think things started to overly complicate; then hold your breath. In the last example, we were not passing the custom_debug variable; we were just accepting it in the function signature. Now that our bureaucrats found out our misdeed, we need to pass it to the lazy_static! block; but if we do, we get the following error.

error[E0434]: can't capture dynamic environment in a fn item
  --> src/config.rs:12:61
   |
12 |             pub static ref CONFIG: MyConfig = MyConfig::new(custom_debug);
   |                                                             ^^^^^^^^^^^^
   |
   = help: use the `|| { ... }` closure form instead

Great, things are getting out of control; and Rust is throwing an unreadable error message. What is an fn item, a dynamic environment and a closure? That's a lot to take on a single error message and we already have another part of our program non-functioning. The truth is, we are coming to the reality that lazy_static! is not magic. It is just a macro.

What is a macro?

A macro is a way to generate Rust code for you. But what is this macro exactly generating? That could shed some light to what it does and maybe hint to better solutions.

Well, it turns out that it is possible to know that. Using this command, we can output the final code that Rust is compiling.

cargo rustc -- -Z unstable-options --pretty=expanded

If you run this, you will see that a lot is going on. println! is a macro. That means the final code is much more than that single line. For now, we will focus on the code generated by lazy_static!.

#[allow(missing_copy_implementations)]
#[allow(non_camel_case_types)]
#[allow(dead_code)]
pub struct CONFIG {
    __private_field: (),
}
#[doc(hidden)]
pub static CONFIG: CONFIG = CONFIG{__private_field: (),};
impl ::lazy_static::__Deref for CONFIG {
    type
    Target
    =
    MyConfig;
    fn deref(&self) -> &MyConfig {
        #[inline(always)]
        fn __static_ref_initialize() -> MyConfig {
            MyConfig::new(custom_debug)
        }
        #[inline(always)]
        fn __stability() -> &'static MyConfig {
            static LAZY: ::lazy_static::lazy::Lazy<MyConfig> =
                ::lazy_static::lazy::Lazy::INIT;
            LAZY.get(__static_ref_initialize)
        }
        __stability()
    }
}
impl ::lazy_static::LazyStatic for CONFIG {
    fn initialize(lazy: &Self) { let _ = &**lazy; }
}

Oops...

There is a lot going on here. First, lazy_static! is creating a new type (config::MyConfig::init::CONFIG). This can explain this weird error message when we try to return CONFIG.

error[E0308]: mismatched types
  --> src/config.rs:15:12
   |
15 |         Ok(CONFIG)
   |            ^^^^^^ expected struct `config::MyConfig`, found struct `config::MyConfig::init::CONFIG`
   |
   = note: expected type `config::MyConfig`
              found type `config::MyConfig::init::CONFIG`

Cool. We are no longer using the MyConfig struct but another one that lazy_static! invented. It looks like lazy_static! has code that executes our function, gets the returned value and creates a new type that imitates our struct. But don't take my word for it. I just happen to know about Rust as much as you do!


Remind me again: Why are we here?

The thing is, our problem seems to be simple. Only if we were able to change the value of the static variable after declaring it. That will solve all of our headaches. Let us get rid of lazy_static! and possibly enjoy the rest of the day with friends.

Who said we can't?

main.rs

mod config;

use config::MyConfig;

fn main() {
    println!("Hello, world!");

    println!("Debug value: {}", MyConfig::global_config().debug);
    MyConfig::init(Some(true));
    println!("Recheck Debug value: {}", MyConfig::global_config().debug);
}

config.rs

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

pub static mut CONFIG: MyConfig = MyConfig { debug: false };

impl MyConfig {
    pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {
        unsafe {
            CONFIG.debug = custom_debug.unwrap_or(false);
        }
        Ok(())
    }

    pub fn global_config() -> &'static MyConfig {
        unsafe { &CONFIG }
    }
}

output

Hello, world!
Debug value: false
Recheck Debug value: true

Notice a few things here: static means a 'static (infinite) lifetime. It doesn't mean we can't mutate the variable. It is not a constant. We can change it if we add the keyword mut. Rust doesn't consider it safe (Thus the unsafe keyword) but what does Rust know about the life of the streets??! Our program works. There is plastic in the ocean; I mean, on the big scheme of things, this is a minor issue, right?


But this is not what Rust stands for, is it? We need to live up to the hype, but how do we go about that?

We can start by putting this line at the top of our program.

#![forbid(unsafe_code)]

Running the previous code will now result in an error.

error: usage of an `unsafe` block
  --> src/config.rs:10:9
   |
10 | /         unsafe {
11 | |             CONFIG.debug = custom_debug.unwrap_or(false);
12 | |         }
   | |_________^
   |

Now we know what we want. We want a global variable that we can, at a certain point, mutate with external variables. But we must do that safely; and be ready for the future of multiple processors, cores and threads.

Let's start with the global variable mutation. One might think that it is not possible to safely mutate static variables. And that's actually correct which is why the Rust team has come up with some solutions.

Hidden somewhere in one of Rust's types documentation (Cell), we can read the following.

Rust memory safety is based on this rule: Given an object T, it is only possible to have one of the following:

  • Having several immutable references (&T) to the object (also known as aliasing).
  • Having one mutable reference (&mut T) to the object (also known as mutability).

Yes, that's our problem right here.

This is enforced by the Rust compiler. However, there are situations where this rule is not flexible enough. Sometimes it is required to have multiple references to an object and yet mutate it.

Shareable mutable containers exist to permit mutability in a controlled manner, even in the presence of aliasing. bla bla bla...

Cool. That's our solution right there. The Cell! Let's go and implement it.

main.rs

mod config;

use config::MyConfig;

fn main() {
    dbg!(MyConfig::global_config());
}

config.rs

use std::cell::Cell;

#[derive(Debug, Copy, Clone)]
pub struct MyConfig {
    pub debug: bool,
}

pub static CONFIG: Cell<MyConfig> = Cell::new(MyConfig { debug: false });

impl MyConfig {
    pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {
        match custom_debug {
            Some(debug_value) => CONFIG.set(MyConfig { debug: debug_value }),
            None => (),
        };

        Ok(())
    }

    pub fn global_config() -> MyConfig {
        CONFIG.get()
    }
}

output

error[E0277]: `std::cell::Cell<config::MyConfig>` cannot be shared between threads safely
 --> src/config.rs:8:1
  |
8 | pub static CONFIG: Cell<MyConfig> = Cell::new(MyConfig { debug: false });
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `std::cell::Cell<config::MyConfig>` cannot be shared between threads safely
  |
  = help: the trait `std::marker::Sync` is not implemented for `std::cell::Cell<config::MyConfig>`
  = note: shared static variables must have a type that implements `Sync`

This doesn't work but it is important to note that this particular configuration, add a few complications: The Clone and Copy keywords. While they seem harmless now, they might be limiting for your future configurations complexity. In case you are wondering what they are about, we will get to them later on in this article.

But, for now, the gist of it, is that Cell is not thread-safe; whatever that means. We need a type like Cell but that happens to be thread-safe.

Browsing more through the Rust website, we find the atomic types.

Atomic types provide primitive shared-memory communication between threads, and are the building blocks of other concurrent types.

Reading further into the documentation.

Atomic variables are safe to share between threads (they implement Sync) but they do not themselves provide the mechanism for sharing and follow the threading model of rust. The most common way to share an atomic variable is to put it into an Arc (an atomically-reference-counted shared pointer).

Atomic variables looks like they are the building block for higher-level types that might solve our problem. We should not use them (as per the advice of the documentation) and instead use the higher-level abstractions.

Most of the low-level synchronization primitives are quite error-prone and inconvenient to use, which is why the standard library also exposes some higher-level synchronization objects.

These objects are Arc, Barrier, Condvar, Mutex, Once, and RwLock. That's a lot and might mean that the problem at hand is rather hard. Imagine you have two threads executing at the same time: You'd want a solid framework to let them share access to the same variable. That's basically what this mess is about. If we were to have a single-threaded application, things would be much simpler. But we live in a world where quantum tunneling is a real thing!

So which one do we need? We want multiple access to the global variable (configuration). There are several that can fit the bill. Each has its own quirks and limitations.

Let's look for example at Once.

A synchronization primitive which can be used to run a one-time global initialization. Useful for one-time initialization for FFI or related functionality. This type can only be constructed with the ONCE_INIT value or the equivalent Once::new constructor.

This sounds good if we want to do a single, one-time initialization of our configuration variable. But how is the code going to look like and what if we initialize twice?

main.rs

mod config;

use config::MyConfig;

fn main() {
    println!("Hello, world!");

    println!("Debug value: {}", MyConfig::global_config().debug);

    MyConfig::init(Some(true));

    println!("Recheck Debug value: {}", MyConfig::global_config().debug);

    MyConfig::init(Some(false));

    println!("Recheck Debug value: {}", MyConfig::global_config().debug);
}

config.rs

use std::sync::Once;

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

pub static INIT: Once = Once::new();
pub static mut CONFIG: MyConfig = MyConfig { debug: false };

impl MyConfig {
    pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {
        INIT.call_once(|| unsafe {
            match custom_debug {
                Some(debug_value) => CONFIG = MyConfig { debug: true },
                None => (),
            };
        });
        Ok(())
    }

    pub fn global_config() -> &'static MyConfig {
        unsafe { &CONFIG }
    }
}

The program runs and here is the output.

Hello, world!
Debug value: false
Recheck Debug value: true
Recheck Debug value: true

Cool. The init function is functioning as expected. If we recall init twice, however, nothing happens. We can pass variables to our closure. So everything is working just fine. Plus we got rid of the lazy_static crate. One thing to keep out of this mess.

Except, if you have a good eye, you should have noticed something that you probably won't like much (at least if you agreed with our previous monologue). We are using unsafe code!

Well, according to Rust, this should be pretty safe.

// Accessing a `static mut` is unsafe much of the time, but if we do so
// in a synchronized fashion (e.g., write once or read all) then we're
// good to go!

But there is still something bugging me about it: First, we can only change the configuration one time. This might be enough but it is not quite what we want. Second, we are using unsafe blocks of code. While this is somewhat safe, it opens up the possibility that a developer handles the code unsafely. After all, you just opened a can of worms in a controlled environment. That might have risks since the environment might get out of our control as the code base grows and more developers are recruited.

Better keep the can of worms closed if we can afford it. Luckily, we have a wide range of sync objects to choose from. Let's look at RwLock.

This type of lock allows a number of readers or at most one writer at any point in time. The write portion of this lock typically allows modification of the underlying data (exclusive access) and the read portion of this lock typically allows for read-only access (shared access).

Sounds too good to be true? This allows us, basically, to modify the global variables (from a single location) and access it from multiple locations. Well it does come with some complications, however.

The priority policy of the lock is dependent on the underlying operating system's implementation, and this type does not guarantee that any particular policy will be used.

The type parameter T represents the data that this lock protects. It is required that T satisfies Send to be shared across threads and Sync to allow concurrent access through readers. The RAII guards returned from the locking methods implement Deref (and DerefMut for the write methods) to allow access to the content of the lock.

That's some more gibberish out there; but this looks like something we should be able to manage. Let's give it a try.

main.rs

#[macro_use]
extern crate lazy_static;
mod config;

use config::MyConfig;

fn main() {
    MyConfig::global_config();
    MyConfig::init(Some(true));
    MyConfig::global_config();
    MyConfig::init(Some(false));
    MyConfig::global_config();
}

config.rs

use std::sync::RwLock;

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

lazy_static! {
    pub static ref CONFIG: RwLock<MyConfig> = RwLock::new(MyConfig { debug: false });
}

impl MyConfig {
    pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {
        let mut w = CONFIG.write().unwrap();
        match custom_debug {
            Some(debug_value) => *w = MyConfig { debug: debug_value },
            None => (),
        };
        Ok(())
    }

    pub fn global_config()
    {
        let m = CONFIG.read().unwrap();
        dbg!(m);
    }
}

This program requires lazy_static as a dependency. If we cargo run it, we will get the following output.

[src/config.rs:26] m = RwLockReadGuard {
    lock: RwLock {
        data: MyConfig {
            debug: false,
        },
    },
}
[src/config.rs:26] m = RwLockReadGuard {
    lock: RwLock {
        data: MyConfig {
            debug: true,
        },
    },
}
[src/config.rs:26] m = RwLockReadGuard {
    lock: RwLock {
        data: MyConfig {
            debug: false,
        },
    },
}

Looks good? We have a static global variable CONFIG. Inside this global variable is a RwLock<MyConfig> object. We can mutate the inside of the RwLock by using the write function. We do this in the init function of MyConfig; and we can also pass variables.

So everything is good, right? Well, kind of. You might have noticed that we are not returning the CONFIG struct or a reference to it. Instead, we are displaying the content inside the function itself. But that might not be what you want, you might want to access the values from another place and access it directly.

We can try that but let's first understand what CONFIG.read().unwrap() is returning. Using the dbg! command, we are getting the following.

[src/config.rs:26] m = RwLockReadGuard {
    lock: RwLock {
        data: MyConfig {
            debug: false,
        },
    },
}

So it is not our MyConfig. Or, it is our MyConfig but wrapped inside another structure of the type RwLockReadGuard and stored two levels deep (lock and data fields). The documentation for it is the following.

RAII structure used to release the shared read access of a lock when dropped.

This structure is created by the read and try_read methods on RwLock.

That's more gibberish. What is RAII?

Resource acquisition is initialization (RAII)[1] is a programming idiom[2] used in several object-oriented languages to describe a particular language behavior. In RAII, holding a resource is a class invariant, and is tied to object lifetime: resource allocation (or acquisition) is done during object creation (specifically initialization), by the constructor, while resource deallocation (release) is done during object destruction (specifically finalization), by the destructor.

There is another article that has another definition.

All that manual management was unpleasant, to say the least. In the mid-80s, Bjarne Stroustrup invented a new paradigm for his brand-new language, C++. He called it Resource Acquisition Is Initialization, and the fundamental insights were the following: objects can be specified to have constructors and destructors which are called automatically at appropriate times by the compiler, this provides a much more convenient way to manage the memory a given object requires, and the technique is also useful for resources which are not memory.

So that's good. But isn't everything in Rust a RAII structure and what does it mean for our RwLockReadGuard? To see how this unfolds, let's try returning our RwLockReadGuard. We will have to unlock every time we want to access the data but there is a price for everything, right?

main.rs

#[macro_use]
extern crate lazy_static;
mod config;

use config::MyConfig;

fn main() {
    // Initialize the configuration
    MyConfig::init(Some(true));

    // Return 'RwLockReadGuard'
    let a = MyConfig::global_config();

    // Display 'MyConfig'
    dbg!(&*a);
}

config.rs

use std::sync::RwLock;
use std::sync::RwLockReadGuard;

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
}

lazy_static! {
    pub static ref CONFIG: RwLock<MyConfig> = RwLock::new(MyConfig { debug: false });
}

impl MyConfig {
    pub fn init(custom_debug: Option<bool>) -> Result<(), i32> {

        let mut w = CONFIG.write().unwrap();

        match custom_debug {
            Some(debug_value) => *w = MyConfig { debug: debug_value },
            None => (),
        };

        Ok(())
    }

    pub fn global_config() -> RwLockReadGuard<'static, MyConfig> {
        let m: RwLockReadGuard<'static, MyConfig> = CONFIG.read().unwrap();

        m
    }
}

Run the following and we will get.

[src/main.rs:15] &*a = MyConfig {
    debug: true,
}

So that's perfect, right? Let's try to re-initialize our configuration struct.

main.rs

#[macro_use]
extern crate lazy_static;
mod config;

use config::MyConfig;

fn main() {
    // Initialize the configuration
    MyConfig::init(Some(true));

    // Return 'RwLockReadGuard'
    let a = MyConfig::global_config();

    // Display 'MyConfig'
    dbg!(&*a);

    // Reinit MyConfig
    MyConfig::init(Some(false));

    // Return 'RwLockReadGuard'
    let b = MyConfig::global_config();

    // Display 'MyConfig'
    dbg!(&*b);
}

Output

[src/main.rs:15] &*a = MyConfig {
    debug: true,
}

Hmm, not exactly the output we expect. Wait! The program didn't quit. It is stuck there!

We tried to cheat our way out of this complexity by plowing through the different types and trying different things. But, obviously, we hit another roadblock. It'd be better to get an understanding of why our program suddenly halted and never returned. It is like our program is stuck in an infinite loop but there is no loop in our code. Or is there?


Remind me again: Why are we here?

Why are we going through all of this complexity? What is exactly wrong with simple unsafe code. Well, to understand that, you might need to witness the dangers of unsafe code in real life.

First, let's remember that our problem only exists if we are working on multiple threads. This means multiple cores. Let's look at this video from a Udacity online MOOC (High Performance Computer Architecture). I would recommend you take the whole course later as it goes deeper on other related topics; but for now this video will do it.

Synchronization Example

In a nutshell, our problem is due to the fact that two processes trying to access the same memory location (or variable) at the same time can result in a conflict. To ensure consistency, the processor has a set of operations named atomic operations that can guarantee the linearizability of execution. That's why using unsafe code with Once is safe: Once ensures that our operation is linear.

Also, what are atomic operations? Amdan from Stackoverflow has a rather useful explanation. Basically, atomic operations are operations that don't suffer from this conflict. They are executed by the processor directly. That might help explaining some of the gibberish above in the RwLock definition above.

Okay, let's try a couple examples (actually three) and see why unsafe code can give inaccurate results or worse.

In the first example, we show how to use multi-threading by executing two threads together. This example is not related to the configuration code, so we can (and probably should) execute it in a different program.

main.rs

use std::thread;

fn main() {
    // Executes Thread 1
    let thread1 = thread::spawn(|| {
        for i in 1..100 {
            println!("Thread 1");
        }
    });

    // Executes Thread 2
    let thread2 = thread::spawn(|| {
        for i in 1..100 {
            println!("Thread 2");
        }
    });

    // Makes sure our threads have executed and returned
    thread1.join().unwrap();
    thread2.join().unwrap();

    println!("Execution completed");
}

output

...
Thread 2
Thread 2
Thread 2
Thread 1
Thread 1
Thread 1
...
Thread 2
Thread 2
Execution completed

What our program does is launch two threads. Each thread is going to write to the terminal. Thread 1 will write "Thread 1" and Thread 2 will write "Thread 2". Since our threads are executing at the same time, we get a mix of these two lines; and it is not predictable which will happen first.

You might notice here that we are using a new weird sign ||. That denotes a closure. A closure is very similar to an anonymous function with the benefit that it can capture variables from its environment. There are various articles about Rust closures but, for now, if you stick to the idea that they are anonymous functions, then that's enough.

It is important to highlight that these threads are independent from our main thread or program. While these threads start executing as soon as we spawn them, they might not finish before our program exits. So we must force them to finish execution before our program exits with the join function.

Now that we are comfortable running multiple threads, let's do more: Create a global variable and modify it from both threads. This might seem harmless but it is not something that Rust will allow safely without jumping through hoops. So let's start with the simple and direct approach: Unsafe code. We will declare a global mutable variable, and then mutate it from inside each thread. The variable is a 32-bit integer and from inside the threads we will loop and increment this integer.

Same as before, this example can run on its own.

main.rs

use std::thread;

pub static mut INC: i32 = 0;

fn main() {
    let thread1 = thread::spawn(|| {
        for i in 1..1000 {
            unsafe {
                INC = INC + 1;
            }
        }
    });

    let thread2 = thread::spawn(|| {
        for i in 1..1000 {
            unsafe {
                INC = INC + 1;
            }
        }
    });

    thread1.join().unwrap();
    thread2.join().unwrap();

    unsafe {
        println!("{}", INC);
    }
}

Alright, let's execute our program. We do 999 iteration in each thread, so we expect our integer to have the value of 1998.

First execution.

output

1254

Hmm. That's not our expectation, let's try again.

output

1443

Cool. We made a random number generator! (Slightly joking; it would be interesting to study how random these numbers are, though).

But why are our results not consistent? Well, it is the synchronization issue. The two threads are not synchronized and in multiple occasions (more than enough), they increment the value at the exact same time. When that happens, the value is only incremented once.

So how do we fix this problem? Well, this is exactly why the Rust team came up with the synchronization primitives.

Let's try to use RwLock now.

main.rs

#[macro_use]
extern crate lazy_static;

use std::thread;
use std::sync::RwLock;

lazy_static! {
    pub static ref INC: RwLock<i32> = RwLock::new(0);
}

fn main() {
    let thread1 = thread::spawn(|| {
        for i in 1..1000 {
            let mut w = INC.write().unwrap();
            *w = *w + 1;
        }
    });

    let thread2 = thread::spawn(|| {
        for i in 1..1000 {
            let mut w = INC.write().unwrap();
            *w = *w + 1;
        }
    });

    thread1.join().unwrap();
    thread2.join().unwrap();

    let r = INC.read().unwrap();
    println!("{}", *r);
}

output

1998

Instead of mutating the global variable directly, we are mutating our type through RwLock. RwLock ensures that we are using atomic operations. If one thread is incrementing the variable, the other thread has to wait. You should have already noticed the name of RwLock. It is a lock. Go again through the definition and you might understand it more stuff now. That's why in the previous RwLock example our program was stuck. It was waiting for RwLock to be unlocked so that it can access it again.

Alright, let's make use of this knowledge and implement our final solution. For the fun of it, we are going to add a String field. Because, you know, bureaucracy!

main.rs

#[macro_use]
extern crate lazy_static;

mod config;

use config::MyConfig;
use std::ops::Deref;

fn main() {
    // First, we get an RwLockReadGuard
    let guard = MyConfig::global_config();

    // Next, we deref the guard and get MyConfig
    let config = guard.deref();

    // Freely access MyConfig by borrowing it
    dbg!(&config);
    dbg!(&config.info);

    // We drop the guard and unlock the RwLock
    drop(guard);

    // Re-initialize our configuration
    MyConfig::init(Some(true), Some(String::from("Updated")));

    // Go through the same process again
    let guard = MyConfig::global_config();

    let config = guard.deref();

    dbg!(&config);
    dbg!(&config.info);

    drop(guard);
}

config.rs

use std::sync::RwLock;
use std::sync::RwLockReadGuard;

#[derive(Debug)]
pub struct MyConfig {
    pub debug: bool,
    pub info: String,
}

lazy_static! {
    pub static ref CONFIG: RwLock<MyConfig> = RwLock::new(MyConfig { debug: false, info: String::from("Some info") });
}

impl MyConfig {
    pub fn init(custom_debug: Option<bool>, custom_info: Option<String>) -> Result<(), i32> {
        let mut w = CONFIG.write().unwrap();

        match (custom_debug , custom_info) {
            (Some(debug_value), Some(custom_info)) => *w = MyConfig { debug: debug_value, info: custom_info},
            _ => (),
        };

        Ok(())
    }

    pub fn global_config() -> RwLockReadGuard<'static, MyConfig> {
        let m = CONFIG.read().unwrap();

        m
    }
}

output

[src/main.rs:17] &config = MyConfig {
    debug: false,
    info: "Some info",
}
[src/main.rs:18] &config.info = "Some info"
[src/main.rs:32] &config = MyConfig {
    debug: true,
    info: "Updated",
}
[src/main.rs:33] &config.info = "Updated"

Hooray! This works! It is a pain in the _ but it serves our purposes, and it is safe. That should be everything Rust stands for (At least, I'm certain of the pain in the _ part)!

Alright, could this be simpler? We promised above to cover the Copy and Clone keywords, and this is the part where we do. Remember the Ownership/Borrowing concept we talked above at first? That you can't take something you don't own but you can borrow it? But what if you were able to clone it. Well, that makes you able to return the actual type.

There are types that don't derive Copy. We won't go into the details of why and how. Partly because we have already covered a lot on this topic but also because there are lots of good resources on this. The Rust Programming Language book does a good job on that: What is ownership

Let's see how this would work.

main.rs

#[macro_use]
extern crate lazy_static;

mod config;

use config::MyConfig;

fn main() {
    let a = MyConfig::global_config();

    dbg!(a);

    MyConfig::init(Some(true), Some("Updated"));

    let a = MyConfig::global_config();

    dbg!(a);
}

config.rs

use std::sync::RwLock;

#[derive(Debug, Clone, Copy)]
pub struct MyConfig {
    pub debug: bool,
    pub info: &'static str,
}

lazy_static! {
    pub static ref CONFIG: RwLock<MyConfig> = RwLock::new(MyConfig { debug: false, info: "Some info" });
}

impl MyConfig {
    pub fn init(custom_debug: Option<bool>, custom_info: Option<&'static str>) -> Result<(), i32> {
        let mut w = CONFIG.write().unwrap();

        match (custom_debug , custom_info) {
            (Some(debug_value), Some(custom_info)) => *w = MyConfig { debug: debug_value, info: custom_info},
            _ => (),
        };

        Ok(())
    }

    pub fn global_config() -> MyConfig {
        let m = CONFIG.read().unwrap();

        // This doesn't return *m but a copy of it
        *m
    }
}

output

[src/main.rs:11] a = MyConfig {
    debug: false,
    info: "Some info",
}
[src/main.rs:17] a = MyConfig {
    debug: true,
    info: "Updated",
}

That's convenient but you have to keep in mind that we are returning copies of the configuration and not the actual configuration.


I hope this introduction to Rust wasn't overwhelming.

Subscribe to the Newsletter

Get the latest posts from this blog delivered to your inbox. No spam.