Idiomatic Rust

As with most programming languages, there is many ways to write Rust, however, unlike some other languages, Rust takes a more dogmatic approach to what is considered idiomatic, and what is not.

This is partly due to Rust's philosophy of providing one good way of doing things and making the other, unidiomatic ways, more awkward to do, leading programmers into common patterns.

However, these common patterns and habits might not be immediately obvious to newcomers. Rust's heritage is complicated, its low-level nature puts it alongside C and C++, but the language that it owes most of its inspiration to is OCaml, which is firmly planted in the more high-level, functional paradigm part of the programming language spectrum. While it has a C-like syntax, its semantics are generally unique or inspired by OCaml and Cycle.

The full list of inspirations can be found here: https://doc.rust-lang.org/reference/influences.html

It is quite an extensive list, in which procedural languages are a minority.

In this chapter, we shall name some of the most common patterns to help make your code more idiomatic.

Borrowed types over borrowing an owned type

For a number of Rust's standard library types, there exists two variants of one type:

  • An owned one
  • A borrowed one

The difference is that the owned one has a 'static' lifetime, meaning it is not a reference to another type, and if you do not drop it (such as by letting it go out of scope, dropping it manually with mem::drop(), or including it in an instance of a type which is later dropped), it will live forever, not limited by the lifetime of anything else.

Borrowed types tend to be thinner and often don't require allocations on their own, however, they are what it says on the tin: strongly typed references to some owned type. There is no hard requirement for a 1:1 parity, so one borrowed type can be produced from more owned types, we may consider calling borrowed types views into the owned types in some cases.

We have met at least two explicit examples of borrowed types. With vectors, arrays and other collections the classic borrowed type is a slice, for a collection of Collection<T>, such as Vec<T>, the slice is &[T] or &mut [T] for mutable access.

Especially when you want to accept a parameter in a function or a method, and don't or can't take it by value, prefer using borrowed types of borrowing an owned type.

There is two reason for this:

  • Greater flexibility
  • Adding another level of indirection (The owned type may have an internal allocation it does access internally through a pointer already, which can be exposed as the borrowed type).

See the following example:

#![allow(unused)]
fn main() {
// unidiomatic
fn takes_string(input: &String) {
    // This only accepts a reference to String
}

// idiomatic
fn takes_string2(input: &str) {
    // This accepts:
    // - a reference to String
    // - a String slice
    // - anything that derefs to &str
}
}

Mutability

In Rust, variables (more often called bindings) are immutable by default. This is yet another of Rust's safety measures, as it makes you explicitly say if you want to mutate something at some point. If you mutate something you didn't say you are going to mutate, it may indicate a logic error or a typo in your program, which could have unforeseen consequences at runtime, if it were to go by unnoticed.

If you make variables uselessly mutable, you lose this benefit, however, if you don't disable the warning, Rust is pretty good at indicating when variables are mutable, but do not need to be.

Furthermore, immutable variables can be optimized better, which brings some performance benefits (although it may be marginal in your usecase).

It is a good idea and considered idiomatic to minimize segments of code contain mutable variables, especially if you only need the variable mutable for a very short period (such as some initialization).

The way to do this generally is to use a block with a temporary variable, which is mutable, and then to make it the return value from said block, which you store into an immutable variable.

Here is an example:

#![allow(unused)]
fn main() {
let data = {
    let mut data = get_data();
    data.some_mutating_op();
    // Lieutenant Commander
    data
};
}

As you can see, we limited the mutable sector to its absolute minimum.

Returning

There are two ways of returning a value from a function:

  • with the return keyword, which works the same as in C-like languages
  • implicitly, by making the value the last statement of a particular code block and not terminating it with a semicolon

In idiomatic Rust, the return keyword is only used to indicate early returns, whereas the second method is preferred otherwise.

There is a small semantic difference, which you might have already noticed from the example from the previous point, and that is that, the implicit return only returns from a particular block, not the entire function.

The latter approach is considered idiomatic for the following reasons, which you may debate depending on your programming history:

  • Readability (it's just the value there, so it's clear)
  • Flexibility (you can take the block and make it the value of a variable instead of function body without needing to modify this)
  • It leans better into Rust's "everything is an expression" dogma, which we will look into in the next point

To compare, this code snippets illustrates this point:

#![allow(unused)]
fn main() {
// considered unidiomatic and Clippy will yell at you
fn does_something() -> usize {
    return 42;
}

// perfection itself
fn does_something2() -> usize {
    42
}
}

Expressions

In Rust, everything is an expression. Even things that return Rust's equivalent of the, in mainstream languages well-known, void type, called unit and written as (), are expressions, since unit is a proper value.

You should therefore thing in expressions, and consider control structures expressions also, and use them directly when you want a value:

#![allow(unused)]
fn main() {
let number = if condition { x * 2 } else { x / 2 };

let res = loop {
    break 42;
};

let something_else = if let Some(x) = option {
    x * 145
} else {
    // do something and terminate the program with an error value exit code
    std::process::exit(14)
};

println!("{}", something_else);

let does_it_match = match value_of_enum {
    SomeEnum::Variant1 => "variant1",
    SomeEnum::Variant2 => "variant2",
    SomeEnum::Variant3 => "variant3",
    _ => unreachable!("we expected this eventuality to be unreachable, alas, we were wrong"),
};
}

Of course, expressions which can have multiple arms have to have all arms return a value, and the value has to be of the same type, otherwise, it will not type-check and rustc will proverbially not let you pass.

However, you may have noticed, that we delegated some arms in the previous example to expressions which are clearly not of the same type as the rest, namely std::process::exit(val) and unreachable!().

These are (or internally call) so-called divergent functions, meaning they execution never goes past them. This is indicated by their return type being the ! (pronounced never type), which will type-check with anything.

For all intents and purpose, you can consider the expression of return val; to have the never type as well, since if its called, it terminates execution in a function early, and so it can type-check also.

Iterators

If you have spent any time playing around with functional programming languages, you might have noticed that many of them really don't like loops. This makes sense, since functional programming languages also tend to really not like mutable state, which commonly accompanies loops.

Rust is the same, however, it has fully featured loops, three types of them as a matter of fact, loop, while and for. But in idiomatic Rust, these are seldom used, essentially only for event loops or things which would be too awkward to represent with iterators.

Iterators are otherwise the preferred, and wherever you can and it makes sense to you, you should prefer using iterators over loops.

They generally have equivalent performance, in some cases, Rust can optimize iterators even better than loops.

When writing an operation on an iterator, it is traditional and idiomatic to break down what you are doing into very small operations, even if you were to repeat the same iterator transformation twice or more times in a row. There is no downside, and its more readable.

Also remember that iterators are lazy, and you need to consume them such as by using .collect() or .for_each().

See more in the iterators chapter.

Example:

#![allow(unused)]
fn main() {
let template_files = read_dir("theme")?
    .into_iter()
    .filter_map(|x| x.ok())
    .map(|x| x.path())
    .filter(|x| x.is_file())
    .collect::<Vec<PathBuf>>();
}

This could be written as a single .filter_map and .collect.

TIP: If you are writing an iterator with a .for_each() consumer that is more than, say, five lines long, it should probably be a for loop instead.

Recursion

On the other hand, here is where Rust's love story with FP falls a little short. Recursion is a common pattern in functional programming, such as to supplement the lack of loops, or to model other algorithms.

However, unlike most functional programming languages, Rust does not have Tail Call Optimization, and so, if you use recursion extensively, you will eventually blow your stack and your program will crash due to running out of stack memory.

TCO was deemed to not belong in Rust due to the following reasons:

  • They "play badly" with deterministic destruction
  • They "play badly" with assumptions in C tools, including platform ABIs and dynamic linking
  • They require a calling convention which is a performance hit relative to the C convention
  • We find most cases of tail recursion reasonably well convert to loops or iterators
  • It can make debugging more difficult since it overwrites stack values.

Therefore, do not do recursion if you expect you might nest yourself thousands or more levels deep.

Constructors

Constructors are an established feature in programming languages that know the notion of Object. They typically have special syntax that slightly differs from normal methods and may be the only way of producing an instance of a non-primitive type.

Rust does not have any notion of a constructor, however, certain conventions have been established to imitate them for the time being, and they are named as such:

  • A method called new() taking zero parameters
  • A method called new(params..) taking some parameters, if they are absolutely necessary for the creation of a type
  • A method called from_X(X..) taking some parameters, if it is not possible to implement std::convert::From<X> (maybe because there is more than one param), and a new() exists
  • Implementations of the std::convert::From trait where possible

Keep in mind that the From trait should never fail. If the conversion may fail, use TryFrom.

#![allow(unused)]
fn main() {
/// Time in seconds.
pub struct Second {
    value: u64
}

impl Second {
    // Constructs a new instance of [`Second`].
    // Note this is an associated function - no self.
    pub fn new(value: u64) -> Self {
        Self { value }
    }

    /// Returns the value in seconds.
    pub fn value(&self) -> u64 {
        self.value
    }
}
}

Default and Debug

The Default trait is related to the previous topic. A recommended way to implement the new() constructor is to implement or derive Default and then call it in the method implementation.

If you are fine with the default values for all members of your structure, you can derive the trait, otherwise, implement it manually.

#[derive(Default)]
struct SomeOptions {
    foo: i32,
    bar: f32,
}

impl SomeOptions {
    fn new() -> Self {
        Default::default()
    }
}

fn main() {
    let options: SomeOptions = Default::default();
    let other_options = SomeOptions::new();
}

Another trait that is a good idea to derive or implement for all of your types is Debug. This traits is mentioned in the Standard Library traits chapter. It makes it significantly easier for others to use your type in their own types, when they need this trait implemented as well (if your type is not Debug, the trait cannot be derived).

Struct-update syntax

What if you want to create a struct from another struct of the types, which shares the values of some fields? There are two naive options: clone the the structure and change the fields that need to be different, or create a new instance of the structure and change the fields that need to be the same.

However, Rust has a better tool called the struct update syntax. It uses the double dot operator, you may have seen it already. It is often used in tandem with the Default trait when you want to use default values, but override some.

fn main() {
    let options = SomeOptions { foo: 42, ..Default::default() };
}

You can use it to implement better constructor functions.

Fallible code

If your code can fail, it is a good idea to propagate the error and use the ? (pronounced Try) operator to either continue execution, or propagate error upwards.

To be able to use the try operator, your function either needs to return Result or Option, or another type implementing the Try trait.

More on this in the Options and Results chapter.

Avoid using the legacy deprecated try!() macro. In all but the oldest edition of Rust, try is a reserved keyword.

Concatenating strings

An operation that is done commonly is concatenating strings.

It is possible to build strings using the push and push_str methods on a mutable String, or by using the plus operator. However, in cases where performance is not a concern and/or you have a mix of literal and non-literal strings, using the format! macro is better.

It also saves you from having to explicitly convert convertible types to String.

#![allow(unused)]
fn main() {
let name = get_random_name_as_string_somewhere();
let new_string = format!("The year of Bitcoin is {}. - {}", 2022);
}

Do not use this macro to convert a single value to string, format!("{}", val) is less readable than val.to_string() and it may be much slower at no added benefit.

Builder pattern

Rust does not have any notion of an optional parameter, or in other words, a parameter with a default value. In order to prevent excessive typing, or spamming the Option type everywhere, the Builder pattern is used.

The Builder pattern provides either uses the target type (with default values preset), or another "builder" type that has methods typically named after the fields or configuration values that need to be set.

#![allow(unused)]
fn main() {
struct UserBuilder {
    email: Option<String>,
    first_name: Option<String>,
    last_name: Option<String>
}

impl UserBuilder {
    fn new() -> Self {
        Self {
            email: None,
            first_name: None,
            last_name: None,
        }
    }

    fn email(mut self, email: impl Into<String>) -> Self {
        self.email = Some(email.into());
        self
    }

    fn first_name(mut self, first_name: impl Into<String>) -> Self {
        self.first_name = Some(first_name.into());
        self
    }

    fn last_name(mut self, last_name: impl Into<String>) -> Self {
        self.last_name = Some(last_name.into());
        self
    }

    // there is no consensus on what this method is called
    // it may also just be the method that consumes the type
    //
    // consider a scenario where you are building a type for Email,
    // the consumer may be the send(self) method()
    fn build(self) -> User {
        let Self { email, first_name, last_name } = self;
        User { email, first_name, last_name }
    }
}
}

example kindly adapted from Sergey Potapov

Panicking vs Result/Option

There are two main ways to indicate failure: using the Result and Option type (only use Option if there are no discernible error states you could report to the user of a type/function/library), and using panics.

It is a rookie mistake to conflate panics with exceptions, and using them where exceptions would be used in programming languages that have them. This is considered an anti-pattern.

Panics should be used scarcely, as they signify irrecoverable errors, and may abort() the entire process, or shutdown the thread, if the panic does not propagate beyond the thread it occurred on. So only use them when it is impossible to handle the error, or when you need to type-check in situations where you know some code are is unreachable or some possibly fallible eventuality is actually impossible, but only you know it, Rust doesn't.

#![allow(unused)]
fn main() {
panic!();
panic!("this is a terrible mistake!");
panic!("this is a {} {message}", "fancy", message = "message");
std::panic::panic_any(4);
}

If the main thread panics, it will terminate all your threads and end your program with code 101. If you have set panic = "abort" in your Cargo manifest, stack won't be unwound and you cannot expect values to be dropped and your program to be gracefully terminated.

Strong types

As Pascal Hertleif spoke about in his old but gold Writing Idiomatic Libraries in Rust talk, in idiomatic Rust, you should avoid stringly-typed APIs. More broadly, prefer strong typing, and create types where applicable.

You want to do this so that you make your APIs more expressive (it is easier to discern what things mean by looking at their signatures), and it also helps you delegate some degree of sanity checking on the compiler, or forcing the end user to go through checks that you set in your wrapping type's implementation.

This boils down to three guidelines:

  • use enums over numeric state constants
#![allow(unused)]
fn main() {
// good
enum E {
    Invalid,
    Valid {...}
}

// bad
const ERROR_INVALID: isize = -1;
}
  • use two-variant enums over bools where bool would indicate one of two possible states/values
#![allow(unused)]
fn main() {
// good
enum Visibility {
    Visible,
    Hidden
}

struct MyType {
    // ...
    visible: Visibility
}

// bad
struct MyType {
    visibility: bool,
}
}
  • when representing units:
#![allow(unused)]
fn main() {
// good
struct Voltage(f32);

let voltage = Voltage(14.1);

// bad
let voltage = 14.1;
}

Unsafe

Avoid unsafe where you can. Binding foreign libraries and syscalls is an unavoidable exception. Especially avoid exposing an unsafe API to your library.

Remember that by the definition used by Rust, the following are considered unsafe:

  • thread unsafe operations
  • operations that may void memory safety
  • operations where there exists a combination of parameters that may do one of the above or cause undefined behavior

Matching on Result and Option just to find which variant

A common anti-pattern found in Rust is using the if-let syntax just to find what variant Option or Result is without doing anything with the contained value (of course, this does not apply to Option::None, as it has no contained value).

Prefer using variant detection methods.

#![allow(unused)]
fn main() {
let x: Option<u32> = Some(2);
assert_eq!(x.is_some(), true);

let x: Option<u32> = None;
assert_eq!(x.is_some(), false); // but .is_none() would be true!


let x: Result<i32, &str> = Ok(-3);
assert_eq!(x.is_err(), false);

let x: Result<i32, &str> = Err("Some error message");
assert_eq!(x.is_err(), true);


let x: Result<i32, &str> = Ok(-3);
assert_eq!(x.is_ok(), true);

let x: Result<i32, &str> = Err("Some error message");
assert_eq!(x.is_ok(), false);
}

For all of these that have a contained value, there also exist .is_X_and(pred) variants, that will only return a true if the value inside also matches a predicate defined by a closure.

For custom enums, you have to implement these.

You may also use the matches! macro (may be helpful when implementing said methods).

#![allow(unused)]
fn main() {
let foo = 'f';
assert!(matches!(foo, 'A'..='Z' | 'a'..='z'));

let bar = Some(4);
assert!(matches!(bar, Some(x) if x > 2));
}

Anti-pattern: Cloning to satisfy the borrow checker

When you are new to Rust, fighting with the borrow-checker is a common occurrence. It may be tempting to resolve these issues, which may initially be confusing by just cloning values willy-nilly. Using .clone() causes a copy of the data to be made, requiring a new allocation on stack or on the heap. That is not optimal, and it does not help you learn ownership.

#![allow(unused)]
fn main() {
// define any variable
let mut x = 5;

// Borrow `x` -- but clone it first
let y = &mut (x.clone());

// without the x.clone() two lines prior, this line would fail on compile as
// x has been borrowed
// thanks to x.clone(), x was never borrowed, and this line will run.
println!("{}", x);

// perform some action on the borrow to prevent rust from optimizing this
//out of existence
*y += 1;
}

However, keep on mind that .clone() is the preferred method of propagating Rc/Arc pointers to a type, and it does not actually create a deep copy of the underlying data.

There are also cases where unnecessary .cloning() is acceptable:

  • You are still very new to ownership and boy, we sure as hell don't want to torture you to death all at once, do we
  • The code does not have significant performance or memory constraints, but you need to get things done fast, such as prototyping, hackathon projects or competitions (you can always go back and iron things out).
  • In situations where satisfying the borrow checker is really complex and you prefer to provide better readability over performance

You should run cargo clippy on your code, which is able to detect and eliminate some unnecessary clones.

Anti-pattern: Denying or allowing all warnings

It may also be tempting by well-intentioned authors to ensure that the code builds without warnings. I too have fallen into this trap in the past.

You go and you annotate your crate root with the following.

#![allow(unused)]
#![deny(warnings)]

fn main() {
// And there was peace in the land for forty years
//
// - Judges 5:31 GNT
}

While this is short and it will stop the build if there is anything wrong, it will also stop the build if there is anything wrong. This opts out of Rust's stability, as new versions of the language may (and often do) introduce new warnings, and your crate is suddenly incompatible with these new versions.

New compile errors are introduced as well, yes, but they too have a certain grace period before being turned to deny.

If you want to do this in your CI/CD pipeline, use the RUSTFLAGS="-D warnings" env variable.

Otherwise, consider denying exact warnings, for example:

#![allow(unused)]
#![deny(bad_style,
fn main() {
       const_err,
       dead_code,
       improper_ctypes,
       non_shorthand_field_patterns,
       no_mangle_generic_items,
       overflowing_literals,
       path_statements,
       patterns_in_fns_without_body,
       private_in_public,
       unconditional_recursion,
       unused,
       unused_allocation,
       unused_comparisons,
       unused_parens,
       while_true)]
}

You can also use deny to enforce a degree of style:

#![allow(unused)]
#![deny(missing_debug_implementations,
fn main() {
       missing_docs,
       trivial_casts,
       trivial_numeric_casts,
       unused_extern_crates,
       unused_import_braces,
       unused_qualifications,
       unused_results)]
}

Using outdated Rust

Sometimes you can't avoid it (such as if it is imposed on you by the environment you work in, or a particular tool, or missing platform support), so you are out of luck, but if you can, make sure to use the latest language edition and latest Rust version to be able to fully utilize the tools Rust provides you.

Newer versions of Rust may make your code more performant, less verbose and may contain security fixes.

Anti-pattern: Deref polymorphism

As elaborated in the chapter on advanced trait usage, Rust does not have type inheritance, so it may be tempting to simulate it by using Deref to imitate a sort of polymorphism.

class Foo {
    void m() { ... }
}

class Bar extends Foo {}

public static void main(String[] args) {
    Bar b = new Bar();
    b.m();
}

You could emulate it using the anti-pattern as such:

use std::ops::Deref;

struct Foo {}

impl Foo {
    fn m(&self) {
        //..
    }
}

struct Bar {
    f: Foo,
}

impl Deref for Bar {
    type Target = Foo;
    fn deref(&self) -> &Foo {
        &self.f
    }
}

fn main() {
    let b = Bar { f: Foo {} };
    b.m();
}

This is an anti-pattern because it goes against the Rust philosophy of approaching object oriented programming, but more exactly, it is abusing the Deref trait to do something it was not intended for.

It is also not a trivial replacement, since traits implemented by Foo are not automatically implemented for Bar, so this pattern interacts badly with bounds checking and thus generic programming.

There is no good alternative for this, you are out of luck, and have to learn how to use composition over inheritance instead.

The project: Making a static website generator idiomatic

For this project, you will take a minimalistic static site generator, I have written and made significantly less idiomatic, and make it idiomatic to your best ability.

Clone the project from here: https://gitlab.ii.zone/lho/hyper-rat-idiomatic-rust

To figure out if you have broken it, go into the test-site folder and run cargo-run.

This will build HTML in the test-site/build folder.

Either open the html files manually or host them with a simple HTTP server such as simple-http-server -p 7777 --index.

It should look like this:

index.html

idiomatic first

second.html

idiomatic first

third.html

idiomatic first