2: Rust Basics: Syntax and Variables, Compiling programs with Cargo

Now that we have a working Rust installation, it is finally time to dig in and get familiar with Rust as a language. In this lesson, we will gain an overview of Rust's syntax, the usage of variables, and how to work with Cargo to compile our programs.

You may be wondering, why we dedicate an entire lesson to this - the reason is simple, variables are not as trivial in Rust, and they touch upon topics such as pattern-matching, shadowing, and ownership, so there is a plenty of topics we need to introduce at least in in the briefest of terms.

Without a further ado, let's get into Rust syntax. There is no way to really sugar coat it, so let's just go concept by concept.

Comments

Comments in Rust start with //. Multi-line comments can be written between /* and */. Multi-line comments can be nested freely, so you don't have to worry about existing comments if you want to disable a part of code that may already have comments.

#![allow(unused)]
fn main() {
// This is a single-line comment
/* This is a
   multi-line comment */
}

Doc Comments

Rust supports documentation comments that are used to generate external documentation. Doc comments use three slashes /// or //!. Doc comments are formatted in Markdown and may contain images, links and code examples.

#![allow(unused)]
fn main() {
/// This is a doc comment for the following struct.
struct MyStruct {
    // ... we will look at structs in Lesson 4
}

//! This is a module-level doc comment.
}

The general rule is that the /// doc-comment documents the item right underneath it, whereas the //! documents the item it is contained in. When documenting modules declared by file structure, the //! doc comment is the only practical option.

Literals

Literals represent fixed, immutable values, just like in any langauge. In Rust, you have numeric literals, string literals, character literals, and so on.

fn main() {
    let integer = 10;
    let floating_point = 3.14;
    let character = 'a';
    let string = "Hello, Rust!";
    let boolean = true;
}

For numeric types, you can specify the type right at the literal:

fn main() {
    let integer = 10u32;
}

Numbers always have a specific type, which is either the default (u32)

Variable Bindings

Variables in Rust are immutable by default, and you declare them with the let keyword.

fn main() {
    let x = 5; // immutable variable
    let mut y = 10; // mutable variable
    y = 15; // re-assigning mutable variable
}

The type can be specified by including a : Type right after the binding identifier:

fn main() {
    let x: i32 = 5;
    println!("x is: {}", x);
}

We explicitly set the type of the x variable to i32, which is the default, if no other type can be inferred.

NOTE: Rust is statically typed, meaning that types are always concrete and never change during the runtime of the program. If you do not specify a type in one place or another, it will be deduced from the context, the variable binding will not be untyped.

Etymology

Note that in Rust, what are commonly known as variables in many other languages are referred to as "variable bindings". They are called bindings because they bind a name to a value, essentially tying the name to the value, so we can use the name to refer to the value later in the program.

Shadowing

Variable shadowing is another interesting feature of Rust’s variable bindings. Shadowing occurs when we declare a new variable with the same name as a previous variable. The new variable "shadows" the name of the previous variable, meaning the original is no longer accessible and any future use of the variable will refer to the new one.

Here’s an example of shadowing:

fn main() {
    let x = 5;
    println!("x is: {}", x); // prints "x is: 5"

    let x = "Rust";
    println!("x is: {}", x); // prints "x is: Rust"
}

In this example, the first let x binds x to the value 5. The second let x shadows the first, binding x to the value "Rust".

Shadowing is particularly useful when you want to change the type of a variable or declare a new value to the same name immutably. Here's an example where shadowing is used to 'change' the type of a variable:

fn main() {
    let x = "5";
    println!("x is: {}", x); // prints "x is: 5"

    let x: i32 = x.parse().expect("Not a number");
    println!("x is: {}", x); // prints "x is: 5"
}

In this case, the binding x is initially a string. Then, it is shadowed by a new x that is a result of parsing the original string x into an i32. It is a common idiom to repeat the shadowing of the same variable name as you build from inputs to final values (parsing is one such case).

Shadowing allows you to reuse variable names, which can lead to more concise and readable code, but it can also introduce bugs if used carelessly, as the original variable becomes inaccessible. On the other hand, you can use shadowing to impose further restrictions that can help keep your code bug free - shadowing a mutable variable as immutable when you know you will no longer need to mutate it.

For starters, we can take the primitive types, here is a handy table of the ones avaible in Rust and their C equivalents:

Rust TypeNumeric TypeSize (bytes)Corresponding C Type
i8Integer1int8_t
u8Unsigned1uint8_t
i16Integer2int16_t
u16Unsigned2uint16_t
i32Integer4int32_t
u32Unsigned4uint32_t
i64Integer8int64_t
u64Unsigned8uint64_t
i128Integer16__int128_t (GCC)
u128Unsigned16__uint128_t (GCC)
isizeIntegerDependent on the architectureintptr_t
usizeUnsignedDependent on the architectureuintptr_t
f32Floating Point4float
f64Floating Point8double

The size column indicates the size of each type in bytes. isize and usize are architecture-dependent, and the i128 and u128 types may have equivalent C types depending on the compiler used, but you can't find them in the C standard. The f32 and f64 represent floating-point numbers in Rust, corresponding to float and double in C, respectively.

There are other primitive types in Rust, these are the simple ones:

Rust TypeDescription
boolA boolean type representing the values true or false.
charA character type representing a single Unicode character, like 'a'.
strA string slice type, typically used as &str, representing a reference to a UTF-8 encoded string slice.
unitThe unit type () representing an empty tuple, often used to signify that a function doesn’t return any meaningful value.

And then we have three types related to collections:

Rust TypeDescription
tupleA collection of values with different types. The size is fixed at compile-time. For example, (i32, f64, &str).
arrayA collection of values with the same type. The size is fixed at compile-time. For example, [i32; 5].
sliceA dynamically-sized view into a contiguous sequence, [T]. It is more commonly used as a reference, &[T], representing a view into an array or another slice.

Blocks

A block in Rust is a group of statements enclosed within curly braces {}. It can be used to group statements together.

fn main() {
    {
        let x = 10;
        println!("x inside block: {}", x);
    }
}

Much like many things in Rust, a block is an expression and you can use it to produce a value to assign to a variable binding:

fn main() {
    let x = 5;
    let y = {
        let temp = x * 2;
        temp + 1
    };
    println!("y is: {}", y); // prints "y is: 11"
}

In this example, y is assigned the value of the block, which is temp + 1, resulting in 11.

Additionally, Rust allows you to name a block and use the break keyword to exit the block early and specify the value it should result in. Here’s an example:

fn main() {
    let x = 5;
    let y = 'block: {
        if x < 10 {
            break 'block x * 2;
        }
        x + 1
    };
    println!("y is: {}", y); // prints "y is: 10"
}

These features make Rust's block expressions a powerful tool for structuring your code. You can also use these to prevent polluting your namespace with variables that lean into being named similarly. This feature is also great for macros, which often generate block expressions.

Statements

A statement performs an action. In Rust, each statement ends with a semicolon ;.

fn main() {
    let x = 5; // statement
    println!("x is: {}", x); // statement
}

If you ommit the semicolon of the final statement in a block, control flow expression or a function body, it will be considered that block's return value. You can put any statement there, but keep in mind that most will just return Rust's equivalent of void, the empty tuple (), often referred to as unit.

For example, in both Rust and C, assignment is an expression, meaning it evaluates to a value. However, there is a key difference between the two languages in how assignment expressions are handled.

In C, an assignment expression evaluates to the value that was assigned, making it useful in certain scenarios, like conditional statements or within other expressions:

#include <stdio.h>

int main() {
    int x;
    if ((x = 10)) {
        printf("x is: %d\n", x); // prints "x is: 10"
    }
    return 0;
}

In contrast, Rust’s assignment expression always evaluates to the aforementioned unit type (). This means that you can’t use the value of the assignment in the same way you might in C:

fn main() {
    let x;
    if (x = 10) { // This will result in a compile-time error!
        println!("x is: {}", x);
    }
}

This Rust code will not compile because the expression x = 10 evaluates to (), and if expects a boolean expression. The unit type () doesn’t carry any meaningful information, and as such, using assignment as an expression in Rust is not very useful.

In Rust, if you need to assign a value and use it within a condition, you need to separate the assignment and the condition:

fn main() {
    let x;
    x = 10;
    if x == 10 {
        println!("x is: {}", x); // prints "x is: 10"
    }
}

This design choice in Rust encourages more explicit and clear code, reducing the chance of subtle bugs introduced by assignments inside expressions. There is a further reason in that having the same behavior as C could clash with Rust's memory management model, but we will get back to that later.

Tuple Declarations

A tuple is an ordered list of fixed-size elements, possibly of different types.

fn main() {
    let tuple = (1, 2.0, "Rust");
    let (integer, floating_point, string) = tuple; // destructuring a tuple
}

Destructuring a tuple (which in this case creates three separate independent bindings - integer, floating_point and string) is often the most useful way to deal with a tuple.

If you want to preserve the tuple and access its elements as elements of a tuple, you would use the dot syntax with an index:

fn main() {
    let tuple = (1, "hello", 4.5);
    let (x, y, z) = tuple;
    println!("x: {}, y: {}, z: {}", x, y, z);

    // Accessing elements of a tuple
    let first_element = tuple.0;
    let second_element = tuple.1;
    let third_element = tuple.2;

    println!("First element: {}", first_element); // prints "First element: 1"
    println!("Second element: {}", second_element); // prints "Second element: hello"
    println!("Third element: {}", third_element); // prints "Third element: 4.5"
}

In Rust, you can access the elements of a tuple using a dot followed by the index of the value you want to access, starting from 0. So, tuple.0 refers to the first element, tuple.1 to the second, and so on.

This way, you can either destructure the tuple to access its elements, as seen with (x, y, z), or you can use indexing with a dot notation to access individual elements directly.

Array Declarations

An array is a collection of objects of the same type, stored in contiguous memory locations. The length of the array is fixed, and must be known at compile time.

fn main() {
    let array = [1, 2, 3, 4, 5]; // type is [i32; 5]
    let first = array[0]; // accessing array elements
}

These syntax elements, presented from simplest to more complex, provide a good foundational understanding for starting with Rust.

Rust's Standard Library

The Rust Standard Library is the foundation of portable Rust software, a set of minimal and battle-tested shared abstractions. It offers core types, like Vec<T> and Option<T>, library-defined operations on language primitives, standard macros, I/O and multithreading, among many other features.

Finding Documentation

  • Locally: If you have Rust installed via rustup, you can access the local documentation with the following command:

    rustup doc --std
    

    This will open up the documentation in your default web browser.

  • Remotely: The official Rust documentation, including the Standard Library, can be found online at:

    Rust Documentation

Essential Modules in Rust's Standard Library

1. std::io

Handles input and output functionality. Commonly used for reading from and writing to files, stdin, and stdout.

#![allow(unused)]
fn main() {
use std::io;
}

2. std::fmt

Formatting and printing. Contains traits that dictate display and debug print behaviors.

#![allow(unused)]
fn main() {
use std::fmt;
}

3. std::fs

Filesystem operations. Used for reading and writing files, directory manipulation, and more.

#![allow(unused)]
fn main() {
use std::fs;
}

4. std::collections

A module that provides various data structures like HashMap, HashSet, VecDeque, etc.

#![allow(unused)]
fn main() {
use std::collections::HashMap;
}

5. std::error

Error handling utilities. Provides the Error trait, which can be used to define custom error types.

#![allow(unused)]
fn main() {
use std::error::Error;
}

6. std::thread

Multithreading and concurrency. Enables the creation and management of threads.

#![allow(unused)]
fn main() {
use std::thread;
}

7. std::time

Time operations, like measuring durations or obtaining the current time.

#![allow(unused)]
fn main() {
use std::time::{Duration, Instant};
}

8. std::net

Networking operations, including TCP and UDP primitives.

#![allow(unused)]
fn main() {
use std::net::TcpListener;
}

9. std::option and std::result

Enums representing optional values (Option<T>) and potential errors (Result<T, E>). They are fundamental to Rust's error handling and control flow.

#![allow(unused)]
fn main() {
use std::option::Option;
use std::result::Result;
}

10. std::str and std::string

String and string slice types and associated functions.

#![allow(unused)]
fn main() {
use std::str;
use std::string::String;
}

Rust's String Type

While we are already touching upon the topic of strings, we should actually introduce them. Rust has many String types, but for us, only two are important - the str primitive type and the String standard library type.

Unlike the str (aka string slice), a String is growable and allows modification. It's UTF-8 encoded, ensuring any valid String will be properly encoded Unicode data.

Creating a String

  • From a Literal: Use to_string() to create a String from a string literal.

    #![allow(unused)]
    fn main() {
    let my_string = "Hello, world!".to_string();
    }
  • From a String Slice: You can also create it directly from a string slice (str) using the from function.

    #![allow(unused)]
    fn main() {
    let my_string = String::from("Hello, world!");
    }

Manipulating a String

  • Appending: You can append to a String using push_str or push.

    #![allow(unused)]
    fn main() {
    let mut hello = String::from("Hello, ");
    hello.push_str("world!"); // Append a str
    hello.push('!'); // Append a char
    }
  • Concatenation: String can be concatenated using the + operator or the format! macro.

    #![allow(unused)]
    fn main() {
    let hello = String::from("Hello, ");
    let world = "world!";
    let hello_world = hello + world;
    }

    Note: When using +, the left operand gets moved and cannot be used again.

  • Indexing: String does not support indexing directly because it’s encoded in UTF-8, which does not have constant-time indexing.

Converting Between String and str

  • You can create a string slice by referencing a String.

    #![allow(unused)]
    fn main() {
    let string_slice: &str = &my_string;
    }

Unicode and UTF-8 Encoding

  • String holds UTF-8 bytes and ensures the data is valid UTF-8, enabling the representation of a wide range of characters from various languages and symbols.

Accessing Bytes and Characters

  • To iterate over Unicode scalar values (char), use chars():

    #![allow(unused)]
    fn main() {
    for c in my_string.chars() {
      println!("{}", c);
    }
    }
  • To iterate over bytes, use bytes():

    #![allow(unused)]
    fn main() {
    for b in my_string.bytes() {
      println!("{}", b);
    }
    }

Memory and Allocation

  • String is allocated on the heap, and it can dynamically grow or shrink as needed.
  • Memory is automatically reclaimed when String goes out of scope, thanks to Rust’s ownership system and the drop trait.

Useful Methods

  • len(): Get the length in bytes.
  • is_empty(): Check if the String is empty.
  • split_whitespace(): Iterator over words.
  • replace(from, to): Replace a substring.

Where to Find More Information

You can find more details in the Rust documentation:

Homework

For the lesson "Rust Basics: Syntax and Variables", we're building upon the foundational concepts you've learned and applying them to a practical task.

Your assignment is to write a program that reads from standard input, transmutes text according to the provided specification, and prints the result back to the user. The behavior of the program should be modified based on parsed CLI arguments.

Description:

In this (still very simple) exercise, you'll be using Rust's string manipulation capabilities. Here's what you need to do:

  1. Setting up the Crate:

    • Add the slug crate to your Cargo project to help with the slugify feature. To do this, open your Cargo.toml file and under the [dependencies] section, add: slug = "latest_version". (Replace "latest_version" with the most recent version number from crates.io, which is 0.1.4)
    • Once added, you can use it in your project by adding use slug::slugify; at the top of your main Rust file. View the crate's documentation to see how to use it: https://docs.rs/slug/0.1.4/slug/
  2. Read Input:

    • Read a string from the standard input.
  3. Parse CLI Arguments:

    • Based on the provided CLI argument, the program should modify the text's behavior. Use the std::env::args() method to collect CLI arguments:
use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();

    println!("{}", args[0]);
}

Note that the .len() and .is_empty() methods are available on Vector<String> to help you figure out, if you received the necessary parameters.

  1. Transmute Text:
    • If the argument is lowercase, convert the entire text to lowercase.
    • If the argument is uppercase, convert the entire text to uppercase.
    • If the argument is no-spaces, remove all spaces from the text.
    • If the argument is slugify, convert the text into a slug (a version of the text suitable for URLs) using the slug crate.

For one bonus point, try making two additional transformations of your own.

  1. Print Result:
    • Print the transmuted text back to the user.

Hint: For string manipulations, the Rust standard library provides handy methods like:

  • to_lowercase()
  • to_uppercase()
  • replace(" ", "")

Submission:

  • After implementing and testing your program, commit your changes and push the updated code to the GitHub repository you created for the previous homework.
  • Submit the link to your updated GitHub repository on our class submission platform, ensuring your repository remains public for access and review.

Deadline:

  • Please complete and submit this assignment by Tuesday, October 16. Attach the link to the Github repository again to the Google Classroom assignment.

By the end of this exercise, you'll tried out string manipulations, using external crates, and managing your Rust projects with Cargo. Should you face any hurdles or have questions, don't hesitate to ask or consult the Rust documentation. All of these will be super important in the future.

Forge ahead, and happy coding!