Getting Started

Strafe is organized into a collection of subcrates that follow the naming convention of r2rs-* or strafe-*.

Each of the r2rs-* crates are translations of their respective R packages (for instance the r2rs-mass crate is a translation of the MASS package in R). These are accumulated in the main r2rs package, allowing those familiar with R's organization to more easily access the functionality they desire in that way.

The strafe-* packages are reexports of the r2rs-* packages in a more streamlined fashion (for example, strafe-datasets combines the datasets found within r2rs-datasets, r2rs-mass, and r2rs-rfit into one place) or provides core functionality not found within any r2rs-* package (like the strafe-consts crate that provides numerical constants not otherwise provided). These are formally collected in the final strafe crate for easy import.

For those unfamiliar with R, the strafe crate is recommended over the r2rs crate. In the following examples we will be using strafe by updating the Cargo.toml as follows.

[dependencies]
strafe = "0.1.0"

Random Number Generation

All functions that use random number generation must be provided with a random number generator (structs that implement the RNG trait). Random number generators are accessible via the numerics export. This allows applications to use multiple generators, to set and reuse seeds for their generators, and allows for multithreading without strange collisions.

By default in R 4.x the Mersenne Twister generator is used. Knowing this, and using the same seed for both R and strafe we can generate the exact same numbers as shown below:

set.seed(1)
rnorm(5)
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
use strafe::distribution::{Distribution, NormalBuilder};  
use strafe::numerics::{MersenneTwister, RNG};  
use strafe::types::FloatConstraint;  
  
fn main() {
	let mut rng = MersenneTwister::new();
	rng.set_seed(1);
	
    let norm = NormalBuilder::new().build();
	let rand_nums = (0..5)
	    .map(|_| norm.random_sample(&mut rng).unwrap())
	    .collect::<Vec<_>>();
	println!("{rand_nums:?}");
}
[-0.6264538, 0.1836433, -0.8356286, 1.5952808, 0.3295078]

Distributions

Strafe provides a number of statistical distributions from R's base, and these distributions can be used to perform comparisons, calculations, and tests. Internally strafe uses these distributions for virtually all its calculations. Each distribution accepts numerics for its arguments (f64, i32, usize, etc.) and checks that these arguments are valid via the Constraint types. For example, the NormalBuilder accepts a mean that is Real (any number that is not a NAN), and a standard deviation that is Positive (any number that is greater than 0 and not NAN). You can find these constraint types in the strafe-type crate.

Below is an example of using the normal distribution to find the upper and lower 75th quantile, and 11 densities between -5 and 5.

use strafe::distribution::{Distribution, NormalBuilder};  
use strafe::types::FloatConstraint;  
  
fn main() {  
    let norm = NormalBuilder::new()  
        .with_mean(0)  
        .with_standard_deviation(1)  
        .build();  
    let lower = norm.quantile(0.95, false);  
    let upper = norm.quantile(0.95, true);  
    println!("{lower} {upper}");  
  
    let densities = (-5..=5)  
        .map(|x| norm.density(x as f64 / 5.0).unwrap())  
        .collect::<Vec<_>>();  
    println!("{densities:#.7?}");  
}
-1.6448536269514715 1.6448536269514715
[
    0.2419707,
    0.2896916,
    0.3332246,
    0.3682701,
    0.3910427,
    0.3989423,
    0.3910427,
    0.3682701,
    0.3332246,
    0.2896916,
    0.2419707,
]

Datasets

Strafe provides a large number of datasets that originate in R packages. These can all be found in the datasets section as functions providing Polars::Dataframe's. Below is an example of accessing the Iris dataset from the R datasets package.

use std::error::Error;  
  
use strafe::datasets::iris;  
  
fn main() -> Result<(), Box<dyn Error>> {  
    let iris = iris()?;  
  
    println!("{iris}");  
  
    Ok(())  
}
shape: (150, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species   │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa    │
│ …            ┆ …           ┆ …            ┆ …           ┆ …         │
│ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ virginica │
│ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ virginica │
│ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ virginica │
│ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ virginica │
│ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ virginica │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

For more information on how to use polars please see the Polars User Guide.

Plotting

Plotting with strafe is provided via plots, with a plotters crate backend. You must provide your own drawing area for the plots, either generated from a plotters backend, or from the evcxr instance. Below is an example of plotting the iris dataset from a binary and from evcxr.

Binary Example

use std::error::Error;  
  
use strafe::datasets::iris;  
use strafe::datasets::polars::prelude::*;  
use strafe::plots::prelude::*;  
  
fn main() -> Result<(), Box<dyn Error>> {  
    let iris = iris()?;  
  
    let x = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .collect()?  
        .column("Sepal.Length")?  
        .f64()?  
        .to_vec()  
        .iter()  
        .flatten()  
        .cloned()  
        .collect();  
  
    let y = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .collect()?  
        .column("Sepal.Width")?  
        .f64()?  
        .to_vec()  
        .iter()  
        .flatten()  
        .cloned()  
        .collect();  
  
    let root = SVGBackend::new("plot.svg", (1024, 768)).into_drawing_area();  
    Plot::new()  
        .with_options(PlotOptions {  
            title: "Iris".to_string(),  
            x_axis_label: "Sepal Length".to_string(),  
            y_axis_label: "Sepal Width".to_string(),  
            ..Default::default()  
        })  
        .with_plottable(Points {  
            x,  
            y,  
            ..Default::default()  
        })  
        .plot(&root)?;  
  
    Ok(())  
}

plot.svg

EVCXR Example

Note: Currently having some issues with EVCXR. In theory it should work just fine, but I'm running into two issues.

  1. The long compile time is really killing the workflow. Tried using sccache, but this doesn't seem to be doing anything for some reason? I dunno, maybe I'm doing something wrong. Further research required.
  2. Plotting isn't actually generating a plot like it's supposed to. It just generates a bunch of text that make up the SVG image without actually displaying the SVG image. Also requires more research.

If you have any ideas, pleas leave a comment on the release notes! Thanks!

For more information on evcxr and how to set up Rust in Jupyter Notebooks, see their repo.

Models

Linear models can be built from a variety of types by using the ModelMatrix intermediate type. This allows us to get rid of some of the bloat from the previous example and create a fit on the iris data.

We can access the data from a fit via methods on the fit. For example below we use the parameters().estimate() methods to get the estimated parameters of the fit and parameters().confidence_interval() to get the confidence interval of each parameter.

use std::error::Error;  
  
use strafe::datasets::iris;  
use strafe::datasets::polars::prelude::*;  
use strafe::tests::two_way::LeastSquaresRegressionBuilder;  
use strafe::traits::{Model, ModelBuilder};  
use strafe::types::ModelMatrix;  
  
fn main() -> Result<(), Box<dyn Error>> {  
    let iris = iris()?;  
  
    let x = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .select(&[col("Sepal.Length")])  
        .collect()?;  
  
    let y = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .select(&[col("Sepal.Width")])  
        .collect()?;  
  
    let mut fit = LeastSquaresRegressionBuilder::new()  
        .with_x(&ModelMatrix::from(x))  
        .with_y(&ModelMatrix::from(y))  
        .build();  
  
    for p in fit.parameters()? {  
        println!("{} {:?}", p.estimate(), p.confidence_interval());  
    }
    
	Ok(())
}
-0.569432673039648 (-2.390925862499916, 1.2520605164206204)
0.7985283006471533 (0.43554707179202234, 1.1615095295022844)

Below is an example of fitting the model, printing the fit, and plotting some of the fit plots.

use std::error::Error;  
  
use strafe::datasets::iris;  
use strafe::datasets::polars::prelude::*;  
use strafe::plots::prelude::*;  
use strafe::tests::two_way::LeastSquaresRegressionBuilder;  
use strafe::traits::ModelBuilder;  
use strafe::types::ModelMatrix;  
  
fn main() -> Result<(), Box<dyn Error>> {  
    let iris = iris()?;  
  
    let x = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .select(&[col("Sepal.Length")])  
        .collect()?;  
  
    let y = iris  
        .clone()  
        .lazy()  
        .filter(col("Species").eq(lit("setosa")))  
        .select(&[col("Sepal.Width")])  
        .collect()?;  
  
    let mut fit = LeastSquaresRegressionBuilder::new()  
        .with_x(&ModelMatrix::from(x))  
        .with_y(&ModelMatrix::from(y))  
        .build();  
  
    println!("{fit}");  
  
    let root = SVGBackend::new("fit.svg", (1024, 768)).into_drawing_area();  
    fit.plot_fit(&root, &Default::default());  
  
    let root = SVGBackend::new("resid_lev.svg", (1024, 768)).into_drawing_area();  
    fit.plot_residual_leverage(&root, &Default::default());  
  
    let root = SVGBackend::new("qq.svg", (1024, 768)).into_drawing_area();  
    fit.plot_quantile_quantile(&root, &Default::default());  
  
    Ok(())  
}
Residuals:
┌─────────┬──────────────┬─────────┬──────────────┬─────────┐
│ Minimum │ 1st Quantile │ Median  │ 3rd Quantile │ Maximum │
╞═════════╪══════════════╪═════════╪══════════════╪═════════╡
│ -0.7239 │ -0.1827      │ -0.0030 │ 0.1573       │ 0.5170  │
└─────────┴──────────────┴─────────┴──────────────┴─────────┘

Coefficients:
┌────┬──────────┬─────────────────────────┬─────────────────────────┬─────────┬────────────┐
│    │ Estimate │ Confidence Interval (L) │ Confidence Interval (U) │ T-Value │ P-Value    │
╞════╪══════════╪═════════════════════════╪═════════════════════════╪═════════╪════════════╡
│ x0 │ -0.5694  │ -2.3909                 │ 1.2520                  │ -1.0914 │ 0.2805     │
│ x1 │ 0.7985   │ 0.4355                  │ 1.1615                  │ 7.6807  │ 6.7098e-10 │
└────┴──────────┴─────────────────────────┴─────────────────────────┴─────────┴────────────┘

Tests:
┌──────────────────────────────┬───────────┬────────────┬────────┐
│                              │ Statistic │ P-Value    │ Alpha  │
╞══════════════════════════════╪═══════════╪════════════╪════════╡
│ Multiple R-squared (Robust)  │ 0.9945    │ 0.0054     │ 0.1500 │
│ Significance of Regression   │ 58.993    │ 6.7098e-10 │ 0.05   │
│ Shapiro-Wilk Normal Residual │ 0.9868    │ 0.8459     │ 0.05   │
└──────────────────────────────┴───────────┴────────────┴────────┘

fit.svg

resid_lev.svg

qq.svg