A simple package for reproducible package management in R. This is different than other approaches to package management such as
renv, by including all-in-one management for packages in R focused around a single function,
Some packages, including those in our PredictiveEcology repository, have many package dependencies. Some of them are on CRAN, but some are still in development and so are hosted elsewhere. Mixing many package dependencies that are constantly evolving creates challenges with standard R package management. For example, what is the best way to move analyses from one machine to another, or set up a series of High Performance Compute nodes? How should we use functions like
install.packages in a reproducible workflow that are clearly intended to be used once or very few times? How do we deal with many packages on GitHub that have many common dependencies? Finally, how do we do all this for many concurrent projects without installing hundreds of packages in a new directory for every project?
Require package attempts to address these issues and others. It is different than
packrat in that it is much simpler and is closer to base R package management.
Require can use hierarchical library paths, as in base R, with many paths in the
.libPaths(), or can set a single library path to be
standAlone. This allows “system” packages to be used as well as “project-specific” packages to be used together, as in base R. It is different than other packages like
renv in that it is focused around a single function,
Require, that can be used in a reproducible workflow.
renv uses a notion of package versions, the “snapshot”, as installed at its foundation; any changes to package versions by a user then updates this snapshot.
renv does not keep this information in the source code of the project.
Require uses the notion of a “snapshot” as a decision to make by a user when it is time to set the package versions. Package versions are primarily updated by the code developer by stating the minimum (or maximum) package version in the source code. This means that by default, projects are somewhat more fluid, defined by no package version if none is required or minimum (or maximum) package versions if required until it is time to freeze it, say when publishing or needing to set up virtual machines with identical setup. From this perspective,
renv is more “top-down”, and
Require is more “bottom-up”, though they can each emulate the other’s behaviour.
We define a reproducible workflow as a workflow that can be run from the start to any point in the project, without having to “skip over” or “comment out” or “jump to” particular lines or chunks of code.
Require does that.
Require is essentially a wrapper around functions that install packages, e.g.,
install_github and one of the main function to load packages,
# install.packages("Require") library(Require) Require("data.table")
And with version numbering:
It is vectorized on package names, and can include mixed github and CRAN, mixed version number and not:
We can keep all packages in an isolated folder for a project, using
standAlone = TRUE
setLibPaths("Project_A_Packages", standAlone = TRUE) .libPaths() # this is just to check what happened in the previous line -- there are 2 folders only Require("data.table (>=1.12.8)")
Or we can use a hybrid of our main, “personal” library and a project specific one for “extra” packages:
setLibPaths("Project_A_Packages", standAlone = FALSE) .libPaths() # we have added a library to original ones on this system Require("data.table (>=1.12.8)")
In the same way as above, we can specify maximum or exact package versions.
Require will retrieve these on CRAN archives.
Because it is vectorized, there can be a long list of packages at the top of a project file, with various sources and version specifications.
library(Require) setLibPaths("ProjectA", standAlone = TRUE) Require(c("data.table (==1.12.8)", "dplyr", "reproducible", "PredictiveEcology/SpaDES@development", "raster (>=3.1.5)"))
When a system is set up with the correct packages and versions, we can take a snapshot and give that file to another person or machine:
Move to a new machine, say
When installing on many machines on a network, having a local cache can speed up installations. Setting
options("Require.RPackageCache" = someSharedDirectory) will turn on local cache. By default, binaries will be saved on Windows. Also by default, binaries will be built on the fly on *nix systems and this binary will be cached for even faster installs later (turned off with
options("Require.RPackageCache" = someSharedDirectory))
Install from CRAN:
Install from GitHub:
#install.packages("devtools") library("devtools") install_github("PredictiveEcology/Require", dependencies = TRUE)
Require package offers a simple, lightweight, package focused around a single function. The package has very few dependencies and so can be used to install packages without interfering with itself.