Developing R packages

OECD Stats Day 2023

María Paula Caldas

Economics Department

R/Python/Algobank CoP

Introduction

Motivation and prerequisites

What is an R package?

A package bundles together code, data, and documentation in a format that is easy to share with others

# R makes it easy to install and use packages from CRAN or other repositories
install.packages("ggplot2")

library(ggplot2)

# Some packages contain data, which can be documented
data(package = "ggplot2")
?ggplot2::diamonds

# Beyond object documentation, packages can include short articles to 
# describe broader functionality
vignette("ggplot2-specs")

ggplot(diamonds, aes(x, y)) + 
  geom_bin_2d(show.legend = FALSE) +
  scale_y_log10()

Why package your R code?

It’s easy for users!

  • Most already know install.package() and library()
  • {remotes} makes it easy to install and build packages hosted on code sharing platforms
  • You can publish packages to CRAN, to the r-universe or to internal package repositories (e.g.  Sonatype Nexus, coming to the OECD)

Why package your R code?

As a developer, adopting a package infrastructure helps you iterate faster and create more robust code

  • Clearly stating your dependencies
  • Separating development from deployment
  • Providing a framework to create unit tests and do automated checks
  • Giving the ability to version your releases

Why package your R code?

It gives you a set of tools to help people understand your code and learn how to interact with it.

  • Concise documentation of functions, data and examples
  • Vignettes, for long-form documentation
  • Automatically create package websites with {pkgdown}
  • Conventions to communicate NEWS.md and for CONTRIBUTING.md

What do you need to start building R packages?

  • Be curious and willing!
  • Enough to create a function
  • is not a must, but a very nice-to-have
  • Some markdown, to write documentation

Let’s create a package

 

Posit Cloud Project

https://posit.cloud/content/7235458

1 Set-up the basic infrastructure


library(usethis)
library(devtools)

create_package("location-i-want/mypackage")
  • What happens when you run create_package()?
  • What files do you see?
  • Open the DESCRIPTION file and edit some fields
  • Run devtools::check() What do you see?

2 Create a function, and use it!


  • Open a file to write your function

    use_r("name-of-your-file")
  • Write a small function

    If you need a little inspiration
    year_progress <- function(date, is_leap_year = FALSE) {
      nominator   <- as.numeric(format(date, "%j"))
      denominator <- if (is_leap_year) 366 else 365
      share <- round(nominator * 100 / denominator)
      message(share, "% of the year is done!")
    }
  • When you are done, go to the Console and type devtools::load_all() Press ENTER. What happens?

3 Document


  • Place your cursor in the body of your function
  • Navigate to Code > Insert Roxygen Skeleton
  • Fill the documentation, and save
  • In the console, run devtools::document()
  • See the documentation you just wrote with ?yourfun

Internal vs. exported functions

  • @export identifies user-facing functions i.e. functions available to your users when they load your library, or call a function with ::

    usethis::create_package
  • Other functions are internal, there to help you break down your logic into smaller functions that are easier to test, but which may not be of interest to users

    # eval: false
    usethis:::user_path_prep

4 Test


Unit tests are deliberate tests we perform whilst developing a package to monitor the correct functioning of our code

  • What would make a good unit test for the function below?

    year_progress <- function(date, is_leap_year = FALSE) {
      nominator   <- as.numeric(format(date, "%j"))
      denominator <- if (is_leap_year) 366 else 365
      share <- round(nominator * 100 / denominator)
      message(share, "% of the year is done!")
    }
  • Create a test with use_test() and test it with test()

5 Install


The final step before deployment is to install your package

install()

If you are sharing with users, also consider increasing its version, and documenting the main changes in the NEWS.md file.

use_news_md()
use_version()

Sharing with other developers


{usethis} makes it easy to set up version control

use_git()
use_github()

{oecdthis} aims to do the same for OECD staff

use_git()
use_algobank()

Lessons

What has been the experience in ECO?


The ECO Data Platform is an ambitious IT project to migrate core databases and programs into open-source, code-first solutions.

A key part of this platform includes an ecosystem of R packages, helping statisticians and analysts connect, curate and interact with our databases

What has been the experience in ECO?


Advantages of using packages

  • Better documentation and examples to statisticians
  • Better versioning of code and deployment cycles.
  • Reducing dependencies and NAMESPACE conflicts in analysis code

Where we are trying to improve

  • Defining a framework for governance and maintenance
  • Upskilling staff in Git, R and good practice

Thank you!

Any questions?