Safely Rename Data Frames
Stu Field, SomaLogic Operating Co., Inc.
Source:vignettes/articles/tips-safely-rename-df.Rmd
tips-safely-rename-df.Rmd
Introduction
Renaming variables/features of a data frame (or tibble
)
is a common task in data science. Doing so safely is often a
struggle. This can be achieved safely via the
dplyr::rename()
function via 2 steps:
- Set up the mapping in either a named vector
- Apply the
dplyr::rename()
function via!!!
syntax - Alternatively, roll-your-own
rename()
function
-
Note: all entries in the mapping (i.e. key) object
must be present as
names
in the data frame object.
Example with mtcars
# Create map/key of the names to map
key <- c(MPG = "mpg", CARB = "carb") # named vector
key
#> MPG CARB
#> "mpg" "carb"
# rename `mtcars`
rename(mtcars, !!! key) |> head()
#> MPG cyl disp hp drat wt qsec vs am gear CARB
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
A SomaScan example (example_data
)
Occasionally it might be required to rename AptNames
(seq.1234.56
) -> SeqIds
(1234-56
) when analyzing SomaScan data.
getAnalytes(example_data) |>
head()
#> [1] "seq.10000.28" "seq.10001.7" "seq.10003.15" "seq.10006.25"
#> [5] "seq.10008.43" "seq.10011.65"
# create map (named vector)
key2 <- getAnalytes(example_data)
names(key2) <- getSeqId(key2) # re-name `seq.XXXX` -> SeqIds
key2 <- c(key2, ID = "SampleId") # SampleId -> ID
head(key2, 10L)
#> 10000-28 10001-7 10003-15 10006-25
#> "seq.10000.28" "seq.10001.7" "seq.10003.15" "seq.10006.25"
#> 10008-43 10011-65 10012-5 10013-34
#> "seq.10008.43" "seq.10011.65" "seq.10012.5" "seq.10013.34"
#> 10014-31 10015-119
#> "seq.10014.31" "seq.10015.119"
# rename analytes of `example_data`
getAnalytes(example_data) |>
head(10L)
#> [1] "seq.10000.28" "seq.10001.7" "seq.10003.15" "seq.10006.25"
#> [5] "seq.10008.43" "seq.10011.65" "seq.10012.5" "seq.10013.34"
#> [9] "seq.10014.31" "seq.10015.119"
new <- rename(example_data, !!! key2)
getAnalytes(new) |>
head(10L)
#> [1] "10000-28" "10001-7" "10003-15" "10006-25" "10008-43"
#> [6] "10011-65" "10012-5" "10013-34" "10014-31" "10015-119"
Alternative to dplyr
If you prefer to avoid the dplyr
import/dependency, you
can achieve a similar result with similar syntax by writing your own
renaming function:
rename2 <- function (.data, ...) {
map <- c(...)
loc <- setNames(match(map, names(.data), nomatch = 0L), names(map))
loc <- loc[loc > 0L]
newnames <- names(.data)
newnames[loc] <- names(loc)
setNames(.data, newnames)
}
Now, with similar syntax (but cannot use
!!!
):
# rename `mtcars` in-line
rename2(mtcars, MPG = "mpg", CARB = "carb") |>
head()
#> MPG cyl disp hp drat wt qsec vs am gear CARB
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# rename `mtcars` via named `key`
rename2(mtcars, key) |>
head()
#> MPG cyl disp hp drat wt qsec vs am gear CARB
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1