Joining strings with missing data together in R

programming

strings

Author

Ben Harrap

Published

September 24, 2025

Joining multiple string columns together in R isn’t a particularly tricky thing to do. Until recently though, I found it unreasonably annoying trying to join strings where missing data is present. For example, let’s say we’ve got the following address data:

library(tidyverse)
addresses <- tibble(
  unit = c("Shop 4","Unit 185", NA, NA),
  street = c("2 Frencham Pl","20 Allara St","1 McCoy Cct","130 Garran Rd Acton ACT 2601"),
  suburb = c("Downer","Canberra","Acton", NA),
  postcode = c("2602","2601","2601", NA)
)

The first two have a value for each element, while the third address doesn’t have a unit, and the fourth for some reason has all the elements in street. What I want to do is combine all of these address elements into a single string and ignore missing values. Let’s review the ways we can go about this.

`tidyr::unite()`

I’ll cut to the chase – tidyr::unite() is perfect for this:

addresses |>
  bind_cols(
    unite(
      data = addresses,
      col = "address",
      sep = " ",
      na.rm = TRUE
    )
  )

# A tibble: 4 × 5
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <chr>                 
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601     1 McCoy Cct Acton 2601
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>     130 Garran Rd Acton A…

I only recently learned about unite() (thanks anonymous cat data scientist!) and was surprised to find it in tidyr and not in stringr. Anyway, it’s perfect!

`base::paste()`

We could use paste(), except that it includes NA as the letters NA in the output, which is not what I want.

addresses |> 
  mutate(
    address = paste(
      unit, street, suburb, postcode,
      sep = " "
      )
  )

# A tibble: 4 × 5
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <chr>                 
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601     NA 1 McCoy Cct Acton …
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>     NA 130 Garran Rd Acto…

We could do some cleaning to overcome this issue:

addresses |>
  replace_na(
    replace = list(unit = "", suburb = "", postcode = "")
  ) |>
  mutate(
    address = paste(
      unit, street, suburb, postcode,
      sep = " "
    )
  )

# A tibble: 4 × 5
  unit       street                       suburb     postcode address           
  <chr>      <chr>                        <chr>      <chr>    <chr>             
1 "Shop 4"   2 Frencham Pl                "Downer"   "2602"   "Shop 4 2 Frencha…
2 "Unit 185" 20 Allara St                 "Canberra" "2601"   "Unit 185 20 Alla…
3 ""         1 McCoy Cct                  "Acton"    "2601"   " 1 McCoy Cct Act…
4 ""         130 Garran Rd Acton ACT 2601 ""         ""       " 130 Garran Rd A…

But the empty strings are now causing excessive separators to be included. We can fix this with str_trim() and str_squish(), but it’s getting silly at this point. Just use unite()!

addresses |>
  replace_na(
    replace = list(unit = "", suburb = "", postcode = "")
  ) |>
  mutate(
    address = str_trim(
      paste(
        unit, street, suburb, postcode,
        sep = " "
      )
    )
  )

# A tibble: 4 × 5
  unit       street                       suburb     postcode address           
  <chr>      <chr>                        <chr>      <chr>    <chr>             
1 "Shop 4"   2 Frencham Pl                "Downer"   "2602"   Shop 4 2 Frencham…
2 "Unit 185" 20 Allara St                 "Canberra" "2601"   Unit 185 20 Allar…
3 ""         1 McCoy Cct                  "Acton"    "2601"   1 McCoy Cct Acton…
4 ""         130 Garran Rd Acton ACT 2601 ""         ""       130 Garran Rd Act…

`base::sprintf()`

Another base option is to use sprintf(), which seems like we’d still need to do some extra work to handle the NA.

addresses |> 
  mutate(
    address = sprintf(
      "%s %s %s %s",
      unit, street, suburb, postcode
      )
  )

# A tibble: 4 × 5
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <chr>                 
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601     NA 1 McCoy Cct Acton …
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>     NA 130 Garran Rd Acto…

`stringr::str_c()`

We could use str_c() but we’ll have the same issues as we did with paste. Notice though that str_c returns NA if any of the input strings are NA.

addresses |> 
  mutate(
    address = str_c(
      unit, street, suburb, postcode,
      sep = " "
      )
  )

# A tibble: 4 × 5
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <chr>                 
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601     <NA>                  
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>     <NA>

`stringr::str_flatten()`

Interestingly, we can achieve the same result as unite() if we perform str_flatten() on a row-wise basis. I doubt this is a good idea though, as row-wise stuff tends to get very expensive, very quickly.

addresses |> 
  rowwise() |> 
  mutate(
    address = str_flatten(
      c(unit, street, suburb, postcode),
      collapse = " ",
      na.rm = TRUE
      )
  )

# A tibble: 4 × 5
# Rowwise: 
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <chr>                 
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601     1 McCoy Cct Acton 2601
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>     130 Garran Rd Acton A…

`glue::glue()`

glue is another option and arguably works better than paste or str_c because of the .na option letting us handle the NA within the function itself.

library(glue)
addresses |> 
  mutate(
    address = glue(
      "{unit}","{street}","{suburb}","{postcode}",
      .na = "",
      .sep = " "
      )
  )

# A tibble: 4 × 5
  unit     street                       suburb   postcode address               
  <chr>    <chr>                        <chr>    <chr>    <glue>                
1 Shop 4   2 Frencham Pl                Downer   2602     Shop 4 2 Frencham Pl …
2 Unit 185 20 Allara St                 Canberra 2601     Unit 185 20 Allara St…
3 <NA>     1 McCoy Cct                  Acton    2601      1 McCoy Cct Acton 26…
4 <NA>     130 Garran Rd Acton ACT 2601 <NA>     <NA>      130 Garran Rd Acton …

Again though, we’d need to trim the leading whitespace created.

Summary

There’s not much to say except use unite()!