I have an R dataframe. Several columns have binary values (e.g. ‘Y’ or ‘N’). Some fields in these binary columns had NULLs (NA
) values. I wanted to change the NULLs to ‘N’.
I thought the task was obvious: just use replace_na
, right?
No. I tried replace_na
and is.na
and mutate...coalesce...
. Nothing took. I didn’t get errors, but the NULLs remained. I tried banging my head on the desk, but that didn’t help, either.
Finally, while testing with a small, dummy test set, I finally got some errors and, therefore, a clue to the problem.
Long story short, I needed to convert the columns from factor
to character
datatypes before trying to replace the NULLs. After converting the datatypes, NULL replacement worked just fine.
Try it yourself …
Create test
# create test data.frame
tdf <- data.frame(col1=letters[1:3], col2=c(NA, "Y", NA))
# view data.frame
tdf
col1 col2 <fctr> <fctr> a NA b Y c NA 3 rows
Problem
# replace NA's with 'N'
tdf$col2 <- tdf$col2 %>% replace_na('N')
invalid factor level, NA generated
Solution
# convert column to character before replace_na, then back to factor
tdf$col2 <- as.factor(as.character(tdf$col2) %>% replace_na('N'))
# display results
tdf
col1 col2 <fctr> <fctr> a N b Y c N 3 rows
Leave a Reply