Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).

Creating

data_frame() is a nice way to create data frames. It encapsulates best practices for data frames:

Coercion

To complement data_frame(), tibble provides as_data_frame() to coerce objects into tibbles. Generally, as_data_frame() methods are much simpler than as.data.frame() methods, and in fact, it’s precisely what as.data.frame() does, but it’s similar to do.call(cbind, lapply(x, data.frame)) - i.e. it coerces each component to a data frame and then cbinds() them all together.

as_data_frame() has been written with an eye for performance:

l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters

microbenchmark::microbenchmark(
  as_data_frame(l),
  as.data.frame(l)
)
#> Unit: microseconds
#>              expr      min        lq      mean   median       uq      max
#>  as_data_frame(l)  354.722  391.6325  528.6779  429.110  459.535 4979.859
#>  as.data.frame(l) 2683.058 3159.3130 3606.2274 3219.353 3504.535 8933.009
#>  neval cld
#>    100  a 
#>    100   b

The speed of as.data.frame() is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.

Tibbles vs data frames

There are two key differences between tibbles and data frames: printing and subsetting.

Printing

When you print a tibble, it only shows the first ten rows and all the columns that fit on one screen. It also prints an abbreviated description of the column type:

data_frame(x = 1:1000)
#> Source: local data frame [1,000 x 1]
#> 
#>        x
#>    <int>
#> 1      1
#> 2      2
#> 3      3
#> 4      4
#> ..   ...

You can control the default appearance with options:

Subsetting

Tibbles are quite strict about subsetting. [ always returns another tibble. Contrast this with a data frame: sometimes [ returns a data frame and sometimes it just returns a single column:

df1 <- data.frame(x = 1:3, y = 3:1)
class(df1[, 1:2])
#> [1] "data.frame"
class(df1[, 1])
#> [1] "integer"

df2 <- data_frame(x = 1:3, y = 3:1)
class(df2[, 1:2])
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df2[, 1])
#> [1] "tbl_df"     "tbl"        "data.frame"

To extract a single column use [[ or $:

class(df2[[1]])
#> [1] "integer"
class(df2$x)
#> [1] "integer"

Tibbles are also stricter with $. Tibbles never do partial matching, and will throw an error if the column does not exist:

df <- data.frame(abc = 1)
df$a
#> [1] 1

df2 <- data_frame(abc = 1)
df2$a
#> Error: Unknown column 'a'