A data frame is a table of values suitable for data analysis.
Formally, a data frame is a table-like structure made up of three components:
a collection of vectors that contain the data for each column, a set of keys
(called an index) to label the columns,
and an index to label the rows.
Each column may have a different data type. Column values are immutable,
but the collection of columns is not. This means that you can add and remove
columns to and from a data frame. You can even replace a column by a new
one with the same key. But the column vectors themselves can not change.
This means, for example, that any operation that adds or removes rows
returns a new data frame.
A data frame is very similar to a matrix. Both are two-dimensional tables
that can have row and column indexes.
Whereas vectors and matrices are ideally suited for computational tasks,
the emphasis with data frames is on data manipulation and transformation.
This different focus is reflected in the fact that vectors and matrices
are generic over the type of the elements, while data frames are generic
over the type of the row and column keys.
In this section:
Indexes. How to create data frames. Operations on data frames.
Importing and exporting.
The art of transforming data frames into a form
suitable for processing by statistical and machine learning algorithms.
Grouping and Aggregation
One of the core operations on data frames is the grouping
and aggregation of data.
Working with Categorical Data
Variables that can take on only a limited number of values
deserve special treatment.
Working with Time Series Data
Many data sets have observations that are indexed by a date or time value.
This section discusses several enhancements that make working with
such data sets more convenient.