Data Analysis Library User's Guide

A data frame is a table of values suitable for data analysis. Formally, a data frame is a table-like structure made up of three components: a collection of vectors that contain the data for each column, a set of keys (called an index) to label the columns, and an index to label the rows.

Each column may have a different data type. Column values are immutable, but the collection of columns is not. This means that you can add and remove columns to and from a data frame. You can even replace a column by a new one with the same key. But the column vectors themselves can not change. This means, for example, that any operation that adds or removes rows returns a new data frame.

A data frame is very similar to a matrix. Both are two-dimensional tables that can have row and column indexes. Whereas vectors and matrices are ideally suited for computational tasks, the emphasis with data frames is on data manipulation and transformation. This different focus is reflected in the fact that vectors and matrices are generic over the type of the elements, while data frames are generic over the type of the row and column keys.

Overview

In this section:

  • Data Frames Indexes. How to create data frames. Operations on data frames. Importing and exporting.

  • Data wrangling The art of transforming data frames into a form suitable for processing by statistical and machine learning algorithms.

  • Grouping and Aggregation One of the core operations on data frames is the grouping and aggregation of data.

  • Working with Categorical Data Variables that can take on only a limited number of values deserve special treatment.

  • Working with Time Series Data Many data sets have observations that are indexed by a date or time value. This section discusses several enhancements that make working with such data sets more convenient.