Transform
*********


Pivot by a single column
========================

The "Table.pivot()" method is a general process for grouping data by
row and, optionally, by column, and then calculating some aggregation
for each group. Consider the following table:

+-----------+-----------+----------+---------+
| name      | race      | gender   | age     |
|===========|===========|==========|=========|
| Joe       | white     | female   | 20      |
+-----------+-----------+----------+---------+
| Jane      | asian     | male     | 20      |
+-----------+-----------+----------+---------+
| Jill      | black     | female   | 20      |
+-----------+-----------+----------+---------+
| Jim       | latino    | male     | 25      |
+-----------+-----------+----------+---------+
| Julia     | black     | female   | 25      |
+-----------+-----------+----------+---------+
| Joan      | asian     | female   | 25      |
+-----------+-----------+----------+---------+

In the very simplest case, this table can be pivoted to count the
number occurences of values in a column:

   transformed = table.pivot('race')

Result:

+-----------+----------+
| race      | pivot    |
|===========|==========|
| white     | 1        |
+-----------+----------+
| asian     | 2        |
+-----------+----------+
| black     | 2        |
+-----------+----------+
| latino    | 1        |
+-----------+----------+


Pivot by multiple columns
=========================

You can pivot by multiple columns either as additional row-groups, or
as intersecting columns. For example, given the table in the previous
example:

   transformed = table.pivot(['race', 'gender'])

Result:

+-----------+----------+---------+
| race      | gender   | pivot   |
|===========|==========|=========|
| white     | female   | 1       |
+-----------+----------+---------+
| asian     | male     | 1       |
+-----------+----------+---------+
| black     | female   | 2       |
+-----------+----------+---------+
| latino    | male     | 1       |
+-----------+----------+---------+
| asian     | female   | 1       |
+-----------+----------+---------+

For the column, version you would do:

   transformed = table.pivot('race', 'gender')

Result:

+-----------+----------+----------+
| race      | male     | female   |
|===========|==========|==========|
| white     | 0        | 1        |
+-----------+----------+----------+
| asian     | 1        | 1        |
+-----------+----------+----------+
| black     | 0        | 2        |
+-----------+----------+----------+
| latino    | 1        | 0        |
+-----------+----------+----------+


Pivot to sum
============

The default pivot aggregation is "Count" but you can also supply other
operations. For example, to aggregate each group by "Sum" of their
ages:

   transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'))

+-----------+----------+----------+
| race      | male     | female   |
|===========|==========|==========|
| white     | 0        | 20       |
+-----------+----------+----------+
| asian     | 20       | 25       |
+-----------+----------+----------+
| black     | 0        | 45       |
+-----------+----------+----------+
| latino    | 25       | 0        |
+-----------+----------+----------+


Pivot to percent of total
=========================

Pivot allows you to apply a "Computation" to each row of aggregated
results prior to returning the table. Use the stringified name of the
aggregation as the column argument to your computation:

   transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'), computation=agate.Percent('sum'))

+-----------+----------+----------+
| race      | male     | female   |
|===========|==========|==========|
| white     | 0        | 14.8     |
+-----------+----------+----------+
| asian     | 14.8     | 18.4     |
+-----------+----------+----------+
| black     | 0        | 33.3     |
+-----------+----------+----------+
| latino    | 18.4     | 0        |
+-----------+----------+----------+

*Note: actual computed percentages will be much more precise.*

It's helpful when constructing these cases to think of all the cells
in the pivot table as a single sequence.


Denormalize key/value columns into separate columns
===================================================

It's common for very large datasets to be distributed in a
"normalized" format, such as:

+-----------+-------------+-----------+
| name      | property    | value     |
|===========|=============|===========|
| Jane      | gender      | female    |
+-----------+-------------+-----------+
| Jane      | race        | black     |
+-----------+-------------+-----------+
| Jane      | age         | 24        |
+-----------+-------------+-----------+
| ...       | ...         | ...       |
+-----------+-------------+-----------+

The "Table.denormalize()" method can be used to transform the table so
that each unique property has its own column.

   transformed = table.denormalize('name', 'property', 'value')

Result:

+-----------+------------+----------+---------+
| name      | gender     | race     | age     |
|===========|============|==========|=========|
| Jane      | female     | black    | 24      |
+-----------+------------+----------+---------+
| Jack      | male       | white    | 35      |
+-----------+------------+----------+---------+
| Joe       | male       | black    | 28      |
+-----------+------------+----------+---------+


Normalize separate columns into key/value columns
=================================================

Sometimes you have a dataset where each property has its own column,
but your analysis would be easier if all properties were stored
together. Consider this table:

+-----------+------------+----------+---------+
| name      | gender     | race     | age     |
|===========|============|==========|=========|
| Jane      | female     | black    | 24      |
+-----------+------------+----------+---------+
| Jack      | male       | white    | 35      |
+-----------+------------+----------+---------+
| Joe       | male       | black    | 28      |
+-----------+------------+----------+---------+

The "Table.normalize()" method can be used to transform the table so
that all the properties and their values share two columns.

   transformed = table.normalize('name', ['gender', 'race', 'age'])

Result:

+-----------+-------------+-----------+
| name      | property    | value     |
|===========|=============|===========|
| Jane      | gender      | female    |
+-----------+-------------+-----------+
| Jane      | race        | black     |
+-----------+-------------+-----------+
| Jane      | age         | 24        |
+-----------+-------------+-----------+
| ...       | ...         | ...       |
+-----------+-------------+-----------+
