Compute new values
******************


Change
======

   new_table = table.compute([
       ('2000_change', agate.Change('2000', '2001')),
       ('2001_change', agate.Change('2001', '2002')),
       ('2002_change', agate.Change('2002', '2003'))
   ])

Or, better yet, compute the whole decade using a loop:

   computations = []

   for year in range(2000, 2010):
       change = agate.Change(year, year + 1)
       computations.append(('%i_change' % year, change))

   new_table = table.compute(computations)


Percent
=======

Calculate the percentage for each value in a column with "Percent".
Values are divided into the sum of the column by default.

   columns = ('value',)
   rows = ([1],[2],[2],[5])
   new_table = agate.Table(rows, columns)

   new_table = new_table.compute([
       ('percent', agate.Percent('value'))
   ])

   new_table.print_table()

   | value | percent |
   | ----- | ------- |
   |     1 |      10 |
   |     2 |      20 |
   |     2 |      20 |
   |     5 |      50 |

Override the denominator with a keyword argument.

   new_table = new_table.compute([
       ('percent', agate.Percent('value', 5))
   ])

   new_table.print_table()

   | value | percent |
   | ----- | ------- |
   |     1 |      20 |
   |     2 |      40 |
   |     2 |      40 |
   |     5 |     100 |


Percent change
==============

Want percent change instead of value change? Just swap out the
"Computation":

   computations = []

   for year in range(2000, 2010):
       change = agate.PercentChange(year, year + 1)
       computations.append(('%i_change' % year, change))

   new_table = table.compute(computations)


Indexed/cumulative change
=========================

Need your change indexed to a starting year? Just fix the first
argument:

   computations = []

   for year in range(2000, 2010):
       change = agate.Change(2000, year + 1)
       computations.append(('%i_change' % year, change))

   new_table = table.compute(computations)

Of course you can also use "PercentChange" if you need percents rather
than values.


Round to two decimal places
===========================

agate stores numerical values using Python's "decimal.Decimal" type.
This data type ensures numerical precision beyond what is supported by
the native "float()" type, however, because of this we can not use
Python's builtin "round()" function. Instead we must use
"decimal.Decimal.quantize()".

We can use "Table.compute()" to apply the quantize to generate a
rounded column from an existing one:

   from decimal import Decimal

   number_type = agate.Number()

   def round_price(row):
       return row['price'].quantize(Decimal('0.01'))

   new_table = table.compute([
       ('price_rounded', agate.Formula(number_type, round_price))
   ])

To round to one decimal place you would simply change "0.01" to "0.1".


Difference between dates
========================

Calculating the difference between dates (or dates and times) works
exactly the same as it does for numbers:

   new_table = table.compute([
       ('age_at_death', agate.Change('born', 'died'))
   ])


Levenshtein edit distance
=========================

The Levenshtein edit distance is a common measure of string
similarity. It can be used, for instance, to check for typos between
manually-entered names and a version that is known to be spelled
correctly.

Implementing Levenshtein requires writing a custom "Computation". To
save ourselves building the whole thing from scratch, we will lean on
the python-Levenshtein library for the actual algorithm.

   import agate
   from Levenshtein import distance

   class LevenshteinDistance(agate.Computation):
       """
       Computes Levenshtein edit distance between the column and a given string.
       """
       def __init__(self, column_name, compare_string):
           self._column_name = column_name
           self._compare_string = compare_string

       def get_computed_data_type(self, table):
           """
           The return value is a numerical distance.
           """
           return agate.Number()

       def validate(self, table):
           """
           Verify the column is text.
           """
           column = table.columns[self._column_name]

           if not isinstance(column.data_type, agate.Text):
               raise agate.DataTypeError('Can only be applied to Text data.')

       def run(self, table):
           """
           Find the distance, returning null when the input column was null.
           """
           new_column = []

           for row in table.rows:
             val = row[self._column_name]

             if val is None:
                 new_column.append(None)
             else:
                 new_column.append(distance(val, self._compare_string))

           return new_column

This code can now be applied to any "Table" just as any other
"Computation" would be:

   new_table = table.compute([
       ('distance', LevenshteinDistance('column_name', 'string to compare'))
   ])

The resulting column will contain an integer measuring the edit
distance between the value in the column and the comparison string.


USA Today Diversity Index
=========================

The USA Today Diversity Index is a widely cited method for evaluating
the racial diversity of a given area. Using a custom "Computation"
makes it simple to calculate.

Assuming that your data has a column for the total population, another
for the population of each race and a final column for the hispanic
population, you can implement the diversity index like this:

   class USATodayDiversityIndex(agate.Computation):
       def get_computed_data_type(self, table):
           return agate.Number()

       def run(self, table):
           new_column = []

           for row in table.rows:
             race_squares = 0

             for race in ['white', 'black', 'asian', 'american_indian', 'pacific_islander']:
                 race_squares += (row[race] / row['population']) ** 2

             hispanic_squares = (row['hispanic'] / row['population']) ** 2
             hispanic_squares += (1 - (row['hispanic'] / row['population'])) ** 2

             new_column.append((1 - (race_squares * hispanic_squares)) * 100)

           return new_column

We apply the diversity index like any other computation:

   with_index = table.compute([
       ('diversity_index', USATodayDiversityIndex())
   ])


Simple Moving Average
=====================

A simple moving average is the average of some number of prior values
in a series. It is typically used to smooth out variation in time
series data.

The following custom "Computation" will compute a simple moving
average. This example assumes your data is already sorted.

   class SimpleMovingAverage(agate.Computation):
       """
       Computes the simple moving average of a column over some interval.
       """
       def __init__(self, column_name, interval):
           self._column_name = column_name
           self._interval = interval

       def get_computed_data_type(self, table):
           """
           The return value is a numerical average.
           """
           return agate.Number()

       def validate(self, table):
           """
           Verify the column is numerical.
           """
           column = table.columns[self._column_name]

           if not isinstance(column.data_type, agate.Number):
               raise agate.DataTypeError('Can only be applied to Number data.')

       def run(self, table):
           new_column = []

           for i, row in enumerate(table.rows):
               if i < self._interval:
                   new_column.append(None)
               else:
                   values = tuple(r[self._column_name] for r in table.rows[i - self._interval:i])

                   if None in values:
                       new_column.append(None)
                   else:
                       new_column.append(sum(values) / self._interval)

           return new_column

You would use the simple moving average like so:

   with_average = table.compute([
       ('six_month_moving_average', SimpleMovingAverage('price', 6))
   ])
