Emulate SQL
***********

agate's command structure is very similar to SQL. The primary
difference between agate and SQL is that commands like "SELECT" and
"WHERE" explicitly create new tables. You can chain them together as
you would with SQL, but be aware each command is actually creating a
new table.

Note:

  All examples in this section use the PostgreSQL dialect for
  comparison.

If you want to read and write data from SQL, see From a SQL database.


SELECT
======

SQL:

   SELECT state, total FROM table;

agate:

   new_table = table.select(['state', 'total'])


WHERE
=====

SQL:

   SELECT * FROM table WHERE LOWER(state) = 'california';

agate:

   new_table = table.where(lambda row: row['state'].lower() == 'california')


ORDER BY
========

SQL:

   SELECT * FROM table ORDER BY total DESC;

agate:

   new_table = table.order_by(lambda row: row['total'], reverse=True)


DISTINCT
========

SQL:

   SELECT DISTINCT ON (state) * FROM table;

agate:

   new_table = table.distinct('state')

Note:

  Unlike most SQL implementations, agate always returns the full row.
  Use "Table.select()" if you want to filter the columns first.


INNER JOIN
==========

SQL (two ways):

   SELECT * FROM patient, doctor WHERE patient.doctor = doctor.id;

   SELECT * FROM patient INNER JOIN doctor ON (patient.doctor = doctor.id);

agate:

   joined = patients.join(doctors, 'doctor', 'id', inner=True)


LEFT OUTER JOIN
===============

SQL:

   SELECT * FROM patient LEFT OUTER JOIN doctor ON (patient.doctor = doctor.id);

agate:

   joined = patients.join(doctors, 'doctor', 'id')


FULL OUTER JOIN
===============

SQL:

   SELECT * FROM patient FULL OUTER JOIN doctor ON (patient.doctor = doctor.id);

agate:

   joined = patients.join(doctors, 'doctor', 'id', full_outer=True)


GROUP BY
========

agate's "Table.group_by()" works slightly different than SQLs. It does
not require an aggregate function. Instead it returns "TableSet". To
see how to perform the equivalent of a SQL aggregate, see below.

   doctors = patients.group_by('doctor')

You can group by two or more columns by chaining the command.

   doctors_by_state = patients.group_by('state').group_by('doctor')


HAVING
======

agate's "TableSet.having()" works very similar to SQL's keyword of the
same name.

   doctors = patients.group_by('doctor')
   popular_doctors = doctors.having([
       ('patient_count', Count())
   ], lambda t: t['patient_count'] > 100)

This filters to only those doctors whose table includes at least 100
results. Can add as many aggregations as you want to the list and each
will be available, by name in the test function you pass.

For example, here we filter to popular doctors with more an average
review of at least three stars:

   doctors = patients.group_by('doctor')
   popular_doctors = doctors.having([
       ('patient_count', Count()),
       ('average_stars', Average('stars'))
   ], lambda t: t['patient_count'] > 100 and t['average_stars'] >= 3)


Chain commands together
=======================

SQL:

   SELECT state, total FROM table WHERE LOWER(state) = 'california' ORDER BY total DESC;

agate:

   new_table = table \
       .select(['state', 'total']) \
       .where(lambda row: row['state'].lower() == 'california') \
       .order_by('total', reverse=True)

Note:

  Chaining commands in this way is sometimes not a good idea. Being
  explicit about each step can lead to clearer code.


Aggregate functions
===================

SQL:

   SELECT mean(age), median(age) FROM patients GROUP BY doctor;

agate:

   doctors = patients.group_by('doctor')
   patient_ages = doctors.aggregate([
       ('patient_count', agate.Count()),
       ('age_mean', agate.Mean('age')),
       ('age_median', agate.Median('age'))
   ])

The resulting table will have four columns: "doctor", "patient_count",
"age_mean" and "age_median".
