33
44Comparison with SQL
55********************
6- Since many potential pandas users have some familiarity with
7- `SQL <http://en.wikipedia.org/wiki/SQL >`_, this page is meant to provide some examples of how
6+ Since many potential pandas users have some familiarity with
7+ `SQL <http://en.wikipedia.org/wiki/SQL >`_, this page is meant to provide some examples of how
88various SQL operations would be performed using pandas.
99
10- If you're new to pandas, you might want to first read through :ref: `10 Minutes to Pandas<10min> `
10+ If you're new to pandas, you might want to first read through :ref: `10 Minutes to Pandas<10min> `
1111to familiarize yourself with the library.
1212
1313As is customary, we import pandas and numpy as follows:
@@ -17,8 +17,8 @@ As is customary, we import pandas and numpy as follows:
1717 import pandas as pd
1818 import numpy as np
1919
20- Most of the examples will utilize the ``tips `` dataset found within pandas tests. We'll read
21- the data into a DataFrame called `tips ` and assume we have a database table of the same name and
20+ Most of the examples will utilize the ``tips `` dataset found within pandas tests. We'll read
21+ the data into a DataFrame called `tips ` and assume we have a database table of the same name and
2222structure.
2323
2424.. ipython :: python
@@ -44,7 +44,7 @@ With pandas, column selection is done by passing a list of column names to your
4444
4545 tips[[' total_bill' , ' tip' , ' smoker' , ' time' ]].head(5 )
4646
47- Calling the DataFrame without the list of column names would display all columns (akin to SQL's
47+ Calling the DataFrame without the list of column names would display all columns (akin to SQL's
4848``* ``).
4949
5050WHERE
@@ -58,14 +58,14 @@ Filtering in SQL is done via a WHERE clause.
5858 WHERE time = 'Dinner'
5959 LIMIT 5;
6060
61- DataFrames can be filtered in multiple ways; the most intuitive of which is using
61+ DataFrames can be filtered in multiple ways; the most intuitive of which is using
6262`boolean indexing <http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing >`_.
6363
6464.. ipython :: python
6565
6666 tips[tips[' time' ] == ' Dinner' ].head(5 )
6767
68- The above statement is simply passing a ``Series `` of True/False objects to the DataFrame,
68+ The above statement is simply passing a ``Series `` of True/False objects to the DataFrame,
6969returning all rows with True.
7070
7171.. ipython :: python
@@ -74,7 +74,7 @@ returning all rows with True.
7474 is_dinner.value_counts()
7575 tips[is_dinner].head(5 )
7676
77- Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
77+ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
7878(AND).
7979
8080.. code-block :: sql
@@ -101,16 +101,16 @@ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame usi
101101 # tips by parties of at least 5 diners OR bill total was more than $45
102102 tips[(tips[' size' ] >= 5 ) | (tips[' total_bill' ] > 45 )]
103103
104- NULL checking is done using the :meth: `~pandas.Series.notnull ` and :meth: `~pandas.Series.isnull `
104+ NULL checking is done using the :meth: `~pandas.Series.notnull ` and :meth: `~pandas.Series.isnull `
105105methods.
106106
107107.. ipython :: python
108-
108+
109109 frame = pd.DataFrame({' col1' : [' A' , ' B' , np.NaN, ' C' , ' D' ],
110110 ' col2' : [' F' , np.NaN, ' G' , ' H' , ' I' ]})
111111 frame
112112
113- Assume we have a table of the same structure as our DataFrame above. We can see only the records
113+ Assume we have a table of the same structure as our DataFrame above. We can see only the records
114114where ``col2 `` IS NULL with the following query:
115115
116116.. code-block :: sql
@@ -138,12 +138,12 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.
138138
139139 GROUP BY
140140--------
141- In pandas, SQL's GROUP BY operations performed using the similarly named
142- :meth: `~pandas.DataFrame.groupby ` method. :meth: `~pandas.DataFrame.groupby ` typically refers to a
141+ In pandas, SQL's GROUP BY operations performed using the similarly named
142+ :meth: `~pandas.DataFrame.groupby ` method. :meth: `~pandas.DataFrame.groupby ` typically refers to a
143143process where we'd like to split a dataset into groups, apply some function (typically aggregation)
144144, and then combine the groups together.
145145
146- A common SQL operation would be getting the count of records in each group throughout a dataset.
146+ A common SQL operation would be getting the count of records in each group throughout a dataset.
147147For instance, a query getting us the number of tips left by sex:
148148
149149.. code-block :: sql
@@ -163,23 +163,23 @@ The pandas equivalent would be:
163163
164164 tips.groupby(' sex' ).size()
165165
166- Notice that in the pandas code we used :meth: `~pandas.DataFrameGroupBy.size ` and not
167- :meth: `~pandas.DataFrameGroupBy.count `. This is because :meth: `~pandas.DataFrameGroupBy.count `
166+ Notice that in the pandas code we used :meth: `~pandas.DataFrameGroupBy.size ` and not
167+ :meth: `~pandas.DataFrameGroupBy.count `. This is because :meth: `~pandas.DataFrameGroupBy.count `
168168applies the function to each column, returning the number of ``not null `` records within each.
169169
170170.. ipython :: python
171171
172172 tips.groupby(' sex' ).count()
173173
174- Alternatively, we could have applied the :meth: `~pandas.DataFrameGroupBy.count ` method to an
174+ Alternatively, we could have applied the :meth: `~pandas.DataFrameGroupBy.count ` method to an
175175individual column:
176176
177177.. ipython :: python
178178
179179 tips.groupby(' sex' )[' total_bill' ].count()
180180
181- Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount
182- differs by day of the week - :meth: `~pandas.DataFrameGroupBy.agg ` allows you to pass a dictionary
181+ Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount
182+ differs by day of the week - :meth: `~pandas.DataFrameGroupBy.agg ` allows you to pass a dictionary
183183to your grouped DataFrame, indicating which functions to apply to specific columns.
184184
185185.. code-block :: sql
@@ -198,7 +198,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum
198198
199199 tips.groupby(' day' ).agg({' tip' : np.mean, ' day' : np.size})
200200
201- Grouping by more than one column is done by passing a list of columns to the
201+ Grouping by more than one column is done by passing a list of columns to the
202202:meth: `~pandas.DataFrame.groupby ` method.
203203
204204.. code-block :: sql
@@ -207,7 +207,7 @@ Grouping by more than one column is done by passing a list of columns to the
207207 FROM tip
208208 GROUP BY smoker, day;
209209 /*
210- smoker day
210+ smoker day
211211 No Fri 4 2.812500
212212 Sat 45 3.102889
213213 Sun 57 3.167895
@@ -226,16 +226,16 @@ Grouping by more than one column is done by passing a list of columns to the
226226
227227JOIN
228228----
229- JOINs can be performed with :meth: `~pandas.DataFrame.join ` or :meth: `~pandas.merge `. By default,
230- :meth: `~pandas.DataFrame.join ` will join the DataFrames on their indices. Each method has
231- parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
229+ JOINs can be performed with :meth: `~pandas.DataFrame.join ` or :meth: `~pandas.merge `. By default,
230+ :meth: `~pandas.DataFrame.join ` will join the DataFrames on their indices. Each method has
231+ parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
232232columns to join on (column names or indices).
233233
234234.. ipython :: python
235235
236236 df1 = pd.DataFrame({' key' : [' A' , ' B' , ' C' , ' D' ],
237237 ' value' : np.random.randn(4 )})
238- df2 = pd.DataFrame({' key' : [' B' , ' D' , ' D' , ' E' ],
238+ df2 = pd.DataFrame({' key' : [' B' , ' D' , ' D' , ' E' ],
239239 ' value' : np.random.randn(4 )})
240240
241241 Assume we have two database tables of the same name and structure as our DataFrames.
@@ -256,7 +256,7 @@ INNER JOIN
256256 # merge performs an INNER JOIN by default
257257 pd.merge(df1, df2, on = ' key' )
258258
259- :meth: `~pandas.merge ` also offers parameters for cases when you'd like to join one DataFrame's
259+ :meth: `~pandas.merge ` also offers parameters for cases when you'd like to join one DataFrame's
260260column with another DataFrame's index.
261261
262262.. ipython :: python
@@ -296,7 +296,7 @@ RIGHT JOIN
296296
297297 FULL JOIN
298298~~~~~~~~~
299- pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the
299+ pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the
300300joined columns find a match. As of writing, FULL JOINs are not supported in all RDBMS (MySQL).
301301
302302.. code-block :: sql
@@ -364,7 +364,7 @@ SQL's UNION is similar to UNION ALL, however UNION will remove duplicate rows.
364364 Los Angeles 5
365365 */
366366
367- In pandas, you can use :meth: `~pandas.concat ` in conjunction with
367+ In pandas, you can use :meth: `~pandas.concat ` in conjunction with
368368:meth: `~pandas.DataFrame.drop_duplicates `.
369369
370370.. ipython :: python
@@ -377,4 +377,4 @@ UPDATE
377377
378378
379379DELETE
380- ------
380+ ------
0 commit comments