@@ -989,6 +989,60 @@ Note that ``df.groupby('A').colname.std().`` is more efficient than
989989is only interesting over one column (here ``colname ``), it may be filtered
990990*before * applying the aggregation function.
991991
992+ .. _groupby.observed :
993+
994+ Handling of (un)observed Categorical values
995+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
996+
997+ When using a ``Categorical `` grouper (as a single grouper, or as part of multiple groupers), the ``observed `` keyword
998+ controls whether to return a cartesian product of all possible groupers values (``observed=False ``) or only those
999+ that are observed groupers (``observed=True ``).
1000+
1001+ Show all values:
1002+
1003+ .. ipython :: python
1004+
1005+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1006+
1007+ Show only the observed values:
1008+
1009+ .. ipython :: python
1010+
1011+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = True ).count()
1012+
1013+ The returned dtype of the grouped will *always * include *all * of the categories that were grouped.
1014+
1015+ .. ipython :: python
1016+
1017+ s = pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1018+ s.index.dtype
1019+
1020+ .. note ::
1021+ Decimal and object columns are also "nuisance" columns. They are excluded from aggregate functions automatically in groupby.
1022+
1023+ If you do wish to include decimal or object columns in an aggregation with other non-nuisance data types, you must do so explicitly.
1024+
1025+ .. ipython :: python
1026+
1027+ from decimal import Decimal
1028+ dec = pd.DataFrame(
1029+ {' id' : [123 , 456 , 123 , 456 ],
1030+ ' int_column' : [1 , 2 , 3 , 4 ],
1031+ ' dec_column1' : [Decimal(' 0.50' ), Decimal(' 0.15' ), Decimal(' 0.25' ), Decimal(' 0.40' )]
1032+ },
1033+ columns = [' id' ,' int_column' ,' dec_column' ]
1034+ )
1035+
1036+ # Decimal columns can be sum'd explicitly by themselves...
1037+ dec.groupby([' id' ], as_index = False )[' dec_column' ].sum()
1038+
1039+ # ...but cannot be combined with standard data types or they will be excluded
1040+ dec.groupby([' id' ], as_index = False )[' int_column' ,' dec_column' ].sum()
1041+
1042+ # Use .agg function to aggregate over standard and "nuisance" data types at the same time
1043+ dec.groupby([' id' ], as_index = False ).agg({' int_column' : ' sum' , ' dec_column' : ' sum' })
1044+
1045+
9921046 .. _groupby.missing :
9931047
9941048NA and NaT group handling
0 commit comments