@@ -1980,58 +1980,87 @@ def plot_series(data, kind='line', ax=None, # Series unique
19801980
19811981
19821982_shared_docs ['boxplot' ] = """
1983- Make a box-and-whisker plot from DataFrame column optionally grouped
1984- by some columns or other inputs. The box extends from the Q1 to Q3
1985- quartile values of the data, with a line at the median (Q2).
1986- The whiskers extend from the edges of box to show the range of the data.
1987- Flier points (outliers) are those past the end of the whiskers.
1988- The position of the whiskers is set by default to 1.5 IQR (`whis=1.5``)
1989- from the edge of the box.
1983+ Make a box plot from DataFrame columns.
1984+
1985+ Make a box-and-whisker plot from DataFrame columns optionally grouped
1986+ by some other columns. A box plot is a method for graphically depicting
1987+ groups of numerical data through their quartiles.
1988+ The box extends from the Q1 to Q3 quartile values of the data,
1989+ with a line at the median (Q2).The whiskers extend from the edges
1990+ of box to show the range of the data. The position of the whiskers
1991+ is set by default to 1.5*IQR (IQR = Q3 - Q1) from the edges of the box.
1992+ Outlier points are those past the end of the whiskers.
19901993
19911994 For further details see
1992- Wikipedia's entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot/ >`_.
1995+ Wikipedia's entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot>`_.
19931996
19941997 Parameters
19951998 ----------
1996- column : column name or list of names, or vector
1999+ column : str or list of str, optional
2000+ Column name or list of names, or vector.
19972001 Can be any valid input to groupby.
1998- by : string or sequence
2002+ by : str or array-like
19992003 Column in the DataFrame to groupby.
2000- ax : Matplotlib axes object, ( default `None`)
2004+ ax : object of class matplotlib.axes.Axes, default `None`
20012005 The matplotlib axes to be used by boxplot.
2002- fontsize : int or string
2003- The font-size used by matplotlib.
2004- rot : label rotation angle
2005- The rotation angle of labels.
2006- grid : boolean( default `True`)
2006+ fontsize : float or str
2007+ Tick label font size in points or as a string (e.g., ‘large’)
2008+ (see `matplotlib.axes.Axes.tick_params
2009+ <https://matplotlib.org/api/_as_gen/
2010+ matplotlib.axes.Axes.tick_params.html>`_).
2011+ rot : int or float, default 0
2012+ The rotation angle of labels (in degrees)
2013+ with respect to the screen coordinate sytem.
2014+ grid : boolean, default `True`
20072015 Setting this to True will show the grid.
20082016 figsize : A tuple (width, height) in inches
2009- The size of the figure to create in inches by default.
2010- layout : tuple (optional)
2011- Tuple (rows, columns) used for the layout of the plot.
2012- return_type : {None, 'axes', 'dict', 'both'}, default None
2013- The kind of object to return. The default is ``axes``
2014- 'axes' returns the matplotlib axes the boxplot is drawn on;
2015- 'dict' returns a dictionary whose values are the matplotlib
2016- Lines of the boxplot;
2017- 'both' returns a namedtuple with the axes and dict.
2018- When grouping with ``by``, a Series mapping columns to ``return_type``
2019- is returned, unless ``return_type`` is None, in which case a NumPy
2020- array of axes is returned with the same shape as ``layout``.
2021- See the prose documentation for more.
2022- kwds : Keyword Arguments (optional)
2017+ The size of the figure to create in matplotlib.
2018+ layout : tuple (rows, columns) (optional)
2019+ For example, (3, 5) will display the subplots
2020+ using 3 columns and 5 rows, starting from the top-left.
2021+ return_type : {None, 'axes', 'dict', 'both'}, default 'axes'
2022+ The kind of object to return. The default is ``axes``.
2023+
2024+ * 'axes' returns the matplotlib axes the boxplot is drawn on.
2025+ * 'dict' returns a dictionary whose values are the matplotlib
2026+ Lines of the boxplot.
2027+ * 'both' returns a namedtuple with the axes and dict.
2028+ * when grouping with ``by``, a Series mapping columns to
2029+ ``return_type`` is returned (i.e.
2030+ ``df.boxplot(column=['Col1','Col2'], by='var',return_type='axes')``
2031+ may return ``Series([AxesSubplot(..),AxesSubplot(..)],
2032+ index=['Col1','Col2'])``).
2033+
2034+ If ``return_type`` is `None`, a NumPy array
2035+ of axes with the same shape as ``layout`` is returned
2036+ (i.e. ``df.boxplot(column=['Col1','Col2'],
2037+ by='var',return_type=None)`` may return a
2038+ ``array([<matplotlib.axes._subplots.AxesSubplot object at ..>,
2039+ <matplotlib.axes._subplots.AxesSubplot object at ..>],
2040+ dtype=object)``).
2041+ **kwds : Keyword Arguments (optional)
20232042 All other plotting keyword arguments to be passed to
2024- matplotlib's function.
2043+ `matplotlib.pyplot.boxplot <https://matplotlib.org/api/_as_gen/
2044+ matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot>`_.
20252045
20262046 Returns
20272047 -------
2028- lines : dict
2029- ax : matplotlib Axes
2030- (ax, lines): namedtuple
2048+ result:
2049+ Options:
2050+
2051+ * ax : object of class
2052+ matplotlib.axes.Axes (for ``return_type='axes'``)
2053+ * lines : dict (for ``return_type='dict'``)
2054+ * (ax, lines): namedtuple (for ``return_type='both'``)
2055+ * :class:`~pandas.Series` (for ``return_type != None``
2056+ and data grouped with ``by``)
2057+ * :class:`~numpy.array` (for ``return_type=None``
2058+ and data grouped with ``by``)
20312059
20322060 See Also
20332061 --------
20342062 matplotlib.pyplot.boxplot: Make a box and whisker plot.
2063+ matplotlib.pyplot.hist: Make a hsitogram.
20352064
20362065 Notes
20372066 -----
@@ -2041,72 +2070,57 @@ def plot_series(data, kind='line', ax=None, # Series unique
20412070
20422071 Examples
20432072 --------
2073+
2074+ Boxplots can be created for every column in the dataframe
2075+ by ``df.boxplot()`` or indicating the columns to be used:
2076+
20442077 .. plot::
20452078 :context: close-figs
20462079
20472080 >>> np.random.seed(1234)
2081+ >>> df = pd.DataFrame(np.random.rand(10,4),
2082+ ... columns=['Col1', 'Col2', 'Col3', 'Col4'])
2083+ >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3'])
20482084
2049- >>> df = pd.DataFrame({
2050- ... u'stratifying_var': np.random.uniform(0, 100, 20),
2051- ... u'price': np.random.normal(100, 5, 20),
2052- ... u'demand': np.random.normal(100, 10, 20)})
2053-
2054- >>> df[u'quartiles'] = pd.qcut(
2055- ... df[u'stratifying_var'], 4,
2056- ... labels=[u'0-25%%', u'25-50%%', u'50-75%%', u'75-100%%'])
2057-
2058- >>> df
2059- stratifying_var price demand quartiles
2060- 0 19.151945 106.605791 108.416747 0-25%%
2061- 1 62.210877 92.265472 123.909605 50-75%%
2062- 2 43.772774 98.986768 100.761996 25-50%%
2063- 3 78.535858 96.720153 94.335541 75-100%%
2064- 4 77.997581 100.967107 100.361419 50-75%%
2065- 5 27.259261 102.767195 79.250224 0-25%%
2066- 6 27.646426 106.590758 102.477922 0-25%%
2067- 7 80.187218 97.653474 91.028432 75-100%%
2068- 8 95.813935 103.377770 98.632052 75-100%%
2069- 9 87.593263 90.914864 100.182892 75-100%%
2070- 10 35.781727 99.084457 107.554140 0-25%%
2071- 11 50.099513 105.294846 102.152686 25-50%%
2072- 12 68.346294 98.010799 108.410088 50-75%%
2073- 13 71.270203 101.687188 85.541899 50-75%%
2074- 14 37.025075 105.237893 85.980267 25-50%%
2075- 15 56.119619 105.229691 98.990818 25-50%%
2076- 16 50.308317 104.318586 94.517576 25-50%%
2077- 17 1.376845 99.389542 98.553805 0-25%%
2078- 18 77.282662 100.623565 103.540203 50-75%%
2079- 19 88.264119 98.386026 99.644870 75-100%%
2080-
2081- To plot the boxplot of the ``demand`` just put:
2085+ Boxplots of variables distributions grouped by a third variable values
2086+ can be created using the option ``by``. For instance:
20822087
20832088 .. plot::
20842089 :context: close-figs
20852090
2086- >>> boxplot = df.boxplot(column=u'demand', by=u'quartiles')
2091+ >>> df = pd.DataFrame(np.random.rand(10,2), columns=['Col1', 'Col2'] )
2092+ >>> df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
2093+ >>> boxplot = df.boxplot(by='X')
20872094
2088- Use ``grid=False`` to hide the grid:
2095+ A list of strings (i.e. ``['X','Y']``) containing can be passed to boxplot
2096+ in order to group the data by combination of the variables in the x-axis:
20892097
20902098 .. plot::
20912099 :context: close-figs
20922100
2093- >>> boxplot = df.boxplot(column=u'demand', by=u'quartiles', grid=False)
2101+ >>> df = pd.DataFrame(np.random.rand(10,3),
2102+ ... columns=['Col1', 'Col2', 'Col3'])
2103+ >>> df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
2104+ >>> df['Y'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
2105+ >>> boxplot = df.boxplot(column=['Col1','Col2'], by=['X','Y'])
20942106
2095- Optionally, the layout can be changed by setting ``layout=(rows, cols) ``:
2107+ The layout of boxplot can be adjusted giving a tuple to ``layout``:
20962108
20972109 .. plot::
20982110 :context: close-figs
20992111
2100- >>> boxplot = df.boxplot(column=[u'price',u'demand'],
2101- ... by=u'quartiles', layout=(1,2),
2102- ... figsize=(8,5))
2112+ >>> df = pd.DataFrame(np.random.rand(10,2), columns=['Col1', 'Col2'])
2113+ >>> df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
2114+ >>> boxplot = df.boxplot(by='X', layout=(2,1))
2115+
2116+ Additional formatting can be done to the boxplot, like suppressing the grid
2117+ (``grid=False``), rotating the labels in the x-axis (i.e. ``rot=45``)
2118+ or changing the fontsize (i.e. ``fontsize=15``):
21032119
21042120 .. plot::
21052121 :context: close-figs
21062122
2107- >>> boxplot = df.boxplot(column=[u'price',u'demand'],
2108- ... by=u'quartiles', layout=(2,1),
2109- ... figsize=(5,8))
2123+ >>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15)
21102124 """
21112125
21122126@Appender (_shared_docs ['boxplot' ] % _shared_doc_kwargs )
0 commit comments