2828IO Tools (Text, CSV, HDF5, ...)
2929===============================
3030
31- The pandas I/O API is a set of top level ``reader `` functions accessed like
32- :func: `pandas.read_csv ` that generally return a pandas object. The corresponding
33- ``writer `` functions are object methods that are accessed like
34- :meth: `DataFrame.to_csv `. Below is a table containing available ``readers `` and
31+ The pandas I/O API is a set of top level ``reader `` functions accessed like
32+ :func: `pandas.read_csv ` that generally return a pandas object. The corresponding
33+ ``writer `` functions are object methods that are accessed like
34+ :meth: `DataFrame.to_csv `. Below is a table containing available ``readers `` and
3535``writers ``.
3636
3737.. csv-table ::
@@ -74,7 +74,7 @@ intelligently convert tabular data into a ``DataFrame`` object. See the
7474Parsing options
7575'''''''''''''''
7676
77- The functions :func: `read_csv ` and :func: `read_table ` accept the following
77+ The functions :func: `read_csv ` and :func: `read_table ` accept the following
7878common arguments:
7979
8080Basic
@@ -351,8 +351,8 @@ Error Handling
351351
352352error_bad_lines : boolean, default ``True ``
353353 Lines with too many fields (e.g. a csv line with too many commas) will by
354- default cause an exception to be raised, and no ``DataFrame `` will be
355- returned. If ``False ``, then these "bad lines" will dropped from the
354+ default cause an exception to be raised, and no ``DataFrame `` will be
355+ returned. If ``False ``, then these "bad lines" will dropped from the
356356 ``DataFrame `` that is returned. See :ref: `bad lines <io.bad_lines >`
357357 below.
358358warn_bad_lines : boolean, default ``True ``
@@ -364,7 +364,7 @@ warn_bad_lines : boolean, default ``True``
364364Specifying column data types
365365''''''''''''''''''''''''''''
366366
367- You can indicate the data type for the whole ``DataFrame `` or individual
367+ You can indicate the data type for the whole ``DataFrame `` or individual
368368columns:
369369
370370.. ipython :: python
@@ -463,7 +463,7 @@ Specifying Categorical dtype
463463 pd.read_csv(StringIO(data)).dtypes
464464 pd.read_csv(StringIO(data), dtype = ' category' ).dtypes
465465
466- Individual columns can be parsed as a ``Categorical `` using a dict
466+ Individual columns can be parsed as a ``Categorical `` using a dict
467467specification:
468468
469469.. ipython :: python
@@ -562,17 +562,17 @@ If the header is in a row other than the first, pass the row number to
562562Duplicate names parsing
563563'''''''''''''''''''''''
564564
565- If the file or header contains duplicate names, pandas will by default
565+ If the file or header contains duplicate names, pandas will by default
566566distinguish between them so as to prevent overwriting data:
567567
568568.. ipython :: python
569569
570570 data = 'a,b,a\n0,1,2\n3,4,5'
571571 pd.read_csv(StringIO(data))
572572
573- There is no more duplicate data because ``mangle_dupe_cols=True `` by default,
574- which modifies a series of duplicate columns 'X', ..., 'X' to become
575- 'X', 'X.1', ..., 'X.N'. If ``mangle_dupe_cols=False ``, duplicate data can
573+ There is no more duplicate data because ``mangle_dupe_cols=True `` by default,
574+ which modifies a series of duplicate columns 'X', ..., 'X' to become
575+ 'X', 'X.1', ..., 'X.N'. If ``mangle_dupe_cols=False ``, duplicate data can
576576arise:
577577
578578.. code-block :: python
@@ -927,9 +927,9 @@ Note that performance-wise, you should try these methods of parsing dates in ord
927927 For optimal performance, this should be vectorized, i.e., it should accept arrays
928928 as arguments.
929929
930- You can explore the date parsing functionality in
931- `date_converters.py <https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py >`__
932- and add your own. We would love to turn this module into a community supported
930+ You can explore the date parsing functionality in
931+ `date_converters.py <https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py >`__
932+ and add your own. We would love to turn this module into a community supported
933933set of date/time parsers. To get you started, ``date_converters.py `` contains
934934functions to parse dual date and time columns, year/month/day columns,
935935and year/month/day/hour/minute/second columns. It also contains a
@@ -1073,11 +1073,11 @@ The ``thousands`` keyword allows integers to be parsed correctly:
10731073NA Values
10741074'''''''''
10751075
1076- To control which values are parsed as missing values (which are signified by
1077- ``NaN ``), specify a string in ``na_values ``. If you specify a list of strings,
1078- then all values in it are considered to be missing values. If you specify a
1079- number (a ``float ``, like ``5.0 `` or an ``integer `` like ``5 ``), the
1080- corresponding equivalent values will also imply a missing value (in this case
1076+ To control which values are parsed as missing values (which are signified by
1077+ ``NaN ``), specify a string in ``na_values ``. If you specify a list of strings,
1078+ then all values in it are considered to be missing values. If you specify a
1079+ number (a ``float ``, like ``5.0 `` or an ``integer `` like ``5 ``), the
1080+ corresponding equivalent values will also imply a missing value (in this case
10811081effectively ``[5.0, 5] `` are recognized as ``NaN ``).
10821082
10831083To completely override the default values that are recognized as missing, specify ``keep_default_na=False ``.
@@ -1094,7 +1094,7 @@ Let us consider some examples:
10941094 read_csv(path, na_values = [5 ])
10951095
10961096 In the example above ``5 `` and ``5.0 `` will be recognized as ``NaN ``, in
1097- addition to the defaults. A string will first be interpreted as a numerical
1097+ addition to the defaults. A string will first be interpreted as a numerical
10981098``5 ``, then as a ``NaN ``.
10991099
11001100.. code-block :: python
@@ -1113,7 +1113,7 @@ Above, both ``NA`` and ``0`` as strings are ``NaN``.
11131113
11141114 read_csv(path, na_values = [" Nope" ])
11151115
1116- The default values, in addition to the string ``"Nope" `` are recognized as
1116+ The default values, in addition to the string ``"Nope" `` are recognized as
11171117``NaN ``.
11181118
11191119.. _io.infinity :
@@ -1272,8 +1272,8 @@ after a delimiter:
12721272 print (data)
12731273 pd.read_csv(StringIO(data), skipinitialspace = True )
12741274
1275- The parsers make every attempt to "do the right thing" and not be fragile. Type
1276- inference is a pretty big deal. If a column can be coerced to integer dtype
1275+ The parsers make every attempt to "do the right thing" and not be fragile. Type
1276+ inference is a pretty big deal. If a column can be coerced to integer dtype
12771277without altering the contents, the parser will do so. Any non-numeric
12781278columns will come through as object dtype as with the rest of pandas objects.
12791279
@@ -1814,7 +1814,7 @@ Writing to a file, with a date index and a date column:
18141814 Fallback Behavior
18151815+++++++++++++++++
18161816
1817- If the JSON serializer cannot handle the container contents directly it will
1817+ If the JSON serializer cannot handle the container contents directly it will
18181818fall back in the following manner:
18191819
18201820- if the dtype is unsupported (e.g. ``np.complex ``) then the ``default_handler ``, if provided, will be called
@@ -1908,10 +1908,10 @@ overview.
19081908Data Conversion
19091909+++++++++++++++
19101910
1911- The default of ``convert_axes=True ``, ``dtype=True ``, and ``convert_dates=True ``
1912- will try to parse the axes, and all of the data into appropriate types,
1913- including dates. If you need to override specific dtypes, pass a dict to
1914- ``dtype ``. ``convert_axes `` should only be set to ``False `` if you need to
1911+ The default of ``convert_axes=True ``, ``dtype=True ``, and ``convert_dates=True ``
1912+ will try to parse the axes, and all of the data into appropriate types,
1913+ including dates. If you need to override specific dtypes, pass a dict to
1914+ ``dtype ``. ``convert_axes `` should only be set to ``False `` if you need to
19151915preserve string-like numbers (e.g. '1', '2') in an axes.
19161916
19171917.. note ::
@@ -2675,7 +2675,7 @@ The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``) and
26752675Excel 2007+ (``.xlsx ``) files using the ``xlrd `` Python
26762676module. The :meth: `~DataFrame.to_excel ` instance method is used for
26772677saving a ``DataFrame `` to Excel. Generally the semantics are
2678- similar to working with :ref: `csv<io.read_csv_table> ` data.
2678+ similar to working with :ref: `csv<io.read_csv_table> ` data.
26792679See the :ref: `cookbook<cookbook.excel> ` for some advanced strategies.
26802680
26812681.. _io.excel_reader :
@@ -3065,9 +3065,9 @@ The look and feel of Excel worksheets created from pandas can be modified using
30653065Clipboard
30663066---------
30673067
3068- A handy way to grab data is to use the :meth: `~DataFrame.read_clipboard ` method,
3069- which takes the contents of the clipboard buffer and passes them to the
3070- ``read_table `` method. For instance, you can copy the following text to the
3068+ A handy way to grab data is to use the :meth: `~DataFrame.read_clipboard ` method,
3069+ which takes the contents of the clipboard buffer and passes them to the
3070+ ``read_table `` method. For instance, you can copy the following text to the
30713071clipboard (CTRL-C on many operating systems):
30723072
30733073.. code-block :: python
@@ -4550,7 +4550,7 @@ Several caveats.
45504550 on an attempt at serialization.
45514551
45524552You can specify an ``engine `` to direct the serialization. This can be one of ``pyarrow ``, or ``fastparquet ``, or ``auto ``.
4553- If the engine is NOT specified, then the ``pd.options.io.parquet.engine `` option is checked; if this is also ``auto ``,
4553+ If the engine is NOT specified, then the ``pd.options.io.parquet.engine `` option is checked; if this is also ``auto ``,
45544554then ``pyarrow `` is tried, and falling back to ``fastparquet ``.
45554555
45564556See the documentation for `pyarrow <http://arrow.apache.org/docs/python/ >`__ and `fastparquet <https://fastparquet.readthedocs.io/en/latest/ >`__.
@@ -5200,7 +5200,7 @@ ignored.
52005200 dtypes: float64(1), int64(1)
52015201 memory usage: 15.3 MB
52025202
5203- When writing, the top-three functions in terms of speed are are
5203+ When writing, the top-three functions in terms of speed are are
52045204``test_pickle_write ``, ``test_feather_write `` and ``test_hdf_fixed_write_compress ``.
52055205
52065206.. code-block :: ipython
0 commit comments