@@ -853,3 +853,129 @@ Of course if you need integer based selection, then use ``iloc``
853853.. ipython:: python
854854
855855 dfir.iloc[0 :5 ]
856+
857+ Miscellaneous indexing FAQ
858+ --------------------------
859+
860+ Integer indexing with ix
861+ ~~~~~~~~~~~~~~~~~~~~~~~~
862+
863+ Label- based indexing with integer axis labels is a thorny topic. It has been
864+ discussed heavily on mailing lists and among various members of the scientific
865+ Python community. In pandas, our general viewpoint is that labels matter more
866+ than integer locations. Therefore, with an integer axis index * only*
867+ label- based indexing is possible with the standard tools like `` .ix`` . The
868+ following code will generate exceptions:
869+
870+ .. code- block:: python
871+
872+ s = pd.Series(range (5 ))
873+ s[- 1 ]
874+ df = pd.DataFrame(np.random.randn(5 , 4 ))
875+ df
876+ df.ix[- 2 :]
877+
878+ This deliberate decision was made to prevent ambiguities and subtle bugs (many
879+ users reported finding bugs when the API change was made to stop " falling back"
880+ on position- based indexing).
881+
882+ Non- monotonic indexes require exact matches
883+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
884+
885+ If the index of a `` Series`` or `` DataFrame`` is monotonically increasing or decreasing, then the bounds
886+ of a label- based slice can be outside the range of the index, much like slice indexing a
887+ normal Python `` list `` . Monotonicity of an index can be tested with the `` is_monotonic_increasing`` and
888+ `` is_monotonic_decreasing`` attributes.
889+
890+ .. ipython:: python
891+
892+ df = pd.DataFrame(index = [2 ,3 ,3 ,4 ,5 ], columns = [' data' ], data = range (5 ))
893+ df.index.is_monotonic_increasing
894+
895+ # no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
896+ df.loc[0 :4 , :]
897+
898+ # slice is are outside the index, so empty DataFrame is returned
899+ df.loc[13 :15 , :]
900+
901+ On the other hand, if the index is not monotonic, then both slice bounds must be
902+ * unique* members of the index.
903+
904+ .. ipython:: python
905+
906+ df = pd.DataFrame(index = [2 ,3 ,1 ,4 ,3 ,5 ], columns = [' data' ], data = range (6 ))
907+ df.index.is_monotonic_increasing
908+
909+ # OK because 2 and 4 are in the index
910+ df.loc[2 :4 , :]
911+
912+ .. code- block:: python
913+
914+ # 0 is not in the index
915+ In [9 ]: df.loc[0 :4 , :]
916+ KeyError : 0
917+
918+ # 3 is not a unique label
919+ In [11 ]: df.loc[2 :3 , :]
920+ KeyError : ' Cannot get right slice bound for non-unique label: 3'
921+
922+
923+ Endpoints are inclusive
924+ ~~~~~~~~~~~~~~~~~~~~~~~
925+
926+ Compared with standard Python sequence slicing in which the slice endpoint is
927+ not inclusive, label- based slicing in pandas ** is inclusive** . The primary
928+ reason for this is that it is often not possible to easily determine the
929+ " successor" or next element after a particular label in an index. For example,
930+ consider the following Series:
931+
932+ .. ipython:: python
933+
934+ s = pd.Series(np.random.randn(6 ), index = list (' abcdef' ))
935+ s
936+
937+ Suppose we wished to slice from `` c`` to `` e`` , using integers this would be
938+
939+ .. ipython:: python
940+
941+ s[2 :5 ]
942+
943+ However, if you only had `` c`` and `` e`` , determining the next element in the
944+ index can be somewhat complicated. For example, the following does not work:
945+
946+ ::
947+
948+ s.loc[' c' :' e' + 1 ]
949+
950+ A very common use case is to limit a time series to start and end at two
951+ specific dates. To enable this, we made the design design to make label- based
952+ slicing include both endpoints:
953+
954+ .. ipython:: python
955+
956+ s.loc[' c' :' e' ]
957+
958+ This is most definitely a " practicality beats purity" sort of thing, but it is
959+ something to watch out for if you expect label- based slicing to behave exactly
960+ in the way that standard Python integer slicing works.
961+
962+
963+ Indexing potentially changes underlying Series dtype
964+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965+
966+ The use of `` reindex_like`` can potentially change the dtype of a `` Series`` .
967+
968+ .. ipython:: python
969+
970+ series = pd.Series([1 , 2 , 3 ])
971+ x = pd.Series([True ])
972+ x.dtype
973+ x = pd.Series([True ]).reindex_like(series)
974+ x.dtype
975+
976+ This is because `` reindex_like`` silently inserts `` NaNs`` and the `` dtype``
977+ changes accordingly. This can cause some issues when using `` numpy`` `` ufuncs``
978+ such as `` numpy.logical_and`` .
979+
980+ See the `this old issue < https:// github.com/ pydata/ pandas/ issues/ 2388 > ` __ for a more
981+ detailed discussion.
0 commit comments