.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/pandas/indexing_vs_slicing.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_pandas_indexing_vs_slicing.py: ========================= 5.2 indexing and slicing ========================= This lesson shows how to select rows or columns from pandas dataframe. .. GENERATED FROM PYTHON SOURCE LINES 9-12 .. code-block:: default import pandas as pd print(pd.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none 1.5.3 .. GENERATED FROM PYTHON SOURCE LINES 13-15 .. code-block:: default pd.set_option('display.max_columns', 11) .. GENERATED FROM PYTHON SOURCE LINES 16-17 Let's create a dataframe .. GENERATED FROM PYTHON SOURCE LINES 17-35 .. code-block:: default df = pd.DataFrame({ 'name': ['Ali', 'Hasan', 'Husain', 'Ali', 'Muhammad', 'Jafar', 'Musa', 'Raza', 'Taqi', 'Naqi', 'Askari'], 'age': [63, 47, 57, 57, 57, 65, 55, 55, 24, 40, 27], 'other': ['muavia','muavia','yazid','walid','hisham','mansur','harun','mamun','Mutasim','Mutasim','mutaz'], 'cityb': ['MKH', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD', 'MAD'], 'duration':[29, 10, 11, 34, 19, 32, 35, 20, 16, 34, 6], 'YoB_H': [-22, 3, 4, 38, 56, 83, 128, 148, 195, 212, 232], 'YOB_G': [600, 625, 626, 659, 676, 702, 745, 766, 811, 828, 844], 'YoM_H': [40, 50, 61, 95, 114, 148, 183, 203, 220, 254, 260], 'YoM_G': [661, 670, 680, 712, 733, 765, 799, 818, 835, 868, 874], 'dynasty':[None,'umayad','umayad','umayad','umayad','abbasid','abbasid','abbasid','abbasid','abbasid','abbasid'], 'cityd': ['NJF', 'MAD', 'KBL', 'MAD', 'MAD', 'MAD', 'BGD', 'MAS', 'BGH', 'SAM', 'SAM'] }, index=['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eigth', 'ninth', 'tenth', 'eleventh'] ) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ first Ali 63 muavia MKH 29 -22 600 40 661 second Hasan 47 muavia MAD 10 3 625 50 670 third Husain 57 yazid MAD 11 4 626 61 680 fourth Ali 57 walid MAD 34 38 659 95 712 fifth Muhammad 57 hisham MAD 19 56 676 114 733 sixth Jafar 65 mansur MAD 32 83 702 148 765 seventh Musa 55 harun MAD 35 128 745 183 799 eigth Raza 55 mamun MAD 20 148 766 203 818 ninth Taqi 24 Mutasim MAD 16 195 811 220 835 tenth Naqi 40 Mutasim MAD 34 212 828 254 868 eleventh Askari 27 mutaz MAD 6 232 844 260 874 dynasty cityd first None NJF second umayad MAD third umayad KBL fourth umayad MAD fifth umayad MAD sixth abbasid MAD seventh abbasid BGD eigth abbasid MAS ninth abbasid BGH tenth abbasid SAM eleventh abbasid SAM .. GENERATED FROM PYTHON SOURCE LINES 36-38 .. code-block:: default print(df.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (11, 11) .. GENERATED FROM PYTHON SOURCE LINES 41-43 The indexing operator, ``[]``, can be used for slicing and for selecting rows and columns. However, it can not be used for both purposes (slicing and selecting rows/columns) simultaneously. .. GENERATED FROM PYTHON SOURCE LINES 45-46 We can select a single column from dataframe as below .. GENERATED FROM PYTHON SOURCE LINES 46-49 .. code-block:: default print(df['name']) .. rst-class:: sphx-glr-script-out .. code-block:: none first Ali second Hasan third Husain fourth Ali fifth Muhammad sixth Jafar seventh Musa eigth Raza ninth Taqi tenth Naqi eleventh Askari Name: name, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 50-51 For selecting multiple columns, we must pass a list of columns. .. GENERATED FROM PYTHON SOURCE LINES 53-56 .. code-block:: default print(df[['other', 'YoM_H']]) .. rst-class:: sphx-glr-script-out .. code-block:: none other YoM_H first muavia 40 second muavia 50 third yazid 61 fourth walid 95 fifth hisham 114 sixth mansur 148 seventh harun 183 eigth mamun 203 ninth Mutasim 220 tenth Mutasim 254 eleventh mutaz 260 .. GENERATED FROM PYTHON SOURCE LINES 57-59 when slice notation ``:`` is used, then selection happens either by row labels or by integer location .. GENERATED FROM PYTHON SOURCE LINES 61-62 Select rows starting from index of `second` till `tenth` .. GENERATED FROM PYTHON SOURCE LINES 62-64 .. code-block:: default print(df['second':'tenth']) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ second Hasan 47 muavia MAD 10 3 625 50 670 third Husain 57 yazid MAD 11 4 626 61 680 fourth Ali 57 walid MAD 34 38 659 95 712 fifth Muhammad 57 hisham MAD 19 56 676 114 733 sixth Jafar 65 mansur MAD 32 83 702 148 765 seventh Musa 55 harun MAD 35 128 745 183 799 eigth Raza 55 mamun MAD 20 148 766 203 818 ninth Taqi 24 Mutasim MAD 16 195 811 220 835 tenth Naqi 40 Mutasim MAD 34 212 828 254 868 dynasty cityd second umayad MAD third umayad KBL fourth umayad MAD fifth umayad MAD sixth abbasid MAD seventh abbasid BGD eigth abbasid MAS ninth abbasid BGH tenth abbasid SAM .. GENERATED FROM PYTHON SOURCE LINES 65-66 Select every second row starting from 3rd till 7th .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: default print(df[2:6:2]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ third Husain 57 yazid MAD 11 4 626 61 680 fifth Muhammad 57 hisham MAD 19 56 676 114 733 dynasty cityd third umayad KBL fifth umayad MAD .. GENERATED FROM PYTHON SOURCE LINES 69-74 However, there are more specific methods for indexing and slicing a dataframe. These are ``loc``, ``iloc``, ``at`` and ``iat``. ``at`` and ``iat`` are meant to access a scalar, i.e, a single element in the dataframe, while ``loc`` and ``iloc`` are used to access several elements at the same time, potentially to perform vectorized operations .. GENERATED FROM PYTHON SOURCE LINES 76-82 ``loc`` -------- - only work on index - label based .. GENERATED FROM PYTHON SOURCE LINES 84-86 It is used when we want to select rows or columns from a dataframe using the names of columns or the name of index. .. GENERATED FROM PYTHON SOURCE LINES 88-91 The index operator ``[]`` after ``.loc`` can have two values/identifiers separated by comma ",". The first identifier (before comma) tells which row/rows we want to select and second identifier tells, which columns we want to select. .. GENERATED FROM PYTHON SOURCE LINES 93-94 For example if we want to select a row whose index is "third", we can use ``loc``. .. GENERATED FROM PYTHON SOURCE LINES 94-97 .. code-block:: default print(df.loc['third']) .. rst-class:: sphx-glr-script-out .. code-block:: none name Husain age 57 other yazid cityb MAD duration 11 YoB_H 4 YOB_G 626 YoM_H 61 YoM_G 680 dynasty umayad cityd KBL Name: third, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 98-102 Above we did not specify the second identifier i.e. there is no comma. This is because the if we don't specify the columns, it will give all the columns. We can select multiple rows with .loc with a list of strings .. GENERATED FROM PYTHON SOURCE LINES 104-107 .. code-block:: default print(df.loc[['second', 'fourth', 'sixth']]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ second Hasan 47 muavia MAD 10 3 625 50 670 fourth Ali 57 walid MAD 34 38 659 95 712 sixth Jafar 65 mansur MAD 32 83 702 148 765 dynasty cityd second umayad MAD fourth umayad MAD sixth abbasid MAD .. GENERATED FROM PYTHON SOURCE LINES 108-109 Selecting multiple rows with .loc with slice notation ``:`` .. GENERATED FROM PYTHON SOURCE LINES 111-114 .. code-block:: default print(df.loc['second':'fifth']) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ second Hasan 47 muavia MAD 10 3 625 50 670 third Husain 57 yazid MAD 11 4 626 61 680 fourth Ali 57 walid MAD 34 38 659 95 712 fifth Muhammad 57 hisham MAD 19 56 676 114 733 dynasty cityd second umayad MAD third umayad KBL fourth umayad MAD fifth umayad MAD .. GENERATED FROM PYTHON SOURCE LINES 115-118 In following code, we simultaneously select rows and columns by their labels. Before comman, we tell which rows we want and after comma we tell which columns we want. .. GENERATED FROM PYTHON SOURCE LINES 120-122 .. code-block:: default print(df.loc[['fifth', 'sixth'], 'other':]) .. rst-class:: sphx-glr-script-out .. code-block:: none other cityb duration YoB_H YOB_G YoM_H YoM_G dynasty cityd fifth hisham MAD 19 56 676 114 733 umayad MAD sixth mansur MAD 32 83 702 148 765 abbasid MAD .. GENERATED FROM PYTHON SOURCE LINES 123-124 If we want to select all the rows, we can use colon i.e. ``:``. .. GENERATED FROM PYTHON SOURCE LINES 126-129 .. code-block:: default print(df.loc[:, 'name':'cityd':2]) .. rst-class:: sphx-glr-script-out .. code-block:: none name other duration YOB_G YoM_G cityd first Ali muavia 29 600 661 NJF second Hasan muavia 10 625 670 MAD third Husain yazid 11 626 680 KBL fourth Ali walid 34 659 712 MAD fifth Muhammad hisham 19 676 733 MAD sixth Jafar mansur 32 702 765 MAD seventh Musa harun 35 745 799 BGD eigth Raza mamun 20 766 818 MAS ninth Taqi Mutasim 16 811 835 BGH tenth Naqi Mutasim 34 828 868 SAM eleventh Askari mutaz 6 844 874 SAM .. GENERATED FROM PYTHON SOURCE LINES 130-135 Above we wanted to select all the rows (indicated by ``:``) and every second columns starting from `name` to `cityd`. We can also select rows/columns with conditions. For example if we want rows where `age` is above 50, we can do as below .. GENERATED FROM PYTHON SOURCE LINES 137-139 .. code-block:: default print(df.loc[df['age']>50]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ first Ali 63 muavia MKH 29 -22 600 40 661 third Husain 57 yazid MAD 11 4 626 61 680 fourth Ali 57 walid MAD 34 38 659 95 712 fifth Muhammad 57 hisham MAD 19 56 676 114 733 sixth Jafar 65 mansur MAD 32 83 702 148 765 seventh Musa 55 harun MAD 35 128 745 183 799 eigth Raza 55 mamun MAD 20 148 766 203 818 dynasty cityd first None NJF third umayad KBL fourth umayad MAD fifth umayad MAD sixth abbasid MAD seventh abbasid BGD eigth abbasid MAS .. GENERATED FROM PYTHON SOURCE LINES 140-145 In above code, ``df['age']>50``, is the condition. The output of ``df['age']>50`` is a boolean array. Thus when we pass a boolean array to ``loc``, it returns us rows based upon the specified condition. We can have multiple conditions as well .. GENERATED FROM PYTHON SOURCE LINES 145-148 .. code-block:: default print(df.loc[(df['age']>50) & (df['duration']>30)]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ fourth Ali 57 walid MAD 34 38 659 95 712 sixth Jafar 65 mansur MAD 32 83 702 148 765 seventh Musa 55 harun MAD 35 128 745 183 799 dynasty cityd fourth umayad MAD sixth abbasid MAD seventh abbasid BGD .. GENERATED FROM PYTHON SOURCE LINES 149-153 We can not do above conditioning with strings. What if we want all rows where `other` is either muavia or yazid. In such a case we can provide all the values as a list inside the ``isin`` method. .. GENERATED FROM PYTHON SOURCE LINES 153-156 .. code-block:: default print(df.loc[df['other'].isin(['muavia', 'yazid'])]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ first Ali 63 muavia MKH 29 -22 600 40 661 second Hasan 47 muavia MAD 10 3 625 50 670 third Husain 57 yazid MAD 11 4 626 61 680 dynasty cityd first None NJF second umayad MAD third umayad KBL .. GENERATED FROM PYTHON SOURCE LINES 157-158 We can even combine boolean indexing/condition with label based indexing. .. GENERATED FROM PYTHON SOURCE LINES 158-161 .. code-block:: default print(df.loc[df['age'] > 30, ['other', 'YoM_H']]) .. rst-class:: sphx-glr-script-out .. code-block:: none other YoM_H first muavia 40 second muavia 50 third yazid 61 fourth walid 95 fifth hisham 114 sixth mansur 148 seventh harun 183 eigth mamun 203 tenth Mutasim 254 .. GENERATED FROM PYTHON SOURCE LINES 162-165 **Question**: Write the names of the people who were in `umayad` dynasty using loc? .. GENERATED FROM PYTHON SOURCE LINES 167-172 ``iloc`` --------- - integer location based - work on position .. GENERATED FROM PYTHON SOURCE LINES 175-178 ``iloc`` is used to select rows and columns from dataframe by their location/index/position value. If we don't know the actual names of columns and just want the columns by their locations/position, we can use iloc. For example if we want the last row from dataframe, we can do as below .. GENERATED FROM PYTHON SOURCE LINES 180-182 .. code-block:: default print(df.iloc[-1]) .. rst-class:: sphx-glr-script-out .. code-block:: none name Askari age 27 other mutaz cityb MAD duration 6 YoB_H 232 YOB_G 844 YoM_H 260 YoM_G 874 dynasty abbasid cityd SAM Name: eleventh, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 183-184 If we want the last column, we can do as below .. GENERATED FROM PYTHON SOURCE LINES 184-186 .. code-block:: default print(df.iloc[:, -1]) .. rst-class:: sphx-glr-script-out .. code-block:: none first NJF second MAD third KBL fourth MAD fifth MAD sixth MAD seventh BGD eigth MAS ninth BGH tenth SAM eleventh SAM Name: cityd, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 187-190 The ``:`` above tells that we want all rows. If we want 5th row, we can dow as below .. GENERATED FROM PYTHON SOURCE LINES 190-192 .. code-block:: default print(df.iloc[4]) .. rst-class:: sphx-glr-script-out .. code-block:: none name Muhammad age 57 other hisham cityb MAD duration 19 YoB_H 56 YOB_G 676 YoM_H 114 YoM_G 733 dynasty umayad cityd MAD Name: fifth, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 193-194 If we want to select multiple rows, we need to pass a list. .. GENERATED FROM PYTHON SOURCE LINES 196-197 Select thrid and second last row .. GENERATED FROM PYTHON SOURCE LINES 197-199 .. code-block:: default print(df.iloc[[2, -2]]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ third Husain 57 yazid MAD 11 4 626 61 680 tenth Naqi 40 Mutasim MAD 34 212 828 254 868 dynasty cityd third umayad KBL tenth abbasid SAM .. GENERATED FROM PYTHON SOURCE LINES 200-201 Selecting multiple rows with .iloc with slice notation .. GENERATED FROM PYTHON SOURCE LINES 203-204 Select every second row starting from first till 8th .. GENERATED FROM PYTHON SOURCE LINES 204-206 .. code-block:: default print(df.iloc[:7:2]) .. rst-class:: sphx-glr-script-out .. code-block:: none name age other cityb duration YoB_H YOB_G YoM_H YoM_G \ first Ali 63 muavia MKH 29 -22 600 40 661 third Husain 57 yazid MAD 11 4 626 61 680 fifth Muhammad 57 hisham MAD 19 56 676 114 733 seventh Musa 55 harun MAD 35 128 745 183 799 dynasty cityd first None NJF third umayad KBL fifth umayad MAD seventh abbasid BGD .. GENERATED FROM PYTHON SOURCE LINES 207-208 Simultaneous selection of rows and columns .. GENERATED FROM PYTHON SOURCE LINES 210-211 Select second and fifth row and third column .. GENERATED FROM PYTHON SOURCE LINES 211-213 .. code-block:: default print(df.iloc[[1,4], 2]) .. rst-class:: sphx-glr-script-out .. code-block:: none second muavia fifth hisham Name: other, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 214-215 As we did with ``loc``, we can also use a boolean array for selection to ``iloc``. .. GENERATED FROM PYTHON SOURCE LINES 217-218 Select 3rd and fifth column but where age is greater than 30 .. GENERATED FROM PYTHON SOURCE LINES 218-220 .. code-block:: default print(df.iloc[(df['age'] > 30).values, [2, 4]]) .. rst-class:: sphx-glr-script-out .. code-block:: none other duration first muavia 29 second muavia 10 third yazid 11 fourth walid 34 fifth hisham 19 sixth mansur 32 seventh harun 35 eigth mamun 20 tenth Mutasim 34 .. GENERATED FROM PYTHON SOURCE LINES 221-223 **Question**: Select first and last rows and from first and last columns using ``iloc`` .. GENERATED FROM PYTHON SOURCE LINES 225-230 ``at`` ------ Selection with .at is nearly identical to .loc but it only selects a single 'cell' in your DataFrame. We usually refer to this cell as a scalar value. To use .at, pass it both a row and column label separated by a comma. .. GENERATED FROM PYTHON SOURCE LINES 233-236 .. code-block:: default print(df.at['sixth', 'duration']) .. rst-class:: sphx-glr-script-out .. code-block:: none 32 .. GENERATED FROM PYTHON SOURCE LINES 237-241 ``iat`` ------- Selection with `iat` is nearly identical to `iloc` but it only selects a single scalar value. You must pass it an integer for both the row and column locations .. GENERATED FROM PYTHON SOURCE LINES 243-246 .. code-block:: default print(df.iat[2, 5]) .. rst-class:: sphx-glr-script-out .. code-block:: none 4 .. GENERATED FROM PYTHON SOURCE LINES 247-249 **Question**: Calculate the average age of people in `umayad` and `abbasid` dynasties using ``loc`` and ``iloc``? .. GENERATED FROM PYTHON SOURCE LINES 252-256 **Question**: Calculate the `years in office` by subtracting `YoM_H` from `YoM_H` of the above row and find out which one had the longest and shortest stay in office? Since there is no row above the first row, consider 11 as the `YoM_H` of the row above. .. GENERATED FROM PYTHON SOURCE LINES 258-260 **Question**: Who lived the longest and shortest? .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.057 seconds) .. _sphx_glr_download_auto_examples_pandas_indexing_vs_slicing.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/AtrCheema/python-seekho/master?urlpath=lab/tree/notebooks/auto_examples/pandas/indexing_vs_slicing.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: indexing_vs_slicing.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: indexing_vs_slicing.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_