.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/pandas/dataframe_vs_series.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_pandas_dataframe_vs_series.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_pandas_dataframe_vs_series.py:


==================
5.1 introduction
==================

.. GENERATED FROM PYTHON SOURCE LINES 7-14

.. code-block:: default

    import time
    import numpy as np
    import pandas as pd

    print(time.asctime())
    print(pd.__version__, np.__version__)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Mon Nov 11 19:32:24 2024
    1.5.3 1.26.4


.. GENERATED FROM PYTHON SOURCE LINES 15-27

Suppose we have an array [0.4, 0.3, 0.5, 0.2, 0.6, 0.3]. Let's say
the values in this array represent concentrations in water measured
every hour from 13 pm to 19 pm. However, with just an array, we don't
have the ability to encode this information. If we want to add the (temporal) reference of each value
we have to add it ourself for example by saving that in a separate array.
Pandas comes with this in-built ability that we can add reference or labels to arrays.
Every array in pandas has two kinds of references. The reference for the rows which
is called ``index`` and the reference for the columns which is called ``columns``.
Therefore we can call pandas a library which have referenced/labelled arrays.

The core data structure in pandas is ``DataFrame`` which consists of one or more
columns. A single column in a DataFrame is a ``Series``.

.. GENERATED FROM PYTHON SOURCE LINES 27-31

.. code-block:: default


    df = pd.DataFrame(np.random.random((10, 3)))
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0  0.819711  0.469719  0.047138
    1  0.995136  0.799338  0.014367
    2  0.220825  0.169214  0.077614
    3  0.472161  0.297704  0.213042
    4  0.096419  0.898009  0.074485
    5  0.210453  0.086511  0.951945
    6  0.461789  0.427172  0.143734
    7  0.906187  0.200208  0.497694
    8  0.477416  0.382160  0.434236
    9  0.428262  0.910327  0.273702


.. GENERATED FROM PYTHON SOURCE LINES 32-34

The data in columns is stored as numpy arrays. Therefore, a DataFrames and Series
have a lot of characteristics similar to that of numpy arrays.

.. GENERATED FROM PYTHON SOURCE LINES 34-36

.. code-block:: default

    print(df.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (10, 3)


.. GENERATED FROM PYTHON SOURCE LINES 37-39

By default the columns names are just integers starting from 0, however
we can define the column names ourselves as well.

.. GENERATED FROM PYTHON SOURCE LINES 39-43

.. code-block:: default


    df = pd.DataFrame(np.random.random((10, 3)), columns=['a', 'b', 'c'])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              a         b         c
    0  0.643263  0.594007  0.442692
    1  0.286817  0.179552  0.060201
    2  0.463696  0.816541  0.600126
    3  0.150433  0.366931  0.906603
    4  0.288914  0.052968  0.615138
    5  0.158648  0.694414  0.973317
    6  0.744459  0.436719  0.882664
    7  0.175492  0.733806  0.173178
    8  0.051456  0.641243  0.885918
    9  0.474923  0.216120  0.883705


.. GENERATED FROM PYTHON SOURCE LINES 44-47

.. code-block:: default


    print(df.columns)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Index(['a', 'b', 'c'], dtype='object')


.. GENERATED FROM PYTHON SOURCE LINES 48-49

The columns are list like structures. However they are not exactly lists.

.. GENERATED FROM PYTHON SOURCE LINES 49-51

.. code-block:: default

    print(type(df.columns))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.indexes.base.Index'>


.. GENERATED FROM PYTHON SOURCE LINES 52-53

We can however, convert the columns to list though.

.. GENERATED FROM PYTHON SOURCE LINES 53-55

.. code-block:: default

    df.columns.to_list()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    ['a', 'b', 'c']


.. GENERATED FROM PYTHON SOURCE LINES 56-59

.. code-block:: default


    print(type(df.columns.to_list()))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'list'>


.. GENERATED FROM PYTHON SOURCE LINES 60-61

The default label for the rows i.e. ``index`` consists of numbers starting from 0.

.. GENERATED FROM PYTHON SOURCE LINES 61-63

.. code-block:: default

    print(df.index)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    RangeIndex(start=0, stop=10, step=1)


.. GENERATED FROM PYTHON SOURCE LINES 64-65

However, we can set ``index`` of our choice as well.

.. GENERATED FROM PYTHON SOURCE LINES 65-71

.. code-block:: default


    df = pd.DataFrame(np.random.random((10, 3)),
                      columns=['a', 'b', 'c'],
                     index=[2000+i for i in range(10)])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                 a         b         c
    2000  0.978551  0.820233  0.227459
    2001  0.317172  0.157027  0.695845
    2002  0.767413  0.547235  0.369868
    2003  0.425290  0.032933  0.923786
    2004  0.251980  0.580116  0.266811
    2005  0.947292  0.581080  0.211948
    2006  0.868994  0.163130  0.771538
    2007  0.627722  0.624592  0.296365
    2008  0.337227  0.355143  0.516186
    2009  0.441642  0.761122  0.799617


.. GENERATED FROM PYTHON SOURCE LINES 72-75

.. code-block:: default


    print(df.index)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Int64Index([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009], dtype='int64')


.. GENERATED FROM PYTHON SOURCE LINES 76-77

The default name of ``index`` is ``None``.

.. GENERATED FROM PYTHON SOURCE LINES 77-79

.. code-block:: default

    print(df.index.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    None


.. GENERATED FROM PYTHON SOURCE LINES 80-81

However, we can set the name of index as well.

.. GENERATED FROM PYTHON SOURCE LINES 81-84

.. code-block:: default

    df.index.name = 'years'
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                  a         b         c
    years                              
    2000   0.978551  0.820233  0.227459
    2001   0.317172  0.157027  0.695845
    2002   0.767413  0.547235  0.369868
    2003   0.425290  0.032933  0.923786
    2004   0.251980  0.580116  0.266811
    2005   0.947292  0.581080  0.211948
    2006   0.868994  0.163130  0.771538
    2007   0.627722  0.624592  0.296365
    2008   0.337227  0.355143  0.516186
    2009   0.441642  0.761122  0.799617


.. GENERATED FROM PYTHON SOURCE LINES 85-88

.. code-block:: default


    print(df.index.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    years


.. GENERATED FROM PYTHON SOURCE LINES 89-92

.. code-block:: default


    print(type(df))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.frame.DataFrame'>


.. GENERATED FROM PYTHON SOURCE LINES 93-99

.. code-block:: default


    df = pd.DataFrame(np.random.randint(0, 10, (10, 1)),
                      columns=['a'],
                     index=[2000+i for i in range(10)])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

          a
    2000  1
    2001  6
    2002  8
    2003  5
    2004  0
    2005  6
    2006  2
    2007  8
    2008  2
    2009  3


.. GENERATED FROM PYTHON SOURCE LINES 100-103

.. code-block:: default


    print(type(df))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.frame.DataFrame'>


.. GENERATED FROM PYTHON SOURCE LINES 104-107

.. code-block:: default


    print(df.columns)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Index(['a'], dtype='object')


.. GENERATED FROM PYTHON SOURCE LINES 108-111

Series
=========
A Series consists of a single column. It can be constructed using ``pd.Series``.

.. GENERATED FROM PYTHON SOURCE LINES 111-115

.. code-block:: default


    s = pd.Series(np.random.random(10))
    print(s)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    0    0.240092
    1    0.316543
    2    0.336670
    3    0.256285
    4    0.662836
    5    0.947630
    6    0.188108
    7    0.525331
    8    0.612323
    9    0.212473
    dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 116-119

.. code-block:: default


    print(type(s))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.series.Series'>


.. GENERATED FROM PYTHON SOURCE LINES 120-123

.. code-block:: default


    print(s.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (10,)


.. GENERATED FROM PYTHON SOURCE LINES 124-127

.. code-block:: default


    print(s.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    None


.. GENERATED FROM PYTHON SOURCE LINES 128-133

.. code-block:: default


    s = pd.Series(np.random.random(10),
                  name="a")
    print(s)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    0    0.342054
    1    0.232343
    2    0.547047
    3    0.942240
    4    0.766106
    5    0.313809
    6    0.783459
    7    0.325427
    8    0.380389
    9    0.418716
    Name: a, dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 134-137

.. code-block:: default


    print(s.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    a


.. GENERATED FROM PYTHON SOURCE LINES 138-139

the Series is literally the data structure for a single column of a DataFrame.

.. GENERATED FROM PYTHON SOURCE LINES 141-147

.. code-block:: default


    df = pd.DataFrame(np.random.random((10, 3)),
                      columns=['a', 'b', 'c'],
                     index=[2000+i for i in range(10)])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

                 a         b         c
    2000  0.611481  0.646587  0.884365
    2001  0.884323  0.265456  0.499323
    2002  0.035322  0.413177  0.404945
    2003  0.138943  0.159374  0.096236
    2004  0.007718  0.173929  0.156492
    2005  0.244785  0.159650  0.829702
    2006  0.226925  0.632816  0.999603
    2007  0.251388  0.524466  0.769429
    2008  0.024436  0.454759  0.473400
    2009  0.611078  0.184098  0.176071


.. GENERATED FROM PYTHON SOURCE LINES 148-149

A single column in a DataFrame is a Series.

.. GENERATED FROM PYTHON SOURCE LINES 149-152

.. code-block:: default


    print(type(df['a']))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'pandas.core.series.Series'>


.. GENERATED FROM PYTHON SOURCE LINES 153-159

.. code-block:: default


    s = pd.Series(np.random.random(10),
                  index=[2000+i for i in range(10)],
                  name="a")
    print(s)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2000    0.520191
    2001    0.355112
    2002    0.645158
    2003    0.384966
    2004    0.996232
    2005    0.068262
    2006    0.720012
    2007    0.611447
    2008    0.931790
    2009    0.001561
    Name: a, dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 160-162

Since pandas is based upon numpy arrays. We can extract actual numpy
arrays from DataFrame using `.values` method.

.. GENERATED FROM PYTHON SOURCE LINES 162-164

.. code-block:: default

    print(df.values)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[0.61148101 0.64658729 0.8843646 ]
     [0.88432322 0.26545644 0.499323  ]
     [0.03532233 0.41317685 0.40494501]
     [0.13894266 0.15937354 0.0962357 ]
     [0.00771782 0.17392933 0.15649219]
     [0.24478489 0.15965015 0.82970221]
     [0.22692541 0.63281557 0.99960347]
     [0.25138839 0.52446576 0.76942873]
     [0.0244364  0.4547589  0.47339962]
     [0.61107766 0.1840978  0.17607135]]


.. GENERATED FROM PYTHON SOURCE LINES 165-168

.. code-block:: default


    print(type(df.values))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'numpy.ndarray'>


.. GENERATED FROM PYTHON SOURCE LINES 169-175

.. code-block:: default


    df = pd.DataFrame(np.random.randint(0, 14, (10, 3)),
                      columns=['a', 'b', 'c'],
                     index=[2000+i for i in range(10)])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           a   b   c
    2000  12  10   1
    2001   6   5   2
    2002   2   0   4
    2003  12   1  12
    2004  11   1   0
    2005   1   9  13
    2006   6   3  11
    2007   9   0  11
    2008   7   7   0
    2009   5   8   2


.. GENERATED FROM PYTHON SOURCE LINES 176-179

.. code-block:: default


    print(type(df.values))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'numpy.ndarray'>


.. GENERATED FROM PYTHON SOURCE LINES 180-183

.. code-block:: default


    print(df.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (10, 3)


.. GENERATED FROM PYTHON SOURCE LINES 184-187

.. code-block:: default


    df.describe()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>count</th>
          <td>10.000000</td>
          <td>10.000000</td>
          <td>10.000000</td>
        </tr>
        <tr>
          <th>mean</th>
          <td>7.100000</td>
          <td>4.400000</td>
          <td>5.600000</td>
        </tr>
        <tr>
          <th>std</th>
          <td>3.900142</td>
          <td>3.893014</td>
          <td>5.440588</td>
        </tr>
        <tr>
          <th>min</th>
          <td>1.000000</td>
          <td>0.000000</td>
          <td>0.000000</td>
        </tr>
        <tr>
          <th>25%</th>
          <td>5.250000</td>
          <td>1.000000</td>
          <td>1.250000</td>
        </tr>
        <tr>
          <th>50%</th>
          <td>6.500000</td>
          <td>4.000000</td>
          <td>3.000000</td>
        </tr>
        <tr>
          <th>75%</th>
          <td>10.500000</td>
          <td>7.750000</td>
          <td>11.000000</td>
        </tr>
        <tr>
          <th>max</th>
          <td>12.000000</td>
          <td>10.000000</td>
          <td>13.000000</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 188-191

.. code-block:: default


    df.head()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2000</th>
          <td>12</td>
          <td>10</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2001</th>
          <td>6</td>
          <td>5</td>
          <td>2</td>
        </tr>
        <tr>
          <th>2002</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
        </tr>
        <tr>
          <th>2003</th>
          <td>12</td>
          <td>1</td>
          <td>12</td>
        </tr>
        <tr>
          <th>2004</th>
          <td>11</td>
          <td>1</td>
          <td>0</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 192-195

.. code-block:: default


    df.head(8)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2000</th>
          <td>12</td>
          <td>10</td>
          <td>1</td>
        </tr>
        <tr>
          <th>2001</th>
          <td>6</td>
          <td>5</td>
          <td>2</td>
        </tr>
        <tr>
          <th>2002</th>
          <td>2</td>
          <td>0</td>
          <td>4</td>
        </tr>
        <tr>
          <th>2003</th>
          <td>12</td>
          <td>1</td>
          <td>12</td>
        </tr>
        <tr>
          <th>2004</th>
          <td>11</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2005</th>
          <td>1</td>
          <td>9</td>
          <td>13</td>
        </tr>
        <tr>
          <th>2006</th>
          <td>6</td>
          <td>3</td>
          <td>11</td>
        </tr>
        <tr>
          <th>2007</th>
          <td>9</td>
          <td>0</td>
          <td>11</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 196-197

Get the last N rows of a DataFrame

.. GENERATED FROM PYTHON SOURCE LINES 199-202

.. code-block:: default


    df.tail()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2005</th>
          <td>1</td>
          <td>9</td>
          <td>13</td>
        </tr>
        <tr>
          <th>2006</th>
          <td>6</td>
          <td>3</td>
          <td>11</td>
        </tr>
        <tr>
          <th>2007</th>
          <td>9</td>
          <td>0</td>
          <td>11</td>
        </tr>
        <tr>
          <th>2008</th>
          <td>7</td>
          <td>7</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2009</th>
          <td>5</td>
          <td>8</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 203-206

.. code-block:: default


    df.tail(7)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2003</th>
          <td>12</td>
          <td>1</td>
          <td>12</td>
        </tr>
        <tr>
          <th>2004</th>
          <td>11</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2005</th>
          <td>1</td>
          <td>9</td>
          <td>13</td>
        </tr>
        <tr>
          <th>2006</th>
          <td>6</td>
          <td>3</td>
          <td>11</td>
        </tr>
        <tr>
          <th>2007</th>
          <td>9</td>
          <td>0</td>
          <td>11</td>
        </tr>
        <tr>
          <th>2008</th>
          <td>7</td>
          <td>7</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2009</th>
          <td>5</td>
          <td>8</td>
          <td>2</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 207-211

.. code-block:: default


    df.mean()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    a    7.1
    b    4.4
    c    5.6
    dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 212-215

.. code-block:: default


    df.to_dict()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'a': {2000: 12, 2001: 6, 2002: 2, 2003: 12, 2004: 11, 2005: 1, 2006: 6, 2007: 9, 2008: 7, 2009: 5}, 'b': {2000: 10, 2001: 5, 2002: 0, 2003: 1, 2004: 1, 2005: 9, 2006: 3, 2007: 0, 2008: 7, 2009: 8}, 'c': {2000: 1, 2001: 2, 2002: 4, 2003: 12, 2004: 0, 2005: 13, 2006: 11, 2007: 11, 2008: 0, 2009: 2}}


.. GENERATED FROM PYTHON SOURCE LINES 216-219

.. code-block:: default


    df.to_dict('list')


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'a': [12, 6, 2, 12, 11, 1, 6, 9, 7, 5], 'b': [10, 5, 0, 1, 1, 9, 3, 0, 7, 8], 'c': [1, 2, 4, 12, 0, 13, 11, 11, 0, 2]}


.. GENERATED FROM PYTHON SOURCE LINES 220-224

.. code-block:: default


    df['d'] = np.random.randint(0, 10, (10,))
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           a   b   c  d
    2000  12  10   1  4
    2001   6   5   2  4
    2002   2   0   4  6
    2003  12   1  12  9
    2004  11   1   0  5
    2005   1   9  13  5
    2006   6   3  11  5
    2007   9   0  11  5
    2008   7   7   0  0
    2009   5   8   2  8


.. GENERATED FROM PYTHON SOURCE LINES 225-229

.. code-block:: default


    df.pop('d')
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           a   b   c
    2000  12  10   1
    2001   6   5   2
    2002   2   0   4
    2003  12   1  12
    2004  11   1   0
    2005   1   9  13
    2006   6   3  11
    2007   9   0  11
    2008   7   7   0
    2009   5   8   2


.. GENERATED FROM PYTHON SOURCE LINES 230-234

.. code-block:: default


    df.columns = ['x', 'y', 'z']
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           x   y   z
    2000  12  10   1
    2001   6   5   2
    2002   2   0   4
    2003  12   1  12
    2004  11   1   0
    2005   1   9  13
    2006   6   3  11
    2007   9   0  11
    2008   7   7   0
    2009   5   8   2


.. GENERATED FROM PYTHON SOURCE LINES 235-236

row count of pandas dataframe

.. GENERATED FROM PYTHON SOURCE LINES 238-241

.. code-block:: default


    len(df.index)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    10


.. GENERATED FROM PYTHON SOURCE LINES 242-245

.. code-block:: default


    print(df.shape[0])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    10


.. GENERATED FROM PYTHON SOURCE LINES 246-247

change the order of DataFrame columns

.. GENERATED FROM PYTHON SOURCE LINES 249-255

.. code-block:: default


    cols = df.columns.tolist()
    cols = cols[-1:] + cols[:-1]
    df = df[cols]
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

           z   x   y
    2000   1  12  10
    2001   2   6   5
    2002   4   2   0
    2003  12  12   1
    2004   0  11   1
    2005  13   1   9
    2006  11   6   3
    2007  11   9   0
    2008   0   7   7
    2009   2   5   8


.. GENERATED FROM PYTHON SOURCE LINES 256-257

drop rows of Pandas DataFrame whose value in a certain column is NaN

.. GENERATED FROM PYTHON SOURCE LINES 259-263

.. code-block:: default


    df = pd.DataFrame(np.random.randn(6,3))
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0  0.792738 -2.566105  0.434209
    1  1.059024 -0.031717  0.358016
    2  0.301728  0.249756 -0.682732
    3 -0.944110  1.394479  1.437554
    4  1.052420  1.079581  0.149270
    5 -1.030245 -0.500275 -2.613331


.. GENERATED FROM PYTHON SOURCE LINES 264-268

.. code-block:: default


    df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0       NaN -2.566105       NaN
    1  1.059024 -0.031717  0.358016
    2       NaN  0.249756 -0.682732
    3 -0.944110  1.394479       NaN
    4       NaN  1.079581       NaN
    5 -1.030245 -0.500275 -2.613331


.. GENERATED FROM PYTHON SOURCE LINES 269-270

dropping all rows having NaN values

.. GENERATED FROM PYTHON SOURCE LINES 270-272

.. code-block:: default

    df.dropna()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
          <th>2</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>1</th>
          <td>1.059024</td>
          <td>-0.031717</td>
          <td>0.358016</td>
        </tr>
        <tr>
          <th>5</th>
          <td>-1.030245</td>
          <td>-0.500275</td>
          <td>-2.613331</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 273-274

dropping NaN in specific columns

.. GENERATED FROM PYTHON SOURCE LINES 274-276

.. code-block:: default

    print(df[df[2].notna()])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    1  1.059024 -0.031717  0.358016
    2       NaN  0.249756 -0.682732
    5 -1.030245 -0.500275 -2.613331


.. GENERATED FROM PYTHON SOURCE LINES 277-278

count the NaN values in a column in DataFrame

.. GENERATED FROM PYTHON SOURCE LINES 280-285

.. code-block:: default


    df = pd.DataFrame(np.random.randn(6,3))
    df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0       NaN  0.139581       NaN
    1 -1.509370  1.762801 -1.067253
    2       NaN  0.095276 -0.777572
    3  0.492278  0.061235       NaN
    4       NaN -0.038476       NaN
    5 -0.067574 -0.885610 -0.721969


.. GENERATED FROM PYTHON SOURCE LINES 286-289

.. code-block:: default


    df.isna().sum()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0    3
    1    0
    2    3
    dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 290-291

for columns

.. GENERATED FROM PYTHON SOURCE LINES 291-293

.. code-block:: default

    df.isnull().sum(axis = 0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0    3
    1    0
    2    3
    dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 294-295

for rows

.. GENERATED FROM PYTHON SOURCE LINES 295-297

.. code-block:: default

    df.isnull().sum(axis = 1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0    2
    1    0
    2    1
    3    1
    4    2
    5    0
    dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 298-299

check if any value is NaN in a DataFrame

.. GENERATED FROM PYTHON SOURCE LINES 301-306

.. code-block:: default


    df = pd.DataFrame(np.random.randn(6,3))
    df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0       NaN  0.163625       NaN
    1 -1.087252  2.721640 -0.415199
    2       NaN  1.432081 -0.653284
    3  1.336880  0.197294       NaN
    4       NaN  0.015589       NaN
    5  1.388623  0.203641  1.691277


.. GENERATED FROM PYTHON SOURCE LINES 307-308

how many NaN

.. GENERATED FROM PYTHON SOURCE LINES 308-310

.. code-block:: default

    df.isnull()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
          <th>2</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>True</td>
          <td>False</td>
          <td>True</td>
        </tr>
        <tr>
          <th>1</th>
          <td>False</td>
          <td>False</td>
          <td>False</td>
        </tr>
        <tr>
          <th>2</th>
          <td>True</td>
          <td>False</td>
          <td>False</td>
        </tr>
        <tr>
          <th>3</th>
          <td>False</td>
          <td>False</td>
          <td>True</td>
        </tr>
        <tr>
          <th>4</th>
          <td>True</td>
          <td>False</td>
          <td>True</td>
        </tr>
        <tr>
          <th>5</th>
          <td>False</td>
          <td>False</td>
          <td>False</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 311-312

column wise

.. GENERATED FROM PYTHON SOURCE LINES 312-314

.. code-block:: default

    df.isnull().any()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0     True
    1    False
    2     True
    dtype: bool


.. GENERATED FROM PYTHON SOURCE LINES 315-316

if there is any NaN in entire data

.. GENERATED FROM PYTHON SOURCE LINES 316-318

.. code-block:: default

    df.isnull().any().any()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    True


.. GENERATED FROM PYTHON SOURCE LINES 319-320

replace NaN values by Zeroes in a column of a Dataframe?

.. GENERATED FROM PYTHON SOURCE LINES 322-327

.. code-block:: default


    df = pd.DataFrame(np.random.randn(6,3))
    df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0       NaN -0.168681       NaN
    1  0.563927  0.017890 -1.375824
    2       NaN -0.810597 -1.174800
    3 -0.722840  0.883346       NaN
    4       NaN -0.310959       NaN
    5  0.940451 -1.261573 -1.127101


.. GENERATED FROM PYTHON SOURCE LINES 328-331

.. code-block:: default


    df.fillna(0)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
          <th>2</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.000000</td>
          <td>-0.168681</td>
          <td>0.000000</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.563927</td>
          <td>0.017890</td>
          <td>-1.375824</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.000000</td>
          <td>-0.810597</td>
          <td>-1.174800</td>
        </tr>
        <tr>
          <th>3</th>
          <td>-0.722840</td>
          <td>0.883346</td>
          <td>0.000000</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0.000000</td>
          <td>-0.310959</td>
          <td>0.000000</td>
        </tr>
        <tr>
          <th>5</th>
          <td>0.940451</td>
          <td>-1.261573</td>
          <td>-1.127101</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 332-333

To fill the NaNs in only one column

.. GENERATED FROM PYTHON SOURCE LINES 333-337

.. code-block:: default


    df[2].fillna(0, inplace=True)
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0       NaN -0.168681  0.000000
    1  0.563927  0.017890 -1.375824
    2       NaN -0.810597 -1.174800
    3 -0.722840  0.883346  0.000000
    4       NaN -0.310959  0.000000
    5  0.940451 -1.261573 -1.127101


.. GENERATED FROM PYTHON SOURCE LINES 338-339

check if a column exists in Pandas

.. GENERATED FROM PYTHON SOURCE LINES 341-345

.. code-block:: default


    df = pd.DataFrame(np.random.randn(6,3))
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

              0         1         2
    0  0.775496  0.028101  0.380929
    1 -0.932216  0.469528 -0.663859
    2 -0.616558 -1.267830  0.580904
    3 -1.582028 -0.355916  0.460871
    4  0.672658 -1.117510  0.144625
    5  0.613005  0.732261 -0.276066


.. GENERATED FROM PYTHON SOURCE LINES 346-350

.. code-block:: default


    if 0 in df.columns:
         print("true")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    true


.. GENERATED FROM PYTHON SOURCE LINES 351-352

Python dict into a dataframe

.. GENERATED FROM PYTHON SOURCE LINES 354-371

.. code-block:: default


    d = {
        '2012-06-08': 388,
        '2012-06-09': 388,
        '2012-06-10': 388,
        '2012-06-11': 389,
        '2012-06-12': 389,
        '2012-06-13': 389,
        '2012-06-14': 389,
        '2012-06-15': 389,
        '2012-06-16': 389,
        '2012-06-17': 389,
        '2012-06-18': 390,
        '2012-06-19': 390,
        '2012-06-20': 390,
    }


.. GENERATED FROM PYTHON SOURCE LINES 372-375

.. code-block:: default


    pd.DataFrame(d.items())


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>0</th>
          <th>1</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2012-06-08</td>
          <td>388</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2012-06-09</td>
          <td>388</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2012-06-10</td>
          <td>388</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2012-06-11</td>
          <td>389</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2012-06-12</td>
          <td>389</td>
        </tr>
        <tr>
          <th>5</th>
          <td>2012-06-13</td>
          <td>389</td>
        </tr>
        <tr>
          <th>6</th>
          <td>2012-06-14</td>
          <td>389</td>
        </tr>
        <tr>
          <th>7</th>
          <td>2012-06-15</td>
          <td>389</td>
        </tr>
        <tr>
          <th>8</th>
          <td>2012-06-16</td>
          <td>389</td>
        </tr>
        <tr>
          <th>9</th>
          <td>2012-06-17</td>
          <td>389</td>
        </tr>
        <tr>
          <th>10</th>
          <td>2012-06-18</td>
          <td>390</td>
        </tr>
        <tr>
          <th>11</th>
          <td>2012-06-19</td>
          <td>390</td>
        </tr>
        <tr>
          <th>12</th>
          <td>2012-06-20</td>
          <td>390</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 376-379

.. code-block:: default


    pd.DataFrame(d.items(), columns=['Date', 'DateValue'])


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Date</th>
          <th>DateValue</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2012-06-08</td>
          <td>388</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2012-06-09</td>
          <td>388</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2012-06-10</td>
          <td>388</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2012-06-11</td>
          <td>389</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2012-06-12</td>
          <td>389</td>
        </tr>
        <tr>
          <th>5</th>
          <td>2012-06-13</td>
          <td>389</td>
        </tr>
        <tr>
          <th>6</th>
          <td>2012-06-14</td>
          <td>389</td>
        </tr>
        <tr>
          <th>7</th>
          <td>2012-06-15</td>
          <td>389</td>
        </tr>
        <tr>
          <th>8</th>
          <td>2012-06-16</td>
          <td>389</td>
        </tr>
        <tr>
          <th>9</th>
          <td>2012-06-17</td>
          <td>389</td>
        </tr>
        <tr>
          <th>10</th>
          <td>2012-06-18</td>
          <td>390</td>
        </tr>
        <tr>
          <th>11</th>
          <td>2012-06-19</td>
          <td>390</td>
        </tr>
        <tr>
          <th>12</th>
          <td>2012-06-20</td>
          <td>390</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 380-382

uncomment following line
pd.DataFrame(d) # ValueError: If using all scalar values, you must pass an index

.. GENERATED FROM PYTHON SOURCE LINES 385-387

.. code-block:: default

    pd.DataFrame([d])


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>2012-06-08</th>
          <th>2012-06-09</th>
          <th>2012-06-10</th>
          <th>2012-06-11</th>
          <th>2012-06-12</th>
          <th>2012-06-13</th>
          <th>2012-06-14</th>
          <th>2012-06-15</th>
          <th>2012-06-16</th>
          <th>2012-06-17</th>
          <th>2012-06-18</th>
          <th>2012-06-19</th>
          <th>2012-06-20</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>388</td>
          <td>388</td>
          <td>388</td>
          <td>389</td>
          <td>389</td>
          <td>389</td>
          <td>389</td>
          <td>389</td>
          <td>389</td>
          <td>389</td>
          <td>390</td>
          <td>390</td>
          <td>390</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 388-391

.. code-block:: default


    pd.DataFrame.from_dict(d, orient='index', columns=['DateVaue'])


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>DateVaue</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>2012-06-08</th>
          <td>388</td>
        </tr>
        <tr>
          <th>2012-06-09</th>
          <td>388</td>
        </tr>
        <tr>
          <th>2012-06-10</th>
          <td>388</td>
        </tr>
        <tr>
          <th>2012-06-11</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-12</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-13</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-14</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-15</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-16</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-17</th>
          <td>389</td>
        </tr>
        <tr>
          <th>2012-06-18</th>
          <td>390</td>
        </tr>
        <tr>
          <th>2012-06-19</th>
          <td>390</td>
        </tr>
        <tr>
          <th>2012-06-20</th>
          <td>390</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 392-393

Count the frequency that a value occurs in a dataframe column

.. GENERATED FROM PYTHON SOURCE LINES 395-401

.. code-block:: default


    df = pd.DataFrame(np.random.randint(0, 14, (10, 3)),
                      columns=['a', 'b', 'c'],
                     index=[2000+i for i in range(10)])
    df['a'].value_counts()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    6     2
    2     2
    12    2
    9     1
    11    1
    0     1
    5     1
    Name: a, dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 402-406

.. code-block:: default


    for index, row in df.iterrows():
        print(index, row, '\n')


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2000 a     6
    b    12
    c     5
    Name: 2000, dtype: int64 

    2001 a     2
    b     1
    c    10
    Name: 2001, dtype: int64 

    2002 a     6
    b     0
    c    12
    Name: 2002, dtype: int64 

    2003 a    9
    b    8
    c    7
    Name: 2003, dtype: int64 

    2004 a    11
    b    11
    c     4
    Name: 2004, dtype: int64 

    2005 a    0
    b    0
    c    7
    Name: 2005, dtype: int64 

    2006 a    12
    b     6
    c     8
    Name: 2006, dtype: int64 

    2007 a    12
    b     5
    c     1
    Name: 2007, dtype: int64 

    2008 a     2
    b     2
    c    13
    Name: 2008, dtype: int64 

    2009 a     5
    b     2
    c    13
    Name: 2009, dtype: int64 


.. GENERATED FROM PYTHON SOURCE LINES 407-412

.. code-block:: default


    df = pd.DataFrame(np.random.randint(0, 14, (10, 3)),
                      columns=['a', 'b', 'c'])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        a   b   c
    0   9  10  13
    1   7   7   3
    2   5   2  13
    3   8   8   7
    4   7   6   3
    5   3   1   5
    6   4  10   7
    7   0   8   5
    8  12  12  12
    9  13   4   9


.. GENERATED FROM PYTHON SOURCE LINES 413-416

.. code-block:: default


    print(df['a']/df['b'])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    0    0.900000
    1    1.000000
    2    2.500000
    3    1.000000
    4    1.166667
    5    3.000000
    6    0.400000
    7    0.000000
    8    1.000000
    9    3.250000
    dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 417-418

add an empty column to a dataframe?

.. GENERATED FROM PYTHON SOURCE LINES 420-424

.. code-block:: default


    df["d"] = ""
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        a   b   c d
    0   9  10  13  
    1   7   7   3  
    2   5   2  13  
    3   8   8   7  
    4   7   6   3  
    5   3   1   5  
    6   4  10   7  
    7   0   8   5  
    8  12  12  12  
    9  13   4   9  


.. GENERATED FROM PYTHON SOURCE LINES 425-428

.. code-block:: default


    print(df['d'])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    0    
    1    
    2    
    3    
    4    
    5    
    6    
    7    
    8    
    9    
    Name: d, dtype: object


.. GENERATED FROM PYTHON SOURCE LINES 429-433

.. code-block:: default


    df["d"] = np.nan
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        a   b   c   d
    0   9  10  13 NaN
    1   7   7   3 NaN
    2   5   2  13 NaN
    3   8   8   7 NaN
    4   7   6   3 NaN
    5   3   1   5 NaN
    6   4  10   7 NaN
    7   0   8   5 NaN
    8  12  12  12 NaN
    9  13   4   9 NaN


.. GENERATED FROM PYTHON SOURCE LINES 434-435

What does axis in pandas mean?

.. GENERATED FROM PYTHON SOURCE LINES 437-440

.. code-block:: default


    df.mean(axis=0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    a    6.8
    b    6.8
    c    7.7
    d    NaN
    dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 441-444

.. code-block:: default


    df.mean(axis=1)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    0    10.666667
    1     5.666667
    2     6.666667
    3     7.666667
    4     5.333333
    5     3.000000
    6     7.000000
    7     4.333333
    8    12.000000
    9     8.666667
    dtype: float64


.. GENERATED FROM PYTHON SOURCE LINES 445-446

Replace NaN with blank/empty string

.. GENERATED FROM PYTHON SOURCE LINES 448-451

.. code-block:: default


    df.replace(9, np.nan)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
          <th>d</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>NaN</td>
          <td>10</td>
          <td>13.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>1</th>
          <td>7.0</td>
          <td>7</td>
          <td>3.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>2</th>
          <td>5.0</td>
          <td>2</td>
          <td>13.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>3</th>
          <td>8.0</td>
          <td>8</td>
          <td>7.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>4</th>
          <td>7.0</td>
          <td>6</td>
          <td>3.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>5</th>
          <td>3.0</td>
          <td>1</td>
          <td>5.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>6</th>
          <td>4.0</td>
          <td>10</td>
          <td>7.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>7</th>
          <td>0.0</td>
          <td>8</td>
          <td>5.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>8</th>
          <td>12.0</td>
          <td>12</td>
          <td>12.0</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>9</th>
          <td>13.0</td>
          <td>4</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 452-455

.. code-block:: default


    df.replace(np.nan, '')


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>a</th>
          <th>b</th>
          <th>c</th>
          <th>d</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>9</td>
          <td>10</td>
          <td>13</td>
          <td></td>
        </tr>
        <tr>
          <th>1</th>
          <td>7</td>
          <td>7</td>
          <td>3</td>
          <td></td>
        </tr>
        <tr>
          <th>2</th>
          <td>5</td>
          <td>2</td>
          <td>13</td>
          <td></td>
        </tr>
        <tr>
          <th>3</th>
          <td>8</td>
          <td>8</td>
          <td>7</td>
          <td></td>
        </tr>
        <tr>
          <th>4</th>
          <td>7</td>
          <td>6</td>
          <td>3</td>
          <td></td>
        </tr>
        <tr>
          <th>5</th>
          <td>3</td>
          <td>1</td>
          <td>5</td>
          <td></td>
        </tr>
        <tr>
          <th>6</th>
          <td>4</td>
          <td>10</td>
          <td>7</td>
          <td></td>
        </tr>
        <tr>
          <th>7</th>
          <td>0</td>
          <td>8</td>
          <td>5</td>
          <td></td>
        </tr>
        <tr>
          <th>8</th>
          <td>12</td>
          <td>12</td>
          <td>12</td>
          <td></td>
        </tr>
        <tr>
          <th>9</th>
          <td>13</td>
          <td>4</td>
          <td>9</td>
          <td></td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 456-457

Rename specific column(s) in pandas

.. GENERATED FROM PYTHON SOURCE LINES 459-463

.. code-block:: default


    df = pd.DataFrame(np.random.randint(0, 14, (10, 3)), columns=['a', 'b', 'c'])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

        a   b   c
    0   2   4  12
    1   9  13   2
    2  11   1   4
    3   9   9   5
    4   9   5   4
    5  12   0   1
    6  10   2  11
    7   6   6   1
    8  12   7   4
    9   6   9   7


.. GENERATED FROM PYTHON SOURCE LINES 464-468

.. code-block:: default


    df.rename(columns={'a':'log(A)'}, inplace=True)
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

       log(A)   b   c
    0       2   4  12
    1       9  13   2
    2      11   1   4
    3       9   9   5
    4       9   5   4
    5      12   0   1
    6      10   2  11
    7       6   6   1
    8      12   7   4
    9       6   9   7


.. GENERATED FROM PYTHON SOURCE LINES 469-470

print DataFrame without index

.. GENERATED FROM PYTHON SOURCE LINES 473-476

.. code-block:: default


    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

       log(A)   b   c
    0       2   4  12
    1       9  13   2
    2      11   1   4
    3       9   9   5
    4       9   5   4
    5      12   0   1
    6      10   2  11
    7       6   6   1
    8      12   7   4
    9       6   9   7


.. GENERATED FROM PYTHON SOURCE LINES 477-480

.. code-block:: default


    df.style.hide_index()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/python-seekho/checkouts/latest/scripts/pandas/dataframe_vs_series.py:478: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")`
      df.style.hide_index()


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style type="text/css">
    </style>
    <table id="T_6e473">
      <thead>
        <tr>
          <th id="T_6e473_level0_col0" class="col_heading level0 col0" >log(A)</th>
          <th id="T_6e473_level0_col1" class="col_heading level0 col1" >b</th>
          <th id="T_6e473_level0_col2" class="col_heading level0 col2" >c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td id="T_6e473_row0_col0" class="data row0 col0" >2</td>
          <td id="T_6e473_row0_col1" class="data row0 col1" >4</td>
          <td id="T_6e473_row0_col2" class="data row0 col2" >12</td>
        </tr>
        <tr>
          <td id="T_6e473_row1_col0" class="data row1 col0" >9</td>
          <td id="T_6e473_row1_col1" class="data row1 col1" >13</td>
          <td id="T_6e473_row1_col2" class="data row1 col2" >2</td>
        </tr>
        <tr>
          <td id="T_6e473_row2_col0" class="data row2 col0" >11</td>
          <td id="T_6e473_row2_col1" class="data row2 col1" >1</td>
          <td id="T_6e473_row2_col2" class="data row2 col2" >4</td>
        </tr>
        <tr>
          <td id="T_6e473_row3_col0" class="data row3 col0" >9</td>
          <td id="T_6e473_row3_col1" class="data row3 col1" >9</td>
          <td id="T_6e473_row3_col2" class="data row3 col2" >5</td>
        </tr>
        <tr>
          <td id="T_6e473_row4_col0" class="data row4 col0" >9</td>
          <td id="T_6e473_row4_col1" class="data row4 col1" >5</td>
          <td id="T_6e473_row4_col2" class="data row4 col2" >4</td>
        </tr>
        <tr>
          <td id="T_6e473_row5_col0" class="data row5 col0" >12</td>
          <td id="T_6e473_row5_col1" class="data row5 col1" >0</td>
          <td id="T_6e473_row5_col2" class="data row5 col2" >1</td>
        </tr>
        <tr>
          <td id="T_6e473_row6_col0" class="data row6 col0" >10</td>
          <td id="T_6e473_row6_col1" class="data row6 col1" >2</td>
          <td id="T_6e473_row6_col2" class="data row6 col2" >11</td>
        </tr>
        <tr>
          <td id="T_6e473_row7_col0" class="data row7 col0" >6</td>
          <td id="T_6e473_row7_col1" class="data row7 col1" >6</td>
          <td id="T_6e473_row7_col2" class="data row7 col2" >1</td>
        </tr>
        <tr>
          <td id="T_6e473_row8_col0" class="data row8 col0" >12</td>
          <td id="T_6e473_row8_col1" class="data row8 col1" >7</td>
          <td id="T_6e473_row8_col2" class="data row8 col2" >4</td>
        </tr>
        <tr>
          <td id="T_6e473_row9_col0" class="data row9 col0" >6</td>
          <td id="T_6e473_row9_col1" class="data row9 col1" >9</td>
          <td id="T_6e473_row9_col2" class="data row9 col2" >7</td>
        </tr>
      </tbody>
    </table>

    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 481-482

replace nan values with average of columns

.. GENERATED FROM PYTHON SOURCE LINES 484-487

.. code-block:: default


    df.fillna(df.mean())


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>log(A)</th>
          <th>b</th>
          <th>c</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2</td>
          <td>4</td>
          <td>12</td>
        </tr>
        <tr>
          <th>1</th>
          <td>9</td>
          <td>13</td>
          <td>2</td>
        </tr>
        <tr>
          <th>2</th>
          <td>11</td>
          <td>1</td>
          <td>4</td>
        </tr>
        <tr>
          <th>3</th>
          <td>9</td>
          <td>9</td>
          <td>5</td>
        </tr>
        <tr>
          <th>4</th>
          <td>9</td>
          <td>5</td>
          <td>4</td>
        </tr>
        <tr>
          <th>5</th>
          <td>12</td>
          <td>0</td>
          <td>1</td>
        </tr>
        <tr>
          <th>6</th>
          <td>10</td>
          <td>2</td>
          <td>11</td>
        </tr>
        <tr>
          <th>7</th>
          <td>6</td>
          <td>6</td>
          <td>1</td>
        </tr>
        <tr>
          <th>8</th>
          <td>12</td>
          <td>7</td>
          <td>4</td>
        </tr>
        <tr>
          <th>9</th>
          <td>6</td>
          <td>9</td>
          <td>7</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 488-489

retrieve the number of columns in a dataframe?

.. GENERATED FROM PYTHON SOURCE LINES 491-494

.. code-block:: default


    len(df.columns)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    3


.. GENERATED FROM PYTHON SOURCE LINES 495-498

.. code-block:: default


    print(df.shape[1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    3


.. GENERATED FROM PYTHON SOURCE LINES 499-501

We can create empty DataFrame by telling
how many columns should exist or how many rows should exist.

.. GENERATED FROM PYTHON SOURCE LINES 503-507

.. code-block:: default


    df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Empty DataFrame
    Columns: [A, B, C, D, E, F, G]
    Index: []


.. GENERATED FROM PYTHON SOURCE LINES 508-511

.. code-block:: default


    print(df.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (0, 7)


.. GENERATED FROM PYTHON SOURCE LINES 512-516

.. code-block:: default


    df = pd.DataFrame(index=range(1,8))
    print(df)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Empty DataFrame
    Columns: []
    Index: [1, 2, 3, 4, 5, 6, 7]


.. GENERATED FROM PYTHON SOURCE LINES 517-519

.. code-block:: default


    print(df.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (7, 0)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.461 seconds)


.. _sphx_glr_download_auto_examples_pandas_dataframe_vs_series.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/AtrCheema/python-seekho/master?urlpath=lab/tree/notebooks/auto_examples/pandas/dataframe_vs_series.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: dataframe_vs_series.py <dataframe_vs_series.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: dataframe_vs_series.ipynb <dataframe_vs_series.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_