.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/pandas/dataframe_vs_series.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_pandas_dataframe_vs_series.py: ================== 5.1 introduction ================== .. GENERATED FROM PYTHON SOURCE LINES 7-14 .. code-block:: default import time import numpy as np import pandas as pd print(time.asctime()) print(pd.__version__, np.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none Mon Nov 11 19:32:24 2024 1.5.3 1.26.4 .. GENERATED FROM PYTHON SOURCE LINES 15-27 Suppose we have an array [0.4, 0.3, 0.5, 0.2, 0.6, 0.3]. Let's say the values in this array represent concentrations in water measured every hour from 13 pm to 19 pm. However, with just an array, we don't have the ability to encode this information. If we want to add the (temporal) reference of each value we have to add it ourself for example by saving that in a separate array. Pandas comes with this in-built ability that we can add reference or labels to arrays. Every array in pandas has two kinds of references. The reference for the rows which is called ``index`` and the reference for the columns which is called ``columns``. Therefore we can call pandas a library which have referenced/labelled arrays. The core data structure in pandas is ``DataFrame`` which consists of one or more columns. A single column in a DataFrame is a ``Series``. .. GENERATED FROM PYTHON SOURCE LINES 27-31 .. code-block:: default df = pd.DataFrame(np.random.random((10, 3))) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 0.819711 0.469719 0.047138 1 0.995136 0.799338 0.014367 2 0.220825 0.169214 0.077614 3 0.472161 0.297704 0.213042 4 0.096419 0.898009 0.074485 5 0.210453 0.086511 0.951945 6 0.461789 0.427172 0.143734 7 0.906187 0.200208 0.497694 8 0.477416 0.382160 0.434236 9 0.428262 0.910327 0.273702 .. GENERATED FROM PYTHON SOURCE LINES 32-34 The data in columns is stored as numpy arrays. Therefore, a DataFrames and Series have a lot of characteristics similar to that of numpy arrays. .. GENERATED FROM PYTHON SOURCE LINES 34-36 .. code-block:: default print(df.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (10, 3) .. GENERATED FROM PYTHON SOURCE LINES 37-39 By default the columns names are just integers starting from 0, however we can define the column names ourselves as well. .. GENERATED FROM PYTHON SOURCE LINES 39-43 .. code-block:: default df = pd.DataFrame(np.random.random((10, 3)), columns=['a', 'b', 'c']) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 0 0.643263 0.594007 0.442692 1 0.286817 0.179552 0.060201 2 0.463696 0.816541 0.600126 3 0.150433 0.366931 0.906603 4 0.288914 0.052968 0.615138 5 0.158648 0.694414 0.973317 6 0.744459 0.436719 0.882664 7 0.175492 0.733806 0.173178 8 0.051456 0.641243 0.885918 9 0.474923 0.216120 0.883705 .. GENERATED FROM PYTHON SOURCE LINES 44-47 .. code-block:: default print(df.columns) .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['a', 'b', 'c'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 48-49 The columns are list like structures. However they are not exactly lists. .. GENERATED FROM PYTHON SOURCE LINES 49-51 .. code-block:: default print(type(df.columns)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 52-53 We can however, convert the columns to list though. .. GENERATED FROM PYTHON SOURCE LINES 53-55 .. code-block:: default df.columns.to_list() .. rst-class:: sphx-glr-script-out .. code-block:: none ['a', 'b', 'c'] .. GENERATED FROM PYTHON SOURCE LINES 56-59 .. code-block:: default print(type(df.columns.to_list())) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 60-61 The default label for the rows i.e. ``index`` consists of numbers starting from 0. .. GENERATED FROM PYTHON SOURCE LINES 61-63 .. code-block:: default print(df.index) .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex(start=0, stop=10, step=1) .. GENERATED FROM PYTHON SOURCE LINES 64-65 However, we can set ``index`` of our choice as well. .. GENERATED FROM PYTHON SOURCE LINES 65-71 .. code-block:: default df = pd.DataFrame(np.random.random((10, 3)), columns=['a', 'b', 'c'], index=[2000+i for i in range(10)]) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 2000 0.978551 0.820233 0.227459 2001 0.317172 0.157027 0.695845 2002 0.767413 0.547235 0.369868 2003 0.425290 0.032933 0.923786 2004 0.251980 0.580116 0.266811 2005 0.947292 0.581080 0.211948 2006 0.868994 0.163130 0.771538 2007 0.627722 0.624592 0.296365 2008 0.337227 0.355143 0.516186 2009 0.441642 0.761122 0.799617 .. GENERATED FROM PYTHON SOURCE LINES 72-75 .. code-block:: default print(df.index) .. rst-class:: sphx-glr-script-out .. code-block:: none Int64Index([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009], dtype='int64') .. GENERATED FROM PYTHON SOURCE LINES 76-77 The default name of ``index`` is ``None``. .. GENERATED FROM PYTHON SOURCE LINES 77-79 .. code-block:: default print(df.index.name) .. rst-class:: sphx-glr-script-out .. code-block:: none None .. GENERATED FROM PYTHON SOURCE LINES 80-81 However, we can set the name of index as well. .. GENERATED FROM PYTHON SOURCE LINES 81-84 .. code-block:: default df.index.name = 'years' print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c years 2000 0.978551 0.820233 0.227459 2001 0.317172 0.157027 0.695845 2002 0.767413 0.547235 0.369868 2003 0.425290 0.032933 0.923786 2004 0.251980 0.580116 0.266811 2005 0.947292 0.581080 0.211948 2006 0.868994 0.163130 0.771538 2007 0.627722 0.624592 0.296365 2008 0.337227 0.355143 0.516186 2009 0.441642 0.761122 0.799617 .. GENERATED FROM PYTHON SOURCE LINES 85-88 .. code-block:: default print(df.index.name) .. rst-class:: sphx-glr-script-out .. code-block:: none years .. GENERATED FROM PYTHON SOURCE LINES 89-92 .. code-block:: default print(type(df)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 93-99 .. code-block:: default df = pd.DataFrame(np.random.randint(0, 10, (10, 1)), columns=['a'], index=[2000+i for i in range(10)]) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a 2000 1 2001 6 2002 8 2003 5 2004 0 2005 6 2006 2 2007 8 2008 2 2009 3 .. GENERATED FROM PYTHON SOURCE LINES 100-103 .. code-block:: default print(type(df)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 104-107 .. code-block:: default print(df.columns) .. rst-class:: sphx-glr-script-out .. code-block:: none Index(['a'], dtype='object') .. GENERATED FROM PYTHON SOURCE LINES 108-111 Series ========= A Series consists of a single column. It can be constructed using ``pd.Series``. .. GENERATED FROM PYTHON SOURCE LINES 111-115 .. code-block:: default s = pd.Series(np.random.random(10)) print(s) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 0.240092 1 0.316543 2 0.336670 3 0.256285 4 0.662836 5 0.947630 6 0.188108 7 0.525331 8 0.612323 9 0.212473 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 116-119 .. code-block:: default print(type(s)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 120-123 .. code-block:: default print(s.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (10,) .. GENERATED FROM PYTHON SOURCE LINES 124-127 .. code-block:: default print(s.name) .. rst-class:: sphx-glr-script-out .. code-block:: none None .. GENERATED FROM PYTHON SOURCE LINES 128-133 .. code-block:: default s = pd.Series(np.random.random(10), name="a") print(s) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 0.342054 1 0.232343 2 0.547047 3 0.942240 4 0.766106 5 0.313809 6 0.783459 7 0.325427 8 0.380389 9 0.418716 Name: a, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 134-137 .. code-block:: default print(s.name) .. rst-class:: sphx-glr-script-out .. code-block:: none a .. GENERATED FROM PYTHON SOURCE LINES 138-139 the Series is literally the data structure for a single column of a DataFrame. .. GENERATED FROM PYTHON SOURCE LINES 141-147 .. code-block:: default df = pd.DataFrame(np.random.random((10, 3)), columns=['a', 'b', 'c'], index=[2000+i for i in range(10)]) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 2000 0.611481 0.646587 0.884365 2001 0.884323 0.265456 0.499323 2002 0.035322 0.413177 0.404945 2003 0.138943 0.159374 0.096236 2004 0.007718 0.173929 0.156492 2005 0.244785 0.159650 0.829702 2006 0.226925 0.632816 0.999603 2007 0.251388 0.524466 0.769429 2008 0.024436 0.454759 0.473400 2009 0.611078 0.184098 0.176071 .. GENERATED FROM PYTHON SOURCE LINES 148-149 A single column in a DataFrame is a Series. .. GENERATED FROM PYTHON SOURCE LINES 149-152 .. code-block:: default print(type(df['a'])) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 153-159 .. code-block:: default s = pd.Series(np.random.random(10), index=[2000+i for i in range(10)], name="a") print(s) .. rst-class:: sphx-glr-script-out .. code-block:: none 2000 0.520191 2001 0.355112 2002 0.645158 2003 0.384966 2004 0.996232 2005 0.068262 2006 0.720012 2007 0.611447 2008 0.931790 2009 0.001561 Name: a, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 160-162 Since pandas is based upon numpy arrays. We can extract actual numpy arrays from DataFrame using `.values` method. .. GENERATED FROM PYTHON SOURCE LINES 162-164 .. code-block:: default print(df.values) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.61148101 0.64658729 0.8843646 ] [0.88432322 0.26545644 0.499323 ] [0.03532233 0.41317685 0.40494501] [0.13894266 0.15937354 0.0962357 ] [0.00771782 0.17392933 0.15649219] [0.24478489 0.15965015 0.82970221] [0.22692541 0.63281557 0.99960347] [0.25138839 0.52446576 0.76942873] [0.0244364 0.4547589 0.47339962] [0.61107766 0.1840978 0.17607135]] .. GENERATED FROM PYTHON SOURCE LINES 165-168 .. code-block:: default print(type(df.values)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 169-175 .. code-block:: default df = pd.DataFrame(np.random.randint(0, 14, (10, 3)), columns=['a', 'b', 'c'], index=[2000+i for i in range(10)]) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 2000 12 10 1 2001 6 5 2 2002 2 0 4 2003 12 1 12 2004 11 1 0 2005 1 9 13 2006 6 3 11 2007 9 0 11 2008 7 7 0 2009 5 8 2 .. GENERATED FROM PYTHON SOURCE LINES 176-179 .. code-block:: default print(type(df.values)) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 180-183 .. code-block:: default print(df.values.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (10, 3) .. GENERATED FROM PYTHON SOURCE LINES 184-187 .. code-block:: default df.describe() .. raw:: html
a b c
count 10.000000 10.000000 10.000000
mean 7.100000 4.400000 5.600000
std 3.900142 3.893014 5.440588
min 1.000000 0.000000 0.000000
25% 5.250000 1.000000 1.250000
50% 6.500000 4.000000 3.000000
75% 10.500000 7.750000 11.000000
max 12.000000 10.000000 13.000000


.. GENERATED FROM PYTHON SOURCE LINES 188-191 .. code-block:: default df.head() .. raw:: html
a b c
2000 12 10 1
2001 6 5 2
2002 2 0 4
2003 12 1 12
2004 11 1 0


.. GENERATED FROM PYTHON SOURCE LINES 192-195 .. code-block:: default df.head(8) .. raw:: html
a b c
2000 12 10 1
2001 6 5 2
2002 2 0 4
2003 12 1 12
2004 11 1 0
2005 1 9 13
2006 6 3 11
2007 9 0 11


.. GENERATED FROM PYTHON SOURCE LINES 196-197 Get the last N rows of a DataFrame .. GENERATED FROM PYTHON SOURCE LINES 199-202 .. code-block:: default df.tail() .. raw:: html
a b c
2005 1 9 13
2006 6 3 11
2007 9 0 11
2008 7 7 0
2009 5 8 2


.. GENERATED FROM PYTHON SOURCE LINES 203-206 .. code-block:: default df.tail(7) .. raw:: html
a b c
2003 12 1 12
2004 11 1 0
2005 1 9 13
2006 6 3 11
2007 9 0 11
2008 7 7 0
2009 5 8 2


.. GENERATED FROM PYTHON SOURCE LINES 207-211 .. code-block:: default df.mean() .. rst-class:: sphx-glr-script-out .. code-block:: none a 7.1 b 4.4 c 5.6 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 212-215 .. code-block:: default df.to_dict() .. rst-class:: sphx-glr-script-out .. code-block:: none {'a': {2000: 12, 2001: 6, 2002: 2, 2003: 12, 2004: 11, 2005: 1, 2006: 6, 2007: 9, 2008: 7, 2009: 5}, 'b': {2000: 10, 2001: 5, 2002: 0, 2003: 1, 2004: 1, 2005: 9, 2006: 3, 2007: 0, 2008: 7, 2009: 8}, 'c': {2000: 1, 2001: 2, 2002: 4, 2003: 12, 2004: 0, 2005: 13, 2006: 11, 2007: 11, 2008: 0, 2009: 2}} .. GENERATED FROM PYTHON SOURCE LINES 216-219 .. code-block:: default df.to_dict('list') .. rst-class:: sphx-glr-script-out .. code-block:: none {'a': [12, 6, 2, 12, 11, 1, 6, 9, 7, 5], 'b': [10, 5, 0, 1, 1, 9, 3, 0, 7, 8], 'c': [1, 2, 4, 12, 0, 13, 11, 11, 0, 2]} .. GENERATED FROM PYTHON SOURCE LINES 220-224 .. code-block:: default df['d'] = np.random.randint(0, 10, (10,)) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c d 2000 12 10 1 4 2001 6 5 2 4 2002 2 0 4 6 2003 12 1 12 9 2004 11 1 0 5 2005 1 9 13 5 2006 6 3 11 5 2007 9 0 11 5 2008 7 7 0 0 2009 5 8 2 8 .. GENERATED FROM PYTHON SOURCE LINES 225-229 .. code-block:: default df.pop('d') print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 2000 12 10 1 2001 6 5 2 2002 2 0 4 2003 12 1 12 2004 11 1 0 2005 1 9 13 2006 6 3 11 2007 9 0 11 2008 7 7 0 2009 5 8 2 .. GENERATED FROM PYTHON SOURCE LINES 230-234 .. code-block:: default df.columns = ['x', 'y', 'z'] print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none x y z 2000 12 10 1 2001 6 5 2 2002 2 0 4 2003 12 1 12 2004 11 1 0 2005 1 9 13 2006 6 3 11 2007 9 0 11 2008 7 7 0 2009 5 8 2 .. GENERATED FROM PYTHON SOURCE LINES 235-236 row count of pandas dataframe .. GENERATED FROM PYTHON SOURCE LINES 238-241 .. code-block:: default len(df.index) .. rst-class:: sphx-glr-script-out .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 242-245 .. code-block:: default print(df.shape[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none 10 .. GENERATED FROM PYTHON SOURCE LINES 246-247 change the order of DataFrame columns .. GENERATED FROM PYTHON SOURCE LINES 249-255 .. code-block:: default cols = df.columns.tolist() cols = cols[-1:] + cols[:-1] df = df[cols] print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none z x y 2000 1 12 10 2001 2 6 5 2002 4 2 0 2003 12 12 1 2004 0 11 1 2005 13 1 9 2006 11 6 3 2007 11 9 0 2008 0 7 7 2009 2 5 8 .. GENERATED FROM PYTHON SOURCE LINES 256-257 drop rows of Pandas DataFrame whose value in a certain column is NaN .. GENERATED FROM PYTHON SOURCE LINES 259-263 .. code-block:: default df = pd.DataFrame(np.random.randn(6,3)) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 0.792738 -2.566105 0.434209 1 1.059024 -0.031717 0.358016 2 0.301728 0.249756 -0.682732 3 -0.944110 1.394479 1.437554 4 1.052420 1.079581 0.149270 5 -1.030245 -0.500275 -2.613331 .. GENERATED FROM PYTHON SOURCE LINES 264-268 .. code-block:: default df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 NaN -2.566105 NaN 1 1.059024 -0.031717 0.358016 2 NaN 0.249756 -0.682732 3 -0.944110 1.394479 NaN 4 NaN 1.079581 NaN 5 -1.030245 -0.500275 -2.613331 .. GENERATED FROM PYTHON SOURCE LINES 269-270 dropping all rows having NaN values .. GENERATED FROM PYTHON SOURCE LINES 270-272 .. code-block:: default df.dropna() .. raw:: html
0 1 2
1 1.059024 -0.031717 0.358016
5 -1.030245 -0.500275 -2.613331


.. GENERATED FROM PYTHON SOURCE LINES 273-274 dropping NaN in specific columns .. GENERATED FROM PYTHON SOURCE LINES 274-276 .. code-block:: default print(df[df[2].notna()]) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 1 1.059024 -0.031717 0.358016 2 NaN 0.249756 -0.682732 5 -1.030245 -0.500275 -2.613331 .. GENERATED FROM PYTHON SOURCE LINES 277-278 count the NaN values in a column in DataFrame .. GENERATED FROM PYTHON SOURCE LINES 280-285 .. code-block:: default df = pd.DataFrame(np.random.randn(6,3)) df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 NaN 0.139581 NaN 1 -1.509370 1.762801 -1.067253 2 NaN 0.095276 -0.777572 3 0.492278 0.061235 NaN 4 NaN -0.038476 NaN 5 -0.067574 -0.885610 -0.721969 .. GENERATED FROM PYTHON SOURCE LINES 286-289 .. code-block:: default df.isna().sum() .. rst-class:: sphx-glr-script-out .. code-block:: none 0 3 1 0 2 3 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 290-291 for columns .. GENERATED FROM PYTHON SOURCE LINES 291-293 .. code-block:: default df.isnull().sum(axis = 0) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 3 1 0 2 3 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 294-295 for rows .. GENERATED FROM PYTHON SOURCE LINES 295-297 .. code-block:: default df.isnull().sum(axis = 1) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 2 1 0 2 1 3 1 4 2 5 0 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 298-299 check if any value is NaN in a DataFrame .. GENERATED FROM PYTHON SOURCE LINES 301-306 .. code-block:: default df = pd.DataFrame(np.random.randn(6,3)) df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 NaN 0.163625 NaN 1 -1.087252 2.721640 -0.415199 2 NaN 1.432081 -0.653284 3 1.336880 0.197294 NaN 4 NaN 0.015589 NaN 5 1.388623 0.203641 1.691277 .. GENERATED FROM PYTHON SOURCE LINES 307-308 how many NaN .. GENERATED FROM PYTHON SOURCE LINES 308-310 .. code-block:: default df.isnull() .. raw:: html
0 1 2
0 True False True
1 False False False
2 True False False
3 False False True
4 True False True
5 False False False


.. GENERATED FROM PYTHON SOURCE LINES 311-312 column wise .. GENERATED FROM PYTHON SOURCE LINES 312-314 .. code-block:: default df.isnull().any() .. rst-class:: sphx-glr-script-out .. code-block:: none 0 True 1 False 2 True dtype: bool .. GENERATED FROM PYTHON SOURCE LINES 315-316 if there is any NaN in entire data .. GENERATED FROM PYTHON SOURCE LINES 316-318 .. code-block:: default df.isnull().any().any() .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 319-320 replace NaN values by Zeroes in a column of a Dataframe? .. GENERATED FROM PYTHON SOURCE LINES 322-327 .. code-block:: default df = pd.DataFrame(np.random.randn(6,3)) df.iloc[::2,0] = np.nan; df.iloc[::4,2] = np.nan; df.iloc[::3,2] = np.nan print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 NaN -0.168681 NaN 1 0.563927 0.017890 -1.375824 2 NaN -0.810597 -1.174800 3 -0.722840 0.883346 NaN 4 NaN -0.310959 NaN 5 0.940451 -1.261573 -1.127101 .. GENERATED FROM PYTHON SOURCE LINES 328-331 .. code-block:: default df.fillna(0) .. raw:: html
0 1 2
0 0.000000 -0.168681 0.000000
1 0.563927 0.017890 -1.375824
2 0.000000 -0.810597 -1.174800
3 -0.722840 0.883346 0.000000
4 0.000000 -0.310959 0.000000
5 0.940451 -1.261573 -1.127101


.. GENERATED FROM PYTHON SOURCE LINES 332-333 To fill the NaNs in only one column .. GENERATED FROM PYTHON SOURCE LINES 333-337 .. code-block:: default df[2].fillna(0, inplace=True) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 NaN -0.168681 0.000000 1 0.563927 0.017890 -1.375824 2 NaN -0.810597 -1.174800 3 -0.722840 0.883346 0.000000 4 NaN -0.310959 0.000000 5 0.940451 -1.261573 -1.127101 .. GENERATED FROM PYTHON SOURCE LINES 338-339 check if a column exists in Pandas .. GENERATED FROM PYTHON SOURCE LINES 341-345 .. code-block:: default df = pd.DataFrame(np.random.randn(6,3)) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 0 0.775496 0.028101 0.380929 1 -0.932216 0.469528 -0.663859 2 -0.616558 -1.267830 0.580904 3 -1.582028 -0.355916 0.460871 4 0.672658 -1.117510 0.144625 5 0.613005 0.732261 -0.276066 .. GENERATED FROM PYTHON SOURCE LINES 346-350 .. code-block:: default if 0 in df.columns: print("true") .. rst-class:: sphx-glr-script-out .. code-block:: none true .. GENERATED FROM PYTHON SOURCE LINES 351-352 Python dict into a dataframe .. GENERATED FROM PYTHON SOURCE LINES 354-371 .. code-block:: default d = { '2012-06-08': 388, '2012-06-09': 388, '2012-06-10': 388, '2012-06-11': 389, '2012-06-12': 389, '2012-06-13': 389, '2012-06-14': 389, '2012-06-15': 389, '2012-06-16': 389, '2012-06-17': 389, '2012-06-18': 390, '2012-06-19': 390, '2012-06-20': 390, } .. GENERATED FROM PYTHON SOURCE LINES 372-375 .. code-block:: default pd.DataFrame(d.items()) .. raw:: html
0 1
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
3 2012-06-11 389
4 2012-06-12 389
5 2012-06-13 389
6 2012-06-14 389
7 2012-06-15 389
8 2012-06-16 389
9 2012-06-17 389
10 2012-06-18 390
11 2012-06-19 390
12 2012-06-20 390


.. GENERATED FROM PYTHON SOURCE LINES 376-379 .. code-block:: default pd.DataFrame(d.items(), columns=['Date', 'DateValue']) .. raw:: html
Date DateValue
0 2012-06-08 388
1 2012-06-09 388
2 2012-06-10 388
3 2012-06-11 389
4 2012-06-12 389
5 2012-06-13 389
6 2012-06-14 389
7 2012-06-15 389
8 2012-06-16 389
9 2012-06-17 389
10 2012-06-18 390
11 2012-06-19 390
12 2012-06-20 390


.. GENERATED FROM PYTHON SOURCE LINES 380-382 uncomment following line pd.DataFrame(d) # ValueError: If using all scalar values, you must pass an index .. GENERATED FROM PYTHON SOURCE LINES 385-387 .. code-block:: default pd.DataFrame([d]) .. raw:: html
2012-06-08 2012-06-09 2012-06-10 2012-06-11 2012-06-12 2012-06-13 2012-06-14 2012-06-15 2012-06-16 2012-06-17 2012-06-18 2012-06-19 2012-06-20
0 388 388 388 389 389 389 389 389 389 389 390 390 390


.. GENERATED FROM PYTHON SOURCE LINES 388-391 .. code-block:: default pd.DataFrame.from_dict(d, orient='index', columns=['DateVaue']) .. raw:: html
DateVaue
2012-06-08 388
2012-06-09 388
2012-06-10 388
2012-06-11 389
2012-06-12 389
2012-06-13 389
2012-06-14 389
2012-06-15 389
2012-06-16 389
2012-06-17 389
2012-06-18 390
2012-06-19 390
2012-06-20 390


.. GENERATED FROM PYTHON SOURCE LINES 392-393 Count the frequency that a value occurs in a dataframe column .. GENERATED FROM PYTHON SOURCE LINES 395-401 .. code-block:: default df = pd.DataFrame(np.random.randint(0, 14, (10, 3)), columns=['a', 'b', 'c'], index=[2000+i for i in range(10)]) df['a'].value_counts() .. rst-class:: sphx-glr-script-out .. code-block:: none 6 2 2 2 12 2 9 1 11 1 0 1 5 1 Name: a, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 402-406 .. code-block:: default for index, row in df.iterrows(): print(index, row, '\n') .. rst-class:: sphx-glr-script-out .. code-block:: none 2000 a 6 b 12 c 5 Name: 2000, dtype: int64 2001 a 2 b 1 c 10 Name: 2001, dtype: int64 2002 a 6 b 0 c 12 Name: 2002, dtype: int64 2003 a 9 b 8 c 7 Name: 2003, dtype: int64 2004 a 11 b 11 c 4 Name: 2004, dtype: int64 2005 a 0 b 0 c 7 Name: 2005, dtype: int64 2006 a 12 b 6 c 8 Name: 2006, dtype: int64 2007 a 12 b 5 c 1 Name: 2007, dtype: int64 2008 a 2 b 2 c 13 Name: 2008, dtype: int64 2009 a 5 b 2 c 13 Name: 2009, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 407-412 .. code-block:: default df = pd.DataFrame(np.random.randint(0, 14, (10, 3)), columns=['a', 'b', 'c']) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 0 9 10 13 1 7 7 3 2 5 2 13 3 8 8 7 4 7 6 3 5 3 1 5 6 4 10 7 7 0 8 5 8 12 12 12 9 13 4 9 .. GENERATED FROM PYTHON SOURCE LINES 413-416 .. code-block:: default print(df['a']/df['b']) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 0.900000 1 1.000000 2 2.500000 3 1.000000 4 1.166667 5 3.000000 6 0.400000 7 0.000000 8 1.000000 9 3.250000 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 417-418 add an empty column to a dataframe? .. GENERATED FROM PYTHON SOURCE LINES 420-424 .. code-block:: default df["d"] = "" print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c d 0 9 10 13 1 7 7 3 2 5 2 13 3 8 8 7 4 7 6 3 5 3 1 5 6 4 10 7 7 0 8 5 8 12 12 12 9 13 4 9 .. GENERATED FROM PYTHON SOURCE LINES 425-428 .. code-block:: default print(df['d']) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 1 2 3 4 5 6 7 8 9 Name: d, dtype: object .. GENERATED FROM PYTHON SOURCE LINES 429-433 .. code-block:: default df["d"] = np.nan print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c d 0 9 10 13 NaN 1 7 7 3 NaN 2 5 2 13 NaN 3 8 8 7 NaN 4 7 6 3 NaN 5 3 1 5 NaN 6 4 10 7 NaN 7 0 8 5 NaN 8 12 12 12 NaN 9 13 4 9 NaN .. GENERATED FROM PYTHON SOURCE LINES 434-435 What does axis in pandas mean? .. GENERATED FROM PYTHON SOURCE LINES 437-440 .. code-block:: default df.mean(axis=0) .. rst-class:: sphx-glr-script-out .. code-block:: none a 6.8 b 6.8 c 7.7 d NaN dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 441-444 .. code-block:: default df.mean(axis=1) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 10.666667 1 5.666667 2 6.666667 3 7.666667 4 5.333333 5 3.000000 6 7.000000 7 4.333333 8 12.000000 9 8.666667 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 445-446 Replace NaN with blank/empty string .. GENERATED FROM PYTHON SOURCE LINES 448-451 .. code-block:: default df.replace(9, np.nan) .. raw:: html
a b c d
0 NaN 10 13.0 NaN
1 7.0 7 3.0 NaN
2 5.0 2 13.0 NaN
3 8.0 8 7.0 NaN
4 7.0 6 3.0 NaN
5 3.0 1 5.0 NaN
6 4.0 10 7.0 NaN
7 0.0 8 5.0 NaN
8 12.0 12 12.0 NaN
9 13.0 4 NaN NaN


.. GENERATED FROM PYTHON SOURCE LINES 452-455 .. code-block:: default df.replace(np.nan, '') .. raw:: html
a b c d
0 9 10 13
1 7 7 3
2 5 2 13
3 8 8 7
4 7 6 3
5 3 1 5
6 4 10 7
7 0 8 5
8 12 12 12
9 13 4 9


.. GENERATED FROM PYTHON SOURCE LINES 456-457 Rename specific column(s) in pandas .. GENERATED FROM PYTHON SOURCE LINES 459-463 .. code-block:: default df = pd.DataFrame(np.random.randint(0, 14, (10, 3)), columns=['a', 'b', 'c']) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none a b c 0 2 4 12 1 9 13 2 2 11 1 4 3 9 9 5 4 9 5 4 5 12 0 1 6 10 2 11 7 6 6 1 8 12 7 4 9 6 9 7 .. GENERATED FROM PYTHON SOURCE LINES 464-468 .. code-block:: default df.rename(columns={'a':'log(A)'}, inplace=True) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none log(A) b c 0 2 4 12 1 9 13 2 2 11 1 4 3 9 9 5 4 9 5 4 5 12 0 1 6 10 2 11 7 6 6 1 8 12 7 4 9 6 9 7 .. GENERATED FROM PYTHON SOURCE LINES 469-470 print DataFrame without index .. GENERATED FROM PYTHON SOURCE LINES 473-476 .. code-block:: default print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none log(A) b c 0 2 4 12 1 9 13 2 2 11 1 4 3 9 9 5 4 9 5 4 5 12 0 1 6 10 2 11 7 6 6 1 8 12 7 4 9 6 9 7 .. GENERATED FROM PYTHON SOURCE LINES 477-480 .. code-block:: default df.style.hide_index() .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/python-seekho/checkouts/latest/scripts/pandas/dataframe_vs_series.py:478: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` df.style.hide_index() .. raw:: html
log(A) b c
2 4 12
9 13 2
11 1 4
9 9 5
9 5 4
12 0 1
10 2 11
6 6 1
12 7 4
6 9 7


.. GENERATED FROM PYTHON SOURCE LINES 481-482 replace nan values with average of columns .. GENERATED FROM PYTHON SOURCE LINES 484-487 .. code-block:: default df.fillna(df.mean()) .. raw:: html
log(A) b c
0 2 4 12
1 9 13 2
2 11 1 4
3 9 9 5
4 9 5 4
5 12 0 1
6 10 2 11
7 6 6 1
8 12 7 4
9 6 9 7


.. GENERATED FROM PYTHON SOURCE LINES 488-489 retrieve the number of columns in a dataframe? .. GENERATED FROM PYTHON SOURCE LINES 491-494 .. code-block:: default len(df.columns) .. rst-class:: sphx-glr-script-out .. code-block:: none 3 .. GENERATED FROM PYTHON SOURCE LINES 495-498 .. code-block:: default print(df.shape[1]) .. rst-class:: sphx-glr-script-out .. code-block:: none 3 .. GENERATED FROM PYTHON SOURCE LINES 499-501 We can create empty DataFrame by telling how many columns should exist or how many rows should exist. .. GENERATED FROM PYTHON SOURCE LINES 503-507 .. code-block:: default df = pd.DataFrame(columns=['A','B','C','D','E','F','G']) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none Empty DataFrame Columns: [A, B, C, D, E, F, G] Index: [] .. GENERATED FROM PYTHON SOURCE LINES 508-511 .. code-block:: default print(df.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (0, 7) .. GENERATED FROM PYTHON SOURCE LINES 512-516 .. code-block:: default df = pd.DataFrame(index=range(1,8)) print(df) .. rst-class:: sphx-glr-script-out .. code-block:: none Empty DataFrame Columns: [] Index: [1, 2, 3, 4, 5, 6, 7] .. GENERATED FROM PYTHON SOURCE LINES 517-519 .. code-block:: default print(df.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (7, 0) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.461 seconds) .. _sphx_glr_download_auto_examples_pandas_dataframe_vs_series.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/AtrCheema/python-seekho/master?urlpath=lab/tree/notebooks/auto_examples/pandas/dataframe_vs_series.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: dataframe_vs_series.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: dataframe_vs_series.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_