5.9 pivot vs melt#

import time
import pandas as pd

print(time.asctime())
print(pd.__version__)
Mon Nov 11 07:57:23 2024
1.5.3

Let’s consider a dataframe which consists of daily streamflow data from 840 polish stations for the month of January 2020. The data is available in a zip file. We can read the data using the read_csv method.

url = "https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/2020/codz_2020_01.zip"

df = pd.read_csv(
    url, compression='zip', encoding="ISO-8859-1",  engine='python',
    on_bad_lines="skip",
    names=['stn_id', 'year', 'day', 'q_cms', 'month'],
    usecols=[0, 3, 5, 7, 9],
    dtype={'stn_id': 'str', 'year': 'int', 'day': 'int', 'q_cms': 'float', 'month': 'int'},
    parse_dates={'date': ['year', 'month', 'day']},
    index_col='date'
    )


df
stn_id q_cms
date
2020-11-01 149180020 12.90
2020-11-02 149180020 12.50
2020-11-03 149180020 12.10
2020-11-04 149180020 12.60
2020-11-05 149180020 13.40
... ... ...
2020-11-26 149190250 0.44
2020-11-27 149190250 0.44
2020-11-28 149190250 0.44
2020-11-29 149190250 0.44
2020-11-30 149190250 0.44

24913 rows × 2 columns



print(df.shape)
(24913, 2)

The above dataframe consists of data for 480 stations, each stacked on top of the other.

len(df['stn_id'].unique())
840

If we want data for each station in a separate column, we can use the pivot_table method.

pivoted_table = df.pivot_table(index=df.index, columns="stn_id", values="q_cms")

pivoted_table
stn_id 149180010 149180020 149180030 149180040 149180050 ... 154220110 154230010 154230020 154230030 154230040
date
2020-11-01 17.6 12.9 3.44 0.47 0.53 ... 0.41 1.30 99999.999 2.38 0.69
2020-11-02 17.0 12.5 3.53 0.47 0.86 ... 0.42 1.26 99999.999 2.38 0.72
2020-11-03 16.6 12.1 3.47 0.43 0.95 ... 0.42 1.26 99999.999 2.41 0.70
2020-11-04 16.8 12.6 3.42 0.47 1.87 ... 0.44 1.30 99999.999 2.41 0.71
2020-11-05 18.9 13.4 4.52 0.47 1.62 ... 0.45 1.30 99999.999 2.41 0.75
... ... ... ... ... ... ... ... ... ... ... ...
2020-11-26 21.5 15.3 4.57 0.60 0.39 ... 0.45 1.25 99999.999 2.49 0.76
2020-11-27 22.4 15.8 4.79 0.60 0.47 ... 0.46 1.29 99999.999 2.53 0.76
2020-11-28 21.8 15.3 4.70 0.56 0.53 ... 0.47 1.26 99999.999 2.46 0.76
2020-11-29 20.7 14.4 4.96 0.60 0.47 ... 0.48 1.26 99999.999 2.50 0.78
2020-11-30 20.9 14.0 5.46 0.64 0.42 ... 0.51 1.27 99999.999 2.50 0.79

30 rows × 840 columns



(30, 840)
Index([' 149180010', ' 149180020', ' 149180030', ' 149180040', ' 149180050',
       ' 149180060', ' 149180070', ' 149180080', ' 149180090', ' 149180100',
       ...
       ' 154220060', ' 154220070', ' 154220080', ' 154220090', ' 154220100',
       ' 154220110', ' 154230010', ' 154230020', ' 154230030', ' 154230040'],
      dtype='object', name='stn_id', length=840)
840

Melt#

Melt is kind of opposite to that of pivot. It stacks the columns on top of each other.

melted_table = df.melt(id_vars=["stn_id"], value_vars=["q_cms"])

melted_table
stn_id variable value
0 149180020 q_cms 12.90
1 149180020 q_cms 12.50
2 149180020 q_cms 12.10
3 149180020 q_cms 12.60
4 149180020 q_cms 13.40
... ... ... ...
24908 149190250 q_cms 0.44
24909 149190250 q_cms 0.44
24910 149190250 q_cms 0.44
24911 149190250 q_cms 0.44
24912 149190250 q_cms 0.44

24913 rows × 3 columns



Total running time of the script: ( 0 minutes 1.761 seconds)

Gallery generated by Sphinx-Gallery