5.9 pivot vs melt
Contents
Note
Click here to download the full example code or to run this example in your browser via Binder
5.9 pivot vs melt#
import time
import pandas as pd
print(time.asctime())
print(pd.__version__)
Mon Nov 11 07:57:23 2024
1.5.3
Let’s consider a dataframe which consists of daily streamflow data from 840 polish stations for the month of January 2020. The data is available in a zip file. We can read the data using the read_csv method.
url = "https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/2020/codz_2020_01.zip"
df = pd.read_csv(
url, compression='zip', encoding="ISO-8859-1", engine='python',
on_bad_lines="skip",
names=['stn_id', 'year', 'day', 'q_cms', 'month'],
usecols=[0, 3, 5, 7, 9],
dtype={'stn_id': 'str', 'year': 'int', 'day': 'int', 'q_cms': 'float', 'month': 'int'},
parse_dates={'date': ['year', 'month', 'day']},
index_col='date'
)
df
print(df.shape)
(24913, 2)
The above dataframe consists of data for 480 stations, each stacked on top of the other.
len(df['stn_id'].unique())
840
If we want data for each station in a separate column, we can use the pivot_table method.
pivoted_table = df.pivot_table(index=df.index, columns="stn_id", values="q_cms")
pivoted_table
(30, 840)
Index([' 149180010', ' 149180020', ' 149180030', ' 149180040', ' 149180050',
' 149180060', ' 149180070', ' 149180080', ' 149180090', ' 149180100',
...
' 154220060', ' 154220070', ' 154220080', ' 154220090', ' 154220100',
' 154220110', ' 154230010', ' 154230020', ' 154230030', ' 154230040'],
dtype='object', name='stn_id', length=840)
840
Melt#
Melt is kind of opposite to that of pivot. It stacks the columns on top of each other.
melted_table = df.melt(id_vars=["stn_id"], value_vars=["q_cms"])
melted_table
Total running time of the script: ( 0 minutes 1.761 seconds)