Note

Click here to download the full example code or to run this example in your browser via Binder

5.9 pivot vs melt#

import time
import pandas as pd

print(time.asctime())
print(pd.__version__)

Mon Nov 11 07:57:23 2024
1.5.3

Let’s consider a dataframe which consists of daily streamflow data from 840 polish stations for the month of January 2020. The data is available in a zip file. We can read the data using the read_csv method.

url = "https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/2020/codz_2020_01.zip"

df = pd.read_csv(
    url, compression='zip', encoding="ISO-8859-1",  engine='python',
    on_bad_lines="skip",
    names=['stn_id', 'year', 'day', 'q_cms', 'month'],
    usecols=[0, 3, 5, 7, 9],
    dtype={'stn_id': 'str', 'year': 'int', 'day': 'int', 'q_cms': 'float', 'month': 'int'},
    parse_dates={'date': ['year', 'month', 'day']},
    index_col='date'
    )


df

	stn_id	q_cms
date
2020-11-01	149180020	12.90
2020-11-02	149180020	12.50
2020-11-03	149180020	12.10
2020-11-04	149180020	12.60
2020-11-05	149180020	13.40
...	...	...
2020-11-26	149190250	0.44
2020-11-27	149190250	0.44
2020-11-28	149190250	0.44
2020-11-29	149190250	0.44
2020-11-30	149190250	0.44

24913 rows × 2 columns

print(df.shape)

(24913, 2)

The above dataframe consists of data for 480 stations, each stacked on top of the other.

len(df['stn_id'].unique())

If we want data for each station in a separate column, we can use the pivot_table method.

pivoted_table = df.pivot_table(index=df.index, columns="stn_id", values="q_cms")

pivoted_table

stn_id	149180010	149180020	149180030	149180040	149180050	...	154220110	154230010	154230020	154230030	154230040
date
2020-11-01	17.6	12.9	3.44	0.47	0.53	...	0.41	1.30	99999.999	2.38	0.69
2020-11-02	17.0	12.5	3.53	0.47	0.86	...	0.42	1.26	99999.999	2.38	0.72
2020-11-03	16.6	12.1	3.47	0.43	0.95	...	0.42	1.26	99999.999	2.41	0.70
2020-11-04	16.8	12.6	3.42	0.47	1.87	...	0.44	1.30	99999.999	2.41	0.71
2020-11-05	18.9	13.4	4.52	0.47	1.62	...	0.45	1.30	99999.999	2.41	0.75
...	...	...	...	...	...	...	...	...	...	...	...
2020-11-26	21.5	15.3	4.57	0.60	0.39	...	0.45	1.25	99999.999	2.49	0.76
2020-11-27	22.4	15.8	4.79	0.60	0.47	...	0.46	1.29	99999.999	2.53	0.76
2020-11-28	21.8	15.3	4.70	0.56	0.53	...	0.47	1.26	99999.999	2.46	0.76
2020-11-29	20.7	14.4	4.96	0.60	0.47	...	0.48	1.26	99999.999	2.50	0.78
2020-11-30	20.9	14.0	5.46	0.64	0.42	...	0.51	1.27	99999.999	2.50	0.79

30 rows × 840 columns

pivoted_table.shape

(30, 840)

pivoted_table.columns

Index([' 149180010', ' 149180020', ' 149180030', ' 149180040', ' 149180050',
       ' 149180060', ' 149180070', ' 149180080', ' 149180090', ' 149180100',
       ...
       ' 154220060', ' 154220070', ' 154220080', ' 154220090', ' 154220100',
       ' 154220110', ' 154230010', ' 154230020', ' 154230030', ' 154230040'],
      dtype='object', name='stn_id', length=840)

len(pivoted_table.columns)

Melt#

Melt is kind of opposite to that of pivot. It stacks the columns on top of each other.

melted_table = df.melt(id_vars=["stn_id"], value_vars=["q_cms"])

melted_table

	stn_id	variable	value
0	149180020	q_cms	12.90
1	149180020	q_cms	12.50
2	149180020	q_cms	12.10
3	149180020	q_cms	12.60
4	149180020	q_cms	13.40
...	...	...	...
24908	149190250	q_cms	0.44
24909	149190250	q_cms	0.44
24910	149190250	q_cms	0.44
24911	149190250	q_cms	0.44
24912	149190250	q_cms	0.44

24913 rows × 3 columns

Total running time of the script: ( 0 minutes 1.761 seconds)

Gallery generated by Sphinx-Gallery

5.9 pivot vs melt

Contents

5.9 pivot vs melt#

Melt#