Note

Click here to download the full example code or to run this example in your browser via Binder

5.4 reading/writing#

This file describes how to read data from files and write data into files using pandas.

Important

This lesson is still under development.

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/AtrCheema/AI4Water/master/ai4water/datasets/arg_busan.csv")

type(df)

df

	index	tide_cm	wat_temp_c	sal_psu	air_temp_c	...	aac_coppml	Total_otus	otu_5575	otu_273	otu_94
0	6/19/2018 0:00	36.407149	19.321232	33.956058	19.780000	...	NaN	NaN	NaN	NaN	NaN
1	6/19/2018 0:30	35.562515	19.320124	33.950508	19.093333	...	NaN	NaN	NaN	NaN	NaN
2	6/19/2018 1:00	34.808016	19.319666	33.942532	18.733333	...	NaN	NaN	NaN	NaN	NaN
3	6/19/2018 1:30	30.645216	19.320406	33.931263	18.760000	...	NaN	NaN	NaN	NaN	NaN
4	6/19/2018 2:00	26.608980	19.326729	33.917961	18.633333	...	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...
1441	9/7/2019 22:00	-3.989912	20.990612	33.776449	23.700000	...	NaN	NaN	NaN	NaN	NaN
1442	9/7/2019 22:30	-2.807042	21.012014	33.702310	23.620000	...	NaN	NaN	NaN	NaN	NaN
1443	9/7/2019 23:00	-3.471326	20.831739	33.726177	23.666667	...	NaN	NaN	NaN	NaN	NaN
1444	9/7/2019 23:30	0.707771	21.006086	33.716274	23.633333	...	NaN	NaN	NaN	NaN	NaN
1445	9/8/2019 0:00	1.011731	20.896149	33.729773	23.600000	...	NaN	NaN	NaN	NaN	NaN

1446 rows × 26 columns

df.to_csv("arg_busan.csv")

The index of df was 0,1,2,… By default, to_csv function writes the index to csv

print(df.index)

RangeIndex(start=0, stop=1446, step=1)

print(df.index.name)

None

we can avoid writing the index to csv file by setting index=False.

df.to_csv("arg_busan.csv", index=False)

we can also explicitly tell pandas what label for index to use when writing the index to csv file.

df.to_csv("arg_busan.csv", index_label="index")

if we want to save a dataframe to Excel file we can do it as following

df.to_excel("arg_busan.xlsx")  # we must have ``openpyxl`` package for that

we can define the sheet name and exlcude the index as following

df.to_excel("arg_busan.xlsx", index=False, sheet_name="data")

to read the excel file as dataframe we can make use of read_excel function

df = pd.read_excel("arg_busan.xlsx")
print(df)

               index    tide_cm  wat_temp_c    sal_psu  air_temp_c  ...  \
   6/19/2018 0:00  36.407149   19.321232  33.956058   19.780000  ...
   6/19/2018 0:30  35.562515   19.320124  33.950508   19.093333  ...
   6/19/2018 1:00  34.808016   19.319666  33.942532   18.733333  ...
   6/19/2018 1:30  30.645216   19.320406  33.931263   18.760000  ...
   6/19/2018 2:00  26.608980   19.326729  33.917961   18.633333  ...
...              ...        ...         ...        ...         ...  ...
9/7/2019 22:00  -3.989912   20.990612  33.776449   23.700000  ...
9/7/2019 22:30  -2.807042   21.012014  33.702310   23.620000  ...
9/7/2019 23:00  -3.471326   20.831739  33.726177   23.666667  ...
9/7/2019 23:30   0.707771   21.006086  33.716274   23.633333  ...
 9/8/2019 0:00   1.011731   20.896149  33.729773   23.600000  ...

      aac_coppml  Total_otus  otu_5575  otu_273  otu_94
          NaN         NaN       NaN      NaN     NaN
          NaN         NaN       NaN      NaN     NaN
          NaN         NaN       NaN      NaN     NaN
          NaN         NaN       NaN      NaN     NaN
          NaN         NaN       NaN      NaN     NaN
...          ...         ...       ...      ...     ...
       NaN         NaN       NaN      NaN     NaN
       NaN         NaN       NaN      NaN     NaN
       NaN         NaN       NaN      NaN     NaN
       NaN         NaN       NaN      NaN     NaN
       NaN         NaN       NaN      NaN     NaN

[1446 rows x 26 columns]

we can tell which column should be used as index for the dataframe

df = pd.read_excel("arg_busan.xlsx", index_col="index")
print(df)

                  tide_cm  wat_temp_c    sal_psu  air_temp_c  pcp_mm  ...  \
index                                                                 ...
6/19/2018 0:00  36.407149   19.321232  33.956058   19.780000     0.0  ...
6/19/2018 0:30  35.562515   19.320124  33.950508   19.093333     0.0  ...
6/19/2018 1:00  34.808016   19.319666  33.942532   18.733333     0.0  ...
6/19/2018 1:30  30.645216   19.320406  33.931263   18.760000     0.0  ...
6/19/2018 2:00  26.608980   19.326729  33.917961   18.633333     0.0  ...
...                   ...         ...        ...         ...     ...  ...
9/7/2019 22:00  -3.989912   20.990612  33.776449   23.700000     0.0  ...
9/7/2019 22:30  -2.807042   21.012014  33.702310   23.620000     0.0  ...
9/7/2019 23:00  -3.471326   20.831739  33.726177   23.666667     0.0  ...
9/7/2019 23:30   0.707771   21.006086  33.716274   23.633333     0.0  ...
9/8/2019 0:00    1.011731   20.896149  33.729773   23.600000     0.0  ...

                aac_coppml  Total_otus  otu_5575  otu_273  otu_94
index
6/19/2018 0:00         NaN         NaN       NaN      NaN     NaN
6/19/2018 0:30         NaN         NaN       NaN      NaN     NaN
6/19/2018 1:00         NaN         NaN       NaN      NaN     NaN
6/19/2018 1:30         NaN         NaN       NaN      NaN     NaN
6/19/2018 2:00         NaN         NaN       NaN      NaN     NaN
...                    ...         ...       ...      ...     ...
9/7/2019 22:00         NaN         NaN       NaN      NaN     NaN
9/7/2019 22:30         NaN         NaN       NaN      NaN     NaN
9/7/2019 23:00         NaN         NaN       NaN      NaN     NaN
9/7/2019 23:30         NaN         NaN       NaN      NaN     NaN
9/8/2019 0:00          NaN         NaN       NaN      NaN     NaN

[1446 rows x 25 columns]

print(type(df.index))

<class 'pandas.core.indexes.base.Index'>

Although we index of dataframe is date and time but pandas does not recognize it as data and time but it recognizes it just as numbers

df = pd.read_excel("arg_busan.xlsx", index_col="index", parse_dates=True)

print(df)

                       tide_cm  wat_temp_c    sal_psu  air_temp_c  pcp_mm  \
index
2018-06-19 00:00:00  36.407149   19.321232  33.956058   19.780000     0.0
2018-06-19 00:30:00  35.562515   19.320124  33.950508   19.093333     0.0
2018-06-19 01:00:00  34.808016   19.319666  33.942532   18.733333     0.0
2018-06-19 01:30:00  30.645216   19.320406  33.931263   18.760000     0.0
2018-06-19 02:00:00  26.608980   19.326729  33.917961   18.633333     0.0
...                        ...         ...        ...         ...     ...
2019-09-07 22:00:00  -3.989912   20.990612  33.776449   23.700000     0.0
2019-09-07 22:30:00  -2.807042   21.012014  33.702310   23.620000     0.0
2019-09-07 23:00:00  -3.471326   20.831739  33.726177   23.666667     0.0
2019-09-07 23:30:00   0.707771   21.006086  33.716274   23.633333     0.0
2019-09-08 00:00:00   1.011731   20.896149  33.729773   23.600000     0.0

                     ...  aac_coppml  Total_otus  otu_5575  otu_273  otu_94
index                ...
2018-06-19 00:00:00  ...         NaN         NaN       NaN      NaN     NaN
2018-06-19 00:30:00  ...         NaN         NaN       NaN      NaN     NaN
2018-06-19 01:00:00  ...         NaN         NaN       NaN      NaN     NaN
2018-06-19 01:30:00  ...         NaN         NaN       NaN      NaN     NaN
2018-06-19 02:00:00  ...         NaN         NaN       NaN      NaN     NaN
...                  ...         ...         ...       ...      ...     ...
2019-09-07 22:00:00  ...         NaN         NaN       NaN      NaN     NaN
2019-09-07 22:30:00  ...         NaN         NaN       NaN      NaN     NaN
2019-09-07 23:00:00  ...         NaN         NaN       NaN      NaN     NaN
2019-09-07 23:30:00  ...         NaN         NaN       NaN      NaN     NaN
2019-09-08 00:00:00  ...         NaN         NaN       NaN      NaN     NaN

[1446 rows x 25 columns]

Now the index of dataframe is read as DateTimeIndex

print(type(df.index))

#%%

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Total running time of the script: ( 0 minutes 4.364 seconds)

Gallery generated by Sphinx-Gallery