5.4 reading/writing
Note
Click here to download the full example code or to run this example in your browser via Binder
5.4 reading/writing#
This file describes how to read data from files and write data into files using pandas.
Important
This lesson is still under development.
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/AtrCheema/AI4Water/master/ai4water/datasets/arg_busan.csv")
type(df)
df.to_csv("arg_busan.csv")
The index of df was 0,1,2,… By default, to_csv function writes the index to csv
print(df.index)
RangeIndex(start=0, stop=1446, step=1)
print(df.index.name)
None
we can avoid writing the index to csv file by setting index=False.
df.to_csv("arg_busan.csv", index=False)
we can also explicitly tell pandas what label for index to use when writing the index to csv file.
df.to_csv("arg_busan.csv", index_label="index")
if we want to save a dataframe to Excel file we can do it as following
df.to_excel("arg_busan.xlsx") # we must have ``openpyxl`` package for that
we can define the sheet name and exlcude the index as following
df.to_excel("arg_busan.xlsx", index=False, sheet_name="data")
to read the excel file as dataframe we can make use of read_excel function
df = pd.read_excel("arg_busan.xlsx")
print(df)
index tide_cm wat_temp_c sal_psu air_temp_c ... \
0 6/19/2018 0:00 36.407149 19.321232 33.956058 19.780000 ...
1 6/19/2018 0:30 35.562515 19.320124 33.950508 19.093333 ...
2 6/19/2018 1:00 34.808016 19.319666 33.942532 18.733333 ...
3 6/19/2018 1:30 30.645216 19.320406 33.931263 18.760000 ...
4 6/19/2018 2:00 26.608980 19.326729 33.917961 18.633333 ...
... ... ... ... ... ... ...
1441 9/7/2019 22:00 -3.989912 20.990612 33.776449 23.700000 ...
1442 9/7/2019 22:30 -2.807042 21.012014 33.702310 23.620000 ...
1443 9/7/2019 23:00 -3.471326 20.831739 33.726177 23.666667 ...
1444 9/7/2019 23:30 0.707771 21.006086 33.716274 23.633333 ...
1445 9/8/2019 0:00 1.011731 20.896149 33.729773 23.600000 ...
aac_coppml Total_otus otu_5575 otu_273 otu_94
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... ... ... ... ... ...
1441 NaN NaN NaN NaN NaN
1442 NaN NaN NaN NaN NaN
1443 NaN NaN NaN NaN NaN
1444 NaN NaN NaN NaN NaN
1445 NaN NaN NaN NaN NaN
[1446 rows x 26 columns]
we can tell which column should be used as index for the dataframe
df = pd.read_excel("arg_busan.xlsx", index_col="index")
print(df)
tide_cm wat_temp_c sal_psu air_temp_c pcp_mm ... \
index ...
6/19/2018 0:00 36.407149 19.321232 33.956058 19.780000 0.0 ...
6/19/2018 0:30 35.562515 19.320124 33.950508 19.093333 0.0 ...
6/19/2018 1:00 34.808016 19.319666 33.942532 18.733333 0.0 ...
6/19/2018 1:30 30.645216 19.320406 33.931263 18.760000 0.0 ...
6/19/2018 2:00 26.608980 19.326729 33.917961 18.633333 0.0 ...
... ... ... ... ... ... ...
9/7/2019 22:00 -3.989912 20.990612 33.776449 23.700000 0.0 ...
9/7/2019 22:30 -2.807042 21.012014 33.702310 23.620000 0.0 ...
9/7/2019 23:00 -3.471326 20.831739 33.726177 23.666667 0.0 ...
9/7/2019 23:30 0.707771 21.006086 33.716274 23.633333 0.0 ...
9/8/2019 0:00 1.011731 20.896149 33.729773 23.600000 0.0 ...
aac_coppml Total_otus otu_5575 otu_273 otu_94
index
6/19/2018 0:00 NaN NaN NaN NaN NaN
6/19/2018 0:30 NaN NaN NaN NaN NaN
6/19/2018 1:00 NaN NaN NaN NaN NaN
6/19/2018 1:30 NaN NaN NaN NaN NaN
6/19/2018 2:00 NaN NaN NaN NaN NaN
... ... ... ... ... ...
9/7/2019 22:00 NaN NaN NaN NaN NaN
9/7/2019 22:30 NaN NaN NaN NaN NaN
9/7/2019 23:00 NaN NaN NaN NaN NaN
9/7/2019 23:30 NaN NaN NaN NaN NaN
9/8/2019 0:00 NaN NaN NaN NaN NaN
[1446 rows x 25 columns]
print(type(df.index))
<class 'pandas.core.indexes.base.Index'>
Although we index of dataframe is date and time but pandas does not recognize it as data and time but it recognizes it just as numbers
df = pd.read_excel("arg_busan.xlsx", index_col="index", parse_dates=True)
print(df)
tide_cm wat_temp_c sal_psu air_temp_c pcp_mm \
index
2018-06-19 00:00:00 36.407149 19.321232 33.956058 19.780000 0.0
2018-06-19 00:30:00 35.562515 19.320124 33.950508 19.093333 0.0
2018-06-19 01:00:00 34.808016 19.319666 33.942532 18.733333 0.0
2018-06-19 01:30:00 30.645216 19.320406 33.931263 18.760000 0.0
2018-06-19 02:00:00 26.608980 19.326729 33.917961 18.633333 0.0
... ... ... ... ... ...
2019-09-07 22:00:00 -3.989912 20.990612 33.776449 23.700000 0.0
2019-09-07 22:30:00 -2.807042 21.012014 33.702310 23.620000 0.0
2019-09-07 23:00:00 -3.471326 20.831739 33.726177 23.666667 0.0
2019-09-07 23:30:00 0.707771 21.006086 33.716274 23.633333 0.0
2019-09-08 00:00:00 1.011731 20.896149 33.729773 23.600000 0.0
... aac_coppml Total_otus otu_5575 otu_273 otu_94
index ...
2018-06-19 00:00:00 ... NaN NaN NaN NaN NaN
2018-06-19 00:30:00 ... NaN NaN NaN NaN NaN
2018-06-19 01:00:00 ... NaN NaN NaN NaN NaN
2018-06-19 01:30:00 ... NaN NaN NaN NaN NaN
2018-06-19 02:00:00 ... NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
2019-09-07 22:00:00 ... NaN NaN NaN NaN NaN
2019-09-07 22:30:00 ... NaN NaN NaN NaN NaN
2019-09-07 23:00:00 ... NaN NaN NaN NaN NaN
2019-09-07 23:30:00 ... NaN NaN NaN NaN NaN
2019-09-08 00:00:00 ... NaN NaN NaN NaN NaN
[1446 rows x 25 columns]
Now the index of dataframe is read as DateTimeIndex
print(type(df.index))
#%%
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Total running time of the script: ( 0 minutes 4.364 seconds)