5.4 reading/writing
Note
Click here to download the full example code or to run this example in your browser via Binder
5.4 reading/writing#
This file describes how to read data from files and write data into files using pandas.
Important
This lesson is still under development.
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/AtrCheema/AI4Water/master/ai4water/datasets/arg_busan.csv")
type(df)
df.to_csv("arg_busan.csv")
The index of df was 0,1,2,… By default, to_csv function writes the index to csv
print(df.index)
RangeIndex(start=0, stop=1446, step=1)
print(df.index.name)
None
we can avoid writing the index to csv file by setting index=False
.
df.to_csv("arg_busan.csv", index=False)
we can also explicitly tell pandas what label for index to use when writing the index to csv file.
df.to_csv("arg_busan.csv", index_label="index")
if we want to save a dataframe to Excel file we can do it as following
df.to_excel("arg_busan.xlsx") # we must have ``openpyxl`` package for that
we can define the sheet name and exlcude the index as following
df.to_excel("arg_busan.xlsx", index=False, sheet_name="data")
to read the excel file as dataframe we can make use of read_excel function
df = pd.read_excel("arg_busan.xlsx")
print(df)
index tide_cm wat_temp_c ... otu_5575 otu_273 otu_94
0 6/19/2018 0:00 36.407149 19.321232 ... NaN NaN NaN
1 6/19/2018 0:30 35.562515 19.320124 ... NaN NaN NaN
2 6/19/2018 1:00 34.808016 19.319666 ... NaN NaN NaN
3 6/19/2018 1:30 30.645216 19.320406 ... NaN NaN NaN
4 6/19/2018 2:00 26.608980 19.326729 ... NaN NaN NaN
... ... ... ... ... ... ... ...
1441 9/7/2019 22:00 -3.989912 20.990612 ... NaN NaN NaN
1442 9/7/2019 22:30 -2.807042 21.012014 ... NaN NaN NaN
1443 9/7/2019 23:00 -3.471326 20.831739 ... NaN NaN NaN
1444 9/7/2019 23:30 0.707771 21.006086 ... NaN NaN NaN
1445 9/8/2019 0:00 1.011731 20.896149 ... NaN NaN NaN
[1446 rows x 26 columns]
we can tell which column should be used as index for the dataframe
df = pd.read_excel("arg_busan.xlsx", index_col="index")
print(df)
tide_cm wat_temp_c sal_psu ... otu_5575 otu_273 otu_94
index ...
6/19/2018 0:00 36.407149 19.321232 33.956058 ... NaN NaN NaN
6/19/2018 0:30 35.562515 19.320124 33.950508 ... NaN NaN NaN
6/19/2018 1:00 34.808016 19.319666 33.942532 ... NaN NaN NaN
6/19/2018 1:30 30.645216 19.320406 33.931263 ... NaN NaN NaN
6/19/2018 2:00 26.608980 19.326729 33.917961 ... NaN NaN NaN
... ... ... ... ... ... ... ...
9/7/2019 22:00 -3.989912 20.990612 33.776449 ... NaN NaN NaN
9/7/2019 22:30 -2.807042 21.012014 33.702310 ... NaN NaN NaN
9/7/2019 23:00 -3.471326 20.831739 33.726177 ... NaN NaN NaN
9/7/2019 23:30 0.707771 21.006086 33.716274 ... NaN NaN NaN
9/8/2019 0:00 1.011731 20.896149 33.729773 ... NaN NaN NaN
[1446 rows x 25 columns]
print(type(df.index))
<class 'pandas.core.indexes.base.Index'>
Although we index of dataframe is date and time but pandas does not recognize it as data and time but it recognizes it just as numbers
df = pd.read_excel("arg_busan.xlsx", index_col="index", parse_dates=True)
print(df)
tide_cm wat_temp_c ... otu_273 otu_94
index ...
2018-06-19 00:00:00 36.407149 19.321232 ... NaN NaN
2018-06-19 00:30:00 35.562515 19.320124 ... NaN NaN
2018-06-19 01:00:00 34.808016 19.319666 ... NaN NaN
2018-06-19 01:30:00 30.645216 19.320406 ... NaN NaN
2018-06-19 02:00:00 26.608980 19.326729 ... NaN NaN
... ... ... ... ... ...
2019-09-07 22:00:00 -3.989912 20.990612 ... NaN NaN
2019-09-07 22:30:00 -2.807042 21.012014 ... NaN NaN
2019-09-07 23:00:00 -3.471326 20.831739 ... NaN NaN
2019-09-07 23:30:00 0.707771 21.006086 ... NaN NaN
2019-09-08 00:00:00 1.011731 20.896149 ... NaN NaN
[1446 rows x 25 columns]
Now the index of dataframe is read as DateTimeIndex
print(type(df.index))
#%%
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Total running time of the script: ( 0 minutes 3.608 seconds)