Plotting time series in python is so simple

As an econ student, I was more inclined to code in VBA or in R. However, I had my first indirect contact with python through sage math. It is a powerful solver written in python. The sage language share some similarities with python. The software is very attractive because you can call functions from different software packages (simpy, R, maxima, etc…) within a unique platform.

Alas, sagemath is not ready on Debian which is the OS of my workstation. I needed to plot time series. Therefore, why would I not try python?

I found this very good post of a blog titled « Spectral differences ». I would like to import a csv files with futures price inside. You might be european as me. Therefore, your csv file uses semi-colon and not the comma as a separator. Just look at the pandas documentation about the function « read_csv ». It is very simple. To import a CSV with a semi-colon as a separator, we get:
import pandas as pd data_df = pd.read_csv('us_ng_1998_2014.csv', sep=';')
As you can see, you do not have to declare object in python. You can directly defined the content of an object with the equal sign (« = ») and python will determine what is. Here, it sees that « data_df » is a a dataframe. Pandas that we rename as « pd » to be more comfortable is a library. All the functions are attributes of this library such as « library.function(inputs) ».

Now, we can use the library matplotlib to plot the dataframe. It is important to notice that we plot time series. First, matplotlib have to recognize the date format. It is not automatic if the dates are in the european format. I adapted this stackoverflow issue:
data_df['Unnamed: 0'] = pd.to_datetime(data_df['Unnamed: 0'], format='%d/%m/%Y') data_df.set_index(['Unnamed: 0'],inplace=True)
Unnamed:0 is the name given to my first column which is without name. The first line directly transforms the column. So inside the string of the format, you have to indicate the order. %d is for days, %m for month and %Y for year. If you have a slash in your csv file, writes slash. If it is dash, write dash. The second line sets the index. So if you print a column, the index will appear next to the values. I recommand you to replace commas by points for the decimal values. This stackoverflow issue was very useful. The conversion is pretty simple in our case:

data_df = data_df.replace(',', '.', regex=True)

If the matlib plots does not recognize the values as numeric, you can convert them with convert_objects.

At least, I managed to write pretty easily my first python script:
import pandas as pd data_df = pd.read_csv('us_ng_1998_2014.csv', sep=';') #print(data_df.columns) import matplotlib.pyplot as plt data_df['Unnamed: 0'] = pd.to_datetime(data_df['Unnamed: 0'], format='%d/%m/%Y') data_df = data_df.replace(',', '.', regex=True) data_df.set_index(['Unnamed: 0'],inplace=True) #print data_df['1M'] #check the index data_df = data_df.convert_objects(convert_numeric=True) #print data_df.dtypes #check the types data_df['1M'].plot() #plot the column for my first column #data_df.plot() #plot everything plt.show()

Plotting time series in python is so simple

Add Comment Cancel Reply