TimeSeries SciKit
Introduction
Please note that this package is still experimental and subject to API changes.
The TimeSeries scikits module provides classes and functions for manipulating, reporting, and plotting time series of various frequencies.
License
The TimeSeries scikit is free for both commercial and non-commercial use, under the BSD terms. Click here for license details.
Requirements
In order to use the TimeSeries package, you need to have the following packages installed:
- Python 2.4 or later
- setuptools : scikits is a namespace package, and as a result every scikit requires setuptools to be installed to function properly.
- numpy 1.1.0 or later
Optional (but recommended)
- scipy : Some of the lib sub-modules (interpolate, moving_funcs) make use of scipy functions.
- matplotlib : matplotlib is required for time series plotting.
Setup
The module includes a setup script which you can use in the standard python manner to compile the C code. If you have difficulty installing, please ask for assistance on the scipy-user mailing list.
The timeseries module itself is currently only through subversion (http://svn.scipy.org/svn/scikits/trunk/timeseries). To install it, run python setup.py install in the directory you checked out the source code to.
If you are using Windows and are having trouble compiling the module, please see the following page in the cookbook: Compiling Extensions on Windows
The current plan is to begin doing official releases and distributing windows binaries once an official release of numpy has been made which includes the new version of masked array. In the mean time, please bear with us.
Dates
Even if you have no use for time series in general, you may still find the Date class contained in the module quite useful. A Date object combines some date and/or time related information with a given frequency. You can picture the frequency as the unit into which the date is expressed. For example, we can create dates in the following manner:
>>> # The following imports are assumed throughout the documentation >>> import numpy as np >>> import numpy.ma as ma >>> import datetime >>> import scikits.timeseries as ts >>> >>> D = ts.Date(freq='D', year=2007, month=1, day=1) >>> M = ts.Date(freq='M', year=2007, month=1) >>> Y = ts.Date(freq='A', year=2007)
Observe that you only need to specify as much information as is relevant to the frequency. The importance of the frequency will become clearer later on.
~-A more technical note: Date objects are internally stored as integers. The conversion to integers and back is controlled by the frequency. In the example above, the internal representation of the three objects D, M and Y are 732677, 24073 and 2007, respectively. -~
Construction of a Date object
Several options are available to construct a Date object explicitly. In each case, the frequency argument must be given. Valid frequency specifications are given in the section Frequencies below.
- Give appropriate values to any of the year, month, day, quarter, hour, minute, second arguments.
>>> ts.Date(freq='Q',year=2004,quarter=3) <Q : 2004Q3> >>> ts.Date(freq='D',year=2001,month=1,day=1) <D : 01-Jan-2001>
- Use the string keyword.
>>> ts.Date('D', string='2007-01-01') <D : 01-Jan-2007>
- Use the datetime keyword with an existing datetime.datetime object.
>>> ts.Date('D', datetime=datetime.datetime.now()) - Use the value keyword and provide an integer representation of the date.
>>> ts.Date('D', value=732677) <D : 01-Jan-2007>
Frequencies
For any functions or class constructors taking a frequency argument, the frequency can be specified in one of two ways: using a valid string representation of the frequency, or using the integer frequency constants. The constants can be found in the timeseries.const sub-module. The following table lists the frequency constants and their valid string aliases.
| CONSTANT | String aliases (case insensitive) |
For annual frequencies, "Year" is determined by where the last month of the year falls.
| FR_ANN | 'A', 'Y', 'ANNUAL', 'ANNUALLY', 'YEAR', 'YEARLY' |
| FR_ANNDEC | 'A-DEC', 'A-December', 'Y-DEC', 'ANNUAL-DEC', etc... (annual frequency with December year end, equivalent to FR_ANN) |
| FR_ANNNOV | 'A-NOV', 'A-NOVEMBER', 'Y-NOVEMBER', 'ANNUAL-NOV', etc... (annual frequency with November year end) |
| FR_ANNOCT | 'A-OCT', 'A-OCTOBER', 'Y-OCTOBER', 'ANNUAL-OCT', etc... (annual frequency with October year end) |
| FR_ANNSEP | 'A-SEP', 'A-SEPTEMBER', 'Y-SEPTEMBER', 'ANNUAL-SEP', etc... (annual frequency with September year end) |
| FR_ANNAUG | 'A-AUG', 'A-AUGUST', 'Y-AUGUST', 'ANNUAL-AUG', etc... (annual frequency with August year end) |
| FR_ANNJUL | 'A-JUL', 'A-JULY', 'Y-JULY', 'ANNUAL-JUL', etc... (annual frequency with July year end) |
| FR_ANNJUN | 'A-JUN', 'A-JUNE', 'Y-JUNE', 'ANNUAL-JUN', etc... (annual frequency with June year end) |
| FR_ANNMAY | 'A-MAY', 'Y-MAY', 'YEARLY-MAY', 'ANNUAL-MAY', etc... (annual frequency with May year end) |
| FR_ANNAPR | 'A-APR', 'A-APRIL', 'Y-APRIL', 'ANNUAL-APR', etc... (annual frequency with April year end) |
| FR_ANNMAR | 'A-MAR', 'A-MARCH', 'Y-MARCH', 'ANNUAL-MAR', etc... (annual frequency with March year end) |
| FR_ANNFEB | 'A-FEB', 'A-FEBRUARY', 'Y-FEBRUARY', 'ANNUAL-FEB', etc... (annual frequency with February year end) |
| FR_ANNJAN | 'A-JAN', 'A-JANUARY', 'Y-JANUARY', 'ANNUAL-JAN', etc... (annual frequency with January year end) |
For the following quarterly frequencies, "Year" is determined by where the last quarter of the current group of quarters ENDS
| FR_QTR | 'Q', 'QUARTER', 'QUARTERLY' |
| FR_QTREDEC | 'Q-DEC', 'QTR-December', 'QUARTERLY-DEC', etc... (quarterly frequency with December year end, equivalent to FR_QTR) |
| FR_QTRENOV | 'Q-NOV', 'QTR-NOVEMBER', 'QUARTERLY-NOV', etc... (quarterly frequency with November year end) |
| FR_QTREOCT | 'Q-OCT', 'QTR-OCTOBER', 'QUARTERLY-OCT', etc... (quarterly frequency with October year end) |
| FR_QTRESEP | 'Q-SEP', 'QTR-SEPTEMBER', 'QUARTERLY-SEP', etc... (quarterly frequency with September year end) |
| FR_QTREAUG | 'Q-AUG', 'QTR-AUGUST', 'QUARTERLY-AUG', etc... (quarterly frequency with August year end) |
| FR_QTREJUL | 'Q-JUL', 'QTR-JULY', 'QUARTERLY-JUL', etc... (quarterly frequency with July year end) |
| FR_QTREJUN | 'Q-JUN', 'QTR-JUNE', 'QUARTERLY-JUN', etc... (quarterly frequency with June year end) |
| FR_QTREMAY | 'Q-MAY', 'QTR-MAY', 'QUARTERLY-MAY', etc... (quarterly frequency with May year end) |
| FR_QTREAPR | 'Q-APR', 'QTR-APRIL', 'QUARTERLY-APR', etc... (quarterly frequency with April year end) |
| FR_QTREMAR | 'Q-MAR', 'QTR-MARCH', 'QUARTERLY-MAR', etc... (quarterly frequency with March year end) |
| FR_QTREFEB | 'Q-FEB', 'QTR-FEBRUARY', 'QUARTERLY-FEB', etc... (quarterly frequency with February year end) |
| FR_QTREJAN | 'Q-JAN', 'QTR-JANUARY', 'QUARTERLY-JAN', etc... (quarterly frequency with January year end) |
For the following quarterly frequencies, "Year" is determined by where the first quarter of the current group of quarters STARTS
| FR_QTRSDEC | 'Q-S-DEC', 'QTR-S-December', etc... (quarterly frequency with December year end) |
| FR_QTRSNOV | 'Q-S-NOV', 'QTR-S-NOVEMBER', etc... (quarterly frequency with November year end) |
| FR_QTRSOCT | 'Q-S-OCT', 'QTR-S-OCTOBER', etc... (quarterly frequency with October year end) |
| FR_QTRSSEP | 'Q-S-SEP', 'QTR-S-SEPTEMBER', etc... (quarterly frequency with September year end) |
| FR_QTRSAUG | 'Q-S-AUG', 'QTR-S-AUGUST', etc... (quarterly frequency with August year end) |
| FR_QTRSJUL | 'Q-S-JUL', 'QTR-S-JULY', etc... (quarterly frequency with July year end) |
| FR_QTRSJUN | 'Q-S-JUN', 'QTR-S-JUNE', etc... (quarterly frequency with June year end) |
| FR_QTRSMAY | 'Q-S-MAY', 'QTR-S-MAY', etc... (quarterly frequency with May year end) |
| FR_QTRSAPR | 'Q-S-APR', 'QTR-S-APRIL', etc... (quarterly frequency with April year end) |
| FR_QTRSMAR | 'Q-S-MAR', 'QTR-S-MARCH', etc... (quarterly frequency with March year end) |
| FR_QTRSFEB | 'Q-S-FEB', 'QTR-S-FEBRUARY', etc... (quarterly frequency with February year end) |
| FR_QTRSJAN | 'Q-S-JAN', 'QTR-S-JANUARY', etc... (quarterly frequency with January year end) |
| FR_MTH | 'M', 'MONTH', 'MONTHLY' |
| FR_WK | 'W', 'WEEK', 'WEEKLY' |
| FR_WKSUN | 'W-SUN', 'WEEK-SUNDAY', 'WEEKLY-SUN', etc... (weekly frequency with Sunday being the last day of the week, equivalent to FR_WK) |
| FR_WKSAT | 'W-SAT', 'WEEK-SATURDAY', 'WEEKLY-SUN', etc... (weekly frequency with Saturday being the last day of the week) |
| FR_WKFRI | 'W-FRI', 'WEEK-FRIDAY', 'WEEKLY-FRI', etc... (weekly frequency with Friday being the last day of the week) |
| FR_WKTHU | 'W-THU', 'WEEK-THURSDAY', 'WEEKLY-THU', etc... (weekly frequency with Thursday being the last day of the week) |
| FR_WKWED | 'W-WED', 'WEEK-WEDNESDAY', 'WEEKLY-WED', etc... (weekly frequency with Wednesday being the last day of the week) |
| FR_WKTUE | 'W-TUE', 'WEEK-TUESDAY', 'WEEKLY-TUE', etc... (weekly frequency with Tuesday being the last day of the week) |
| FR_WKMON | 'W-MON', 'WEEK-MONDAY', 'WEEKLY-MON', etc... (weekly frequency with Monday being the last day of the week) |
| FR_BUS | 'B', 'BUSINESS', 'BUSINESSLY' |
| FR_DAY | 'D', 'DAY', 'DAILY' |
| FR_HR | 'H', 'HOUR', 'HOURLY' |
| FR_MIN | 'T', 'MINUTE', 'MINUTELY' |
| FR_SEC | 'S', 'SECOND', 'SECONDLY' |
| FR_UND | 'U', 'UNDEF', 'UNDEFINED' |
Convenience functions
- now : get the current Date at a specified frequency
- prevbusday : get the previous business day, determined by a specified cut off time. See the function's doc string for more details.
Manipulating dates
You can convert a Date object from one frequency to another with the asfreq method. When converting to a higher frequency (for example, from monthly to daily), you may optionally specify the "relation" parameter with the value "START" or "END" (default is "END"). Note that if you convert a daily Date to a monthly frequency and back to a daily one, you will lose your day information in the process (similarly for converting any higher frequency to a lower one):
>>> D = ts.Date('D', year=2007, month=12, day=31) >>> D.asfreq('M') <M: Dec-2006> >>> D.asfreq('M').asfreq('D', relation="START") <D: 01-Dec-2006> >>> D.asfreq('M').asfreq('D', relation="END") <D: 31-Dec-2006>
You can add and subtract integers from a Date object to get a new Date object. The frequency of the new object is the same as the original one. For example:
>>> yesterday = ts.now('D') - 1 >>> infivemonths = ts.now('M') + 5
You can also subtract a Date from another Date of the same frequency to determine the number of periods between the two dates.
>>> Y = ts.Date('A', year=2007) >>> days_in_year = Y.asfreq('D', relation='END') - Y.asfreq('D', relation='START') + 1 >>> days_in_year 365
Some other methods worth mentioning are:
- toordinal : converts an object to the equivalent proleptic gregorian date.
- tostring : converts an object to the corresponding string.
Formatting Dates as Strings
To output a date as a string, you can simply cast it to a string (call str on it) and a default output format for that frequency will be used, or you can use the strfmt method for explicit control. The strfmt method of the Date class takes one argument: a format string. This behaves in essentially the same manner as the strftime function in the standard python time module and accepts the same directives, plus several additional directives outlined below.
| Directive | Meaning |
| %q | the quarter of the date |
| %f | Year without century as a decimal number [00,99]. The year in this case is the year of the date determined by the year for the current quarter. This is the same as %y unless the Date is one of the quarterly frequencies. In financial terms, this is the 'fiscal year'. |
| %F | Year with century as a decimal number. The year in this case is the year of the date determined by the year for the current quarter. This is the same as %Y unless the Date is one of the quarterly frequencies. In financial terms, this is the 'fiscal year'. |
Examples
>>> a = ts.Date(freq='q-jul', year=2006, quarter=1) >>> a.strfmt("%F-Q%q") '2006-Q1' >>> a.strfmt("%b-%Y") # this will output the last month in the quarter for this date 'Oct-2005' >>> b = ts.Date(freq='d', year=2006, month=4, day=25) >>> b.strfmt("%d-%b-%Y") '25-Apr-2006'
DateArray objects
DateArrays are simply ndarrays of Date objects. They accept the same methods as a Date object, with the addition of:
- tovalue : converts the array to an array of integers. Each integer is the internal representation of the corresponding date.
- has_missing_dates : outputs a boolean on whether some dates are missing or not.
- has_duplicated_dates : outputs a boolean on whether some dates are duplicated or not.
Construction
To construct a DateArray object, you can use the factory function date_array (preferred), or call the class directly. See the __doc__ strings of date_array and DateArray for parameter details.
TimeSeries
A TimeSeries object is the combination of three ndarrays:
- dates: DateArray object.
- data : ndarray.
- mask : Boolean ndarray, indicating missing or invalid data.
These three arrays can be accessed as attributes of a TimeSeries object. Another very useful attribute is series, that gives you the possibility to directly access data and mask as a masked array.
Construction
To construct a TimeSeries, you can use the factory function time_series (preferred) or call the class directly. See the __doc__ strings of time_series and TimeSeries for parameter details.
Use the class constructor when you want to bypass some of the overhead associated with the additional flexibility in the factory function.
Let us construct a series of 600 random elements, starting 600 business days ago, at a business daily frequency:
>>> data = np.random.uniform(-100,100,600) >>> today = ts.now('B') >>> series = ts.time_series(data, dtype=np.float_, freq='B', start_date=today-600)
We can check that series.dates is a DateArray object and that series.series is a MaskedArray object.
>>> isinstance(series.dates, ts.DateArray) True >>> isinstance(series.series, ma.MaskedArray) True
So, if you are already familiar with MaskedArray, using TimeSeries should be straightforward. Just keep in mind that another attribute is always present, dates.
Dates and Data compatibility
The example we just introduced corresponds to the simplest case of only one variable indexed in time. In that case, the DateArray object should have the same size as the data part. In our example, the length of the DateArray was automatically adjusted to match the data length, and we have DateArray.size == series.size.
However, it is often convenient to use series with multiple variables. A simple representation of this kind of data is a matrix, with as many rows as actual observations and as many columns as variables. In that case, the DateArray object should have the same length as the number of rows. More generally, DateArray.size should be equal to series.shape[0].
When a TimeSeries is created from a multi-dimensional data and a single starting date, it is assumed that the data consists of several variables: the length of the DateArray is then adjusted to match len(data). However, you can force the length of the DateArray with the length optional parameter.
For example, let us consider the case of an array of (50 x 12) points, corresponding to 50 years of monthly data.
>>> data = np.random.uniform(-1,1,50*12).reshape(50,12)
We may want to consider each month independently from the others: in that case, we want an annual series of 50 observations, each observation consisting of 12 variables. We define the time series as:
>>> newseries = ts.time_series(data, start_date=ts.now('Y')-50)
>>> newseries._dates.size
50
But we can also consider the series as monthly data. We could even ravel the initial data, or force the length of the DateArray:
>>> newseries = ts.time_series(data, start_date=ts.now('M')-600, length=600)
>>> newseries._dates.size
600
Now, let us consider the case of a (5x10x10) array. For example, each (10x10) slice could be a raster map, or a picture. The following code defines a daily series of 5 maps:
>>> data = np.random.uniform(-1,1,5*10*10).reshape(5,10,10)
>>> newseries = ts.time_series(data, start_date=ts.now('D'))
Indexing
Elements of a TimeSeries can be accessed just like with regular ndarrrays. Thus,
>>> series[0]
outputs the first element, while
>>> series[-30:]
outputs the last 30 elements.
But you can also use a date:
>>> thirtydaysago = today - 30 >>> series[thirtydaysago:]
or even a string...
>>> series[thirtydaysago.tostring():]
or a sequence/ndarray of integers...
>>> series[[0,-1]]
~-This latter is quite useful: it gives you the first and last data of your series.-~
In a similar way, setting elements of a TimeSeries works seamlessly. Let us set negative values to zero...
>>> series[series<0] = 0
... and the values falling on Fridays to 100
>>> series[series.weekday == 4] = 100
We can also index on multiple criteria. We will create a temporary array of 'weekdays' to avoid recomputing the weekdays multiple times. Here we will set all Wednesday and Fridays to 100.
>>> weekdays = series.weekday >>> series[(weekdays == 2) | (weekdays == 4)] = 100
You should keep in mind that TimeSeries are basically MaskedArrays. If some data of an array are masked, you will not be able to use this array as index, you will have to fill it first.
Missing Observations (aka masked values)
Missing observations are handled in exactly the same way as with masked arrays. If you are familiar with masked arrays, then there is nothing new to learn. Please see the main numpy documentation for additional info on masked arrays.
Operations on TimeSeries
If you work with only one TimeSeries, you can use the maskedarray commands to process the data. For example:
>>> series_log = ma.log(series)
Note that invalid values (negative, in that case), are automatically masked. Note also that you could use the standard numpy version of the function instead, however the reduce and accumulate methods of some ufuncs (such as add or multiply) will only function properly with the maskedarray versions. ~-The reason is that the methods of the numpy.ufuncs will not know how to properly ignore masked values for such operations.-~
When working with multiple series, only series of the same frequency, size and starting date can be used in basic operations. The function align_series ~-(or its alias aligned)-~ forces series to have matching starting and ending dates. By default, the starting date will be set to the smallest starting date of the series, and the ending date to the largest.
Let's construct a list of months, starting on Jan 2005 and ending on Dec 2006, with a gap from Oct 2005 to Jan 2006.
>>> mlist_1 = ['2005-%02i' % i for i in range(1,10)] >>> mlist_1 += ['2006-%02i' % i for i in range(2,13)] >>> mdata_1 = np.arange(len(mlist_1)) >>> mser_1 = ts.time_series(mdata_1, mlist_1, freq='M')
Let us check whether there are some duplicated dates (no):
>>> mser_1.has_duplicated_dates()
False
...or missing dates (yes):
>>> mser_1.has_missing_dates()
True
Let us construct a second monthly series, this time without missing dates:
>>> mlist_2 = ['2004-%02i' % i for i in range(1,13)] >>> mlist_2 += ['2005-%02i' % i for i in range(1,13)] >>> mser_2 = ts.time_series(np.arange(len(mlist_2)), mlist_2, freq='M')
We cannot perform binary operations on these two series (such as adding them together) because the dates of the series do not line up. Thus, we need to align them first.
>>> (malg_1, malg_2) = ts.align_series(mser_1, mser_2)
Now we can add the two series. Only the data that fall on dates common to the original, non-aligned series will be actually added, the others will be masked. After all, we are adding masked arrays.
>>> mser_3 = malg_1 + malg_2
We could have filled the initial series first (replace masked values with a specified value):
>>> mser_3 = malg_1.filled(0) + malg_2.filled(0)
When aligning the series, we could have forced the series to start/end at some given dates:
>>> (malg_1,malg_2) = align_series(mser_1_filled, mser2, start_date='2004-06', end_date='2006-06')
Time Shifting Operations
Calculating things like rate of change, or difference in a TimeSeries can be done most easily using a special method called tshift.
Suppose we want to calculate a Year over Year rate of return for a monthly time series. One might initially try to do something along the lines of...
>>> YoY_change = 100*(mser[12:]/mser[:-12] - 1)
This will give you the correct numerical result, but since mser[12:] and mser[:-12] have different start and end dates, the result will be forced to a plain MaskedArray. Also, it will not be the same shape as your original input series, which may also be inconvenient. To get around these issues, use the tshift method instead.
>>> YoY_change = 100*(mser/mser.tshift(-12, copy=False) - 1)
mser.tshift(-12, copy=False) returns a series with the same start_date and end_date as mser, but values shifted to the right by 12 periods. Note that this will result in 12 masked values at the start of the resulting series. By default tshift copies any data it uses from the original series, but for situations like the example above you may want to avoid that.
TimeSeries Frequency Conversion
To convert a TimeSeries to another frequency, use the convert method or function. The optional argument func must be a function that acts on a 1D masked array and returns a scalar.
>>> mseries = series.convert('M',func=ma.average)
If func is None (the default value), the convert method/function returns a 2D array, where each row corresponds to the new frequency, and the columns to the original data. In our example, convert will return a 2D array with 23 columns, as there are at most 23 business days per month.
>>> mseries_default = series.convert('M')
When converting from a lower frequency to a higher frequency, an extra argument position is used to determine the placement of values in the resulting series. The value of the argument is either 'START' or 'END' ('END' by default). This will yield a series with a lot of masked values. To fill in these masked values, see the section Interpolating Masked Values below.
"asfreq" vs "convert": Be careful not to confuse these two methods. "asfreq" simply takes every date in the .dates attribute of the TimeSeries and changes it to the specified frequency, so the resulting series will have the same shape as the original series. "convert" is a more complicated function that takes a series with no missing or duplicated dates and creates a series at the new frequency with no missing or duplicated dates and intelligently places the data from the original series into appropriate points in the new series.
Interpolating Masked Values
The timeseries.interpolate sub-module contains several functions for filling in masked values in an array. Currently this includes:
- interp_masked1d
- foward_fill
- backward_fill
Let us take a monthly TimeSeries , convert it to business frequency, and then interpolate the resulting masked values.
>>> import scikits.timeseries.lib.interpolate as itp >>> mser = ts.time_series(np.arange(12, dtype=np.float_), start_date=ts.now('M')) >>> bser = mser.convert("B", position='END') >>> bser_ffill = itp.forward_fill(bser, maxgap=30) >>> bser_bfill = itp.backward_fill(bser) >>> bser_linear = itp.interp_masked1d(bser, kind='linear')
The optional maxgap parameter for forward_fill and backward_fill will ensure that if there are more than maxgap consecutive masked values, they will not be filled. Using maxgap=30 like in our above example will ensure that missing months from our original monthly series are not filled in.
Reports
Report Class
The Report class allows you to generate tabular reports of TimeSeries objects with dates in the left most column. An instance of the Report class is essentially a template for generating reports. All parameters to the __init__ method of the class are optional, any options you specify simply serve as the defaults for this instance.
When you call your Report instance (by invoking the __call__ method), you may specify any of the options that are valid for creation of the Report instance, and these options will affect only the current call, they will not modify the defaults for that instance.
Parameters
Both the __init__ and __call__ methods accept all of the following parameters:
- *tseries : time series objects. Must all be at the same frequency, but do not need to be aligned.
- dates (DateArray, None) : dates at which values of all the series will be output. If not specified, data will be output from the minimum start_date to the maximum end_date of all the time series objects.
- header_row (list, None) : List of column headers. Specifying the header for the date column is optional.
- header_char (str, '-'): Character to be used for the row separator line between the header and first row of data. None for no separator. This is ignored if header_row is None.
- header_justify (List of strings or single string, None) : Determines how headers are justified. If not specified, all headers are left justified. If a string is specified, it must be one of 'left', 'right', or 'center' and all headers will be justified the same way. If a list is specified, each header will be justified according to the specification for that header in the list. Specifying the justification for the date column is header is optional.
- row_char (str, None): Character to be used for the row separator line between each row of data. None for no separator.
- footer_func (List of functions or single function, None) : A function or list of functions for summarizing each data column in the report. For example, ma.sum to get the sum of the column. If a list of functions is provided there must be exactly one function for each column. Do not specify a function for the Date column.
- footer_char (str, '-'): Character to be used for the row separator line between the last row of data and the footer. None for no separator. This is ignored if footer_func is None.
- footer_label (str, None) : label for the footer row. This goes at the end of the date column. This is ignored if footer_func is None.
- justify (List of strings or single string, *[None]*) : Determines how data are justified in their column. If not specified, the date column and string columns are left justified, and everything else is right justified. If a string is specified, it must be one of 'left', 'right', or 'center' and all columns will be justified the same way. If a list is specified, each column will be justified according to the specification for that column in the list. Specifying the justification for the date column is optional.
- prefix (str, '') : A string prepended to each printed row.
- postfix (str, '') : A string appended to each printed row.
- mask_rep (str, '--'): String used to represent masked values in output.
- datefmt (str, None) : Formatting string used for displaying the dates in the date column. If None, str() is simply called on the dates.
- fmtfunc (List of functions or single function, None) : A function or list of functions for formatting each data column in the report. If not specified, str() is simply called on each item. If a list of functions is provided, there must be exactly one function for each column. Do not specify a function for the Date column, that is handled by the datefmt argument.
- wrapfunc (List of functions or single function, lambda x:x): A function f(text) for wrapping text; each element in the column is first wrapped by this function. Instances of wrap_onspace, wrap_onspace_strict, and wrap_always (which are part of this module) work well for this. Eg. wrapfunc=wrap_onspace(10) . If a list is specified, each column will be wrapped according to the specification for that column in the list. Specifying a function for the Date column is optional.
- col_width (list of integers or single integer, None): use this to specify a width for all columns (single integer), or each column individually (list of integers). The column will be at least as wide as col_width, but may be larger if cell contents exceed col_width. If specifying a list, you may optionally specify the width for the Date column as the first entry.
- output (buffer, sys.stdout): output must have a write method.
- fixed_width (boolean, True): If True, columns are fixed width (ie. cells will be padded with spaces to ensure all cells in a given column are the same width). If False, col_width will be ignored and cells will not be padded.
Examples
# the following variables will be used throughout the examples import scikits.timeseries.lib.reportlib as rl ser1 = ts.time_series(np.random.uniform(-100,100,10), start_date=ts.now('b')-5) ser2 = ts.time_series(np.random.uniform(-100,100,10), start_date=ts.now('b')) strings = ['some string', 'another string', 'yet another, string', 'final string'] ser3 = ts.time_series(strings, start_date=ts.now('b'), dtype=np.string_) dArray = ts.date_array(start_date=ts.now('b'), length=3)
Example 1: Basic report
>>> basicReport = rl.Report(ser1, ser2, ser3) >>> basicReport() """ 29-Jan-2007 | -95.4554568525 | -- | -- 30-Jan-2007 | 8.58356086571 | -- | -- 31-Jan-2007 | 41.6353000447 | -- | -- 01-Feb-2007 | -70.4674570816 | -- | -- 02-Feb-2007 | 2.98803489327 | -- | -- 05-Feb-2007 | -21.6474414786 | -77.750560056 | some string 06-Feb-2007 | 84.3212422071 | 56.2238118715 | another string 07-Feb-2007 | 23.5664556686 | 64.2491772743 | yet another, string 08-Feb-2007 | 34.8778728662 | -39.4734173695 | final string 09-Feb-2007 | -64.0545308092 | -83.7175337221 | -- 12-Feb-2007 | -- | 52.4958419122 | -- 13-Feb-2007 | -- | 7.1396171176 | -- 14-Feb-2007 | -- | -57.7688749366 | -- 15-Feb-2007 | -- | 71.2844695721 | -- 16-Feb-2007 | -- | 87.1665936067 | -- """
Example 2: csv report for excel
>>> mycsv = open('mycsv.csv', 'w') >>> strfmt = lambda x: '"'+str(x)+'"' >>> fmtfunc = [None, None, strfmt] >>> csvReport = rl.Report(ser1, ser2, ser3, fmtfunc=fmtfunc, mask_rep='#N/A', delim=',', fixed_width=False) >>> csvReport() # output to sys.stdout """ 29-Jan-2007,67.4086881661,#N/A,#N/A 30-Jan-2007,-78.8405461996,#N/A,#N/A 31-Jan-2007,10.0559754743,#N/A,#N/A 01-Feb-2007,-71.149716374,#N/A,#N/A 02-Feb-2007,-46.055865283,#N/A,#N/A 05-Feb-2007,35.9105419931,85.1744316431,"some string" 06-Feb-2007,2.93015788615,-87.0634270731,"another string" 07-Feb-2007,-49.0774248826,-91.4854233865,"yet another, string" 08-Feb-2007,94.8175754225,36.587114053,"final string" 09-Feb-2007,-88.9474880802,37.3563788938,#N/A 12-Feb-2007,#N/A,21.1325367724,#N/A 13-Feb-2007,#N/A,72.2437957896,#N/A 14-Feb-2007,#N/A,37.2619438419,#N/A 15-Feb-2007,#N/A,-87.1465826319,#N/A 16-Feb-2007,#N/A,63.5556895555,#N/A """ >>> csvReport(output=mycsv) # output to file
Example 3: HTML report
>>> numfmt = lambda x: '%.2f' % x >>> fmtfunc = [numfmt, numfmt, None] >>> footer_func = [ma.sum, ma.sum, None] >>> footer_label = "Total" >>> htmlReport = rl.Report(ser1, ser2, ser3) >>> htmlReport.set_options(prefix='<tr><td>', delim='</td><td>', postfix='</td></tr>') >>> htmlReport.set_options(wrapfunc=rl.wrap_onspace(10,nls='<BR>')) >>> htmlReport.set_options(fmtfunc=fmtfunc) >>> htmlReport.set_options(footer_label=footer_label, footer_func=footer_func, footer_char='') >>> htmlReport.set_options(dates=dArray) >>> htmlReport() # output to sys.stdout """ <tr><td>05-Feb-2007</td><td> 91.66</td><td>-99.21</td><td>some<BR>string </td></tr> <tr><td>06-Feb-2007</td><td>-68.84</td><td> 30.50</td><td>another<BR>string </td></tr> <tr><td>07-Feb-2007</td><td> 93.53</td><td> 90.46</td><td>yet<BR>another,<BR>string</td></tr> <tr><td>Total </td><td>116.36</td><td> 21.75</td><td> </td></tr> """
Example 4: Extra Options
>>> basicReport = rl.Report(ser1, ser2, ser3, dates=dArray) #............................................................................. """Output report with a header. By default, a line of dashes will separate the header and the first row of data. Optionally, you can specify a label for the Date column as well (so a list with four entries instead of three like this example), If you wish to get rid of the separater line, or use a different character, specify: header_char=''""" >>> basicReport(header_row=['col 1', 'col 2', 'col 3']) """ | col 1 | col 2 | col 3 ------------------------------------------------------------------- 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string """ #............................................................................. """Change column justification for the report. You can specify a single string ('right', 'left', or 'center') and this will impact all columns, or you can specify a list of strings (optionally including the Date column, which is 'left' by default)""" >>> basicReport(justify=['left', 'left', 'right']) """ 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string """ #............................................................................. """Change formatting of Date column""" >>> basicReport(datefmt='%d') """ 06 | 2.59583929443 | -96.2110139217 | some string 07 | -24.1064434097 | 86.0387977626 | another string 08 | -21.6432010416 | 4.83754030508 | yet another, string """ #............................................................................. """Add a separater line between each row""" >>> basicReport(row_char='-') """ 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string ------------------------------------------------------------------- 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string ------------------------------------------------------------------- 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string """ #............................................................................. """Report different series. Notice that the other options set remain intact (ie. dates=dArray)""" >>> basicReport(ser1) """ 06-Feb-2007 | 2.59583929443 07-Feb-2007 | -24.1064434097 08-Feb-2007 | -21.6432010416 """ #............................................................................. """Specify column widths. Just as in the header and justify options, you can specify a single value to affect all columns, or a list which optionally includes a specification for the Date column. Specify -1 to auto-size a column""" >>> basicReport(col_width=[20, 20, -1]) """ 06-Feb-2007 | 2.59583929443 | -96.2110139217 | some string 07-Feb-2007 | -24.1064434097 | 86.0387977626 | another string 08-Feb-2007 | -21.6432010416 | 4.83754030508 | yet another, string """
Plotting
Introduction
The timeseries.plotlib submodule makes it relatively simple to produce time series plots using matplotlib. It relieves the user from the burden of having to setup appropriately spaced and formatted tick labels.
If you have never used matplotlib, you should first go through the tutorial on the matplotlib web-site before following the examples below.
Examples
Adaptation of date_demo2.py in matplotlib tutorial
import matplotlib.pyplot as plt from matplotlib.finance import quotes_historical_yahoo import scikits.timeseries as ts import scikits.timeseries.lib.plotlib as tpl # retrieve data from yahoo. The standard datetime python module is needed here import datetime date1 = datetime.date(2002, 1, 5) date2 = datetime.date(2003, 12, 1) quotes = quotes_historical_yahoo('INTC', date1, date2) """the dates from the yahoo quotes module get returned as integers, which happen to correspond to the integer representation of 'DAILY' frequency dates in the timeseries module. So create a DateArray of daily dates, then convert this to business day frequency afterwards.""" dates = ts.date_array([q[0] for q in quotes], freq='DAILY').asfreq('BUSINESS') opens = [q[1] for q in quotes] raw_series = ts.time_series(opens, dates) """fill_missing_dates will insert masked values for any missing data points. Note that you could plot the series without doing this, but it would cause missing values to be linearly interpolated rather than left empty in the plot""" series = ts.fill_missing_dates(raw_series) fig = tpl.tsfigure() fsp = fig.add_tsplot(111) fsp.tsplot(series, '-') fsp.format_dateaxis() """add grid lines at start of each quarter. Grid lines appear at the major tick marks by default (which, due to the dynamic nature of the ticks for time series plots, cannot be guaranteed to be at quarter start). So if you want grid lines to appear at specific intervals, you must first specify xticks explicitly""" dates = series.dates quarter_starts = dates[dates.quarter != (dates-1).quarter] fsp.set_xticks(quarter_starts.tovalue()) fsp.grid() fsp.set_xlim(int(series.start_date), int(series.end_date)) plt.show()
The above code produces the following plot:
Monthly Data along with an exponential moving average
import matplotlib.pyplot as plt import numpy as np import scikits.timeseries as ts import scikits.timeseries.lib.plotlib as tpl from scikits.timeseries.lib.moving_funcs import mov_average_expw # generate some random data data = np.cumprod(1 + np.random.normal(0, 1, 300)/100) series = ts.time_series(data, start_date=ts.Date(freq='M', year=1982, month=1)) fig = tpl.tsfigure() fsp = fig.add_tsplot(111) fsp.tsplot(series, '-', mov_average_expw(series, 40), 'r--') fsp.format_dateaxis() fsp.set_xlim(int(series.start_date), int(series.end_date)) plt.show()
The above code produces the following plot:
Separate scales for left and right axis
import matplotlib.pyplot as plt import numpy as np import numpy.ma as ma import scikits.timeseries as ts import scikits.timeseries.lib.plotlib as tpl # generate some random data data1 = np.cumprod(1 + np.random.normal(0, 1, 300)/100) data2 = np.cumprod(1 + np.random.normal(0, 1, 300)/100)*100 series1 = ts.time_series(data1, start_date=ts.Date(freq='M', year=1982, month=1)-50) series2 = ts.time_series(data2, start_date=ts.Date(freq='M', year=1982, month=1)) fig = tpl.tsfigure() fsp = fig.add_tsplot(111) # plot series on left axis fsp.tsplot(series1, 'b-', label='<- left series') fsp.set_ylim(ma.min(series1.series), ma.max(series1.series)) # create right axis fsp_right = fsp.add_yaxis(position='right', yscale='log') # plot series on right axis fsp_right.tsplot(series2, 'r-', label='-> right series') fsp_right.set_ylim(ma.min(series2.series), ma.max(series2.series)) # setup legend fsp.legend((fsp.lines[-1], fsp_right.lines[-1]), (fsp.lines[-1].get_label(), fsp_right.lines[-1].get_label()), loc=(0,1)) fsp.format_dateaxis() fsp.set_xlim(int(min(series1.start_date, series2.start_date)), int(max(series1.end_date, series2.end_date))) plt.show()
The above code produces the following plot:
Sample plots at various levels of zoom
The following charts show daily data being plotted at varying length date ranges. This demonstrates the dynamic nature of the axis labels. With interactive plotting, labels will be updated dynamically as you scroll and zoom.
Databases
Storing and retrieving time series from standard relational databases is very
simple once you know a few tricks. For these examples, I use the ceODBC
database module (http://ceodbc.sourceforge.net/) which I have found to be more
reliable and faster than the pyodbc module. However, I *think* these examples
should work with the pyodbc module as well.
SQL Server 2005 Express edition is the database used in the examples. Other
standard relational databases should also work, but I have not personally
verified it.
A database called "test" is assumed to have been created already along with a
table called "test_table" described by the following query:
CREATE TABLE test_table ( [date] [datetime] NULL, [value] [decimal](18, 6) NULL )
If you have verified these examples to work with other databases and python
db modules, it would be greatly appreciated if you could add a note to the
wiki.
Example
import ceODBC as odbc import scikits.timeseries as ts test_series = ts.time_series(range(50), start_date=ts.now('b')) # lets mask one value just to make things interesting test_series[5] = ts.masked conn = odbc.Connection( "Driver={SQL Native Client};Server=localhost;Database=test;Uid=userid;Pwd=password;") crs = conn.cursor() # start with an empty table for these examples crs.execute("DELETE FROM test_table") # convert series to list of (datetime, value) tuples which can be interpreted # by the database module. Note that masked values will get converted to None # with the tolist method. None gets translated to NULL when inserted into the # database. _tslist = test_series.tolist() # insert time series data crs.executemany(""" INSERT INTO test_table ([date], [value]) VALUES (?, ?) """, _tslist ) # Read the data back out of the database. # Explicitly cast data of type decimal to float for reading purposes, # otherwise you will get decimal objects for your result. crs.execute(""" SELECT [date], CAST(ISNULL([value], 999) AS float) as vals, -- convert NULL's to 999 (CASE WHEN [value] is NULL THEN 1 ELSE 0 END) AS mask -- retrieve a mask column FROM test_table ORDER BY [date] ASC """) # zip(*arg) converts row based results to column based results. This is the # crucial trick needed for easily reading time series data from a relational # database with Python _dates, _values, _mask = zip(*crs.fetchall()) _series = ts.time_series(_values, dates=_dates, mask=_mask, freq='B') # commit changes to the database conn.commit() conn.close()
Support / Feedback
- For help using the timeseries scikit, please post questions to the scipy-user mailing list
- For development related inquiries (enhancements, bug, etc), please post questions to the scipy-dev mailing list
- Please file bug reports on trac under the timeseries component
Attachments
- example1.png (28.7 kB) - added by mattknox_ca on 01/26/08 10:03:17.
- example2.png (28.9 kB) - added by mattknox_ca on 01/26/08 10:07:45.
- example3.png (40.5 kB) - added by mattknox_ca on 01/26/08 10:08:12.
- zoom1.png (24.7 kB) - added by mattknox_ca on 01/26/08 10:08:35.
- zoom2.png (33.2 kB) - added by mattknox_ca on 01/26/08 10:08:51.
- zoom3.png (30.4 kB) - added by mattknox_ca on 01/26/08 10:09:12.
- zoom4.png (51.4 kB) - added by mattknox_ca on 01/26/08 10:09:26.







