This tutorial shows how to aggregate the value of variables in a time series by certain time frames with XTS objects.
We will take the univariate timeseries dataset co2 (from Jan 1959 to Dec 1997) as an example (already provided in library xts).
library(xts)
#load the timeseries dataset co2
data(co2)
#convert the timeseries to a XTS object named co2.xts
co2.xts <- as.xts(co2)
#show first 15 rows in co2.xts
head(co2.xts, 15)
[,1]
Jan 1959 315.42
Feb 1959 316.31
Mar 1959 316.50
Apr 1959 317.56
May 1959 318.13
Jun 1959 318.00
Jul 1959 316.39
Aug 1959 314.65
Sep 1959 313.68
Oct 1959 313.18
Nov 1959 314.66
Dec 1959 315.43
Jan 1960 316.27
Feb 1960 316.81
Mar 1960 317.42
This is an example of a monthly timeseries. There are several ways to achieve aggregation of this timeseries.
endpoint
and period.apply
to perform aggregation
Usage of endpoint()
Function endpoint
takes a time series and returns the locations (indices) of the last observations in each interval. We can invoke it like this:
ep <- endpoints(co2.xts, on="quarters", k=1)
head(ep, 10)
[1] 0 3 6 9 12 15 18 21 24 27
We can notice that endpoint()
has three parameters. Valid values for the argument on
include: “us” (microseconds), “microseconds”, “ms” (milliseconds), “milliseconds”, “secs” (seconds), “seconds”, “mins” (minutes), “minutes”, “hours”, “days”, “weeks”, “months”, “quarters”, and “years”. Subsecond on periods are not supported on Windows yet.
endpoint()
returns a numeric vector corresponding to the last observation in each period specified by argument on
, with a zero added to the beginning of the vector, and the index of the last observation in the time series at the end.
Argument k
indicates the function iterates along every k-th element. We set k to 2 in the following code chunk to locate the final index of every two years.
ep2 <- endpoints(co2.xts, on="years", k=2)
head(ep2, 10)
[1] 0 12 36 60 84 108 132 156 180 204
Usage of period.apply()
on univariate timeseries
After getting the indices of the last observations of each time interval, we can use period.apply()
to calculate a specified aggregation function value over non-overlapping intervals along the time series data.
period.apply()
also has sevreal parameters. We can use the return value of endpoints()
as the value of argument INDEX
and specify the function type in argument FUN
. The following code chunks aggregate the mean, standard deviation and cumulated sum values quarterly from the timeseries respectively.
quarterlymean <- period.apply(co2.xts, INDEX = ep, FUN = mean)
head(quarterlymean)
[,1]
Mar 1959 316.0767
Jun 1959 317.8967
Sep 1959 314.9067
Dec 1959 314.4233
Mar 1960 316.8333
Jun 1960 319.3900
quarterlysd <- period.apply(co2.xts, INDEX = ep, FUN = sd)
head(quarterlysd)
[,1]
Mar 1959 0.5765703
Jun 1959 0.2987195
Sep 1959 1.3731108
Dec 1959 1.1435180
Mar 1960 0.5753550
Jun 1960 0.5011986
quarterlycumsum <- period.apply(co2.xts, INDEX = ep, FUN = cumsum)
head(quarterlycumsum)
[,1] [,2] [,3]
Mar 1959 315.42 631.73 948.23
Jun 1959 317.56 635.69 953.69
Sep 1959 316.39 631.04 944.72
Dec 1959 313.18 627.84 943.27
Mar 1960 316.27 633.08 950.50
Jun 1960 318.87 638.74 958.17
Aggregation of multivariate timeseries
We can also apply aggregation to multivariate time series using this method. The aggregation values will be calculated separately on each column (variable). Let’s take dataset sample_matrix as an example (already provided in library xts).
data(sample_matrix)
sample.xts <- as.xts(sample_matrix)
head(sample.xts)
Open High Low Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-01-04 50.42096 50.42096 50.26414 50.33236
2007-01-05 50.37347 50.37347 50.22103 50.33459
2007-01-06 50.24433 50.24433 50.11121 50.18112
2007-01-07 50.13211 50.21561 49.99185 49.99185
ep3 <- endpoints(sample.xts, on="weeks")
weeklymean <- period.apply(sample.xts, INDEX = ep3, FUN = mean)
head(weeklymean)
Open High Low Close
2007-01-08 50.21096 50.27109 50.10555 50.19192
2007-01-15 50.20139 50.35916 50.14784 50.27336
2007-01-22 50.44732 50.55528 50.31774 50.40107
2007-01-29 50.05702 50.13030 49.96866 50.07504
2007-02-05 50.28181 50.41690 50.22217 50.37982
2007-02-12 50.69801 50.82414 50.62509 50.73846
Aggregation of high frequency timeseries
Even high frequency timeseries data can be conveniently aggregated. We can take a stock trading timeseries as an example.
#set the maximum number of digits to print to 6 when formatting time values
#in seconds, since we will use microsecond time indices in this example.
options(digits.secs=6)
#restore the xts object from a RDS file.
stock_data.xts <- readRDS(file = "stockxts.rds")
head(stock_data.xts, 20)
Price Volume Market.VWAP
2019-07-17 10:01:27.900519 41.070 49 41.08583
2019-07-17 10:01:27.904071 41.070 100 41.08582
2019-07-17 10:01:27.904653 41.070 100 41.08582
2019-07-17 10:01:27.908002 41.070 437 41.08579
2019-07-17 10:01:27.908526 41.070 105 41.08578
2019-07-17 10:01:27.912014 41.060 473 41.08573
2019-07-17 10:01:27.912523 41.060 477 41.08568
2019-07-17 10:01:27.916008 41.060 71 41.08567
2019-07-17 10:01:27.916518 41.060 49 41.08566
2019-07-17 10:01:27.916518 41.090 65 41.08566
2019-07-17 10:01:28.016527 41.060 15 41.08566
2019-07-17 10:01:28.204061 41.090 18 41.08566
2019-07-17 10:01:28.204560 41.090 100 41.08566
2019-07-17 10:01:28.208549 41.090 34 41.08566
2019-07-17 10:01:28.467587 41.075 49 41.08566
2019-07-17 10:01:30.483712 41.075 27 41.08566
2019-07-17 10:01:31.567144 41.075 56 41.08566
2019-07-17 10:01:33.031348 41.075 52 41.08566
2019-07-17 10:01:33.031886 41.090 49 41.08566
2019-07-17 10:01:33.035198 41.090 8 41.08566
After loading the timeseries from a rds file, we can calculate the mean trading price, volume and VWAP of this stock per second and show the results.
#perform aggregation on high frequency financial time series
ep <- endpoints(stock_data.xts, 'seconds')
secmean <- period.apply(stock_data.xts, INDEX = ep, FUN = mean)
#align the timestamps to the next seconds
secmean <- align.time(secmean, 1)
head(secmean, 10)
Price Volume Market.VWAP
2019-07-17 10:01:28 41.06800 192.60000 41.08574
2019-07-17 10:01:29 41.08100 43.20000 41.08566
2019-07-17 10:01:31 41.07500 27.00000 41.08566
2019-07-17 10:01:32 41.07500 56.00000 41.08566
2019-07-17 10:01:34 41.08500 33.75000 41.08566
2019-07-17 10:01:35 41.11217 76.86957 41.08575
2019-07-17 10:01:36 41.10800 141.20000 41.08591
2019-07-17 10:01:37 41.14000 34.00000 41.08600
2019-07-17 10:01:38 41.14875 58.25000 41.08602
2019-07-17 10:01:39 41.13750 8.50000 41.08607
split
and lapply
to perform aggregation on intervalsThe sencond way is to use split()
to split your time series data into non-overlapping chunks and use lapply()
to perform aggregations on these periods. The following code chunk shows how to do this on univariate time series co2.
sp.co2 <- split(co2.xts, "years")
#rbind() is used to combine the results by row into a matrix.
yearlymean <- do.call(rbind, lapply(sp.co2, mean))
head(yearlymean)
[,1]
[1,] 315.8258
[2,] 316.7475
[3,] 317.4850
[4,] 318.2975
[5,] 318.8325
[6,] 319.4625
We can compare the result with what we get from using period.apply()
.
yearlymean2 <- period.apply(co2.xts, INDEX = endpoints(co2.xts, on="years"), FUN = mean)
head(yearlymean2)
[,1]
Dec 1959 315.8258
Dec 1960 316.7475
Dec 1961 317.4850
Dec 1962 318.2975
Dec 1963 318.8325
Dec 1964 319.4625
If you want to perform aggregations on mutivariate timeseries, you can write code like this:
weeklymean2 <- do.call(rbind, lapply(split(sample.xts, "weeks"), function(w) apply(w,2,mean)))
head(weeklymean2)
Open High Low Close
[1,] 50.21096 50.27109 50.10555 50.19192
[2,] 50.20139 50.35916 50.14784 50.27336
[3,] 50.44732 50.55528 50.31774 50.40107
[4,] 50.05702 50.13030 49.96866 50.07504
[5,] 50.28181 50.41690 50.22217 50.37982
[6,] 50.69801 50.82414 50.62509 50.73846
rollapply
to perform aggregation to rolling windows in time seriesIn function rollapply
, Argument width
means an integer specifying the window width (in numbers of observations) and FUN
specifies the function to be applied. In the following code chunck, we calculated the weekly mean of different columns in sample.xts
with a sliding window. Please note values of the first 6 rows would be NA
.
samweeksum <- rollapply(sample.xts, width = 7, FUN = mean)
head(samweeksum,15)
Open High Low Close
2007-01-02 NA NA NA NA
2007-01-03 NA NA NA NA
2007-01-04 NA NA NA NA
2007-01-05 NA NA NA NA
2007-01-06 NA NA NA NA
2007-01-07 NA NA NA NA
2007-01-08 50.21096 50.27109 50.10555 50.19192
2007-01-09 50.20454 50.25354 50.08471 50.16271
2007-01-10 50.15908 50.21191 50.03925 50.10197
2007-01-11 50.08256 50.18594 49.98513 50.08865
2007-01-12 50.05957 50.18398 49.97809 50.08159
2007-01-13 50.07093 50.21765 50.00847 50.11470
2007-01-14 50.11829 50.27599 50.07586 50.20178
2007-01-15 50.20139 50.35916 50.14784 50.27336
2007-01-16 50.29072 50.46522 50.25666 50.38265
Load timeseries data “AirPassengers” and convert it to a XTS object. This time series data contains monthly totals of international airline passengers between 1949 to 1960. Please try to use the methods introduced in this tutorial to calculate the mean and stand deviation of the numbers over quarterly and yearly intervals.