library(tdata)
In this vignette, I will introduce you to the main features of the
tdata
package. I will use various datasets to demonstrate
how to perform common tasks, such as defining frequency types and
converting data between frequencies.
Please note that currently, only one section is provided in this vignette. Additional examples will be added in subsequent updates.
Let’s get started!
In the first example, I will use oil price data. The required data
can be downloaded from the Quandl
package using the
following code (Note that the end date in this example may differ from
yours):
<- Quandl::Quandl("OPEC/ORB", start_date="2010-01-01") oil_price
To manipulate data using the tdata
package, we generally
need to create a variable. In this example, we’ll create a variable from
the oil price data. First, we’ll use the values in the first column to
define a frequency. Since the first column contains a list of dates,
we’ll use a ‘List-Date’ frequency:
<- f.list.date(oil_price$Date) start_freq
Now that we have defined the frequency, we can create a variable using the following code:
<- variable(oil_price$Value, start_freq, "Oil Price") var_dl
This creates an array where each element is labeled by a date. We can
print this variable using the print
function:
print(var_dl)
## Variable:
## Name = Oil Price
## Length = 3466
## Frequency Class = List (Date): Ld
## Start Frequency = 20230608
## Fields: NULL
We can also convert the variable back to a data.frame using the
as.data.frame
function:
<- as.data.frame(var_dl) df_var_dl
In this section, we’ll convert var_dl
to a daily
variable. This can be done by sorting the data and filling in any gaps.
The convert.to.daily
function can do this for us:
<- convert.to.daily(var_dl) var_daily
Using this function is more efficient than manually sorting the data
and filling in gaps because var_daily
, as a daily variable,
only stores a single date: the frequency of the first observation. Other
frequencies (or dates) are inferred from this first date (except for
‘Lists’, this is true for other types of frequencies in the
tdata
package). We can print the starting frequency using
the print function:
print(var_daily$startFrequency)
## Frequency: 20100104 (Daily: d)
Each frequency in the tdata
package has a string
representation and a class ID. We can get these values using the
following code:
<- get.class.id(var_daily$startFrequency)
class_id <- as.character(var_daily$startFrequency) str_rep
## [1] "class_id: d, str_rep: 20100104"
Plotting the data is straightforward. We simply convert the data to a
data.frame
using the as.data.frame
function
and then plot it. However, I won’t plot the daily variable in this
example because, since the original data was a ‘List-Date’, there are
many NA
values. In the next section, I’ll aggregate the
data and plot it.
In this section, we’ll convert the daily variable to a weekly
variable. Unlike the previous conversion, this involves aggregating the
data rather than sorting and filling in gaps. To do this, we’ll need to
use an aggregator function that takes an array of data as an argument
and returns a scalar value. Summary statistic functions such as
mean
and median
are natural choices for this
(we’ll also need to handle NA
values). In this example,
I’ll use a built-in function to get the last available data point in
each week as the representative value for that week. Here’s the
code:
<- convert.to.weekly(var_daily, "mon", "last") var_weekly
The second argument, "mon"
, specifies that the week
starts on Monday. Note that the weekly frequency points to the first day
of the week. We can now convert the variable to a
data.frame
and plot it using the following code:
<- as.data.frame(var_weekly)
df_var_weekly par(las = 2, cex.axis = 0.8)
plot(factor(rownames(df_var_weekly)),
$`Oil Price`,
df_var_weeklyxlab = NULL, ylab = "$",
main = "Weekly Oil Price")
Plotting the generated weekly data
There are other frequency types and conversion functions available in
the tdata
package that you can explore on your own.