Time Series Generators
All of the examples in this page are using the TimeSeriesGenerator().create_time_series() method.
Straight Line
We get a straight line by having a specific few interpolation nodes:
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
And when setting the parameters:
randomwalk_scale = 0
noise_scale = 0
season_eff = 0
Expand for full code snippet
Linear trend: With straight line 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )], # (1)
level_breaks = [],
man_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 0 , # (2)
noise_scale = 0 , # (3)
season_eff = 0 , # (4)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "Straight line with no noise" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_straight_line.html" )
fig . show ()
Straight line interpolation
No random walks
No noise
No seasonality
Smooth Curve
If the interpolation nodes were more randomised (not in a straight line), then the generator will aim to build a smooth line which passes through each interpolation node.
For example, if you specify the nodes as:
interpolation_nodes = [( 0.0 , 0 ), ( 274.0 , 400 ), ( 548.0 , 250 ), ( 822.0 , 50 )]
Then you will get a curve that looks like:
Expand for full code snippet
Linear trend: With random interpolation nodes 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = list (
zip ([ n_periods * i / 4 for i in range ( 4 )], [ 0 , 400 , 250 , 50 ])
), # (1)
level_breaks = [],
manual_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 0 , # (2)
noise_scale = 0 , # (3)
season_eff = 0 , # (4)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "Smooth curve with no noise" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "./images/linear_smooth_curve.html" )
fig . show ()
Randomised interpolation
No random walks
No noise
No seasonality
Noise
The noise is just shifting the data points around a amount of normal distribution along the linear trend line, with the scale being the standard deviation.
It is controlled with the parameter:
Expand for full code snippet
Linear trend: With no noise 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 0 , # (1)
noise_scale = 10 , # (2)
season_eff = 0 , # (3)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "Straight line with noise" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_noise.html" )
fig . show ()
No random walks
A little bit of noise
No seasonality
If we increase the noise_scale , then that will widen the standard deviation of the normal distribution, and add more noise.
Expand for full code snippet
Linear trend: With no noise 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 0 , # (1)
noise_scale = 50 , # (2)
season_eff = 0 , # (3)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "Straight line with more noise" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_more_noise.html" )
fig . show ()
No random walks
A lot of of noise
No seasonality
Random Walk
The random walk is a random process that describes a path consisting of a succession of random steps. It utilises a randomisation parameter around the normal distribution, then adds the value to the previous one using the Autoregressive (AR) and Moving Average (MA) models (see the ARMA for more mathematical detail).
It is controlled with the randomwalk_scale parameter. Similar to the manual_outliers parameter, this affects the standard deviation of the normal distribution used to generate the random walk:
Expand for full code snippet
Linear trend: With randomwalk 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 3 , # (1)
noise_scale = 0 , # (2)
season_eff = 0 , # (3)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With noise" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_randomwalk.html" )
fig . show ()
A little bit of random walk
No noise
No seasonality
If you increase this scale, it will increase the standard deviation of each progressive step, introducing more randomisation.
Expand for full code snippet
Linear trend: With no more randomwalk 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [],
MA = [],
exogenous = [],
randomwalk_scale = 7 , # (1)
noise_scale = 0 , # (2)
season_eff = 0 , # (3)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With more randomwalk" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_more_randomwalk.html" )
fig . show ()
A lot more of random walk
No noise
No seasonality
The random walk can also be used in conjunction with the AR and MA models. For example, if you set the AR = [ 0.9 ] , then you will get a time series that is a combination of a random walk and an autoregressive process. To read more about the AR parameter, check the docs for the generate_ARMA() method.
Expand for full code snippet
Linear trend: With no more randomwalk 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [ 0.9 ], # (1)
MA = [ 0 ], # (2)
exogenous = [],
randomwalk_scale = 7 , # (3)
noise_scale = 0 , # (4)
season_eff = 0 , # (5)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With randomwalk and AR[0.9]" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_randomwalk_and_ar.html" )
fig . show ()
Auto Regression: Each element is affected 90% by the previous value
Moving Average: There is no Moving Average effect
A large random walk affect
No noise
No seasonality
If you set the MA = [ 0.4 ] , then you will get a time series that is a combination of a random walk and a moving average process. To read more about the MA parameter, check the docs for the generate_ARMA() method.
Expand for full code snippet
Linear trend: With no more randomwalk 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ n_periods * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = n_periods ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
manual_outliers = [],
AR = [ 1 ],
MA = [ 0.4 ],
exogenous = [],
randomwalk_scale = 7 , # (1)
noise_scale = 0 , # (2)
season_eff = 0 , # (3)
season_conf = None ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With randomwalk and MA[0.4]" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_randomwalk_and_ma.html" )
fig . show ()
Auto Regression: There is no Auto Regressive effect
Moving Average: Each value is corrected by 40% from the random walk effect
A large random walk affect
No noise
No seasonality
Seasonality
The seasonality is a periodic fluctuation in the data, which can be controlled with the season_eff parameter. This parameter is then controlled with the season_conf parameter, which a dictionary with the following keys:
style : The style of the seasonality. One of:
fixed + error : A fixed error pattern.
semi - markov : A semi-Markov pattern.
holiday : A fixed list of holiday dates.
sin : A sine wave pattern.
sin_covar : A sine wave covariance pattern.
season_dates : A list of dates for the seasonality. This is only used if style is holiday .
period_length : The length of the period for the seasonality. For example, if the frequency is weekly, this would be 7. This is only used if style is sin or sin_covar .
period_sd : The standard deviation of the period for the seasonality. This is only used if style is sin or sin_covar .
start_index : The starting index for the seasonality. This is only used if style is sin or sin_covar .
For example, if you set the season_conf parameter to:
season_conf = {
"style" : "sin" ,
"period_length" : 365 ,
"start_index" : 0 ,
}
Then you will get a sine wave pattern with a period of 365 days and a starting index of 0.
The amplitude is the maximum value of the sine wave, which is 0.5 in this case.
Expand for full code snippet
Linear trend: With no noise 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ 1096 * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = 1096 ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
AR = [],
MA = [],
randomwalk_scale = 0 ,
exogenous = [],
season_eff = 0.5 , # (1)
season_conf = { "style" : "sin" , "period_length" : 365 , "start_index" : 0 }, # (2)
manual_outliers = [],
noise_scale = 0 ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With yearly seasonality (sine wave)" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_yearly_seasonality.html" )
fig . show ()
Add seasonality effect of 0.5
Use Sine wave, with a wavelength of 365
Another example with a different period length and starting index:
season_conf = {
"style" : "sin" ,
"period_length" : 30 ,
"start_index" : 4 ,
}
Then you will get a sine wave pattern with a period of 30 days (approximate monthly seasonality) and a starting index of 4. The amplitude is the maximum value of the sine wave, which is 0.5 in this case.
Expand for full code snippet
Linear trend: With no noise 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 # Imports
import pandas as pd
from plotly import express as px , io as pio , graph_objects as go
from synthetic_data_generators.time_series import TimeSeriesGenerator
# Settings
pio . templates . default = "simple_white+gridon"
SEED = 42
TSG = TimeSeriesGenerator ()
n_periods = 1096
# Create data
interpolation_nodes = [[ 1096 * i / 4 , 100 * i ] for i in range ( 4 )]
df : pd . DataFrame = TSG . create_time_series (
start_date = datetime ( 2019 , 1 , 1 ),
n_periods = 1096 ,
interpolation_nodes = interpolation_nodes ,
level_breaks = [],
AR = [],
MA = [],
randomwalk_scale = 0 ,
exogenous = [],
season_eff = 0.5 , # (1)
season_conf = { "style" : "sin" , "period_length" : 30 , "start_index" : 0 }, # (2)
manual_outliers = [],
noise_scale = 0 ,
seed = SEED ,
)
# Build plot
fig : go . Figure = px . line (
df ,
x = "Date" ,
y = "Value" ,
title = "Linear Trend" ,
subtitle = "With (approximate) monthly seasonality (sine wave)" ,
) . update_layout ( title_x = 0.5 , title_xanchor = "center" )
# Render plot
fig . write_html ( "images/linear_with_monthly_seasonality.html" )
fig . show ()
Add seasonality effect of 0.5
Use Sine wave, with a wavelength of 30