Forecasting Canon [7751.T] with a Tiny Multilayer Perceptron (MLP)
Monthly closes, lag features, baseline checks, and a compact neural net
time-series
forecasting
finance
equities
Japan
Canon
7751.T
lag-features
baselines
regression
neural-networks
MLP
torch
tidyquant
R (Programming Language)
Author
A. Srikanth
Published
October 10, 2025
By The Numbers
Context
This post follows a simple idea: take a single company, Canon [7751.T], and ask a small model to forecast its monthly closing price in Japanese Yen (JPY). Monthly data keeps the focus on how long term investors think and act, and it cuts a lot of the noise you get from daily swings. The goal isn’t to show off with a giant architecture; it’s to draw a clear line from raw prices to a forecast you can understand and evaluate.
Let’s break it down. The model is a tiny neural network called a multilayer perceptron (MLP). It takes a short list of recent monthly closing prices and predicts the next one. Each layer does two things: it makes a weighted sum and then applies an activation. I use ReLU, which has a simple formula: ReLU\((x)\) = max\((0, x)\). It keeps positive numbers and turns negatives into zero. That small move adds just enough bend beyond a straight line and trains quickly.
To keep the model from memorizing, I add a bit of dropout. The whole setup stays small on purpose, flexible enough to catch gentle curves and basic interactions, but still well behaved on a modest dataset. I train on older months, hold out the most recent block to check my work, and judge it the obvious way: do the plots hug the data, do the errors drop, and does the behavior make sense without hand waving? Out-of-sample predictions start in October 2021 (2021-10). If a unit goes quiet because it only sees negatives, a common tweak is Leaky ReLU, which leaves a tiny slope so it can keep learning.
Objectives
The primary goal is to produce out-of-sample monthly forecasts and measure error with MSE and RMSE. A secondary goal is to prove that a tiny multilayer perceptron can beat plain baselines. If it cannot, we should know that quickly. Reproducibility matters, so the pipeline fixes seeds, keeps the split chronological, and scales features using only training statistics. The visuals carry the same styling throughout, so we can compare charts at a glance and see whether the gains are real or imagined.
Data Sources
Prices come through tidyquant::tq_get, which pulls historical data that mirrors Yahoo Finance. The notebook fetches daily closes for the selected ticker and collapses them to a single observation per month using the last business day. That choice respects the market calendar and avoids mid-month gaps. Parameters control the ticker, the date range, and the number of lags, so you can swap instruments or extend the horizon without touching the modeling code.
Data Preparation
The monthly price series becomes a supervised learning table using lagged values as features. For each month, the target is the next observed price, and the inputs are the previous n_lags prices in order. The split is chronological, with the earlier part used for training and the most recent stretch held out for validation. All scaling is learned on the training set and then applied to the validation set, which keeps information from leaking forward. Finally, the plotting index is aligned to months so the x axis label, TIME (MONTHS), reflects the actual unit.
Methodology
Every story needs a baseline. Here there are two. The first simply repeats last month’s price. The second is a linear regression on the lagged features. The neural model is small on purpose. It stacks two ReLU layers with dropout and produces a single value, or a short vector in the multi period version. Training uses Adam with a fixed learning rate and small batches. Validation loss is tracked each epoch, and the best weights are saved to a checkpoint so you can roll back if later epochs drift. By the end of this, we should end up with two plots—actuals in a solid white line vs predictions in a dotted white line, and a loss curve showing training and validation across epochs.
The following shows Canon’s monthly closing price using the last business day of each month. It smooths daily noise and makes long swings easy to see, from early climbs to mid-cycle drawdowns and later recoveries. Axes are TIME and PRICE (JPY), so the scale matches what you would see on a brokerage chart.
This tibble reports how many monthly observations went into training and how many were held out for validation. The split is chronological, so the model only learns from the past and is tested on the most recent block. These counts also set the denominator for error metrics.
Code
make_lags <-function(x, n_lags) { X <-embed(x, n_lags +1) y <- X[, 1] X <- X[, -1, drop =FALSE]colnames(X) <-paste0("lag_", n_lags:1)list(X = X, y = y)}N_LAGS <- params$n_lagsVALID_FR <- params$valid_fracseries <- monthly_df$pricelagged <-make_lags(series, N_LAGS)X_all <- lagged$Xy_all <- lagged$yn <-nrow(X_all)valid_n <-max(1, round(n * VALID_FR))valid_i0 <- n - valid_n +1X_train <- X_all[1:(valid_i0 -1), , drop =FALSE]y_train <- y_all[1:(valid_i0 -1)]X_valid <- X_all[valid_i0:n, , drop =FALSE]y_valid <- y_all[valid_i0:n]tibble(set =c("Train", "Valid"),n =c(nrow(X_train), nrow(X_valid)))
# A tibble: 2 × 2
set n
<chr> <int>
1 Train 189
2 Valid 47
Smaller RMSE is better. The persistence baseline model repeats last month’s price, while the linear model uses lagged features. If the linear RMSE is lower, it means simple lag structure adds value before we even touch a neural network.
# A tibble: 2 × 2
Model RMSE
<chr> <dbl>
1 Persistence (t-1) 217.
2 Linear Regression 211.
Solid white is the actual validation series and dotted white is the baseline prediction. The x axis is TIME (MONTHS) and the y axis is PRICE (JPY). Use this to spot where a trivial approach lags turning points or underreacts to sharp moves.
Code
suppressPackageStartupMessages({ library(tidyr); library(ggplot2); library(tibble) })# ---- Styling to match plot-mlp-forecast --------------------------------------bg_col <-"#313131"fg_col <-get0("fg_col", ifnotfound ="#ffffff")grid_col <-"#252525"font_family <-"Menlo".fixed_fig_width <-get0(".fixed_fig_width", ifnotfound =1100L).fixed_fig_height <-get0(".fixed_fig_height", ifnotfound =390L).wrapper_max_w <-get0(".wrapper_max_w", ifnotfound =790L)# ---- Inputs & sanity checks ---------------------------------------------------stopifnot(exists("y_valid"), exists("lin_pred"))n <-min(length(y_valid), length(lin_pred))stopifnot(n >0)# Accept an override if you define valid_start_chr = "YYYY-MM-01" upstreamvalid_start_chr <-get0("valid_start_chr", ifnotfound ="2021-10-01")valid_start <-as.Date(valid_start_chr)if (is.na(valid_start)) stop("valid_start_chr is not a parseable date (YYYY-MM-DD).")# Build months and 0-based index for the validation windowvalid_dates <-seq(valid_start, by ="month", length.out = n)idx0 <-seq.int(0L, n -1L)start_label <-format(valid_start, "%Y-%m")# Print a clear sanity line alongside the plot outputcat(sprintf("Validation Start (Index 0): %s\n", start_label))
Validation Start (Index 0): 2021-10
Code
if (start_label !="2021-10") {warning(sprintf("Sanity check: validation begins at %s, not 2021-10.", start_label))}# ---- Data for plotting --------------------------------------------------------y_plot <- y_valid[seq_len(n)]p_plot <- lin_pred[seq_len(n)]df_plot <-tibble(idx0 = idx0,date = valid_dates,Actual = y_plot,Prediction = p_plot) |>pivot_longer(c(Actual, Prediction), names_to ="Series", values_to ="Value")# ---- Plot --------------------------------------------------------------------ggplot(df_plot, aes(idx0, Value, linetype = Series)) +geom_line(linewidth =1.2, color ="#ffffff", lineend ="round") +scale_linetype_manual(values =c(Actual ="solid", Prediction ="dotted")) +guides(linetype =guide_legend(title =NULL)) +labs(x =sprintf("TIME (SINCE %s)", start_label),y ="PRICE (JPY)" ) +theme_minimal(base_size =12, base_family = font_family) +theme(plot.margin =margin(t =36, r =18, b =36, l =18),axis.title.x =element_text(margin =margin(t =14), color = fg_col),axis.title.y =element_text(margin =margin(r =14), color = fg_col),axis.text =element_text(size =14, color = fg_col),legend.text =element_text(size =14, color = fg_col),legend.position ="top",legend.key =element_rect(fill = bg_col, color =NA),plot.background =element_rect(fill = bg_col, color =NA),panel.background =element_rect(fill = bg_col, color =NA),panel.grid.major =element_line(color = grid_col),panel.grid.minor =element_blank() )
# A tibble: 1 × 3
Model MSE RMSE
<chr> <dbl> <dbl>
1 MLP (Torch) 49921. 223.
This has the same visual grammar as the baseline plot. If the dotted line hugs the solid line more tightly than before, the tiny MLP (Torch for R) is learning mild nonlinearity and timing improvements. Misses around fast reversals point to places where more features or a different target scale could help.
Code
suppressPackageStartupMessages({ library(tidyr); library(ggplot2); library(tibble) })# Styling to match your baselinebg_col <-"#313131"fg_col <-get0("fg_col", ifnotfound ="#ffffff")grid_col <-"#252525"font_family <-"Menlo".fixed_fig_width <-get0(".fixed_fig_width", ifnotfound =1100L).fixed_fig_height <-get0(".fixed_fig_height", ifnotfound =390L).wrapper_max_w <-get0(".wrapper_max_w", ifnotfound =790L)# Ensure lengths alignn <-min(length(y_valid), length(mlp_pred))y_plot <- y_valid[seq_len(n)]p_plot <- mlp_pred[seq_len(n)]tibble(idx =seq_along(y_plot),Actual = y_plot,Prediction = p_plot) |>pivot_longer(-idx, names_to ="Series", values_to ="Value") |>ggplot(aes(idx, Value, linetype = Series)) +geom_line(linewidth =1.2, color ="#ffffff", lineend ="round") +scale_linetype_manual(values =c(Actual ="solid", Prediction ="dotted")) +guides(linetype =guide_legend(title =NULL)) +labs(x ="TIME (MONTHS)", y ="PRICE (JPY)") +theme_minimal(base_size =12, base_family = font_family) +theme(plot.margin =margin(t =36, r =18, b =36, l =18),axis.title.x =element_text(margin =margin(t =14), color = fg_col),axis.title.y =element_text(margin =margin(r =14), color = fg_col),axis.text =element_text(size =14, color = fg_col),legend.text =element_text(size =14, color = fg_col),legend.position ="top",legend.key =element_rect(fill = bg_col, color =NA),plot.background =element_rect(fill = bg_col, color =NA),panel.background =element_rect(fill = bg_col, color =NA),panel.grid.major =element_line(color = grid_col),panel.grid.minor =element_blank() )
The left part of the curve should drop quickly as the model finds easy structure. A shallow, stable gap between train and validation later in training is a good sign. If train keeps falling while validation stalls or rises, the model is starting to memorize.
Code
suppressPackageStartupMessages({library(torch)library(tibble)library(tidyr)library(ggplot2)})# ---- Ensure torch runtime -----------------------------------------------------if (!torch_is_installed()) install_torch()set.seed(42); torch_manual_seed(42)# ---- Get REAL data ------------------------------------------------------------get_splits <-function() {if (all(c("X_train","y_train","X_valid","y_valid") %in%ls(envir = .GlobalEnv))) {list(X_train =as.matrix(get("X_train", envir = .GlobalEnv)),y_train =as.numeric(get("y_train", envir = .GlobalEnv)),X_valid =as.matrix(get("X_valid", envir = .GlobalEnv)),y_valid =as.numeric(get("y_valid", envir = .GlobalEnv)) ) } elseif (exists("monthly_df", inherits =TRUE)) { make_lags <-function(x, n_lags) { E <-embed(x, n_lags +1)list(X = E[, -1, drop =FALSE], y = E[, 1]) } N_LAGS <-get0("N_LAGS", ifnotfound =12L) VALID_FR <-get0("VALID_FR", ifnotfound =0.20) series <- monthly_df$price lagged <-make_lags(series, N_LAGS) X_all <- lagged$X; y_all <- lagged$y n <-nrow(X_all) v_n <-max(1, round(n * VALID_FR)) v_i0 <- n - v_n +1list(X_train = X_all[1:(v_i0 -1), , drop =FALSE],y_train = y_all[1:(v_i0 -1)],X_valid = X_all[v_i0:n, , drop =FALSE],y_valid = y_all[v_i0:n] ) } else {stop("No real data found. Provide X_train/y_train/X_valid/y_valid or monthly_df.") }}splits <-get_splits()X_train <- splits$X_train; y_train <- splits$y_trainX_valid <- splits$X_valid; y_valid <- splits$y_valid# Align just in casen <-min(nrow(X_valid), length(y_valid))X_valid <- X_valid[seq_len(n), , drop =FALSE]y_valid <- y_valid[seq_len(n)]# ---- Scale by training stats --------------------------------------------------x_mu <-colMeans(X_train)x_sd <-apply(X_train, 2, sd); x_sd[!is.finite(x_sd) | x_sd ==0] <-1X_train_s <-sweep(sweep(X_train, 2, x_mu, "-"), 2, x_sd, "/")X_valid_s <-sweep(sweep(X_valid, 2, x_mu, "-"), 2, x_sd, "/")y_mu <-mean(y_train); y_sd <-sd(y_train); if (!is.finite(y_sd) || y_sd ==0) y_sd <-1y_train_s <- (y_train - y_mu) / y_sdy_valid_s <- (y_valid - y_mu) / y_sdx_tr <-torch_tensor(X_train_s, dtype =torch_float())y_tr <-torch_tensor(matrix(y_train_s, ncol =1), dtype =torch_float())x_va <-torch_tensor(X_valid_s, dtype =torch_float())y_va <-torch_tensor(matrix(y_valid_s, ncol =1), dtype =torch_float())# ---- MLP: 3->8->4->1 with ReLU + Dropout(0.2) --------------------------------mlp <-nn_module(initialize =function(input_dim) { self$fc1 <-nn_linear(input_dim, 8) self$drop1 <-nn_dropout(p =0.2) self$fc2 <-nn_linear(8, 4) self$drop2 <-nn_dropout(p =0.2) self$fc3 <-nn_linear(4, 1) },forward =function(x) { x %>% self$fc1() %>%nnf_relu() %>% self$drop1() %>% self$fc2() %>%nnf_relu() %>% self$drop2() %>% self$fc3() })net <-mlp(ncol(X_train_s))optim <-optim_adam(net$parameters, lr =1e-3)epochs <-1000Lbatch_size <-5Ln_tr <-nrow(X_train_s)loss_tr <-numeric(epochs)loss_va <-numeric(epochs)for (e in1:epochs) { ord <-sample.int(n_tr) nb <-0; acc <-0for (i inseq(1, n_tr, by = batch_size)) { idx <- ord[i:min(i + batch_size -1L, n_tr)] xb <- x_tr[idx, , drop =FALSE] yb <- y_tr[idx, , drop =FALSE] net$train(); optim$zero_grad() out <-net(xb) loss <-nnf_mse_loss(out, yb) loss$backward(); optim$step() acc <- acc +as.numeric(loss$item()); nb <- nb +1L } loss_tr[e] <- acc / nb net$eval()with_no_grad({ pv <-net(x_va) loss_va[e] <-as.numeric(nnf_mse_loss(pv, y_va)$item()) })}# ---- Plot loss curve (no legend title) ---------------------------------------bg_col <-"#313131"fg_col <-get0("fg_col", ifnotfound ="#ffffff")grid_col <-"#252525"font_family <-"Menlo"tibble(iter =seq_len(epochs), Train = loss_tr, Validation = loss_va) |>pivot_longer(-iter, names_to ="Series", values_to ="Loss") |>ggplot(aes(iter, Loss, linetype = Series)) +geom_line(color ="#ffffff", linewidth =1.2, lineend ="round") +scale_linetype_manual(name =NULL, # remove legend titlevalues =c(Train ="solid", Validation ="dotted") ) +guides(linetype =guide_legend(title =NULL)) +labs(x ="ITERATION (EPOCHS)", y ="MSE LOSS") +theme_minimal(base_size =12, base_family = font_family) +theme(plot.margin =margin(t =36, r =18, b =36, l =18),axis.title.x =element_text(margin =margin(t =14), color = fg_col),axis.title.y =element_text(margin =margin(r =14), color = fg_col),axis.text =element_text(size =14, color = fg_col),legend.text =element_text(size =14, color = fg_col),legend.title =element_blank(), # also remove via themelegend.position ="top",legend.key =element_rect(fill = bg_col, color =NA),plot.background =element_rect(fill = bg_col, color =NA),panel.background =element_rect(fill = bg_col, color =NA),panel.grid.major =element_line(color = grid_col),panel.grid.minor =element_blank() )
Quick note: MSE is in squared JPY and RMSE is in JPY, so RMSE is easier to reason about. Compare that RMSE to both baselines—if it beats persistence and edges out a linear model, the network’s earning its keep.
In the plots, each dotted segment is a short forecast horizon stitched across time: the first step is usually the cleanest, and later steps get noisier as uncertainty compounds. If those segments swing too much, we can concentrate on modeling log prices or returns to stabilize the scale.
Code
suppressPackageStartupMessages({library(nnet)library(tibble)library(tidyr)library(ggplot2)})set.seed(42)# ---- Use REAL data ------------------------------------------------------------# Prefer existing splits; otherwise derive from `monthly_df` (+ N_LAGS, VALID_FR)get_splits <-function() {if (all(c("X_train","y_train","X_valid","y_valid") %in%ls(envir = .GlobalEnv))) {list(X_train =as.matrix(get("X_train", envir = .GlobalEnv)),y_train =as.numeric(get("y_train", envir = .GlobalEnv)),X_valid =as.matrix(get("X_valid", envir = .GlobalEnv)),y_valid =as.numeric(get("y_valid", envir = .GlobalEnv)) ) } elseif (exists("monthly_df", inherits =TRUE)) { make_lags <-function(x, n_lags) { E <-embed(x, n_lags +1)list(X = E[, -1, drop =FALSE], y = E[, 1]) } N_LAGS <-get0("N_LAGS", ifnotfound =12L) VALID_FR <-get0("VALID_FR", ifnotfound =0.20) series <- monthly_df$price lagged <-make_lags(series, N_LAGS) X_all <- lagged$X; y_all <- lagged$y n <-nrow(X_all); v_n <-max(1, round(n * VALID_FR)); v_i0 <- n - v_n +1list(X_train = X_all[1:(v_i0 -1), , drop =FALSE],y_train = y_all[1:(v_i0 -1)],X_valid = X_all[v_i0:n, , drop =FALSE],y_valid = y_all[v_i0:n] ) } else {stop("No real data available. Provide X_train/y_train/X_valid/y_valid or monthly_df.") }}splits <-get_splits()X_train <- splits$X_train; y_train <- splits$y_trainX_valid <- splits$X_valid; y_valid <- splits$y_valid# Align just in casen <-min(nrow(X_valid), length(y_valid))X_valid <- X_valid[seq_len(n), , drop =FALSE]y_valid <- y_valid[seq_len(n)]# ---- "Reshape to 2D" equivalent + scaling (R matrices are already 2D) --------x_mu <-colMeans(X_train)x_sd <-apply(X_train, 2, sd); x_sd[!is.finite(x_sd) | x_sd ==0] <-1X_train_s <-sweep(sweep(X_train, 2, x_mu, "-"), 2, x_sd, "/")X_valid_s <-sweep(sweep(X_valid, 2, x_mu, "-"), 2, x_sd, "/")y_mu <-mean(y_train); y_sd <-sd(y_train); if (!is.finite(y_sd) || y_sd ==0) y_sd <-1y_train_s <- (y_train - y_mu) / y_sd# ---- Fit MLP (single hidden layer ~ scikit example’s first layer size) -------mlp <-nnet(x = X_train_s, y = y_train_s,size =8, linout =TRUE, decay =1e-3, maxit =1000, trace =FALSE)# ---- Predict on validation and invert scaling --------------------------------y_pred_s <-as.numeric(predict(mlp, X_valid_s))y_pred <- y_pred_s * y_sd + y_mu# ---- Metrics ------------------------------------------------------------------sk_mlp_mse <-mean((y_valid - y_pred)^2, na.rm =TRUE)sk_mlp_rmse <-sqrt(sk_mlp_mse)print(tibble(Model ="R MLP (nnet)", MSE = sk_mlp_mse, RMSE = sk_mlp_rmse))
# A tibble: 1 × 3
Model MSE RMSE
<chr> <dbl> <dbl>
1 R MLP (nnet) 250989. 501.
Code
# ---- Dark Menlo styling: white lines, solid/dotted, no legend title ----------bg_col <-"#313131"fg_col <-get0("fg_col", ifnotfound ="#ffffff")grid_col <-"#252525"font_family <-"Menlo"tibble(idx =seq_along(y_valid),Actual = y_valid,Prediction = y_pred) |>pivot_longer(-idx, names_to ="Series", values_to ="Value") |>ggplot(aes(idx, Value, linetype = Series)) +geom_line(color ="#ffffff", linewidth =1.2, lineend ="round") +scale_linetype_manual(name =NULL, # remove legend titlevalues =c(Actual ="solid", Prediction ="dotted") ) +labs(x ="TIME (MONTHS)", y ="PRICE (JPY)") +theme_minimal(base_size =12, base_family = font_family) +theme(plot.margin =margin(t =36, r =18, b =36, l =18),axis.title.x =element_text(margin =margin(t =14), color = fg_col),axis.title.y =element_text(margin =margin(r =14), color = fg_col),axis.text =element_text(size =14, color = fg_col),legend.text =element_text(size =14, color = fg_col),legend.position ="top",legend.key =element_rect(fill = bg_col, color =NA),plot.background =element_rect(fill = bg_col, color =NA),panel.background =element_rect(fill = bg_col, color =NA),panel.grid.major =element_line(color = grid_col),panel.grid.minor =element_blank() )
The large vertical scale below reflects raw price targets across multiple horizons. We typically want a steep early drop, then a gentle slide or a flat line. A consistent validation floor with small random wiggles suggests the model has learned what it can from the current inputs.
Code
suppressPackageStartupMessages({library(torch)library(tibble)library(tidyr)library(ggplot2)})# Torch runtimeif (!torch_is_installed()) install_torch()set.seed(42); torch_manual_seed(42)# ---- Real data ---------------------------------------------------------------if (exists("monthly_df", inherits =TRUE) &&"price"%in%names(monthly_df)) { prices <-as.numeric(monthly_df$price)} elseif (exists("series", inherits =TRUE)) { prices <-as.numeric(get("series", inherits =TRUE))} else {stop("Provide `monthly_df$price` or `series`.")}# ---- Params ------------------------------------------------------------------N_LAGS <-get0("N_LAGS", ifnotfound =3L)N_FUTURE <-get0("N_FUTURE", ifnotfound =2L)VALID_FR <-get0("VALID_FR", ifnotfound =0.20)VALID_SIZE <-get0("VALID_SIZE", ifnotfound =NA_integer_)BATCH_SIZE <-get0("BATCH_SIZE", ifnotfound =5L)N_EPOCHS <-get0("N_EPOCHS", ifnotfound =1000L)PRINT_EVERY<-get0("PRINT_EVERY",ifnotfound =50L)# ---- create_input_data -------------------------------------------------------create_input_data <-function(series, n_lags =1L, n_leads =1L) { series <-as.numeric(series) n <-length(series) - n_lags - n_leads +1Lif (n <=0) stop("Series too short for the chosen n_lags and n_leads.") X <-matrix(NA_real_, nrow = n, ncol = n_lags) y <-matrix(NA_real_, nrow = n, ncol = n_leads)for (step inseq_len(n)) { end_step <- step + n_lags -1L forward_end <- end_step + n_leads X[step, ] <- series[step:end_step] y[step, ] <- series[(end_step +1L):forward_end] }list(X = X, y = y)}xy <-create_input_data(prices, n_lags = N_LAGS, n_leads = N_FUTURE)X <- xy$X; y <- xy$y# ---- Split (match Python logic) ----------------------------------------------n_samples <-nrow(X)if (is.na(VALID_SIZE)) VALID_SIZE <-max(1L, round(n_samples * VALID_FR))valid_ind <- n_samples - VALID_SIZE + (N_FUTURE -1L)valid_ind <-max(1L, min(valid_ind, n_samples -1L))X_train <- X[seq_len(valid_ind), , drop =FALSE]y_train <- y[seq_len(valid_ind), , drop =FALSE]X_valid <- X[(valid_ind +1L):n_samples, , drop =FALSE]y_valid <- y[(valid_ind +1L):n_samples, , drop =FALSE]# ---- Tensors -----------------------------------------------------------------x_tr <-torch_tensor(X_train, dtype =torch_float())y_tr <-torch_tensor(y_train, dtype =torch_float())x_va <-torch_tensor(X_valid, dtype =torch_float())y_va <-torch_tensor(y_valid, dtype =torch_float())# ---- Model -------------------------------------------------------------------mlp <-nn_module(initialize =function(input_size, output_size) { self$linear1 <-nn_linear(input_size, 16) self$linear2 <-nn_linear(16, 8) self$linear3 <-nn_linear(8, output_size) self$dropout <-nn_dropout(p =0.2) },forward =function(x) { x %>% self$linear1() %>%nnf_relu() %>% self$dropout() %>% self$linear2() %>%nnf_relu() %>% self$dropout() %>% self$linear3() })model <-mlp(N_LAGS, N_FUTURE)loss_fn <-nn_mse_loss()optimizer <-optim_adam(model$parameters, lr =1e-3)# ---- Train -------------------------------------------------------------------iter_batches <-function(n, bs) split(sample.int(n), ceiling(seq_along(sample.int(n)) / bs))train_losses <-numeric(N_EPOCHS)valid_losses <-numeric(N_EPOCHS)best_epoch <-NA_integer_best_val <-Inffor (epoch inseq_len(N_EPOCHS)) { model$train() acc <-0; count <-0for (batch_idx initer_batches(nrow(x_tr), BATCH_SIZE)) { xb <- x_tr[batch_idx, , drop =FALSE] yb <- y_tr[batch_idx, , drop =FALSE] optimizer$zero_grad() y_hat <-model(xb$reshape(c(-1, N_LAGS))) loss <-loss_fn(y_hat, yb) loss$backward(); optimizer$step() acc <- acc +as.numeric(loss$item()) *length(batch_idx) count <- count +length(batch_idx) } train_losses[epoch] <- acc / count model$eval()with_no_grad({ y_hat_va <-model(x_va$reshape(c(-1, N_LAGS))) vloss <-as.numeric(loss_fn(y_hat_va, y_va)$item()) }) valid_losses[epoch] <- vlossif (epoch >1&& vloss < best_val) { best_val <- vloss best_epoch <- epoch# FIX: use the module method to get the state dict checkpoint_path <-file.path(getwd(), "mlp_checkpoint_2.pt")try(torch_save(model$state_dict(), checkpoint_path), silent =TRUE) }if (epoch %% PRINT_EVERY ==0) {message(sprintf("<%d> - Train: %.4f \t Valid: %.4f", epoch, train_losses[epoch], vloss)) }}if (!is.na(best_epoch)) message(sprintf("Lowest loss recorded in epoch: %d", best_epoch))# ---- Plot loss (dark Menlo, white solid/dotted, no legend title) -------------bg_col <-"#313131"fg_col <-get0("fg_col", ifnotfound ="#ffffff")grid_col <-"#252525"font_family <-"Menlo"tibble(iter =seq_len(N_EPOCHS),Train = train_losses,Validation = valid_losses) |>pivot_longer(-iter, names_to ="Curve", values_to ="Loss") |>ggplot(aes(iter, Loss, linetype = Curve)) +geom_line(color ="#ffffff", linewidth =1.2, lineend ="round") +scale_linetype_manual(name =NULL, values =c(Train ="solid", Validation ="dotted")) +labs(x ="ITERATION (EPOCHS)", y ="MSE LOSS") +theme_minimal(base_size =12, base_family = font_family) +theme(plot.margin =margin(t =36, r =18, b =36, l =18),axis.title.x =element_text(margin =margin(t =14), color = fg_col),axis.title.y =element_text(margin =margin(r =14), color = fg_col),axis.text =element_text(size =14, color = fg_col),legend.text =element_text(size =14, color = fg_col),legend.position ="top",legend.key =element_rect(fill = bg_col, color =NA),plot.background =element_rect(fill = bg_col, color =NA),panel.background =element_rect(fill = bg_col, color =NA),panel.grid.major =element_line(color = grid_col),panel.grid.minor =element_blank() )
Analysis
Here, the numbers tell us whether the model improved on the baselines. The shapes tell us how. If the dotted prediction line hugs the solid line during slow trends, the network is capturing the dominant pattern. If it lags at turning points or overreacts after sharp drops, that shows where the model struggles. Residuals often reveal regimes, quiet months where the error stays tame and chaotic runs where noise dominates. The loss curves add context. A good run drops fast early, then flattens. If training loss keeps falling while validation stalls or climbs, capacity or leakage is likely the culprit.
Results & Next Steps
Here’s the million-dollar question: should you trust this with real money? Not on its own. It’s a single-series, monthly model with limited samples. It often beats persistence and sometimes edges a linear model, but that’s not a tradable edge. I would test it like a skeptic: use rolling walk-forward splits, include fees and slippage, and report more than RMSE—directional accuracy, hit rate, turnover, max drawdown, and a simple Sharpe (more on this later). Check stability across regimes; if it breaks in stress, it’s noise.
Now, how to make it sturdier. Try log prices or returns to steady scale, and add uncertainty with Monte Carlo dropout or bootstrapping; if the bands swallow the signal, stop. If not, treat the forecast as a weak prior and combine it with simple, testable signals like momentum, seasonality, realized volatility, or an index filter. Require confirmation, keep positions small, size by volatility, use hard stops, paper trade for a quarter, then—maybe—a tiny allocation with alerts watching for drift. Bottom line: it’s a useful baseline, but not a standalone system.