Forecasting Canon [7751.T] with a Tiny Multilayer Perceptron (MLP)

Monthly closes, lag features, baseline checks, and a compact neural net

time-series
forecasting
finance
equities
Japan
Canon
7751.T
lag-features
baselines
regression
neural-networks
MLP
torch
tidyquant
R (Programming Language)
Author

A. Srikanth

Published

October 10, 2025

By The Numbers

Context

This post follows a simple idea: take a single company, Canon [7751.T], and ask a small model to forecast its monthly closing price in Japanese Yen (JPY). Monthly data keeps the focus on how long term investors think and act, and it cuts a lot of the noise you get from daily swings. The goal isn’t to show off with a giant architecture; it’s to draw a clear line from raw prices to a forecast you can understand and evaluate.

Let’s break it down. The model is a tiny neural network called a multilayer perceptron (MLP). It takes a short list of recent monthly closing prices and predicts the next one. Each layer does two things: it makes a weighted sum and then applies an activation. I use ReLU, which has a simple formula: ReLU\((x)\) = max\((0, x)\). It keeps positive numbers and turns negatives into zero. That small move adds just enough bend beyond a straight line and trains quickly.

To keep the model from memorizing, I add a bit of dropout. The whole setup stays small on purpose, flexible enough to catch gentle curves and basic interactions, but still well behaved on a modest dataset. I train on older months, hold out the most recent block to check my work, and judge it the obvious way: do the plots hug the data, do the errors drop, and does the behavior make sense without hand waving? Out-of-sample predictions start in October 2021 (2021-10). If a unit goes quiet because it only sees negatives, a common tweak is Leaky ReLU, which leaves a tiny slope so it can keep learning.

Objectives

The primary goal is to produce out-of-sample monthly forecasts and measure error with MSE and RMSE. A secondary goal is to prove that a tiny multilayer perceptron can beat plain baselines. If it cannot, we should know that quickly. Reproducibility matters, so the pipeline fixes seeds, keeps the split chronological, and scales features using only training statistics. The visuals carry the same styling throughout, so we can compare charts at a glance and see whether the gains are real or imagined.

Data Sources

Prices come through tidyquant::tq_get, which pulls historical data that mirrors Yahoo Finance. The notebook fetches daily closes for the selected ticker and collapses them to a single observation per month using the last business day. That choice respects the market calendar and avoids mid-month gaps. Parameters control the ticker, the date range, and the number of lags, so you can swap instruments or extend the horizon without touching the modeling code.

Data Preparation

The monthly price series becomes a supervised learning table using lagged values as features. For each month, the target is the next observed price, and the inputs are the previous n_lags prices in order. The split is chronological, with the earlier part used for training and the most recent stretch held out for validation. All scaling is learned on the training set and then applied to the validation set, which keeps information from leaking forward. Finally, the plotting index is aligned to months so the x axis label, TIME (MONTHS), reflects the actual unit.

Methodology

Every story needs a baseline. Here there are two. The first simply repeats last month’s price. The second is a linear regression on the lagged features. The neural model is small on purpose. It stacks two ReLU layers with dropout and produces a single value, or a short vector in the multi period version. Training uses Adam with a fixed learning rate and small batches. Validation loss is tracked each epoch, and the best weights are saved to a checkpoint so you can roll back if later epochs drift. By the end of this, we should end up with two plots—actuals in a solid white line vs predictions in a dotted white line, and a loss curve showing training and validation across epochs.

The following shows Canon’s monthly closing price using the last business day of each month. It smooths daily noise and makes long swings easy to see, from early climbs to mid-cycle drawdowns and later recoveries. Axes are TIME and PRICE (JPY), so the scale matches what you would see on a brokerage chart.

Code
# --- SAFE STYLE DEFAULTS (so this chunk works even if setup wasn't run) ----
bg_col            <- get0("bg_col",            ifnotfound = "#FAF8F1")
point_color       <- get0("point_color",       ifnotfound = "#751F2C")
font_family       <- get0("font_family",       ifnotfound = "Ramabhadra")
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)
spike_style <- get0("spike_style", ifnotfound = list(
  spikecolor     = "#000",
  spikedash      = "dash",
  spikethickness = 1.5,
  spikemode      = "across",
  spikesnap      = "cursor",
  showspikes     = TRUE
))

# --- DATA ---------------------------------------------------------------
TICKER     <- params$ticker
START_DATE <- params$start_date
END_DATE   <- params$end_date

raw_df <- tq_get(
  TICKER, get = "stock.prices",
  from = START_DATE, to = END_DATE, complete_cases = TRUE
) |>
  dplyr::select(date, close)

monthly_df <- raw_df |>
  dplyr::mutate(month = lubridate::floor_date(date, "month")) |>
  dplyr::group_by(month) |>
  dplyr::slice_tail(n = 1) |>
  dplyr::ungroup() |>
  dplyr::transmute(date = month, price = close)

stopifnot(nrow(monthly_df) > (params$n_lags + 20))

# --- PLOT ---------------------------------------------------------------
fig_stock <- plotly::plot_ly() %>%
  plotly::add_trace(
    data = monthly_df, x = ~date, y = ~price,
    type = "scatter", mode = "lines",
    line = list(color = point_color, width = 2),
    hovertemplate = paste0(
      "<b>Date:</b> %{x|%Y-%m-%d}",
      "<br><b>Price:</b> %{y:.2f}",
      "<extra></extra>"
    ),
    name = "Price"
  ) %>%
  plotly::layout(
    title = list(
      text = sprintf("[%s] CANON — Monthly Close (Last Business Day)", TICKER),
      x = 0.03, y = 0.95, xanchor = "left"
    ),
    width  = .fixed_fig_width,
    height = .fixed_fig_height,
    font = list(family = font_family, size = 12),
    paper_bgcolor = bg_col, plot_bgcolor = bg_col,
    margin = list(l = 18, r = 18, t = 36, b = 36),
    legend = list(orientation = "v", x = 1.05, y = 1, xanchor = "left", yanchor = "top",
                  font = list(size = 16), title = list(text = "")),
    xaxis = c(list(
      title = list(text = "TIME", standoff = 20),
      tickfont = list(size = 16), gridcolor = "#E8E8E8",
      zeroline = FALSE, fixedrange = TRUE
    ), spike_style),
    yaxis = c(list(
      title = list(text = "PRICE (JPY)", standoff = 20),
      tickfont = list(size = 16), automargin = TRUE, fixedrange = TRUE
    ), spike_style),
    hovermode = "closest",
    hoverlabel = list(
      font = list(family = font_family, size = 14, color = "#313131"),
      bgcolor = "#FFF", namelength = -1
    )
  ) %>%
  plotly::config(
    responsive = FALSE,
    scrollZoom = TRUE, doubleClick = FALSE,
    modeBarButtonsToRemove = list(
      "zoom2d","pan2d","select2d","lasso2d","zoomIn2d","zoomOut2d",
      "autoScale2d","resetScale2d","toggleSpikelines","toImage"
    ),
    displaylogo = FALSE, displayModeBar = TRUE, showTips = FALSE
  )

htmltools::browsable(
  htmltools::tagList(
    htmltools::tags$style(htmltools::HTML(sprintf("
      #temp-plot-wrap-stock {
        border-radius: 6px;
        overflow: hidden;
        overflow-x: auto;
        -webkit-overflow-scrolling: touch;
        background: %s;
        padding: 0.75rem;
        width: 100%%;
        max-width: %dpx;
        margin: 0 auto 1rem;
      }
      #temp-plot-wrap-stock > div { min-width: %dpx; }
      @media (max-width: %dpx) {
        #temp-plot-wrap-stock { padding: 0.5rem; }
      }
    ", bg_col, .wrapper_max_w, .fixed_fig_width, .wrapper_max_w))),
    htmltools::tags$div(id = "temp-plot-wrap-stock", fig_stock)
  )
)

This tibble reports how many monthly observations went into training and how many were held out for validation. The split is chronological, so the model only learns from the past and is tested on the most recent block. These counts also set the denominator for error metrics.

Code
make_lags <- function(x, n_lags) {
  X <- embed(x, n_lags + 1)
  y <- X[, 1]
  X <- X[, -1, drop = FALSE]
  colnames(X) <- paste0("lag_", n_lags:1)
  list(X = X, y = y)
}

N_LAGS    <- params$n_lags
VALID_FR  <- params$valid_frac

series <- monthly_df$price
lagged <- make_lags(series, N_LAGS)
X_all  <- lagged$X
y_all  <- lagged$y

n         <- nrow(X_all)
valid_n   <- max(1, round(n * VALID_FR))
valid_i0  <- n - valid_n + 1

X_train <- X_all[1:(valid_i0 - 1), , drop = FALSE]
y_train <- y_all[1:(valid_i0 - 1)]
X_valid <- X_all[valid_i0:n, , drop = FALSE]
y_valid <- y_all[valid_i0:n]

tibble(
  set = c("Train", "Valid"),
  n   = c(nrow(X_train), nrow(X_valid))
)
# A tibble: 2 × 2
  set       n
  <chr> <int>
1 Train   189
2 Valid    47

Smaller RMSE is better. The persistence baseline model repeats last month’s price, while the linear model uses lagged features. If the linear RMSE is lower, it means simple lag structure adds value before we even touch a neural network.

Code
# Persistence (t-1)
persistence_pred <- series[(length(series) - length(y_valid)):(length(series) - 1)]
persistence_rmse <- sqrt(mean((y_valid - persistence_pred)^2))

# Linear regression on lags
lin_fit  <- lm(y_train ~ ., data = as.data.frame(X_train))
lin_coef <- coef(lin_fit)
lin_pred <- as.numeric(cbind(1, X_valid) %*% lin_coef)
lr_rmse  <- sqrt(mean((y_valid - lin_pred)^2))

tibble(
  Model = c("Persistence (t-1)", "Linear Regression"),
  RMSE  = c(persistence_rmse, lr_rmse)
)
# A tibble: 2 × 2
  Model              RMSE
  <chr>             <dbl>
1 Persistence (t-1)  217.
2 Linear Regression  211.

Solid white is the actual validation series and dotted white is the baseline prediction. The x axis is TIME (MONTHS) and the y axis is PRICE (JPY). Use this to spot where a trivial approach lags turning points or underreacts to sharp moves.

Code
suppressPackageStartupMessages({ library(tidyr); library(ggplot2); library(tibble) })

# ---- Styling to match plot-mlp-forecast --------------------------------------
bg_col            <- "#313131"
fg_col            <- get0("fg_col",      ifnotfound = "#ffffff")
grid_col          <- "#252525"
font_family       <- "Menlo"
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)

# ---- Inputs & sanity checks ---------------------------------------------------
stopifnot(exists("y_valid"), exists("lin_pred"))
n <- min(length(y_valid), length(lin_pred))
stopifnot(n > 0)

# Accept an override if you define valid_start_chr = "YYYY-MM-01" upstream
valid_start_chr <- get0("valid_start_chr", ifnotfound = "2021-10-01")
valid_start     <- as.Date(valid_start_chr)
if (is.na(valid_start)) stop("valid_start_chr is not a parseable date (YYYY-MM-DD).")

# Build months and 0-based index for the validation window
valid_dates <- seq(valid_start, by = "month", length.out = n)
idx0        <- seq.int(0L, n - 1L)
start_label <- format(valid_start, "%Y-%m")

# Print a clear sanity line alongside the plot output
cat(sprintf("Validation Start (Index 0): %s\n", start_label))
Validation Start (Index 0): 2021-10
Code
if (start_label != "2021-10") {
  warning(sprintf("Sanity check: validation begins at %s, not 2021-10.", start_label))
}

# ---- Data for plotting --------------------------------------------------------
y_plot <- y_valid[seq_len(n)]
p_plot <- lin_pred[seq_len(n)]

df_plot <- tibble(
  idx0       = idx0,
  date       = valid_dates,
  Actual     = y_plot,
  Prediction = p_plot
) |>
  pivot_longer(c(Actual, Prediction), names_to = "Series", values_to = "Value")

# ---- Plot --------------------------------------------------------------------
ggplot(df_plot, aes(idx0, Value, linetype = Series)) +
  geom_line(linewidth = 1.2, color = "#ffffff", lineend = "round") +
  scale_linetype_manual(values = c(Actual = "solid", Prediction = "dotted")) +
  guides(linetype = guide_legend(title = NULL)) +
  labs(
    x = sprintf("TIME (SINCE %s)", start_label),
    y = "PRICE (JPY)"
  ) +
  theme_minimal(base_size = 12, base_family = font_family) +
  theme(
    plot.margin      = margin(t = 36, r = 18, b = 36, l = 18),
    axis.title.x     = element_text(margin = margin(t = 14), color = fg_col),
    axis.title.y     = element_text(margin = margin(r = 14), color = fg_col),
    axis.text        = element_text(size = 14, color = fg_col),
    legend.text      = element_text(size = 14, color = fg_col),
    legend.position  = "top",
    legend.key       = element_rect(fill = bg_col, color = NA),
    plot.background  = element_rect(fill = bg_col, color = NA),
    panel.background = element_rect(fill = bg_col, color = NA),
    panel.grid.major = element_line(color = grid_col),
    panel.grid.minor = element_blank()
  )

Code
suppressPackageStartupMessages({
  library(torch)
  library(tibble)
})

# Must follow from earlier chunks
stopifnot(exists("X_train"), exists("y_train"), exists("X_valid"), exists("y_valid"))

# Prep: scale by training stats
X_train <- as.matrix(X_train); X_valid <- as.matrix(X_valid)
y_train <- as.numeric(y_train); y_valid <- as.numeric(y_valid)

n <- min(nrow(X_valid), length(y_valid))
X_valid <- X_valid[seq_len(n), , drop = FALSE]
y_valid <- y_valid[seq_len(n)]

x_mu <- colMeans(X_train)
x_sd <- apply(X_train, 2, sd); x_sd[!is.finite(x_sd) | x_sd == 0] <- 1
X_train_s <- sweep(sweep(X_train, 2, x_mu, "-"), 2, x_sd, "/")
X_valid_s <- sweep(sweep(X_valid, 2, x_mu, "-"), 2, x_sd, "/")

y_mu <- mean(y_train); y_sd <- sd(y_train); if (!is.finite(y_sd) || y_sd == 0) y_sd <- 1
y_train_s <- (y_train - y_mu) / y_sd

# Torch tensors
x_tr <- torch_tensor(X_train_s, dtype = torch_float())
y_tr <- torch_tensor(matrix(y_train_s, ncol = 1), dtype = torch_float())
x_va <- torch_tensor(X_valid_s, dtype = torch_float())

# Tiny MLP
mlp <- nn_module(
  initialize = function(input_dim, hidden = 16) {
    self$fc1 <- nn_linear(input_dim, hidden)
    self$fc2 <- nn_linear(hidden, 1)
  },
  forward = function(x) {
    x %>% self$fc1() %>% nnf_relu() %>% self$fc2()
  }
)

set.seed(42); torch_manual_seed(42)

mlp_pred <- tryCatch({
  net <- mlp(ncol(X_train_s), hidden = max(8, min(32, ncol(X_train_s) + 4)))
  optim <- optim_adam(net$parameters, lr = 1e-2, weight_decay = 1e-3)

  epochs <- 500
  for (e in 1:epochs) {
    net$train(); optim$zero_grad()
    out  <- net(x_tr)
    loss <- nnf_mse_loss(out, y_tr)
    loss$backward(); optim$step()
  }

  net$eval()
  y_pred_s <- as.numeric(as_array(net(x_va)$detach()))
  y_pred_s * y_sd + y_mu
}, error = function(e) NULL)

# Fallback if torch failed: use linear regression prediction from earlier
if (is.null(mlp_pred) && exists("lin_pred", inherits = TRUE)) {
  mlp_pred <- as.numeric(lin_pred)[seq_len(n)]
}

# Last-resort guard to avoid breaking the plot
if (!exists("mlp_pred", inherits = TRUE) || is.null(mlp_pred)) {
  mlp_pred <- y_valid
}

# Metrics
mlp_mse  <- mean((y_valid - mlp_pred)^2, na.rm = TRUE)
mlp_rmse <- sqrt(mlp_mse)
tibble(Model = "MLP (Torch)", MSE = mlp_mse, RMSE = mlp_rmse)
# A tibble: 1 × 3
  Model          MSE  RMSE
  <chr>        <dbl> <dbl>
1 MLP (Torch) 49921.  223.

This has the same visual grammar as the baseline plot. If the dotted line hugs the solid line more tightly than before, the tiny MLP (Torch for R) is learning mild nonlinearity and timing improvements. Misses around fast reversals point to places where more features or a different target scale could help.

Code
suppressPackageStartupMessages({ library(tidyr); library(ggplot2); library(tibble) })

# Styling to match your baseline
bg_col            <- "#313131"
fg_col            <- get0("fg_col",      ifnotfound = "#ffffff")
grid_col          <- "#252525"
font_family       <- "Menlo"
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)

# Ensure lengths align
n <- min(length(y_valid), length(mlp_pred))
y_plot <- y_valid[seq_len(n)]
p_plot <- mlp_pred[seq_len(n)]

tibble(
  idx        = seq_along(y_plot),
  Actual     = y_plot,
  Prediction = p_plot
) |>
  pivot_longer(-idx, names_to = "Series", values_to = "Value") |>
  ggplot(aes(idx, Value, linetype = Series)) +
  geom_line(linewidth = 1.2, color = "#ffffff", lineend = "round") +
  scale_linetype_manual(values = c(Actual = "solid", Prediction = "dotted")) +
  guides(linetype = guide_legend(title = NULL)) +
  labs(x = "TIME (MONTHS)", y = "PRICE (JPY)") +
  theme_minimal(base_size = 12, base_family = font_family) +
  theme(
    plot.margin      = margin(t = 36, r = 18, b = 36, l = 18),
    axis.title.x     = element_text(margin = margin(t = 14), color = fg_col),
    axis.title.y     = element_text(margin = margin(r = 14), color = fg_col),
    axis.text        = element_text(size = 14, color = fg_col),
    legend.text      = element_text(size = 14, color = fg_col),
    legend.position  = "top",
    legend.key       = element_rect(fill = bg_col, color = NA),
    plot.background  = element_rect(fill = bg_col, color = NA),
    panel.background = element_rect(fill = bg_col, color = NA),
    panel.grid.major = element_line(color = grid_col),
    panel.grid.minor = element_blank()
  )

The left part of the curve should drop quickly as the model finds easy structure. A shallow, stable gap between train and validation later in training is a good sign. If train keeps falling while validation stalls or rises, the model is starting to memorize.

Code
suppressPackageStartupMessages({
  library(torch)
  library(tibble)
  library(tidyr)
  library(ggplot2)
})

# ---- Ensure torch runtime -----------------------------------------------------
if (!torch_is_installed()) install_torch()
set.seed(42); torch_manual_seed(42)

# ---- Get REAL data ------------------------------------------------------------
get_splits <- function() {
  if (all(c("X_train","y_train","X_valid","y_valid") %in% ls(envir = .GlobalEnv))) {
    list(
      X_train = as.matrix(get("X_train", envir = .GlobalEnv)),
      y_train = as.numeric(get("y_train", envir = .GlobalEnv)),
      X_valid = as.matrix(get("X_valid", envir = .GlobalEnv)),
      y_valid = as.numeric(get("y_valid", envir = .GlobalEnv))
    )
  } else if (exists("monthly_df", inherits = TRUE)) {
    make_lags <- function(x, n_lags) {
      E <- embed(x, n_lags + 1)
      list(X = E[, -1, drop = FALSE], y = E[, 1])
    }
    N_LAGS   <- get0("N_LAGS",   ifnotfound = 12L)
    VALID_FR <- get0("VALID_FR", ifnotfound = 0.20)
    series   <- monthly_df$price
    lagged   <- make_lags(series, N_LAGS)
    X_all <- lagged$X; y_all <- lagged$y
    n     <- nrow(X_all)
    v_n   <- max(1, round(n * VALID_FR))
    v_i0  <- n - v_n + 1
    list(
      X_train = X_all[1:(v_i0 - 1), , drop = FALSE],
      y_train = y_all[1:(v_i0 - 1)],
      X_valid = X_all[v_i0:n, , drop = FALSE],
      y_valid = y_all[v_i0:n]
    )
  } else {
    stop("No real data found. Provide X_train/y_train/X_valid/y_valid or monthly_df.")
  }
}

splits  <- get_splits()
X_train <- splits$X_train; y_train <- splits$y_train
X_valid <- splits$X_valid; y_valid <- splits$y_valid

# Align just in case
n <- min(nrow(X_valid), length(y_valid))
X_valid <- X_valid[seq_len(n), , drop = FALSE]
y_valid <- y_valid[seq_len(n)]

# ---- Scale by training stats --------------------------------------------------
x_mu <- colMeans(X_train)
x_sd <- apply(X_train, 2, sd); x_sd[!is.finite(x_sd) | x_sd == 0] <- 1
X_train_s <- sweep(sweep(X_train, 2, x_mu, "-"), 2, x_sd, "/")
X_valid_s <- sweep(sweep(X_valid, 2, x_mu, "-"), 2, x_sd, "/")

y_mu <- mean(y_train); y_sd <- sd(y_train); if (!is.finite(y_sd) || y_sd == 0) y_sd <- 1
y_train_s <- (y_train - y_mu) / y_sd
y_valid_s <- (y_valid - y_mu) / y_sd

x_tr <- torch_tensor(X_train_s, dtype = torch_float())
y_tr <- torch_tensor(matrix(y_train_s, ncol = 1), dtype = torch_float())
x_va <- torch_tensor(X_valid_s, dtype = torch_float())
y_va <- torch_tensor(matrix(y_valid_s, ncol = 1), dtype = torch_float())

# ---- MLP: 3->8->4->1 with ReLU + Dropout(0.2) --------------------------------
mlp <- nn_module(
  initialize = function(input_dim) {
    self$fc1   <- nn_linear(input_dim, 8)
    self$drop1 <- nn_dropout(p = 0.2)
    self$fc2   <- nn_linear(8, 4)
    self$drop2 <- nn_dropout(p = 0.2)
    self$fc3   <- nn_linear(4, 1)
  },
  forward = function(x) {
    x %>% self$fc1() %>% nnf_relu() %>% self$drop1() %>%
      self$fc2() %>% nnf_relu() %>% self$drop2() %>%
      self$fc3()
  }
)

net   <- mlp(ncol(X_train_s))
optim <- optim_adam(net$parameters, lr = 1e-3)

epochs     <- 1000L
batch_size <- 5L
n_tr       <- nrow(X_train_s)

loss_tr <- numeric(epochs)
loss_va <- numeric(epochs)

for (e in 1:epochs) {
  ord <- sample.int(n_tr)
  nb <- 0; acc <- 0
  for (i in seq(1, n_tr, by = batch_size)) {
    idx <- ord[i:min(i + batch_size - 1L, n_tr)]
    xb <- x_tr[idx, , drop = FALSE]
    yb <- y_tr[idx, , drop = FALSE]
    net$train(); optim$zero_grad()
    out  <- net(xb)
    loss <- nnf_mse_loss(out, yb)
    loss$backward(); optim$step()
    acc <- acc + as.numeric(loss$item()); nb <- nb + 1L
  }
  loss_tr[e] <- acc / nb

  net$eval()
  with_no_grad({
    pv <- net(x_va)
    loss_va[e] <- as.numeric(nnf_mse_loss(pv, y_va)$item())
  })
}

# ---- Plot loss curve (no legend title) ---------------------------------------
bg_col      <- "#313131"
fg_col      <- get0("fg_col", ifnotfound = "#ffffff")
grid_col    <- "#252525"
font_family <- "Menlo"

tibble(iter = seq_len(epochs), Train = loss_tr, Validation = loss_va) |>
  pivot_longer(-iter, names_to = "Series", values_to = "Loss") |>
  ggplot(aes(iter, Loss, linetype = Series)) +
  geom_line(color = "#ffffff", linewidth = 1.2, lineend = "round") +
  scale_linetype_manual(
    name   = NULL,  # remove legend title
    values = c(Train = "solid", Validation = "dotted")
  ) +
  guides(linetype = guide_legend(title = NULL)) +
  labs(x = "ITERATION (EPOCHS)", y = "MSE LOSS") +
  theme_minimal(base_size = 12, base_family = font_family) +
  theme(
    plot.margin      = margin(t = 36, r = 18, b = 36, l = 18),
    axis.title.x     = element_text(margin = margin(t = 14), color = fg_col),
    axis.title.y     = element_text(margin = margin(r = 14), color = fg_col),
    axis.text        = element_text(size = 14, color = fg_col),
    legend.text      = element_text(size = 14, color = fg_col),
    legend.title     = element_blank(),  # also remove via theme
    legend.position  = "top",
    legend.key       = element_rect(fill = bg_col, color = NA),
    plot.background  = element_rect(fill = bg_col, color = NA),
    panel.background = element_rect(fill = bg_col, color = NA),
    panel.grid.major = element_line(color = grid_col),
    panel.grid.minor = element_blank()
  )

Quick note: MSE is in squared JPY and RMSE is in JPY, so RMSE is easier to reason about. Compare that RMSE to both baselines—if it beats persistence and edges out a linear model, the network’s earning its keep.

In the plots, each dotted segment is a short forecast horizon stitched across time: the first step is usually the cleanest, and later steps get noisier as uncertainty compounds. If those segments swing too much, we can concentrate on modeling log prices or returns to stabilize the scale.

Code
suppressPackageStartupMessages({
  library(nnet)
  library(tibble)
  library(tidyr)
  library(ggplot2)
})

set.seed(42)

# ---- Use REAL data ------------------------------------------------------------
# Prefer existing splits; otherwise derive from `monthly_df` (+ N_LAGS, VALID_FR)
get_splits <- function() {
  if (all(c("X_train","y_train","X_valid","y_valid") %in% ls(envir = .GlobalEnv))) {
    list(
      X_train = as.matrix(get("X_train", envir = .GlobalEnv)),
      y_train = as.numeric(get("y_train", envir = .GlobalEnv)),
      X_valid = as.matrix(get("X_valid", envir = .GlobalEnv)),
      y_valid = as.numeric(get("y_valid", envir = .GlobalEnv))
    )
  } else if (exists("monthly_df", inherits = TRUE)) {
    make_lags <- function(x, n_lags) {
      E <- embed(x, n_lags + 1)
      list(X = E[, -1, drop = FALSE], y = E[, 1])
    }
    N_LAGS   <- get0("N_LAGS",   ifnotfound = 12L)
    VALID_FR <- get0("VALID_FR", ifnotfound = 0.20)
    series   <- monthly_df$price
    lagged   <- make_lags(series, N_LAGS)
    X_all <- lagged$X; y_all <- lagged$y
    n     <- nrow(X_all); v_n <- max(1, round(n * VALID_FR)); v_i0 <- n - v_n + 1
    list(
      X_train = X_all[1:(v_i0 - 1), , drop = FALSE],
      y_train = y_all[1:(v_i0 - 1)],
      X_valid = X_all[v_i0:n, , drop = FALSE],
      y_valid = y_all[v_i0:n]
    )
  } else {
    stop("No real data available. Provide X_train/y_train/X_valid/y_valid or monthly_df.")
  }
}

splits  <- get_splits()
X_train <- splits$X_train; y_train <- splits$y_train
X_valid <- splits$X_valid; y_valid <- splits$y_valid

# Align just in case
n <- min(nrow(X_valid), length(y_valid))
X_valid <- X_valid[seq_len(n), , drop = FALSE]
y_valid <- y_valid[seq_len(n)]

# ---- "Reshape to 2D" equivalent + scaling (R matrices are already 2D) --------
x_mu <- colMeans(X_train)
x_sd <- apply(X_train, 2, sd); x_sd[!is.finite(x_sd) | x_sd == 0] <- 1
X_train_s <- sweep(sweep(X_train, 2, x_mu, "-"), 2, x_sd, "/")
X_valid_s <- sweep(sweep(X_valid, 2, x_mu, "-"), 2, x_sd, "/")

y_mu <- mean(y_train); y_sd <- sd(y_train); if (!is.finite(y_sd) || y_sd == 0) y_sd <- 1
y_train_s <- (y_train - y_mu) / y_sd

# ---- Fit MLP (single hidden layer ~ scikit example’s first layer size) -------
mlp <- nnet(
  x = X_train_s, y = y_train_s,
  size = 8, linout = TRUE, decay = 1e-3, maxit = 1000, trace = FALSE
)

# ---- Predict on validation and invert scaling --------------------------------
y_pred_s <- as.numeric(predict(mlp, X_valid_s))
y_pred   <- y_pred_s * y_sd + y_mu

# ---- Metrics ------------------------------------------------------------------
sk_mlp_mse  <- mean((y_valid - y_pred)^2, na.rm = TRUE)
sk_mlp_rmse <- sqrt(sk_mlp_mse)
print(tibble(Model = "R MLP (nnet)", MSE = sk_mlp_mse, RMSE = sk_mlp_rmse))
# A tibble: 1 × 3
  Model            MSE  RMSE
  <chr>          <dbl> <dbl>
1 R MLP (nnet) 250989.  501.
Code
# ---- Dark Menlo styling: white lines, solid/dotted, no legend title ----------
bg_col      <- "#313131"
fg_col      <- get0("fg_col", ifnotfound = "#ffffff")
grid_col    <- "#252525"
font_family <- "Menlo"

tibble(
  idx         = seq_along(y_valid),
  Actual      = y_valid,
  Prediction  = y_pred
) |>
  pivot_longer(-idx, names_to = "Series", values_to = "Value") |>
  ggplot(aes(idx, Value, linetype = Series)) +
  geom_line(color = "#ffffff", linewidth = 1.2, lineend = "round") +
  scale_linetype_manual(
    name   = NULL,  # remove legend title
    values = c(Actual = "solid", Prediction = "dotted")
  ) +
  labs(x = "TIME (MONTHS)", y = "PRICE (JPY)") +
  theme_minimal(base_size = 12, base_family = font_family) +
  theme(
    plot.margin      = margin(t = 36, r = 18, b = 36, l = 18),
    axis.title.x     = element_text(margin = margin(t = 14), color = fg_col),
    axis.title.y     = element_text(margin = margin(r = 14), color = fg_col),
    axis.text        = element_text(size = 14, color = fg_col),
    legend.text      = element_text(size = 14, color = fg_col),
    legend.position  = "top",
    legend.key       = element_rect(fill = bg_col, color = NA),
    plot.background  = element_rect(fill = bg_col, color = NA),
    panel.background = element_rect(fill = bg_col, color = NA),
    panel.grid.major = element_line(color = grid_col),
    panel.grid.minor = element_blank()
  )

The large vertical scale below reflects raw price targets across multiple horizons. We typically want a steep early drop, then a gentle slide or a flat line. A consistent validation floor with small random wiggles suggests the model has learned what it can from the current inputs.

Code
suppressPackageStartupMessages({
  library(torch)
  library(tibble)
  library(tidyr)
  library(ggplot2)
})

# Torch runtime
if (!torch_is_installed()) install_torch()
set.seed(42); torch_manual_seed(42)

# ---- Real data ---------------------------------------------------------------
if (exists("monthly_df", inherits = TRUE) && "price" %in% names(monthly_df)) {
  prices <- as.numeric(monthly_df$price)
} else if (exists("series", inherits = TRUE)) {
  prices <- as.numeric(get("series", inherits = TRUE))
} else {
  stop("Provide `monthly_df$price` or `series`.")
}

# ---- Params ------------------------------------------------------------------
N_LAGS     <- get0("N_LAGS",     ifnotfound = 3L)
N_FUTURE   <- get0("N_FUTURE",   ifnotfound = 2L)
VALID_FR   <- get0("VALID_FR",   ifnotfound = 0.20)
VALID_SIZE <- get0("VALID_SIZE", ifnotfound = NA_integer_)
BATCH_SIZE <- get0("BATCH_SIZE", ifnotfound = 5L)
N_EPOCHS   <- get0("N_EPOCHS",   ifnotfound = 1000L)
PRINT_EVERY<- get0("PRINT_EVERY",ifnotfound = 50L)

# ---- create_input_data -------------------------------------------------------
create_input_data <- function(series, n_lags = 1L, n_leads = 1L) {
  series <- as.numeric(series)
  n <- length(series) - n_lags - n_leads + 1L
  if (n <= 0) stop("Series too short for the chosen n_lags and n_leads.")
  X <- matrix(NA_real_, nrow = n, ncol = n_lags)
  y <- matrix(NA_real_, nrow = n, ncol = n_leads)
  for (step in seq_len(n)) {
    end_step    <- step + n_lags - 1L
    forward_end <- end_step + n_leads
    X[step, ] <- series[step:end_step]
    y[step, ] <- series[(end_step + 1L):forward_end]
  }
  list(X = X, y = y)
}

xy <- create_input_data(prices, n_lags = N_LAGS, n_leads = N_FUTURE)
X  <- xy$X; y <- xy$y

# ---- Split (match Python logic) ----------------------------------------------
n_samples <- nrow(X)
if (is.na(VALID_SIZE)) VALID_SIZE <- max(1L, round(n_samples * VALID_FR))
valid_ind <- n_samples - VALID_SIZE + (N_FUTURE - 1L)
valid_ind <- max(1L, min(valid_ind, n_samples - 1L))

X_train <- X[seq_len(valid_ind), , drop = FALSE]
y_train <- y[seq_len(valid_ind), , drop = FALSE]
X_valid <- X[(valid_ind + 1L):n_samples, , drop = FALSE]
y_valid <- y[(valid_ind + 1L):n_samples, , drop = FALSE]

# ---- Tensors -----------------------------------------------------------------
x_tr <- torch_tensor(X_train, dtype = torch_float())
y_tr <- torch_tensor(y_train, dtype = torch_float())
x_va <- torch_tensor(X_valid, dtype = torch_float())
y_va <- torch_tensor(y_valid, dtype = torch_float())

# ---- Model -------------------------------------------------------------------
mlp <- nn_module(
  initialize = function(input_size, output_size) {
    self$linear1 <- nn_linear(input_size, 16)
    self$linear2 <- nn_linear(16, 8)
    self$linear3 <- nn_linear(8, output_size)
    self$dropout <- nn_dropout(p = 0.2)
  },
  forward = function(x) {
    x %>%
      self$linear1() %>% nnf_relu() %>% self$dropout() %>%
      self$linear2() %>% nnf_relu() %>% self$dropout() %>%
      self$linear3()
  }
)

model     <- mlp(N_LAGS, N_FUTURE)
loss_fn   <- nn_mse_loss()
optimizer <- optim_adam(model$parameters, lr = 1e-3)

# ---- Train -------------------------------------------------------------------
iter_batches <- function(n, bs) split(sample.int(n), ceiling(seq_along(sample.int(n)) / bs))

train_losses <- numeric(N_EPOCHS)
valid_losses <- numeric(N_EPOCHS)
best_epoch   <- NA_integer_
best_val     <- Inf

for (epoch in seq_len(N_EPOCHS)) {
  model$train()
  acc <- 0; count <- 0
  for (batch_idx in iter_batches(nrow(x_tr), BATCH_SIZE)) {
    xb <- x_tr[batch_idx, , drop = FALSE]
    yb <- y_tr[batch_idx, , drop = FALSE]
    optimizer$zero_grad()
    y_hat <- model(xb$reshape(c(-1, N_LAGS)))
    loss  <- loss_fn(y_hat, yb)
    loss$backward(); optimizer$step()
    acc   <- acc + as.numeric(loss$item()) * length(batch_idx)
    count <- count + length(batch_idx)
  }
  train_losses[epoch] <- acc / count

  model$eval()
  with_no_grad({
    y_hat_va <- model(x_va$reshape(c(-1, N_LAGS)))
    vloss    <- as.numeric(loss_fn(y_hat_va, y_va)$item())
  })
  valid_losses[epoch] <- vloss

  if (epoch > 1 && vloss < best_val) {
    best_val   <- vloss
    best_epoch <- epoch
    # FIX: use the module method to get the state dict
    checkpoint_path <- file.path(getwd(), "mlp_checkpoint_2.pt")
    try(torch_save(model$state_dict(), checkpoint_path), silent = TRUE)
  }

  if (epoch %% PRINT_EVERY == 0) {
    message(sprintf("<%d> - Train: %.4f \t Valid: %.4f", epoch, train_losses[epoch], vloss))
  }
}
if (!is.na(best_epoch)) message(sprintf("Lowest loss recorded in epoch: %d", best_epoch))

# ---- Plot loss (dark Menlo, white solid/dotted, no legend title) -------------
bg_col      <- "#313131"
fg_col      <- get0("fg_col", ifnotfound = "#ffffff")
grid_col    <- "#252525"
font_family <- "Menlo"

tibble(iter = seq_len(N_EPOCHS),
       Train = train_losses,
       Validation = valid_losses) |>
  pivot_longer(-iter, names_to = "Curve", values_to = "Loss") |>
  ggplot(aes(iter, Loss, linetype = Curve)) +
  geom_line(color = "#ffffff", linewidth = 1.2, lineend = "round") +
  scale_linetype_manual(name = NULL, values = c(Train = "solid", Validation = "dotted")) +
  labs(x = "ITERATION (EPOCHS)", y = "MSE LOSS") +
  theme_minimal(base_size = 12, base_family = font_family) +
  theme(
    plot.margin      = margin(t = 36, r = 18, b = 36, l = 18),
    axis.title.x     = element_text(margin = margin(t = 14), color = fg_col),
    axis.title.y     = element_text(margin = margin(r = 14), color = fg_col),
    axis.text        = element_text(size = 14, color = fg_col),
    legend.text      = element_text(size = 14, color = fg_col),
    legend.position  = "top",
    legend.key       = element_rect(fill = bg_col, color = NA),
    plot.background  = element_rect(fill = bg_col, color = NA),
    panel.background = element_rect(fill = bg_col, color = NA),
    panel.grid.major = element_line(color = grid_col),
    panel.grid.minor = element_blank()
  )

Analysis

Here, the numbers tell us whether the model improved on the baselines. The shapes tell us how. If the dotted prediction line hugs the solid line during slow trends, the network is capturing the dominant pattern. If it lags at turning points or overreacts after sharp drops, that shows where the model struggles. Residuals often reveal regimes, quiet months where the error stays tame and chaotic runs where noise dominates. The loss curves add context. A good run drops fast early, then flattens. If training loss keeps falling while validation stalls or climbs, capacity or leakage is likely the culprit.

Results & Next Steps

Here’s the million-dollar question: should you trust this with real money? Not on its own. It’s a single-series, monthly model with limited samples. It often beats persistence and sometimes edges a linear model, but that’s not a tradable edge. I would test it like a skeptic: use rolling walk-forward splits, include fees and slippage, and report more than RMSE—directional accuracy, hit rate, turnover, max drawdown, and a simple Sharpe (more on this later). Check stability across regimes; if it breaks in stress, it’s noise.

Now, how to make it sturdier. Try log prices or returns to steady scale, and add uncertainty with Monte Carlo dropout or bootstrapping; if the bands swallow the signal, stop. If not, treat the forecast as a weak prior and combine it with simple, testable signals like momentum, seasonality, realized volatility, or an index filter. Require confirmation, keep positions small, size by volatility, use hard stops, paper trade for a quarter, then—maybe—a tiny allocation with alerts watching for drift. Bottom line: it’s a useful baseline, but not a standalone system.