Ashish Srikanth - Scoring Styles and Matchups in NBA 2024

By The Numbers

Context

All numbers in this analysis come from Basketball-Reference’s 2024–25 season pages, beginning with the league game log. Basketball-Reference, is part of Sports Reference, an independent statistics publisher that compiles official box scores, play-by-play reports, and league summaries, then standardizes team names and fields and runs consistency checks before publishing searchable tables. I pulled the season’s game logs and team pages, normalized teams and dates, and used those tables to build possession-based metrics and matchup summaries for the visuals and the model that follows.

Code

suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(readr); library(janitor); library(stringr)
  library(ggplot2); library(ggrepel); library(scales); library(DT); library(purrr)
  library(ggridges); library(lubridate)
  library(glmnet); library(plotly)
  library(tibble)
})

tot_raw <- readr::read_csv("data/2425NBASeason_TotalStats.csv", show_col_types = FALSE) |> clean_names()
pbg_raw <- readr::read_csv("data/2425NBASeason_PointsByGame.csv", show_col_types = FALSE) |> clean_names()
h2h_raw <- readr::read_csv("data/2425NBASeason_TeamVersusTeam.csv", show_col_types = FALSE) |> clean_names()

norm_team <- function(x) gsub("\\s*\\*$", "", as.character(x))

parse_date_std <- function(x) {
  if (inherits(x, "Date")) return(x)
  x_chr <- as.character(x)
  parsed <- suppressWarnings(parse_date_time(x_chr, orders = c("a b d Y","Y-m-d","mdY","m/d/Y"), tz = "UTC"))
  as.Date(parsed)
}

Clean the Game-Level File into Long Form

From the game log, each matchup is reshaped to one row per team per game with columns: DATE, VISITOR, PTS_001, HOME, PTS_002, ATTENDANCE, ARENA, and NOTES. Standardizing dates and normalizing team names prevents missed joins from punctuation or season-specific suffixes. This long format underpins the distributions, the top-nine by median points, and the matchup grid.

Code

stopifnot(all(c("date","start_time","visitor","pts_001","home","pts_002") %in% names(pbg_raw)))

away_long <- pbg_raw |>
  transmute(
    date       = parse_date_std(date),
    team       = norm_team(visitor),
    opponent   = norm_team(home),
    home_away  = "away",
    pts        = suppressWarnings(as.numeric(pts_001)),
    attendance, arena, notes
  )

home_long <- pbg_raw |>
  transmute(
    date       = parse_date_std(date),
    team       = norm_team(home),
    opponent   = norm_team(visitor),
    home_away  = "home",
    pts        = suppressWarnings(as.numeric(pts_002)),
    attendance, arena, notes
  )

pbg <- bind_rows(away_long, home_long) |>
  arrange(date, home_away) |>
  select(date, team, opponent, home_away, pts, attendance, arena, notes)

Build Team Totals and Matchup Tables

Team totals supply field goals, attempts, threes, twos, free throws, rebounds, assists, steals, blocks, turnovers, fouls, points, games, and minutes. From these I compute a possession estimate and the core rates directly. The head-to-head table is converted from records such as 2–1 into wins, losses, games played, and win percentage, then mapped onto a square grid with team abbreviations on both axes so each cell represents Team A’s performance against Team B.

Code

# Totals
stopifnot(all(c("team","g","mp","fg","fga","x3p","x3pa","x2p","x2pa","ft","fta","orb","drb","trb","ast","stl","blk","tov","pf","pts") %in% names(tot_raw)))

tot <- tot_raw |>
  mutate(team = norm_team(team))

stopifnot("team" %in% names(h2h_raw))
opp_cols <- setdiff(names(h2h_raw), c("rk","team"))

h2h_long <- h2h_raw |>
  mutate(team = norm_team(team)) |>
  pivot_longer(
    cols = all_of(opp_cols),
    names_to = "opponent_abbr",
    values_to = "record"
  ) |>
  filter(!is.na(record), record != "") |>
  
  separate(record, into = c("w","l"), sep = "-", remove = TRUE, convert = TRUE, fill = "right") |>
  mutate(
    gp      = w + l,
    win_pct = ifelse(gp > 0, w / gp, NA_real_),
    wl_diff = w - l
  ) |>
  filter(!is.na(win_pct))

h2h <- h2h_long |>
  select(team, opponent = opponent_abbr, win_pct, wl_diff, gp)

Quick Preview

Code

tot_view <- tot |>
  dplyr::select(team,g,mp,fg,fga,x3p,x3pa,x2p,x2pa,ft,fta,orb,drb,trb,ast,stl,blk,tov,pf,pts)
names(tot_view) <- toupper(names(tot_view))

pbg_view <- pbg |>
  dplyr::select(date, team, opponent, pts) |>
  dplyr::slice(1:18)
names(pbg_view) <- toupper(names(pbg_view))

h2h_view <- h2h |>
  dplyr::arrange(team, opponent) |>
  dplyr::slice(1:18)
names(h2h_view) <- toupper(names(h2h_view))

bg_col           <- get0("bg_col",            ifnotfound = "#FAF8F1")
.wrapper_max_w   <- get0(".wrapper_max_w",    ifnotfound = 790L)

tbl_css <- htmltools::tags$style(htmltools::HTML(sprintf("
  .tbl-card {
    border-radius: 6px;
    background: %s;
    padding: 0.75rem;
    width: 100%%;
    max-width: %dpx;
    margin: 0 auto 1rem;
    box-sizing: border-box;
    overflow: hidden; /* IMPORTANT: keep DT scrollbars inside */
  }
  @media (max-width: %dpx) {
    .tbl-card { padding: 0.5rem; }
  }
  /* Keep cells from wrapping unpredictably during width calc */
  .tbl-card table.dataTable thead th,
  .tbl-card table.dataTable tbody td { white-space: nowrap; }
", bg_col, .wrapper_max_w, .wrapper_max_w)))

make_dt_block <- function(id, data, height = "135px") {
  htmltools::tags$div(
    class = "tbl-card",
    id = paste0("tbl-wrap-", id),
    DT::datatable(
      data,
      rownames  = TRUE,
      options   = list(
        autoWidth       = FALSE,
        dom             = "tip",
        scrollY         = height,
        scrollX         = TRUE,
        scrollCollapse  = TRUE,
        ordering        = FALSE,
        paging          = FALSE
      ),
      callback = DT::JS(sprintf("
        // Keep header/body aligned across draws, resizes, and font loads
        var adjust = function(){ table.columns.adjust(); };
        table.on('init.dt',  adjust);
        table.on('draw.dt',  adjust);
        setTimeout(adjust, 0);
        setTimeout(adjust, 250);
        setTimeout(adjust, 1000);
        if (typeof window !== 'undefined') {
          window.addEventListener('resize', adjust, { passive: true });
        }
        if (document && document.fonts && document.fonts.ready) {
          document.fonts.ready.then(adjust);
        }
        // Observe wrapper size changes
        try {
          var wrap = document.getElementById('%s');
          if (wrap && 'ResizeObserver' in window) {
            new ResizeObserver(function(){ adjust(); }).observe(wrap);
          }
        } catch(e){}
      ", paste0("tbl-wrap-", id)))
    )
  )
}

htmltools::tagList(
  tbl_css,

  make_dt_block(
    id = "tot",
    data = tot_view %>% dplyr::arrange(dplyr::desc(PTS)),
    height = "200px"
  ),

  make_dt_block(
    id = "pbg",
    data = pbg_view,
    height = "135px"
  ),

  make_dt_block(
    id = "h2h",
    data = h2h_view,
    height = "135px"
  )
)

The backbone is a possession-based view. A possession is estimated as field goal attempts (FGA) minus offensive rebounds (ORB) plus turnovers (TOV) plus \(0.44\) \(\times\) free throw attempts (FTA). Offensive rating (OffRtg) is points scored per 100 possessions. Pace is possessions scaled to a 48-minute game. The four core ingredients are effective field goal percentage (eFG%), turnover percentage (TOV%), free throw rate (FTR), and offensive rebounds per 100 possessions (ORB/100). eFG% credits the extra point on made threes by computing (field goals + 0.5 × made threes) divided by attempts. TOV% is the share of possessions that end without a shot. FTR is free throw attempts relative to field goal attempts and serves as a direct measure of paint contact. ORB/100 is a second-chance rate that travels across tempos.

Possession Math and Four Factors

We use the common heuristic:

\[ \text{Possessions} = \text{FGA} - \text{ORB} + \text{TOV} + 0.44 \times \text{FTA} \]

OffRtg is points per 100 possessions. Pace scales by minutes to account for overtime. eFG% captures shot value. TOV% prices waste. FTR converts contact into expected value. ORB per 100 keeps denominators aligned.

Code

feat <- tot |>
  mutate(
    team      = as.character(team),
    poss      = fga - orb + tov + 0.44 * fta,
    pace      = 48 * poss / (mp/5),
    efg       = (fg + 0.5 * x3p) / pmax(fga, 1),
    tov_rate  = tov / pmax(poss, 1),
    ftr       = fta / pmax(fga, 1),
    x3p_rate  = x3pa / pmax(fga, 1),
    x3p_pct   = ifelse(x3pa > 0, x3p / x3pa, NA_real_),
    x2p_pct   = ifelse(x2pa > 0, x2p / x2pa, NA_real_),
    ft_pct    = ifelse(fta  > 0, ft  / fta,  NA_real_),
    orb100    = 100 * orb / pmax(poss, 1),
    ast_rate  = ast / pmax(fg, 1),
    offrtg    = 100 * pts / pmax(poss, 1)
  ) |>
  filter(is.finite(offrtg), is.finite(efg), is.finite(tov_rate), is.finite(ftr))

Shot Diet and Care with the Ball

The shot-diet plot places three-point attempt share on the horizontal axis and two-point percentage on the vertical axis, with bubble size tied to FTR. Two stable archetypes stand out. Teams that push a large share of threes succeed when accuracy holds near or above league average because eFG% scales quickly with made threes. Teams that live in the paint succeed when two-point percentage is strong and the FTR bubble is clear, which provides points that do not depend on jump-shot variance. Doing neither tends to produce a middling offense that stalls when a first option is taken away.

Code

bg_col            <- get0("bg_col",            ifnotfound = "#FAF8F1")
font_family       <- get0("font_family",       ifnotfound = "Ramabhadra")
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)
spike_style       <- get0("spike_style", ifnotfound = list(
  spikecolor="#000", spikedash="dash", spikethickness=1.5,
  spikemode="across", spikesnap="cursor", showspikes=TRUE
))
point_color <- "#751F2C"

if (!exists("rescale_sz", mode = "function")) {
  rescale_sz <- function(x, to = c(8, 28)) {
    rng <- range(x, na.rm = TRUE)
    if (!is.finite(diff(rng)) || diff(rng) == 0) return(rep(mean(to), length(x)))
    (x - rng[1]) / diff(rng) * diff(to) + to[1]
  }
}

hover_sd <- with(feat, paste0(
  "<b>TEAM:</b> ", team,
  "<br><b>3PA SHARE:</b> ", scales::percent(x3p_rate, 0.1),
  "<br><b>2P%:</b> ",      scales::percent(x2p_pct, 0.1),
  "<br><b>FTR:</b> ",      signif(ftr, 3),
  "<br><b>OFFRTG:</b> ",   round(offrtg, 1)
))

fig_sd <- plot_ly() |>
  add_trace(
    type = "scatter", mode = "markers",
    x = feat$x3p_rate, y = feat$x2p_pct,
    marker = list(
      size = rescale_sz(feat$ftr, c(10, 28)),
      color = point_color,
      opacity = 0.7,
      line = list(width = 0, color = "rgba(0,0,0,0)")
    ),
    hoverinfo = "text", hovertext = hover_sd,
    name = "TEAM"
  )

if (exists("style_plot")) {
  fig_sd <- style_plot(
    fig_sd,
    "SHOT DIET — 3PA SHARE vs 2P% (SIZE = FTR)",
    "3PA SHARE", "2P%"
  )
} else {
  fig_sd <- fig_sd |>
    layout(
      title = list(text = "SHOT DIET — 3PA SHARE vs 2P% (SIZE = FTR)", x = 0.03, y = 0.95, xanchor = "left"),
      width  = .fixed_fig_width,
      height = .fixed_fig_height,
      font = list(family = font_family, size = 12),
      paper_bgcolor = bg_col, plot_bgcolor = bg_col,
      margin = list(l = 18, r = 18, t = 36, b = 36),
      legend = list(orientation = "v", x = 1.05, y = 1, xanchor = "left", yanchor = "top",
                    font = list(size = 16), title = list(text = "")),
      xaxis = c(list(
        title = list(text = "3PA SHARE", standoff = 20),
        tickformat = ".0%", tickfont = list(size = 16),
        gridcolor = "#E8E8E8", zeroline = FALSE, fixedrange = TRUE
      ), spike_style),
      yaxis = c(list(
        title = list(text = "2P%", standoff = 20),
        tickformat = ".0%", tickfont = list(size = 16),
        automargin = TRUE, fixedrange = TRUE
      ), spike_style),
      hovermode = "closest",
      hoverlabel = list(
        font = list(family = font_family, size = 14, color = "#313131"),
        bgcolor = "#FFF", namelength = -1
      )
    ) |>
    config(
      responsive = FALSE,
      scrollZoom = TRUE, doubleClick = FALSE,
      modeBarButtonsToRemove = list(
        "zoom2d","pan2d","select2d","lasso2d","zoomIn2d","zoomOut2d",
        "autoScale2d","resetScale2d","toggleSpikelines","toImage"
      ),
      displaylogo = FALSE, displayModeBar = TRUE, showTips = FALSE
    )
}

page_css_sd <- htmltools::tags$style(htmltools::HTML(sprintf("
  #shot-diet-wrap {
    border-radius: 6px;
    overflow: hidden;
    overflow-x: auto;             /* horizontal scroll */
    -webkit-overflow-scrolling: touch;
    background: %s;
    padding: 0.75rem;
    width: 100%%;
    max-width: %dpx;
    margin: 0 auto 1rem;
  }
  #shot-diet-wrap > div { min-width: %dpx; }  /* forces inner canvas width */
  @media (max-width: %dpx) { #shot-diet-wrap { padding: 0.5rem; } }
", bg_col, .wrapper_max_w, .fixed_fig_width, .wrapper_max_w)))

htmltools::browsable(
  htmltools::tagList(
    page_css_sd,
    htmltools::tags$div(id = "shot-diet-wrap", fig_sd)
  )
)

The efficiency-versus-care scatter places eFG% on the x axis and TOV% on the y axis, with the y axis flipped so lower TOV% appears higher. The upper right is the favorable quadrant. Teams that shoot well and value possessions separate over a game’s worth of trips. eFG% explains most level differences in OffRtg. Among teams with similar eFG%, the one with fewer wasted trips pulls away. FTR and ORB/100 do not always jump off the page in this scatter, but they raise the floor by creating extra points and extra chances when the jumper cools.

Code

bg_col            <- get0("bg_col",            ifnotfound = "#FAF8F1")
font_family       <- get0("font_family",       ifnotfound = "Ramabhadra")
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)
spike_style <- get0("spike_style", ifnotfound = list(
  spikecolor="#000", spikedash="dash", spikethickness=1.5,
  spikemode="across", spikesnap="cursor", showspikes=TRUE
))
point_color <- "#751F2C"

if (!exists("rescale_sz", mode = "function")) {
  rescale_sz <- function(x, to = c(8, 28)) {
    rng <- range(x, na.rm = TRUE)
    if (!is.finite(diff(rng)) || diff(rng) == 0) return(rep(mean(to), length(x)))
    (x - rng[1]) / diff(rng) * diff(to) + to[1]
  }
}

hover_ec <- with(feat, paste0(
  "<b>TEAM:</b> ", team,
  "<br><b>EFG%:</b> ", scales::percent(efg, 0.1),
  "<br><b>TOV%:</b> ", scales::percent(tov_rate, 0.1),
  "<br><b>OFFRTG:</b> ", round(offrtg, 1)
))

fig_ec <- plot_ly() |>
  add_trace(
    type = "scatter", mode = "markers",
    x = feat$efg, y = feat$tov_rate,
    marker = list(
      size  = rescale_sz(feat$offrtg, c(10, 28)),
      color = point_color, opacity = 0.7,
      line  = list(width = 0, color = "rgba(0,0,0,0)")
    ),
    hoverinfo = "text", hovertext = hover_ec,
    name = "TEAM"
  ) |>
  layout(yaxis = list(autorange = "reversed"))

if (exists("style_plot")) {
  fig_ec <- style_plot(
    fig_ec,
    "EFFICIENCY vs CARE — EFG% vs TOV% (SIZE = OFFRTG)",
    "EFG%", "TOV%"
  )
} else {
  fig_ec <- fig_ec |>
    layout(
      title = list(text = "EFFICIENCY vs CARE — EFG% vs TOV% (SIZE = OFFRTG)", x = 0.03, y = 0.95, xanchor = "left"),
      width  = .fixed_fig_width,
      height = .fixed_fig_height,
      font = list(family = font_family, size = 12),
      paper_bgcolor = bg_col, plot_bgcolor = bg_col,
      margin = list(l = 18, r = 18, t = 36, b = 36),
      legend = list(orientation = "v", x = 1.05, y = 1, xanchor = "left", yanchor = "top",
                    font = list(size = 16), title = list(text = "")),
      xaxis = c(list(
        title = list(text = "EFG%", standoff = 20),
        tickformat = ".0%", tickfont = list(size = 16),
        gridcolor = "#E8E8E8", zeroline = FALSE, fixedrange = TRUE
      ), spike_style),
      yaxis = c(list(
        title = list(text = "TOV%", standoff = 20),
        tickformat = ".0%", tickfont = list(size = 16),
        automargin = TRUE, fixedrange = TRUE
      ), spike_style),
      hovermode = "closest",
      hoverlabel = list(
        font = list(family = font_family, size = 14, color = "#313131"),
        bgcolor = "#FFF", namelength = -1
      )
    ) |>
    config(
      responsive = FALSE,
      scrollZoom = TRUE, doubleClick = FALSE,
      modeBarButtonsToRemove = list(
        "zoom2d","pan2d","select2d","lasso2d","zoomIn2d","zoomOut2d",
        "autoScale2d","resetScale2d","toggleSpikelines","toImage"
      ),
      displaylogo = FALSE, displayModeBar = TRUE, showTips = FALSE
    )
}

page_css_ec <- htmltools::tags$style(htmltools::HTML(sprintf("
  #eff-care-wrap {
    border-radius: 6px;
    overflow: hidden;
    overflow-x: auto;             /* horizontal scroll */
    -webkit-overflow-scrolling: touch;
    background: %s;
    padding: 0.75rem;
    width: 100%%;
    max-width: %dpx;
    margin: 0 auto 1rem;
  }
  #eff-care-wrap > div { min-width: %dpx; }
  @media (max-width: %dpx) { #eff-care-wrap { padding: 0.5rem; } }
", bg_col, .wrapper_max_w, .fixed_fig_width, .wrapper_max_w)))

htmltools::browsable(
  htmltools::tagList(
    page_css_ec,
    htmltools::tags$div(id = "eff-care-wrap", fig_ec)
  )
)

Game-Level Distributions

The distributions of points by game for the top nine offenses show where nights usually land and how often the ceiling appears. Medians cluster around the league middle, which sets expectations for a normal night. Long right tails mark blowout gear that shows up when threes fall in waves or when the rim is compromised. Narrow shapes with short tails mark steady output and fewer surprises. Wide shapes with longer tails mark volatility that can swing a series. The head-to-head grid shows pockets rather than sweeps. Each top-nine team has clearly favorable opponents and at least one problem pairing. With prior-informed smoothing in place, early 100 percent squares regress toward more realistic values until more games are played, and late-season tiles converge to the raw reality.

Code

bg_col            <- get0("bg_col",            ifnotfound = "#FAF8F1")
font_family       <- get0("font_family",       ifnotfound = "Ramabhadra")
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)
point_color       <- "#751F2C"

league_med <- median(pbg$pts, na.rm = TRUE)

top9_order <- pbg |>
  dplyr::group_by(team) |>
  dplyr::summarise(MED = median(pts, na.rm = TRUE), .groups = "drop") |>
  dplyr::arrange(dplyr::desc(MED)) |>
  dplyr::slice_head(n = 9)

team_levels <- top9_order |>
  dplyr::arrange(MED) |>
  dplyr::pull(team)

pbg_top <- pbg |>
  dplyr::filter(team %in% team_levels)
pbg_top$team <- factor(pbg_top$team, levels = team_levels)

fig_vi <- plotly::plot_ly(type = "violin", orientation = "h")
for (tm in levels(pbg_top$team)) {
  dat <- pbg_top[pbg_top$team == tm, , drop = FALSE]
  fig_vi <- fig_vi |>
  plotly::add_trace(
    y = dat$team, x = dat$pts, name = tm,
    type = "violin", orientation = "h",
    points    = "all",
    pointpos  = 0,
    jitter    = 0.12,
    marker    = list(
      size  = 12,
      color = point_color,
      opacity = 0.9
    ),
    meanline  = list(visible = TRUE, color = point_color),
    line      = list(color = point_color),
    fillcolor = bg_col, opacity = 0.7,
    hoverinfo = "text",
    text = paste0("<b>TEAM:</b> ", tm, "<br><b>PTS:</b> ", dat$pts)
  )
}

fig_vi <- fig_vi |>
  plotly::layout(
    title = list(text = "POINTS BY GAME — DISTRIBUTIONS (TOP 9 TEAMS)", x = 0.03, y = 0.95, xanchor = "left"),
    width  = .fixed_fig_width,
    height = .fixed_fig_height,
    font = list(family = font_family, size = 12),
    paper_bgcolor = bg_col, plot_bgcolor = bg_col,
    margin = list(l = 18, r = 18, t = 36, b = 36),
    legend = list(orientation = "v", x = 1.05, y = 1, xanchor = "left", yanchor = "top",
                  font = list(size = 16), title = list(text = "")),
    xaxis = list(
      title = list(text = "PTS", standoff = 20),
      tickfont = list(size = 16),
      gridcolor = "#E8E8E8",
      zeroline = FALSE,
      fixedrange = TRUE
    ),
    yaxis = list(
      title = list(text = "TEAM", standoff = 20),
      tickfont = list(size = 16),
      automargin = TRUE,
      fixedrange = TRUE
    ),
    hovermode = "closest",
    hoverlabel = list(
      font = list(family = font_family, size = 14, color = "#313131"),
      bgcolor = "#FFF", namelength = -1
    ),
    shapes = list(list(
      type = "line", x0 = league_med, x1 = league_med, xref = "x",
      y0 = 0, y1 = 1, yref = "paper",
      line = list(color = "#000000", dash = "dash", width = 1)
    ))
  ) |>
  plotly::config(
    responsive = FALSE,
    scrollZoom = TRUE, doubleClick = FALSE,
    modeBarButtonsToRemove = list(
      "zoom2d","pan2d","select2d","lasso2d","zoomIn2d","zoomOut2d",
      "autoScale2d","resetScale2d","toggleSpikelines","toImage"
    ),
    displaylogo = FALSE, displayModeBar = TRUE, showTips = FALSE
  )

pbg_vi_css <- htmltools::tags$style(htmltools::HTML(sprintf("
  #pbg-violin-wrap {
    border-radius: 6px;
    overflow: hidden;
    overflow-x: auto;
    -webkit-overflow-scrolling: touch;
    background: %s;
    padding: 0.75rem;
    width: 100%%;
    max-width: %dpx;
    margin: 0 auto 1rem;
  }
  #pbg-violin-wrap > div { min-width: %dpx; }
  @media (max-width: %dpx) { #pbg-violin-wrap { padding: 0.5rem; } }
", bg_col, .wrapper_max_w, .fixed_fig_width, .wrapper_max_w)))

htmltools::browsable(
  htmltools::tagList(
    pbg_vi_css,
    htmltools::tags$div(id = "pbg-violin-wrap", fig_vi)
  )
)

Head-to-Head Texture (Win Percentage)

The head-to-head file contains win–loss records (e.g., 2–1) by opponent. We visualize win percentage vs opponent as a heat map.

Code

bg_col            <- get0("bg_col",            ifnotfound = "#FAF8F1")
font_family       <- get0("font_family",       ifnotfound = "Ramabhadra")
.fixed_fig_width  <- get0(".fixed_fig_width",  ifnotfound = 1100L)
.fixed_fig_height <- get0(".fixed_fig_height", ifnotfound = 390L)
.wrapper_max_w    <- get0(".wrapper_max_w",    ifnotfound = 790L)

library(dplyr); library(tidyr); library(plotly); library(htmltools)

top9 <- pbg %>%
  group_by(team) %>%
  summarise(MED = median(pts, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(MED)) %>%
  slice_head(n = 9)

abbr_map <- c(
  "Atlanta Hawks"="ATL","Boston Celtics"="BOS","Brooklyn Nets"="BRK","Charlotte Hornets"="CHO",
  "Chicago Bulls"="CHI","Cleveland Cavaliers"="CLE","Dallas Mavericks"="DAL","Denver Nuggets"="DEN",
  "Detroit Pistons"="DET","Golden State Warriors"="GSW","Houston Rockets"="HOU","Indiana Pacers"="IND",
  "Los Angeles Clippers"="LAC","LA Clippers"="LAC","Los Angeles Lakers"="LAL","Memphis Grizzlies"="MEM",
  "Miami Heat"="MIA","Milwaukee Bucks"="MIL","Minnesota Timberwolves"="MIN","New Orleans Pelicans"="NOP",
  "New York Knicks"="NYK","Oklahoma City Thunder"="OKC","Orlando Magic"="ORL","Philadelphia 76ers"="PHI",
  "Phoenix Suns"="PHO","Portland Trail Blazers"="POR","Sacramento Kings"="SAC","San Antonio Spurs"="SAS",
  "Toronto Raptors"="TOR","Utah Jazz"="UTA","Washington Wizards"="WAS"
)

top9_abbr <- unname(abbr_map[top9$team])
top9_abbr <- top9_abbr[!is.na(top9_abbr)]

base <- h2h %>%
  mutate(
    TEAM_ABBR = unname(abbr_map[team]),
    OPP_ABBR  = toupper(opponent),
    WIN_NORM  = ifelse(is.na(win_pct), NA_real_,
                       ifelse(win_pct > 1, win_pct/100, win_pct))
  ) %>%
  filter(!is.na(TEAM_ABBR)) %>%
  select(TEAM_ABBR, OPP_ABBR, WIN_NORM) %>%
  filter(TEAM_ABBR %in% top9_abbr, OPP_ABBR %in% top9_abbr) %>%
  group_by(TEAM_ABBR, OPP_ABBR) %>%
  summarise(WIN_NORM = mean(WIN_NORM, na.rm = TRUE), .groups = "drop")

rev9 <- base %>%
  transmute(TEAM_ABBR = OPP_ABBR, OPP_ABBR = TEAM_ABBR, WIN_REV = WIN_NORM)

symm9 <- full_join(base, rev9, by = c("TEAM_ABBR","OPP_ABBR")) %>%
  mutate(
    WIN_NORM = coalesce(WIN_NORM, if_else(!is.na(WIN_REV), 1 - WIN_REV, NA_real_)),
    WIN_NORM = if_else(TEAM_ABBR == OPP_ABBR, NA_real_, WIN_NORM)
  ) %>%
  select(TEAM_ABBR, OPP_ABBR, WIN_NORM)

grid9 <- crossing(TEAM_ABBR = top9_abbr, OPP_ABBR = top9_abbr) %>%
  left_join(symm9, by = c("TEAM_ABBR","OPP_ABBR"))

wide9 <- grid9 %>%
  mutate(TEAM_ABBR = factor(TEAM_ABBR, levels = top9_abbr),
         OPP_ABBR  = factor(OPP_ABBR,  levels = top9_abbr)) %>%
  arrange(TEAM_ABBR, OPP_ABBR) %>%
  pivot_wider(
    names_from = OPP_ABBR, values_from = WIN_NORM,
    values_fn  = list(WIN_NORM = ~mean(.x, na.rm = TRUE)),
    values_fill = NA_real_
  )

teams_y <- as.character(wide9$TEAM_ABBR)
opps_x  <- colnames(wide9)[-1]
z_mat   <- as.matrix(as.data.frame(wide9[, -1, drop = FALSE]))

hover_vec <- ifelse(
  is.na(as.vector(z_mat)),
  paste0("<b>TEAM:</b> ", rep(teams_y, each = length(opps_x)),
         "<br><b>OPP:</b> ", rep(opps_x,  times = length(teams_y)),
         "<br><b>WIN%:</b> —"),
  paste0("<b>TEAM:</b> ", rep(teams_y, each = length(opps_x)),
         "<br><b>OPP:</b> ", rep(opps_x,  times = length(teams_y)),
         "<br><b>WIN%:</b> ", sprintf('%.2f%%', (1 - as.vector(z_mat)) * 100))
)
hover_txt <- matrix(hover_vec, nrow = length(teams_y), byrow = TRUE)

fig_top9 <- plot_ly(
  type = "heatmap",
  x = opps_x, y = teams_y, z = z_mat,
  zmin = 0, zmax = 1,
  colorscale = list(list(0, "#B95763"), list(1, "#751F2C")),
  hoverinfo = "text", text = hover_txt,
  colorbar = list(
    title = list(text = ""),
    tickmode = "array",
    tickvals = c(0, 0.25, 0.5, 0.75, 1),
    ticktext = c("0%", "25%", "50%", "75%", "100%")
  ),
  xgap = 1, ygap = 1, zsmooth = FALSE, hoverongaps = TRUE
) %>%
  layout(
    title = list(text = "WIN PERCENTAGE BY MATCHUP — TOP 9 TEAMS", x = 0.03, y = 0.95, xanchor = "left"),
    width  = .fixed_fig_width,
    height = .fixed_fig_height,
    font = list(family = font_family, size = 12),
    paper_bgcolor = bg_col, plot_bgcolor = bg_col,
    margin = list(l = 18, r = 70, t = 36, b = 36),
    xaxis = list(
      title = list(text = "OPPONENT (ABBR.)", standoff = 20),
      tickangle = 45, tickfont = list(size = 16),
      gridcolor = "#E8E8E8", zeroline = FALSE, fixedrange = TRUE
    ),
    yaxis = list(
      title = list(text = "TEAM (ABBR.)", standoff = 20),
      tickfont = list(size = 16), automargin = TRUE
    ),
    hovermode = "closest",
    hoverlabel = list(
      font = list(family = font_family, size = 14, color = "#313131"),
      bgcolor = "#FFF", namelength = -1
    )
  ) %>%
  config(
    responsive = FALSE,
    scrollZoom = TRUE, doubleClick = FALSE,
    modeBarButtonsToRemove = list(
      "zoom2d","pan2d","select2d","lasso2d","zoomIn2d","zoomOut2d",
      "autoScale2d","resetScale2d","toggleSpikelines","toImage"
    ),
    displaylogo = FALSE, displayModeBar = TRUE, showTips = FALSE
  )

css_top9 <- tags$style(HTML(sprintf("
  #h2h-top9-wrap {
    border-radius: 6px;
    overflow: hidden;
    overflow-x: auto;
    -webkit-overflow-scrolling: touch;
    background: %s;
    padding: 0.75rem;
    width: 100%%; max-width: %dpx;
    margin: 0 auto 1rem;
  }
  #h2h-top9-wrap > div { min-width: %dpx; }
  @media (max-width: %dpx) { #h2h-top9-wrap { padding: 0.5rem; } }
", bg_col, .wrapper_max_w, .fixed_fig_width, .wrapper_max_w)))

browsable(tagList(css_top9, tags$div(id = "h2h-top9-wrap", fig_top9)))

A Compact Predictive Step

I fit a least absolute shrinkage and selection operator (LASSO) with one row per team to relate OffRtg to eFG%, TOV%, FTR, ORB/100, and controls for pace, three-point attempt share, three-point percentage, assist rate, and shot mix. Cross-validation (CV) selects the penalty by splitting teams into folds, training on most folds, predicting the held-out fold, and rotating until every fold is tested once. For this dataset CV chose λ_min = 0.03371709, a moderate level of shrinkage that keeps only the predictors with consistent signal while damping the rest. Results reported here use that λ; if an even sparser model with similar error is preferred, the 1 SE choice (λ_1se) is a reasonable alternative.

Code

mf <- feat |>
  transmute(
    team, offrtg,
    efg, tov_rate, ftr, orb100,
    x3p_pct, x3p_rate, ast_rate, pace
  ) |>
  drop_na()

X <- scale(as.matrix(select(mf, -team, -offrtg)))
y <- mf$offrtg

set.seed(2425)
cvfit <- cv.glmnet(X, y, alpha = 1, nfolds = min(10, nrow(X)), standardize = FALSE, family = "gaussian")
best_lambda <- cvfit$lambda.min
best_lambda

[1] 0.03371709

Wrap-Up and Next Steps

ORB/100 is a clean rate, but it is not the same as true offensive rebounding percentage that nets out opponent rebounds. OffRtg here is unadjusted for opponent strength, venue, or rest, and the head-to-head smoothing uses a simple prior that does not account for home and away splits or lineup availability. The LASSO operates on 30 team rows, which makes coefficients directional rather than precise. Results are stable enough for interpretation, but they should be read as patterns rather than exact margins.

Two extensions are natural. First, fold in defense to move from OffRtg to net rating (NetRtg) so the analysis reflects both ends of the floor. Second, replace ORB/100 with a true rate that accounts for opponent defensive rebounds and integrate opponent quality, venue, rest, and injury context for both the model and the prior used in the matchup grid. An elastic net that blends LASSO with ridge is a sensible next model to stabilize correlated shot-mix variables. None of these changes alters the central message. eFG% sets the level, TOV% separates similar shooters, and FTR with ORB/100 keeps the floor high when variance arrives.