Sentiment in Donald Trump’s Public Statements and FX Market Dynamics

An empirical examination of communication and exchange rate behavior

predictive analytics
financial markets
political communication
sentiment analysis
classification
regularization
Python
Author

A. Srikanth

Published

December 19, 2025

Project Spotlight

Context

This project started with a simple (slightly chaotic) question: when Donald Trump fires off an emotionally loaded post about another country, does the U.S. dollar actually react?

Instead of treating this like a vague “markets respond to news” claim, we framed it as a measurable pipeline problem: take unstructured text (Tweets/Truth Social posts), convert it into a sentiment signal, link each post to a specific country (and therefore a currency), and then test whether FX prices and volatility move in the window right after those posts land. The focus wasn’t to build a magic trading bot. It was to build something reproducible and honest that connects behavioral signals to market data, then see what survives contact with reality.

Objectives

The work in this project had four main parts. The first was to engineer a reliable Tweet/Truth pipeline that could be re-run without manual intervention, including scraping, parsing, sentiment scoring, and country detection. The second was to engineer an exchange rate pipeline that pulls a consistent daily time series for a small set of currency pairs, stored in a modular way so joins and validation are simple.

The third part was exploratory and diagnostic: quantify how many posts exist in the sample, how many contain direct foreign country mentions, and how sentiment behaves overall and by country. The last part was the most practical stress test. If extreme sentiment really creates tradable uncertainty, then a direction-neutral long-volatility structure like a straddle should at least have a chance. I implemented a threshold-based long straddle backtest to see whether realized moves after extreme posts were large enough to overcome the option premium.

Data Sources

The text data came from a one-year social media archive of Donald Trump’s posts hosted on rollcall.com, collected across a trailing 12-month window in late 2024 to late 2025. Raw pages were saved locally as HTML to support offline parsing, avoid partial downloads, and make the pipeline reproducible. Each post was then enriched with a VADER sentiment score and metadata such as date and source, plus extracted country mentions and mapped currencies.

The market data came from Yahoo Finance via yfinance, using daily closing exchange rates for four USD-referenced currency pairs: GBP/USD, INR/USD, JPY/USD, and CAD/USD. Each currency pair was stored in its own CSV so the time series could be updated independently and anomalies could be traced without breaking the full dataset.

Analysis

The first takeaway from the exploratory analysis was scale. Over the one-year window, the dataset contained 6,897 posts, and posting frequency stayed high throughout the period, often above twenty posts per day during the election season. For the FX question, that firehose is mostly noise, because most posts do not reference a foreign country directly. After filtering to posts that contain direct country mentions and including common abbreviations, the dataset shrank from nearly 7,000 posts to roughly 400 FX-relevant posts.

Those country-mention posts were mostly focused on a single country at a time, which matters because it reduces ambiguity when aligning a post to a single currency. There were multi-country posts too, usually in comparative statements, and those tended to show slightly stronger sentiment polarization because they emphasize contrast, praising one partner while criticizing another.

Sentiment itself was more balanced than the stereotype suggests. Using VADER compound polarity, approximately half of the posts scored as positive. When sentiment was segmented by country, Japan showed the highest average positive sentiment, Canada and India were more mixed, and the United Kingdom sat near the middle. That distribution shaped the working hypothesis for the rest of the analysis: the more useful signal is not direction, it is intensity. Extreme sentiment can coincide with short-term volatility in USD pairs, but it is unlikely to deliver durable or directional predictability in price movement.

Methodology

The Truth/Tweet pipeline was built to be transparent and repeatable using lightweight tools. Collection was automated with a Fish shell script (fetch_and_save_trump_tweets.fish) to handle repeated runs needed under rate limits. Sentiment analysis was performed in Python using NLTK and VADER because it is designed for social media text and remains interpretable. Country identification was implemented with a regex scan over post text, backed by pycountry to standardize names and reduce false matches. Those country hits were then mapped to currencies using a programmatic dictionary built from pycountry plus forex-python, producing a consistent country to currency linkage for downstream joins.

The exchange rate pipeline pulled daily closes with yfinance for GBP/USD, INR/USD, JPY/USD, and CAD/USD, and wrote each series to its own CSV. This kept the workflow modular and made the merge step straightforward, since correlations and event-window checks can be run per currency without mixing time series.

To test whether extreme sentiment can monetize volatility, a long straddle backtest was implemented. Since proprietary FX options chains were not available, synthetic weekly options with a fixed premium of 0.001 were generated per leg, which implies a 0.2% total premium cost for the straddle. A position was opened at the close on any day where abs(sentiment) >= 0.75 and closed at the next close, producing a one-day holding period. Across the four currency pairs, this produced 68 straddle trades. Importantly, an early version of the backtest appeared mildly positive, but after correcting payoff logic errors, the validated implementation produced a true overall average return of -34.1795%.

Results & Next Steps

The headline finding is that Trump-style social media sentiment shows up more as a short-term volatility signal than a directional one. Around highly polarized posts, USD pairs can get briefly noisier, but the effect is small and usually fades within a day.

That’s why the trading idea struggled. We tried a simple “bet on movement” trade (a long straddle): it makes money only if the currency moves a lot right after a very emotional post. In practice, the market usually didn’t move enough to cover the cost of placing that bet, so the strategy lost money on average.

For future work, I would prioritize two specific next experiments. First, run a formal event study on a multi-year sample with explicit timing conventions and controls for overlapping macro/geopolitical events, reporting abnormal returns and abnormal realized volatility over windows like [0,1], [0,3], and [0,5] days with statistical tests. Second, test trading ideas with more realistic assumptions: real option pricing (not a fixed cost), clearer timing rules, and smarter triggers that combine sentiment with “is the market already in a volatile mood” plus “is this post unusually attention-grabbing.”

Code
import re
import warnings
from pathlib import Path
from datetime import datetime, timedelta
from typing import Optional, List, Dict, Tuple

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d
from scipy.interpolate import interp1d

warnings.filterwarnings("ignore")


# -----------------------------------------------------------------------------
# Styling (match your notebook style)
# -----------------------------------------------------------------------------
FIG_W = 9
FIG_H = 5
FIG_DPI = 110

color_blue = "#033c73"
color_indigo = "#6610f2"
color_purple = "#6f42c1"
color_red = "#751F2C"

plt.rcParams.update(
    {
        "figure.figsize": (FIG_W, FIG_H),
        "figure.dpi": FIG_DPI,
        "font.family": "Ramabhadra",
        "font.weight": "bold",
        "text.color": "black",
        "axes.labelcolor": "black",
        "axes.titlecolor": "black",
        "xtick.color": "black",
        "ytick.color": "black",
        "axes.titlesize": 14,
        "axes.titleweight": "bold",
        "axes.labelsize": 12,
        "axes.labelweight": "bold",
        "xtick.labelsize": 10,
        "ytick.labelsize": 10,
        "legend.fontsize": 10,
        "axes.grid": True,
        "grid.alpha": 0.25,
    }
)

pd.set_option("display.width", 110)
pd.set_option("display.max_columns", 20)
pd.set_option("display.float_format", lambda x: f"{x:0.3f}")


# -----------------------------------------------------------------------------
# Paths
# -----------------------------------------------------------------------------
ROOT = Path(".")
DATA_DIR = ROOT / "data"
tweets_and_truths_IN = DATA_DIR / "tweets_and_truths.csv"
EXCHANGE_DIR = DATA_DIR / "exchange_rates"


# -----------------------------------------------------------------------------
# Utilities
# -----------------------------------------------------------------------------
def section(title: str) -> None:
    bar = "-" * len(title)
    print(f"\n{title}\n{bar}")


def ensure_dirs() -> None:
    DATA_DIR.mkdir(parents=True, exist_ok=True)
    EXCHANGE_DIR.mkdir(parents=True, exist_ok=True)

# -----------------------------------------------------------------------------
# Console + display helpers
# -----------------------------------------------------------------------------
pd.set_option("display.width", 110)
pd.set_option("display.max_columns", 4)   # <-- max 3 columns everywhere
pd.set_option("display.max_colwidth", 60)
pd.set_option("display.float_format", lambda x: f"{x:0.3f}")

def print_tbl(df: pd.DataFrame, title: str = "", n: int = 6) -> None:
    """Compact, tibble-like print: max 4 cols, first n rows."""
    if title:
        section(title)
    if df is None or len(df) == 0:
        print("(empty)")
        return
    view = df.copy()
    view = view.iloc[:n, : min(3, view.shape[1])]
    print(view.to_string(index=False))

# -----------------------------------------------------------------------------
# Country + currency extraction (compiled regex for speed)
# -----------------------------------------------------------------------------
_COUNTRY_PATTERNS: List[Tuple[str, str]] = [
    (r"\b(?:United States|USA|US|America|American)\b", "United States"),
    (r"\bCanada\b|\bCanadian\b", "Canada"),
    (r"\bChina\b|\bChinese\b", "China"),
    (r"\bJapan\b|\bJapanese\b", "Japan"),
    (r"\b(?:United Kingdom|UK|Britain|British|England|English)\b", "United Kingdom"),
    (r"\bGermany\b|\bGerman\b", "Germany"),
    (r"\bFrance\b|\bFrench\b", "France"),
    (r"\bItaly\b|\bItalian\b", "Italy"),
    (r"\bSpain\b|\bSpanish\b", "Spain"),
    (r"\bNetherlands\b|\bDutch\b", "Netherlands"),
    (r"\bIndia\b|\bIndian\b", "India"),
    (r"\bBrazil\b|\bBrazilian\b", "Brazil"),
    (r"\bMexico\b|\bMexican\b", "Mexico"),
    (r"\bRussia\b|\bRussian\b", "Russia"),
    (r"\bAustralia\b|\bAustralian\b", "Australia"),
    (r"\bSouth Korea\b|\bKorean\b", "South Korea"),
    (r"\bSaudi Arabia\b|\bSaudi\b", "Saudi Arabia"),
    (r"\bTurkey\b|\bTurkish\b", "Turkey"),
    (r"\bSwitzerland\b|\bSwiss\b", "Switzerland"),
    (r"\bSweden\b|\bSwedish\b", "Sweden"),
    (r"\bNorway\b|\bNorwegian\b", "Norway"),
    (r"\bDenmark\b|\bDanish\b", "Denmark"),
    (r"\bArgentina\b|\bArgentine\b", "Argentina"),
    (r"\bSouth Africa\b", "South Africa"),
    (r"\bSingapore\b", "Singapore"),
    (r"\bHong Kong\b", "Hong Kong"),
]
_COUNTRY_REGEX: List[Tuple[re.Pattern, str]] = [
    (re.compile(pat, flags=re.IGNORECASE), cname) for pat, cname in _COUNTRY_PATTERNS
]

_COUNTRY_TO_CCY: Dict[str, str] = {
    "United States": "USD",
    "Canada": "CAD",
    "United Kingdom": "GBP",
    "Japan": "JPY",
    "Germany": "EUR",
    "France": "EUR",
    "Italy": "EUR",
    "Spain": "EUR",
    "Netherlands": "EUR",
    "Switzerland": "CHF",
    "Sweden": "SEK",
    "Norway": "NOK",
    "Denmark": "DKK",
    "China": "CNY",
    "India": "INR",
    "Brazil": "BRL",
    "Mexico": "MXN",
    "Russia": "RUB",
    "Australia": "AUD",
    "South Korea": "KRW",
    "Saudi Arabia": "SAR",
    "Turkey": "TRY",
    "Argentina": "ARS",
    "South Africa": "ZAR",
    "Singapore": "SGD",
    "Hong Kong": "HKD",
}


def find_countries(text: str) -> List[str]:
    if not isinstance(text, str) or not text.strip():
        return []
    found = set()
    for rgx, cname in _COUNTRY_REGEX:
        if rgx.search(text):
            found.add(cname)
    return sorted(found)


def countries_to_currencies(countries: List[str]) -> List[str]:
    out: List[str] = []
    seen = set()
    for c in countries:
        code = _COUNTRY_TO_CCY.get(c)
        if code and code not in seen:
            out.append(code)
            seen.add(code)
    return out


# -----------------------------------------------------------------------------
# Sentiment: VADER if available, fallback if not
# -----------------------------------------------------------------------------
# --- 1) Replace your _sentiment_to_rgb with this (purple ↔ blue, midpoint = mixed, NOT white) ---

def _hex_to_rgb01(h: str) -> Tuple[float, float, float]:
    h = h.lstrip("#")
    return (int(h[0:2], 16) / 255.0, int(h[2:4], 16) / 255.0, int(h[4:6], 16) / 255.0)

def _lerp(a: float, b: float, t: float) -> float:
    return a + (b - a) * t

def _mix(c1, c2, t: float):
    return (_lerp(c1[0], c2[0], t), _lerp(c1[1], c2[1], t), _lerp(c1[2], c2[2], t))

# --- REPLACE the _sentiment_to_rgb in the "Correlation plots (DISPLAY ONLY)" section with this ---

def _sentiment_to_rgb(s: float) -> Tuple[float, float, float]:
    neg = _hex_to_rgb01(color_red)
    pos = _hex_to_rgb01(color_blue)

    # blended midpoint, then darken slightly so it doesn't look white on a white background
    mid = _mix(neg, pos, 0.5)
    mid = _mix(mid, (0.0, 0.0, 0.0), 0.18)  # darken 18%

    s = float(np.clip(s, -1.0, 1.0))

    if s < -0.05:
        t = (s - (-1.0)) / ((-0.05) - (-1.0))
        return _mix(neg, mid, t)

    if s > 0.05:
        t = (s - 0.05) / (1.0 - 0.05)
        return _mix(mid, pos, t)

    return mid

def categorize_sentiment(compound: float) -> str:
    if compound >= 0.05:
        return "positive"
    if compound <= -0.05:
        return "negative"
    return "neutral"


def _try_make_vader():
    try:
        import nltk  # type: ignore
        from nltk.sentiment import SentimentIntensityAnalyzer  # type: ignore
        try:
            nltk.data.find("vader_lexicon")
        except LookupError:
            nltk.download("vader_lexicon")
        return SentimentIntensityAnalyzer()
    except Exception:
        return None


_VADER = _try_make_vader()

_POS_WORDS = {
    "great","good","amazing","excellent","winning","win","success","strong","best","love","like",
    "fantastic","incredible","positive","progress","boom","booming","proud","tremendous","beautiful"
}
_NEG_WORDS = {
    "bad","terrible","awful","worst","weak","failure","fail","loser","sad","disaster","fake","fraud",
    "corrupt","crime","criminal","hate","angry","negative","problem","crisis","scam","crooked"
}
_TOKEN_RE = re.compile(r"[a-zA-Z']+")


def _simple_compound(text: str) -> float:
    toks = _TOKEN_RE.findall((text or "").lower())
    if not toks:
        return 0.0
    pos = sum(t in _POS_WORDS for t in toks)
    neg = sum(t in _NEG_WORDS for t in toks)
    raw = (pos - neg) / max(1, pos + neg)
    return float(np.clip(raw, -1.0, 1.0))


def add_sentiment(df: pd.DataFrame) -> pd.DataFrame:
    section("Sentiment")
    texts = df["text"].astype(str)

    if _VADER is not None:
        print("✓ Using VADER (NLTK)")
        scores = texts.apply(lambda t: _VADER.polarity_scores(t))
        df["sentiment_compound"] = scores.apply(lambda s: float(s["compound"]))
        df["sentiment_positive"] = scores.apply(lambda s: float(s["pos"]))
        df["sentiment_negative"] = scores.apply(lambda s: float(s["neg"]))
        df["sentiment_neutral"] = scores.apply(lambda s: float(s["neu"]))
    else:
        # silent fallback (no console noise)
        pass
        df["sentiment_compound"] = texts.apply(_simple_compound)
        df["sentiment_positive"] = df["sentiment_compound"].clip(lower=0.0)
        df["sentiment_negative"] = (-df["sentiment_compound"]).clip(lower=0.0)
        df["sentiment_neutral"] = 1.0 - (df["sentiment_positive"] + df["sentiment_negative"]).clip(0.0, 1.0)

    df["sentiment_label"] = df["sentiment_compound"].apply(categorize_sentiment)

    print(df["sentiment_label"].value_counts())
    print(f"Average compound: {df['sentiment_compound'].mean():.3f}")
    print(f"Range: {df['sentiment_compound'].min():.3f} to {df['sentiment_compound'].max():.3f}")
    return df


def build_processed_tweets_and_truths() -> pd.DataFrame:
    if not tweets_and_truths_IN.exists():
        raise FileNotFoundError(
            f"Missing input tweets_and_truths file: {tweets_and_truths_IN.as_posix()}\n"
            "Create data/tweets_and_truths.csv with columns: date, text (optional: id, image_url)."
        )

    section("Load tweets_and_truths")
    df = pd.read_csv(tweets_and_truths_IN, on_bad_lines="skip")
    if "date" not in df.columns or "text" not in df.columns:
        raise ValueError("tweets_and_truths.csv must contain columns: date, text")

    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date", "text"]).copy()

    if "id" not in df.columns:
        df = df.reset_index().rename(columns={"index": "tweet_id"})

    section("Country + currency extraction")
    df["countries_found"] = df["text"].astype(str).apply(find_countries)
    df["countries_mentioned"] = df["countries_found"].apply(lambda x: ", ".join(x) if x else "")
    df["currencies_found"] = df["countries_found"].apply(countries_to_currencies)
    df["currency_codes"] = df["currencies_found"].apply(lambda x: ", ".join(x) if x else "")

    df = add_sentiment(df)
    return df


# -----------------------------------------------------------------------------
# Exchange rates: fetch if yfinance available, else offline
# -----------------------------------------------------------------------------
def _try_import_yfinance():
    try:
        import yfinance as yf  # type: ignore
        return yf
    except Exception:
        return None


_YF = _try_import_yfinance()


def yahoo_symbol(currency_code: str) -> Optional[str]:
    c = currency_code.upper()
    if c == "USD":
        return None
    return f"{c}USD=X"


def read_last_date(filepath: Path) -> Optional[pd.Timestamp]:
    try:
        d = pd.read_csv(filepath)
        if d.empty:
            return None
        d["Date"] = pd.to_datetime(d["Date"], errors="coerce")
        d = d.dropna(subset=["Date"]).sort_values("Date")
        return None if d.empty else d["Date"].iloc[-1]
    except Exception:
        return None


def fetch_rates(currency_code: str, start: datetime, end: datetime) -> Optional[pd.DataFrame]:
    if _YF is None:
        return None
    sym = yahoo_symbol(currency_code)
    if sym is None:
        return None
    hist = _YF.download(sym, start=start, end=end, progress=False)
    if hist is None or hist.empty:
        return None
    out = pd.DataFrame({"Date": hist.index, "Rate": hist["Close"].values})
    out["Date"] = pd.to_datetime(out["Date"], errors="coerce")
    out = out.dropna(subset=["Date", "Rate"]).sort_values("Date")
    return None if out.empty else out.reset_index(drop=True)


def ensure_exchange_rate_file(currency_code: str, target_end=None) -> Optional[Path]:
    currency_code = currency_code.upper()
    fp = EXCHANGE_DIR / f"{currency_code}_USD_rates.csv"

    if target_end is None:
        target_end = datetime.now().date()

    last = read_last_date(fp)
    start = datetime(2016, 1, 1) if last is None else (pd.to_datetime(last) + pd.Timedelta(days=1)).to_pydatetime()
    end = datetime.combine(target_end, datetime.min.time()) + timedelta(days=1)

    if last is not None and pd.to_datetime(last).date() >= target_end:
        return fp

    new = fetch_rates(currency_code, start=start, end=end)
    if new is None or new.empty:
        return fp if fp.exists() else None

    if fp.exists():
        old = pd.read_csv(fp)
        old["Date"] = pd.to_datetime(old["Date"], errors="coerce")
        old = old.dropna(subset=["Date"]).sort_values("Date")
        merged = pd.concat([old[["Date", "Rate"]], new], ignore_index=True)
        merged = merged.drop_duplicates(subset=["Date"]).sort_values("Date").reset_index(drop=True)
    else:
        merged = new

    merged.to_csv(fp, index=False)  # cache only; no plots saved
    return fp


def ensure_exchange_rates_for_targets(targets: List[str]) -> None:
    section("Exchange rates")
    missing: List[str] = []

    for ccy in targets:
        fp = EXCHANGE_DIR / f"{ccy.upper()}_USD_rates.csv"
        if fp.exists():
            print(f"✓ {ccy}: {fp.as_posix()}")
            continue

        if _YF is not None:
            out = ensure_exchange_rate_file(ccy)
            if out is not None and out.exists():
                print(f"✓ {ccy}: fetched → {out.as_posix()}")
                continue

        print(f"✗ {ccy}: missing {fp.as_posix()}")
        missing.append(ccy.upper())

    if missing:
        print("\nOffline mode: add these files to see full plots:")
        for ccy in missing:
            print(f"  - data/exchange_rates/{ccy}_USD_rates.csv  (Date, Rate)")


# -----------------------------------------------------------------------------
# Correlation plots (DISPLAY ONLY)
# -----------------------------------------------------------------------------
def load_exchange_rate_data(currency_code: str) -> Optional[pd.DataFrame]:
    fp = EXCHANGE_DIR / f"{currency_code.upper()}_USD_rates.csv"
    if not fp.exists():
        return None

    d = pd.read_csv(fp)
    if "Date" not in d.columns or "Rate" not in d.columns:
        return None
    d["Date"] = pd.to_datetime(d["Date"], errors="coerce")
    d = d.dropna(subset=["Date"]).sort_values("Date").copy()
    if d.empty:
        return None
    d["Daily_Change_Pct"] = d["Rate"].pct_change() * 100
    return d


def _sentiment_to_rgb(s: float) -> Tuple[float, float, float]:
    neg = _hex_to_rgb01(color_red)   # negative end
    pos = _hex_to_rgb01(color_blue)  # positive end
    mid = _mix(neg, pos, 0.5)        # blended midpoint (NOT white)

    s = float(np.clip(s, -1.0, 1.0))

    if s < -0.05:
        t = (s - (-1.0)) / ((-0.05) - (-1.0))   # neg -> mid
        return _mix(neg, mid, t)

    if s > 0.05:
        t = (s - 0.05) / (1.0 - 0.05)          # mid -> pos
        return _mix(mid, pos, t)

    return mid


def create_currency_plot(df: pd.DataFrame, currency_code: str, ax) -> None:
    ex = load_exchange_rate_data(currency_code)
    if ex is None:
        ax.text(0.5, 0.5, f"No data available for {currency_code}", ha="center", va="center", transform=ax.transAxes)
        return

    # currency tweets_and_truths
    if "currencies_found" in df.columns:
        tw = df[df["currencies_found"].apply(lambda x: isinstance(x, (list, tuple)) and (currency_code in x))].copy()
    else:
        tw = df[df.get("currency_codes", "").astype(str).str.contains(currency_code, na=False)].copy()

    if tw.empty:
        ax.text(0.5, 0.5, f"No tweets_and_truths found mentioning {currency_code}", ha="center", va="center", transform=ax.transAxes)
        return

    valid_dates = ex["Date"]
    changes = ex["Daily_Change_Pct"].fillna(0)
    smoothed = gaussian_filter1d(changes, sigma=2)

    ax.plot(
        valid_dates,
        smoothed,
        color=color_indigo,
        linewidth=2.0,
        alpha=0.85,
        label=f"{currency_code}-USD DAILY CHANGE % (SMOOTHED)",
    )

    # interpolation
    curve_dates = pd.Series(valid_dates).reset_index(drop=True)
    curve_x = np.array([(d - curve_dates.iloc[0]).days for d in curve_dates], dtype=float)
    curve_y = np.array(smoothed, dtype=float)
    curve_interp = interp1d(curve_x, curve_y, kind="linear", fill_value="extrapolate")

    change_lookup = {row["Date"].strftime("%Y-%m-%d"): row["Date"] for _, row in ex.iterrows()}

    tw["date"] = pd.to_datetime(tw["date"], errors="coerce")
    tw = tw.dropna(subset=["date"]).copy()

    xdates, yvals, colors, meta = [], [], [], []
    for idx, row in tw.iterrows():
        ds = row["date"].strftime("%Y-%m-%d")
        if ds not in change_lookup:
            continue
        days_from_start = (row["date"] - curve_dates.iloc[0]).days
        y_on_curve = float(curve_interp(days_from_start))

        s = float(row.get("sentiment_compound", 0.0))
        xdates.append(change_lookup[ds])
        yvals.append(y_on_curve)
        colors.append(_sentiment_to_rgb(s))
        meta.append((row.get("id", row.get("tweet_id", idx)), s, change_lookup[ds], y_on_curve))

    if not xdates:
        ax.text(0.5, 0.5, f"No aligned dates for {currency_code}", ha="center", va="center", transform=ax.transAxes)
        return

    ax.scatter(
        xdates,
        yvals,
        s=60,
        alpha=0.95,
        facecolors=colors,
        edgecolors=colors,
        linewidth=1.2,
        zorder=5,
        label=f"{currency_code} MENTIONED ({len(xdates)})",
    )

    # extremes
    pos = max(meta, key=lambda t: t[1])
    neg = min(meta, key=lambda t: t[1])
    extreme_info = ""

    if pos[1] > 0.05:
        ax.scatter([pos[2]], [pos[3]], s=90, facecolors="none", edgecolors=color_indigo, linewidth=4, zorder=10)
        extreme_info += f"POS(SCORE:{pos[1]:.2f}): {pos[0]}  "
    if neg[1] < -0.05:
        ax.scatter([neg[2]], [neg[3]], s=90, facecolors="none", edgecolors=color_red, linewidth=4, zorder=10)
        extreme_info += f"NEG(SCORE:{neg[1]:.2f}): {neg[0]}"

    ax.axhline(0, color=color_purple, linestyle="--", linewidth=1.5, alpha=0.8)
    ax.set_xlabel("DATE")
    ax.set_ylabel("DAILY CHANGE %")
    ax.set_title(f"{currency_code} EXCHANGE RATE VS TWEET/TRUTH SENTIMENT")
    ax.legend(loc="upper right", fontsize=9)


def display_correlation_graphs(df: pd.DataFrame) -> None:
    currencies = ["CAD", "INR", "JPY", "GBP"]

    # individual plots
    for code in currencies:
        fig, ax = plt.subplots(1, 1, figsize=(FIG_W, FIG_H))
        create_currency_plot(df, code, ax)
        fig.tight_layout()
        plt.show()

    # legend plot
    fig, ax = plt.subplots(1, 1, figsize=(10, 2))
    ax.axis("off")
    ax_legend = fig.add_axes([0.05, 0.15, 0.9, 0.7])
    ax_legend.set_xlim(0, 1)
    ax_legend.set_ylim(0, 1)

    # --- 3) Replace ONLY the legend gradient section (purple ↔ blue, midpoint = mixed) ---
    # Replace the "for i in range(100): ..." loop + the three text() lines with:
    
    neg = _hex_to_rgb01(color_red)
    pos = _hex_to_rgb01(color_blue)
    mid = _mix(neg, pos, 0.5)
    
    for i in range(100):
        x = i / 100
        if x < 0.5:
            t = x / 0.5
            col = _mix(neg, mid, t)
        else:
            t = (x - 0.5) / 0.5
            col = _mix(mid, pos, t)
        ax_legend.axvspan(x, x + 0.01, color=col, alpha=0.95)
    
    ax_legend.text(0.1, 0.5, "NEGATIVE", ha="center", va="center",
              fontweight="bold", fontsize=14, color="#ffffff")
    ax_legend.text(0.5, 0.5, "NEUTRAL",  ha="center", va="center",
                  fontweight="bold", fontsize=14, color="#ffffff")
    ax_legend.text(0.9, 0.5, "POSITIVE", ha="center", va="center",
                  fontweight="bold", fontsize=14, color="#ffffff")
                  
    ax_legend.set_yticks([])

    # percent ticks: 0, 25, 50, 75, 100
    ticks = [0.0, 0.25, 0.5, 0.75, 1.0]
    ax_legend.set_xticks(ticks)
    ax_legend.set_xticklabels(["-100%", "-50%", "0%", "50%", "100%"])

    # match sizing to your other plots
    ax_legend.tick_params(axis="x", labelsize=11)
    ax_legend.tick_params(axis="y", labelsize=11)

    ax_legend.grid(True, axis="x", alpha=0.3)

    fig.tight_layout()
    plt.show()


# -----------------------------------------------------------------------------
# Runner (display-only)
# -----------------------------------------------------------------------------
def run_pipeline_display_only() -> pd.DataFrame:
    ensure_dirs()

    df = build_processed_tweets_and_truths()

    targets = ["CAD", "INR", "JPY", "GBP"]
    ensure_exchange_rates_for_targets(targets)

    display_correlation_graphs(df)

    section("Done")
    return df


if __name__ == "__main__":
    run_pipeline_display_only()

Load tweets_and_truths
----------------------

Country + currency extraction
-----------------------------

Sentiment
---------
sentiment_label
neutral     4769
positive    1703
negative     424
Name: count, dtype: int64
Average compound: 0.174
Range: -1.000 to 1.000

Exchange rates
--------------
✓ CAD: data/exchange_rates/CAD_USD_rates.csv
✓ INR: data/exchange_rates/INR_USD_rates.csv
✓ JPY: data/exchange_rates/JPY_USD_rates.csv
✓ GBP: data/exchange_rates/GBP_USD_rates.csv

Done
----
      page_number  browse_flag  ... sentiment_neutral  sentiment_label
0               1         True  ...             0.000         positive
1               1         True  ...             0.000         negative
2               1         True  ...             1.000          neutral
3               1         True  ...             1.000          neutral
4               1         True  ...             0.000         negative
...           ...          ...  ...               ...              ...
6891          139         True  ...             1.000          neutral
6892          139         True  ...             1.000          neutral
6893          139         True  ...             1.000          neutral
6894          139         True  ...             1.000          neutral
6895          139         True  ...             1.000          neutral

[6896 rows x 34 columns]