Struggling with long-term accuracy? Learn how the iTransformer architecture treats variables as tokens to improve scaling and multivariate correlations.

The iTransformer completely flipped the script by changing the modeling axis: instead of making a token out of a timestamp, it treats the entire history of an individual variable as its own token. This architectural inversion finally allows Transformers to handle massive lookback windows and complex multivariate correlations without performance degrading.
A traditional Transformer treats each individual timestamp as a token, which often mixes different variables together into a single snapshot and can muddle the data. In contrast, the iTransformer "flips" this architecture by treating the entire historical sequence of a single variable as a token. This "variate-token attention" allows the model to use the Attention mechanism to find correlations between different variables, while the Feed-Forward Network (FFN) is tasked with learning the temporal patterns and dynamics within each variable's history.
Patching involves breaking a long line of data points into smaller groups or "chunks" rather than analyzing individual points. This technique serves two primary purposes: it significantly reduces the number of tokens the model must process, which increases computational speed, and it helps the model focus on "local" patterns. By grouping points together, the model can better understand the context of a specific window of time while ignoring the noise often found in individual data points.
The Mixture of Experts (MoE) architecture allows a massive foundation model to handle highly diverse types of data by dynamically routing different temporal patterns to specialized parts of its "brain." For example, if the model identifies a financial sequence, it can route that data to neurons specialized in volatility; if it sees climate data, it routes it to experts in long-term trends. This specialization prevents the model from being overwhelmed by the heterogeneity of different time series and allows it to scale to billions of parameters effectively.
ICEEMDAN is used to break down a messy, volatile signal—like power grid load—into several "Intrinsic Mode Functions" (IMFs). This process is similar to breaking a complex musical chord into individual notes, separating high-frequency jitters, medium-frequency seasonal cycles, and long-term trends. By decomposing the data first, practitioners can use specialized layers like LSTMs to capture local fluctuations in each frequency band before passing the refined information to a Transformer to analyze the global picture.
RevIN is a stabilization technique used to handle "non-stationary" data, where trends or mean values shift over time, such as during a market crash. It functions as a "normalization sandwich" where the data is normalized before entering the model to help it learn patterns more effectively, and then "denormalized" at the output stage to return the predictions to their original real-world values. This prevents high-magnitude variables from drowning out subtle patterns and ensures the model remains robust during sudden data shifts.
From Columbia University alumni built in San Francisco
"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."
"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."
"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."
"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."
"Reading used to feel like a chore. Now it’s just part of my lifestyle."
"Feels effortless compared to reading. I’ve finished 6 books this month already."
"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."
"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."
"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"
"It is great for me to learn something from the book without reading it."
"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."
"Makes me feel smarter every time before going to work"
From Columbia University alumni built in San Francisco
