Research
Focus areas, open-source software, and selected publications.
Focus
The Loza Lab focuses on development of statistical and deep learning methods to leverage Real-World Data to improve clinical care with a focus on multimodal medical foundation models.
Software
Open-source repositories for mixed-type sequence modeling and medical data tokenization.
multivariategpt
A decoder-only transformer for mixed categorical and numeric data: categorical inputs behave like language-model tokens, while numeric values use a modified embedding and loss so continuous quantities stay continuous (no discretization). Joint modeling of token class and value extends next-token prediction to mixed sequences—relevant for EHR time series and other irregular multivariate streams.
medspipeline
Python tools to transform wide tabular clinical data into long-format MEDS (Medical Event Data Standard) tables, choose distributions for scaling, and tokenize for medical foundation models—including a path to train multivariateGPT on tokenized data.
Selected papers
For the full publication list and citation metrics, see Google Scholar.
Post-ED Trajectory Prediction in Abdominal Pain with a Generative Medical Event Model
Kent A. McCann, Donald S. Wright, Mark Iscoe, Edward R. Melnick, Lucila Ohno-Machado, Daniella Meeker, Arjun K. Venkatesh, Rohit B. Sangal, Andrew J. Loza
medRxiv, 2026
Conditional Attribute Estimation with Autoregressive Sequence Models
Erica Stutz, Giacomo Marino, Daniella Meeker, Qiao Liu, Andrew J. Loza
arXiv preprint, 2026
multivariateGPT: a decoder-only transformer for multivariate categorical and numeric data
Andrew J. Loza, Jun Yup Kim, Shangzheng Song, Yihang Liu, Joseph J. Y. Sung, R Andrew Taylor, Dennis L. Shung
arXiv preprint, 2025