

Hargen (Youze) Zheng
Undergraduate
Data Science and Math-CS @UC San Diego
yoz018@ucsd.edu
I'm a third-year undergraduate student at UC San Diego, pursuing a double major in B.S. Data Science and B.S. Mathematics-Computer Science. I am fortunate to conduct research at the Laboratory for Emerging Intelligence, where I am advised by Dr. Leon Bergen and Dr. Ramamohan Paturi.
I'm broadly interested in Natural Language Processing (NLP), Large Language Models (LLMs), and the broader fields of Machine Learning and Artificial Intelligence. My previous research is listed below.
-
Mamba Retriever
◇ We explored the use of GPT-4o mini for synthetic data generation and the training of State Space Models (Mamba-2 130M & 1.3B) for long-context question-answering tasks. The trained models outperform leading embedding models on the MTEB Leaderboard, including NV-Embed, across 41 long-document Q&A tasks. The performance of the 1.3B variant is comparable to GPT-4o on documents exceeding 256K tokens while using fewer tokens and being more computationally efficient.
-
The RoBBR Benchmark [1]
◇ We created a benchmark dataset to evaluate LLMs' ability to aggregate information from long-form biomedical reports. The benchmark results reveal that current LLMs perform significantly short of expert-level performance.