Jiahui Yu

Research Lead, OpenAI

G Scholar  /  GitHub
LinkedIn  /  Twitter


I lead the Perception team at OpenAI. Previously, I co-led the Gemini Multimodal at Google DeepMind. I work on deep learning and high performance computing.


Selected Projects

GPT-4o
OpenAI: Jiahui Yu, Visual perception leads.

Gemini: A Family of Highly Capable Multimodal Models
Gemini Team Google: Jiahui Yu, Co-Lead, Multimodal Vision.
ArXiv 2023 / Gemini

PaLM 2 Technical Report
PaLM2 Team Google: Jiahui Yu, Core Contributor, Architecture and Modeling Workstream; and Contributor, Fine-tuning Workstream
ArXiv 2023 / PaLM2

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu
TMLR 2022 / parti.research.google

CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu
TMLR 2022 / Google AI Blog

Vector-quantized Image Modeling with Improved VQGAN
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alex Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
ICLR 2022 / Google AI Blog

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao
ICLR 2022 / Google AI Blog

Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan, Tao Zhu, Zirui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu
ArXiv 2022

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara Sainath,
Yonghui Wu, Ruoming Pang.

ICLR 2021

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization
Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-yiin Chang, Tara Sainath, Yanzhang He,
Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang.

ICASSP 2021

Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications
Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya.
Proceedings of the IEEE 2020

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le.
ECCV 2020

ContextNet: Improving Convolutional Neural Networks for ASR with Global Context
Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu.
INTERSPEECH 2020

Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang.
INTERSPEECH 2020

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
Jiahui Yu and Thomas Huang.
NeurIPS Workshop 2019 / Code

Universally Slimmable Networks and Improved Training Techniques
Jiahui Yu and Thomas Huang.
ICCV 2019 / Code

Free-Form Image Inpainting with Gated Convolution (DeepFill v2)
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang.
ICCV 2019 (Oral Presentation) / Code

Slimmable Neural Networks
Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang.
ICLR 2019 / Code / OpenReview
Top-10 Rated Papers on OpenReview, ICLR 2019.

Wide Activation for Efficient and Accurate Image Super-Resolution
Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas Huang.
BMVC 2019 (challenge report) / Code
Won 1st in NTIRE Challenge on Single Image Super-Resolution, CVPR 2018.

Generative Image Inpainting with Contextual Attention (DeepFill v1)
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas Huang.
CVPR 2018 / Code

UnitBox: An Advanced Object Detection Network
Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, Thomas Huang.
ACMMM 2016
Adapted to official TensorFlow Object Detection API.


Service

  • Journal Reviewer: TPAMI, IJCV, TIP, TMM, TMI, TVCG, TVCJ, TCSVT, TSC, TGRS, IMAGE, JSTSP, JVCI, NEUCOM, PR, SPL, MULT, TNNLS, ENG, IROS, ACCESS, Hindawi.
  • Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, SIGGRAPH, ACM Multimedia, AAAI, IJCAI, ICME, PRCV, PG, ACCV, ICASSP.