[세미나 안내] 성균관대학교 김율화 교수 초청 세미나 (4/15 (수) 17:00), "Efficient KV Cache Management for Scalable LLM Serving"
- ice
- 조회수586
- 2026-04-14
안녕하세요.
2026년 4월 15일 (수), 성균관대학교 반도체시스템공학과 김율화 교수님을 모시고 세미나를 진행합니다.
지속가능 IT 기술 세미나는 IT 기술의 지속 가능성과 관련한 최신 연구 및 동향을 다루는 강의로, 다양한 전공 분야의 전문가를 초빙하여 깊이 있는 강연을 제공합니다.
관심 있는 학생 여러분의 많은 참여 바랍니다.
[세미나 상세 정보]
- ■일시: 2026년 4월 15일 (수) 17:00 ~ 17:50
- ■장소: 제1공학관 23동 23219호
- ■연사: 김율화 교수님 (반도체시스템공학과)
■주제: Efficient KV Cache Management for Scalable LLM Serving
■Abstract: Efficient large language model (LLM) serving is increasingly constrained by the growing memory and latency overhead of key-value (KV) cache management, particularly in long-context and reasoning-intensive scenarios. This talk first provides an overview of LLM architectures and the role of KV cache in inference. It then presents two complementary approaches for improving KV cache efficiency without retraining or architectural modification. First, we introduce Reasoning Path Compression (RPC), which exploits semantic sparsity in reasoning processes to periodically remove redundant KV entries during decoding, significantly reducing memory usage and improving throughput. Second, we present FastKV, which leverages layer-dependent attention dynamics to enable token-selective propagation, preserving full-context processing in early layers while aggressively pruning tokens in later layers. Together, these approaches demonstrate that substantial redundancy exists in both generated tokens and input contexts, and that effectively managing this redundancy is key to scalable and efficient LLM serving.
■Bio: Yulhwa Kim received the B.S. and PhD. degrees in Convergence IT Engineering from Pohang University of Science and Technology, South Korea, in 2016 and 2022, respectively. From 2022 to 2024, she held a postdoctoral position in Inter-University Semiconductor Research Center (ISRC) at Seoul National University, South Korea. She is currently an assistant professor in the Department of Semiconductor Systems Engineering at Sungkyunkwan University, South Korea. Her research primarily focuses on the hardware-software co-design of efficient AI systems through neural network compression and deep learning accelerator design.
- ■HOST: 전정훈 교수 (現 정보통신대학장 / 반도체시스템공학과)


