胡侠: Efficient LLM Serving via Lossy Computation

发布时间：2026-01-08

点击：

来源：人工智能创新学院

报告时间：2026年01月09日（星期五）8:30-9:30

报告地点：翡翠湖校区科教楼A座1104会议室

报告人：胡侠教授

工作单位：上海人工智能实验室

举办单位：人工智能创新学院

报告简介：

Large language models (LLMs) exhibit human-like conversational abilities, but scaling them for long contexts (e.g., healthcare-related lengthy article information extraction) faces key challenges: inability to exceed pre-training context lengths and deployment difficulties due to inference memory requirements growing with context. A critical insight is LLMs’ strong robustness to noise from lossy computations (e.g., low-precision computing). This paper discusses advances in large-scale LLM deployment for long contexts. For algorithmic challenges, we propose extending LLM context length by at least 8× via coarsening positional information of distant tokens. For system hurdles, we quantize intermediate states of past tokens to 2-bit, achieving 8× memory efficiency and 3.5× wall-clock speedup without sacrificing performance. Finally, we highlight latest healthcare applications of LLMs, particularly using long-context retrieval techniques to mitigate hallucinations in healthcare chatbots.

报告人简介：

胡侠教授现任上海人工智能实验室主任助理、领军科学家。曾任美国莱斯大学正教授、数据科学中心主任，作为联合创始人兼首席科学家参与创立AIPOW公司。其长期致力于机器学习和人工智能领域研究，在ICLR、NeurIPS、KDD、WWW、SIGIR等国际顶级会议及期刊上发表论文200余篇，被引超4万次。他主导开发的自动机器学习开源系统AutoKeras成为最常的AutoML框架之一；其提出的NCF算法及系统被纳入主流人工智能框架TensorFlow的官方推荐；此外，他开发的异常检测系统已在NVidia、通用电气、Trane、苹果等企业的产品中得到广泛应用。胡侠教授曾获ICML、WWW、WSDM、INFORMS等会议最佳论文奖或提名、美国国家科学基金委杰出青年奖、KDD Rising Star Award和IEEE Atluri学者奖等荣誉。他现任ACM TIST和Big Data期刊副主编、DMKD编委，曾担任WSDM 2020大会主席及医学信息学会议大会主席。

上一篇：洪德成: 分层介质电磁场解析方法进展及其地球物理勘探应用

下一篇：黄兴怀: 新业态下电力行业岩土工程技术发展方向探讨

本月热点