2024-10-10 |
Inference Scaling for Long-Context Retrieval Augmented Generation |
Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky |
|
2024-10-10 |
$ε$-VAE: Denoising as Visual Decoding |
Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu |
|
2024-10-10 |
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search |
Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu |
|
2024-10-10 |
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention |
Lijie Yang, Zhihao Zhang, Zhuofu Chen, Zikun Li, Zhihao Jia |
|
2024-10-09 |
EBES: Easy Benchmarking for Event Sequences |
Dmitry Osin, Igor Udovichenko, Viktor Moskvoretskii, Egor Shvetsov, Evgeny Burnaev |
|
2024-10-09 |
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models |
Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang |
|
2024-10-09 |
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation |
Liang Chen, Sinan Tan, Zefan Cai, Weichu Xie, Haozhe Zhao, Yichi Zhang, Junyang Lin, Jinze Bai, Tianyu Liu, Baobao Chang |
|
2024-10-09 |
ControlAR: Controllable Image Generation with Autoregressive Models |
Zongming Li, Tianheng Cheng, Shoufa Chen, Peize Sun, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang |
|
2024-10-09 |
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks |
Yijiong Yu |
|
2024-10-09 |
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions |
Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, Hua Wu |
|
2024-10-09 |
LongGenBench: Long-context Generation Benchmark |
Xiang Liu, Peijie Dong, Xuming Hu, Xiaowen Chu |
|
2024-10-09 |
$textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization |
Dylan Zhang, Justin Wang, Francois Charton |
|
2024-10-09 |
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References |
Qiyuan Zhang, Yufei Wang, Tiezheng YU, Yuxin Jiang, Chuhan Wu, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma |
|
2024-10-09 |
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach |
Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel |
|
2024-10-09 |
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation |
Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He |
|
2024-10-08 |
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery |
Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, Huan Sun |
|
2024-10-08 |
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment |
Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo |
|
2024-10-08 |
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents |
Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su |
|
2024-10-08 |
What Matters for Model Merging at Scale? |
Prateek Yadav, Tu Vu, Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, Tsendsuren Munkhdalai |
|
2024-10-08 |
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification |
Benjamin Feuer, Jiawei Xu, Niv Cohen, Patrick Yubeaton, Govind Mittal, Chinmay Hegde |
|
2024-10-08 |
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models |
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar |
|
2024-10-08 |
Named Clinical Entity Recognition Benchmark |
Wadood M Abdul, Marco AF Pimentel, Muhammad Umar Salman, Tathagata Raha, Clément Christophe, Praveen K Kanithi, Nasir Hayat, Ronnie Rajan, Shadab Khan |
|
2024-10-08 |
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning |
Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou |
|
2024-10-08 |
Autonomous Character-Scene Interaction Synthesis from Text Instruction |
Nan Jiang, Zimo He, Zi Wang, Hongjie Li, Yixin Chen, Siyuan Huang, Yixin Zhu |
|
2024-10-08 |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations |
Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Reichart, Idan Szpektor, Hadas Kotek, Yonatan Belinkov |
|
2024-10-08 |
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs |
Lei Wang, Shan Dong, Yuhui Xu, Hanze Dong, Yalu Wang, Amrita Saha, Ee-Peng Lim, Caiming Xiong, Doyen Sahoo |
|
2024-10-08 |
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction |
Leheng Li, Weichao Qiu, Xu Yan, Jing He, Kaiqiang Zhou, Yingjie Cai, Qing Lian, Bingbing Liu, Ying-Cong Chen |
|
2024-10-08 |
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion |
Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, Ming-Hsuan Yang |
|
2024-10-08 |
UniMuMo: Unified Text, Music and Motion Generation |
Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan |
|
2024-10-08 |
差分变压器 |
weirdcat |
|
2024-10-08 |
TLDR: Token-Level Detective Reward Model for Large Vision Language Models |
Deqing Fu, Tong Xiao, Rui Wang, Wang Zhu, Pengchuan Zhang, Guan Pang, Robin Jia, Lawrence Chen |
|
2024-10-08 |
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide |
Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye |
|
2024-10-08 |
Presto! Distilling Steps and Layers for Accelerating Music Generation |
Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan |
|
2024-10-08 |
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles |
Qingchen Yu, Shichao Song, Ke Fang, Yunfeng Shi, Zifan Zheng, Hanyu Wang, Simin Niu, Zhiyu Li |
|
2024-10-08 |
Grounding Language in Multi-Perspective Referential Communication |
Zineng Tang, Lingjun Mao, Alane Suhr |
|
2024-10-08 |
FAN: Fourier Analysis Networks |
Yihong Dong, Ge Li, Yongding Tao, Xue Jiang, Kechi Zhang, Jia Li, Jing Su, Jun Zhang, Jingjing Xu |
|
2024-10-08 |
MLP-KAN: Unifying Deep Representation and Function Learning |
Yunhong He, Yifeng Xie, Zhengqing Yuan, Lichao Sun |
|
2024-10-08 |
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark |
Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning |
|
2024-10-08 |
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs |
Dung Nguyen Manh, Thang Phan Chau, Nam Le Hai, Thong T. Doan, Nam V. Nguyen, Quang Pham, Nghi D. Q. Bui |
|
2024-10-07 |
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning |
Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang |
|
2024-10-07 |
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding |
Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu |
|
2024-10-07 |
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs |
cs.RO ‧ Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, Lirui Wang |
|
2024-10-07 |
NRGBoost: Energy-Based Generative Boosted Trees |
João Bravo |
|
2024-10-07 |
NL-Eye: Abductive NLI for Images |
Mor Ventura, Michael Toker, Nitay Calderon, Zorik Gekhman, Yonatan Bitton, Roi Reichart |
|
2024-10-07 |
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction |
Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu |
|
2024-10-07 |
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond |
Shubhi Bansal, Sreeharish A, Madhava Prasath J, Manikandan S, Sreekanth Madisetty, Mohammad Zia Ur Rehman, Chandravardhan Singh Raghaw, Gaurav Duggal, Nagendra Kumar |
|
2024-10-07 |
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models |
Jangyeong Kim, Donggoo Kang, Junyoung Choi, Jeonga Wi, Junho Gwon, Jiun Bae, Dumim Yoon, Junghyun Han |
|
2024-10-07 |
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction |
Zhaojian Yu, Yinghao Wu, Genesis Wang, Heming Weng |
|
2024-10-07 |
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise |
Rose E. Wang, Ana T. Ribeiro, Carly D. Robinson, Susanna Loeb, Dora Demszky |
|
2024-10-07 |
Erasing Conceptual Knowledge from Language Models |
Rohit Gandikota, Sheridan Feucht, Samuel Marks, David Bau |
|
2024-10-07 |
Selective Attention Improves Transformer |
Yaniv Leviathan, Matan Kalman, Yossi Matias |
|
2024-10-05 |
Contextual Document Embeddings |
John X. Morris, Alexander M. Rush |
|
2024-10-05 |
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models |
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu |
|
2024-10-05 |
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations |
Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman |
|
2024-10-05 |
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning |
Xiao Yu, Baolin Peng, Vineeth Vajipey, Hao Cheng, Michel Galley, Jianfeng Gao, Zhou Yu |
|
2024-10-05 |
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models |
Shayekh Bin Islam, Md Asib Rahman, K S M Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md Rizwan Parvez |
|
2024-10-05 |
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics |
Zhiwen You, Kanyao Han, Haotian Zhu, Bertram Ludäscher, Jana Diesner |
|
2024-10-05 |
Intelligence at the Edge of Chaos |
jasondavies |
|
2024-10-05 |
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning |
Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan |
|
2024-10-04 |
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment |
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux |
|
2024-10-04 |
Learning the Latent Rules of a Game from Data: A Chess Story |
Ben Fauber |
|
2024-10-04 |
LLMs as Markov Chains |
akrymski |
|
2024-10-04 |
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling |
Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng |
|
2024-10-04 |
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos |
Jianrui Zhang, Mu Cai, Yong Jae Lee |
|
2024-10-04 |
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models |
Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber |
|
2024-10-04 |
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis |
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus |
|
2024-10-04 |
Distilling an End-to-End Voice Assistant Without Instruction Training Data |
William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang |
|
2024-10-04 |
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? |
Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang |
|
2024-10-04 |
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation |
Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel |
|
2024-10-04 |
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second |
Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun |
|
2024-10-04 |
Contrastive Localized Language-Image Pre-Training |
Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan |
|
2024-10-04 |
Loong: Generating Minute-level Long Videos with Autoregressive Language Models |
Yuqing Wang, Tianwei Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao, Bingyi Kang, Jiashi Feng, Xihui Liu |
|
2024-10-04 |
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models |
Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang |
|
2024-10-04 |
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration |
Jintao Zhang, Jia wei, Pengle Zhang, Jun Zhu, Jianfei Chen |
|
2024-10-04 |
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data |
Sreyan Ghosh, Sonal Kumar, Zhifeng Kong, Rafael Valle, Bryan Catanzaro, Dinesh Manocha |
|
2024-10-04 |
MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis |
Xiaobiao Du, Yida Wang, Xin Yu |
|
2024-10-04 |
LLaVA-Critic: Learning to Evaluate Multimodal Models |
Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li |
|
2024-10-04 |
Video Instruction Tuning With Synthetic Data |
Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li |
|
2024-10-04 |
General Preference Modeling with Preference Representations for Aligning Language Models |
Yifan Zhang, Ge Zhang, Yue Wu, Kangping Xu, Quanquan Gu |
|
2024-10-04 |
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data |
Xuefeng Du, Reshmi Ghosh, Robert Sim, Ahmed Salem, Vitor Carvalho, Emily Lawton, Yixuan Li, Jack W. Stokes |
|
2024-10-04 |
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation |
Mike Ranzinger, Jon Barker, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, Andrew Tao |
|
2024-10-04 |
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control |
Haozhe Chen, Run Chen, Julia Hirschberg |
|
2024-10-03 |
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation |
Bryan Li, Samar Haider, Fiona Luo, Adwait Agashe, Chris Callison-Burch |
|
2024-10-03 |
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios |
Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu |
|
2024-10-03 |
Old Optimizer, New Norm: An Anthology |
Jeremy Bernstein, Laker Newhouse |
|
2024-10-03 |
FactAlign: Long-form Factuality Alignment of Large Language Models |
Chao-Wei Huang, Yun-Nung Chen |
|
2024-10-03 |
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs |
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang |
|
2024-10-03 |
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages |
Marco Gaido, Sara Papi, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo, Roberto Gretter, Marco Matassoni, Mohamed Nabih, Matteo Negri |
|
2024-10-03 |
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration |
Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang |
|
2024-10-03 |
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding |
Ye Liu, Zongyang Ma, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen |
|
2024-10-03 |
Selective Aggregation for Low-Rank Adaptation in Federated Learning |
Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, Liangqiong Qu |
|
2024-10-03 |
Not All LLM Reasoners Are Created Equal |
Arian Hosseini, Alessandro Sordoni, Daniel Toyama, Aaron Courville, Rishabh Agarwal |
|
2024-10-03 |
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis |
Hippolyte Gisserot-Boukhlef, Ricardo Rei, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo, Nuno M. Guerreiro |
|
2024-10-03 |
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection |
Yang Cao, Yuanliang Jv, Dan Xu |
|
2024-10-03 |
Quantifying Generalization Complexity for Large Language Models |
Zhenting Qi, Hongyin Luo, Xuliang Huang, Zhuokai Zhao, Yibo Jiang, Xiangjun Fan, Himabindu Lakkaraju, James Glass |
|
2024-10-03 |
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks |
Mengzhao Jia, Wenhao Yu, Kaixin Ma, Tianqing Fang, Zhihan Zhang, Siru Ouyang, Hongming Zhang, Meng Jiang, Dong Yu |
|
2024-10-03 |
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling |
cs.RO ‧ Jinghan Li, Zhicheng Sun, Fei Li, Cao Sheng, Jiazhong Yu, Yadong Mu |
|
2024-10-03 |
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging |
Yuling Shi, Songsong Wang, Chengcheng Wan, Xiaodong Gu |
|
2024-10-03 |
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis |
Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jon Barron, Yinda Zhang |
|
2024-10-03 |
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation |
Rinon Gal, Adi Haviv, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Gal Chechik |
|