1

Pruning Foundation Models for High Accuracy without Retraining

Despite the superior performance, it is challenging to deploy large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the inference, the traditional …

Rethinking Token Reduction for State Space Models

Recent advancements in State Space Models (SSMs) have attracted significant interest, particularly in models optimized for parallel training and handling long-range dependencies. Architectures like Mamba have scaled to billions of parameters with …

DiffClass: Diffusion-Based Class Incremental Learning

Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. On top of that, exemplar-free CIL is even more challenging due to forbidden access to data of previous tasks. Recent exemplar-free CIL methods attempt to mitigate …

InstructGIE: Towards Generalizable Image Editing

Recent advances in image editing have been driven by the development of denoising diffusion models, marking a significant leap forward in this field. Despite these advances, the generalization capabilities of recent image editing approaches remain …

FasterVD: On Acceleration of Video Diffusion Models

Equipped with Denoising Diffusion Probabilistic Models, video content generation has gained significant research interest recently. However, diffusion pipelines call for intensive computation and model storage, which poses challenges for their wide …

Lotus: learning-based online thermal and latency variation management for two-stage detectors on edge devices

Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods …

Pruning Parameterization with Bi-level Optimization for Efficient Semantic Segmentation on the Edge

With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications. Vision Transformers (ViTs) have shown considerably stronger results for many …

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge

With the popularity of battery-powered edge computing, an important yet under-explored problem is the supporting of DNNs for diverse edge devices. On the one hand, different edge platforms have various runtime requirements and computation/memory …

Towards Real-Time Segmentation on the Edge

The research in real-time segmentation mainly focuses on desktop GPUs. However, autonomous driving and many other applications rely on real-time segmentation on the edge, and current arts are far from the goal. In addition, recent advances in vision …

Advancing Model Pruning via Bi-level Optimization

The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of …