Recent advancements in bird’s-eye view (BEV) perception models have highlighted the superior performance of LiDAR-camera fusion systems over single-modality approaches garnering considerable interest in the field. Despite the progress the integration of temporal information a technique that has considerably benefitted camera-only BEV models remains underexplored for LiDAR-camera fusion. This paper presents Q-TempFusion a novel approach for temporal multi-sensor fusion designed to enhance the BEV model’s inference speed while keeping high predictive performance compared with the current state-of-the-art. Moreover we are the first to make the multi-modality BEV model profiling on hardware devices. To address the challenges of substantial memory demands and non-trivial latency that hinder deployment in on-vehicle systems particularly when temporal dynamics are incorporated into complex multi-sensor models we introduce an activation-aware quantization framework to generate the fully 8-bit quantized Q-TempFusion model based on the profiling result which can be directly deployed to target devices with negligible detection performance degradation. Our experiments show that our Q-TempFusion (8-bit) achieves 70.3% mAP and 72.7% NDS with 3x - 18x FPS improvement over leading multi-modality baselines and the Q-TempFusion (32-bit) achieves 72.1% mAP and 74.8% NDS comparable to SOTA multi-modality approaches. The results suggest that Q-TempFusion is a promising step toward real-time multi-sensor BEV applications setting a new benchmark for efficient and reliable perception.