Image processing Books
£80.74
Springer Computer Vision ECCV 2024
Book SynopsisGRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation.- IRGen: Generative Modeling for Image Retrieval.- Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality.- FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos.- A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting.- VISA: Reasoning Video Object Segmentation via Large Language Model.- Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models.- IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation.- Scaling Backwards: Minimal Synthetic Pre-training?.- BAMM: Bidirectional Autoregressive Motion Model.- Event-based Head Pose Estimation: Benchmark and Method.- Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos.- Towards Multi-modal Transformers in Federated Learning.- Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning.- QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images.- Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics.- DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution.- Do not move together: per-Gaussian Deformation for 4DGS.- DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion.- CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection.- Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning.- RPBG: Towards Robust Neural Point-based Graphics in the Wild.- GaussReg: Fast 3D Registration with Gaussian Splatting.- Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators.- Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation.- IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map.- TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisDiffusion Model is a Good Pose Estimator from 3D RF-Vision.- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues.- Learning 3D-aware GANs from Unposed Images with Template Feature Field.- TAPTR: Tracking Any Point with Transformers as Detection.- Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning.- Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance.- BRAVE: Broadening the visual encoding of vision-language models.- HUMOS: Human Motion Model Conditioned on Body Shape.- Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields.- MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction.- FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning.- LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation.- Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation.- Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration.- CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians.- Bayesian Evidential Deep Learning for Online Action Detection.- AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation.- Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather.- Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction.- Memory-Efficient Fine-Tuning for Quantized Diffusion Model.- VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing.- MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model.- Human Hair Reconstruction with Strand-Aligned 3D Gaussians.- COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation.- SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders.- Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection.- Global-to-Pixel Regression for Human Mesh Recovery.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisLearning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal.- Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation.- Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning.- Text2Place: Affordance-aware Text Guided Human Placement.- OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations.- Zero-Shot Multi-Object Scene Completion.- Beta-Tuned Timestep Diffusion Model.- POA: Pre-training Once for Models of All Sizes.- Taming Latent Diffusion Model for Neural Radiance Field Inpainting.- MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation.- ByteEdit: Boost, Comply and Accelerate Generative Image Editing.- ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion.- High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs.- Accelerating Image Super-Resolution Networks with Pixel-Level Classification.- LASS3D: Language-Assisted Semi-Supervised 3D Semantic Segmentation with Progressive Unreliable Data Exploitation.- Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution.- Click-Gaussian: Interactive Segmentation to Any 3D Gaussians.- Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes.- DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction.- Track Everything Everywhere Fast and Robustly.- Towards Open-ended Visual Quality Comparison.- FreeInit: Bridging Initialization Gap in Video Diffusion Models.- DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs.- Eliminating Feature Ambiguity for Few-Shot Segmentation.- Soft Prompt Generation for Domain Generalization.- Shedding More Light on Robust Classifiers under the lens of Energy-based Models.
£64.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisSignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark.- AttnZero: Efficient Attention Discovery for Vision Transformers.- Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search.- Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search.- UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation.- TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning.- Spectral Subsurface Scattering for Material Classification.- nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding.- Dynamic Neural Radiance Field From Defocused Monocular Video.- PiTe: Pixel-Temporal Alignment for Large Video-Language Model.- CarFormer: Self-Driving with Learned Object-Centric Representations.- FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models.- Plain-Det: A Plain Multi-Dataset Object Detector.- Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation.- Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation.- Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching.- Text-Guided Video Masked Autoencoder.- Diffusion Models for Open-Vocabulary Segmentation.- Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation.- EvSign: Sign Language Recognition and Translation with Streaming Events.- QUAR-VLA: Vision-Language-Action Model for Quadruped Robots.- Zero-shot Object Counting with Good Exemplars.- TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering.- SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds.- PartSTAD: 2D-to-3D Part Segmentation Task Adaptation.- FutureDepth: Learning to Predict the Future Improves Video Depth Estimation.- LLM as Copilot for Coarse-grained Vision-and-Language Navigation.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisNeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation.- AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling.- SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models.- Quantized Prompt for Efficient Generalization of Vision-Language Models.- Online Temporal Action Localization with Memory-Augmented Transformer.- Efficient Cascaded Multiscale Adaptive Network for Image Restoration.- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.- Occlusion-Aware Seamless Segmentation.- OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection.- Referring Atomic Video Action Recognition.- Agent3D-Zero: An Agent for Zero-shot 3D Understanding.- Stream Query Denoising for Vectorized HD-Map Construction.- SAGS: Structure-Aware 3D Gaussian Splatting.- Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval.- OneRestore: A Universal Restoration Framework for Composite Degradation.- Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation.- SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks.- Bag of Tricks to Boost Adversarial Transferability.- RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency.- Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting.- WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation.- A Unified Framework for Gradient-based Saliency Map Generation of Black-box Models.- Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance.- COIN-Matting: Confounder Intervention for Image Matting.- SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding.- Audio-driven Talking Face Generation with Stabilized Synchronization Loss.- Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos.
£71.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisTrain Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation.- Learning to Obstruct Few-Shot Image Classification over Restricted Classes.- RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion.- L-DiffER: Single Image Reflection Removal with Language-based Diffusion Model.- AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting.- OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving.- CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner.- HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning.- BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion.- LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer.- Blind image deblurring with noise-robust kernel estimation.- Binomial Self-compensation for Motion Error in Dynamic 3D Scanning.- AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes.- Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation.- VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting.- Momentum Auxiliary Network for Supervised Local Learning.- HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion.- Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains.- Improving Zero-Shot Generalization for CLIP with Variational Adapter.- Realistic Human Motion Generation with Cross-Diffusion Models.- EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding.- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection.- Towards Reliable Advertising Image Generation Using Human Feedback.- Topology-Preserving Downsampling of Binary Images.- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders.- Classification Matters: Improving Video Action Detection with Class-Specific Attention.- Improving Medical Multi-modal Contrastive Learning with Expert Annotations.
£124.86
Springer Computer Vision ECCV 2024
Book SynopsisRethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias.- Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization.- SILC: Improving Vision Language Pretraining with Self-Distillation.- Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction.- Leveraging temporal contextualization for video action recognition.- ChEX: Interactive Localization and Region Description in Chest X-rays.- AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale.- CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts.- ZigMa: A DiT-style Zigzag Mamba Diffusion Model.- EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.- On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines.- HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization.- Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time.- Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries.- Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction.- Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning.- WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians.- SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference.- Flying with Photons: Rendering Novel Views of Propagating Light.- RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos.- MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images.- 3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views.- Removing Distributional Discrepancies in Captions Improves Image-Text Alignment.- Resilience of Entropy Model in Distributed Neural Networks.- Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis.- Implicit Concept Removal of Diffusion Models.- PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisGS-LRM: Large Reconstruction Model for 3D Gaussian Splatting.- Robust-Wide: Robust Watermarking against Instruction-driven Image Editing.- OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal.- Formula-Supervised Visual-Geometric Pre-training.- VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding.- Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing.- Restoring Images in Adverse Weather Conditions via Histogram Transformer.- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer.- NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis.- Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs.- G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields.- Getting it Right: Improving Spatial Consistency in Text-to-Image Models.- Generating 3D House Wireframes with Semantics.- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image.- Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments.- Nonverbal Interaction Detection.- UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving.- Responsible Visual Editing.- Drag Anything: Motion Control for Anything using Entity Representation.- SegPoint: Segment Any Point Cloud via Large Language Model.- Navigation Instruction Generation with BEV Perception and Large Language Models.- Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch.- Vista3D: unravel the 3d darkside of a single image.- The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation.- Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection.- FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally.- Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising.
£64.99
Springer Computer Vision ECCV 2024
£66.49
Springer Computer Vision ECCV 2024
Book SynopsisModeling and Driving Human Body Soundfields through Acoustic Primitives.- m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks.- Label-anticipated Event Disentanglement for Audio-Visual Video Parsing.- High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding.- Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization.- I-MedSAM: Implicit Medical Image Segmentation with Segment Anything.- ReMamber: Referring Image Segmentation with Mamba Twister.- TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting.- CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios.- Segmentation-guided Layer-wise Image Vectorization with Gradient Fills.- Implicit Style-Content Separation using B-LoRA.- OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models.- ActionVOS: Actions as Prompts for Video Object Segmentation.- FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance.- U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation.- Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization.- Rotary Position Embedding for Vision Transformer.- Local All-Pair Correspondence for Point Tracking.- MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection.- ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments.- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis.- ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos.- Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos.- PQ-SAM: Post-training Quantization for Segment Anything Model.- CPM: Class-conditional Prompting Machine for Audio-visual Segmentation.- Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition.- DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisElevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning.- Improving Knowledge Distillation via Regularizing Feature Direction and Norm.- 3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views.- Lazy Diffusion Transformer for Interactive Image Editing.- Non-parametric Sensor Noise Modeling and Synthesis.- Stripe Observation Guided Inference Cost-free Attention Mechanism.- The Nerfect Match: Exploring NeRF Features for Visual Localization.- ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance.- Robust Calibration of Large Vision-Language Adapters.- Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation.- Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training.- milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing.- denoiSplit: a method for joint microscopy image splitting and unsupervised denoising.- AugDETR: Improving Multi-scale Learning for Detection Transformer.- Spherical World-Locking for Audio-Visual Localization in Egocentric Videos.- SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images.- SIGMA: Sinkhorn-Guided Masked Video Modeling.- Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis.- Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams.- Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images.- Understanding Physical Dynamics with Counterfactual World Modeling.- MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition.- 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation.- Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance.- Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild.- DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation.- SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild.
£80.99
Springer Computer Vision ECCV 2024
Book SynopsisNeural Metamorphosis.- WHAC: World-grounded Humans and Cameras.- Federated Learning with Local Openset Noisy Labels.- Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection.- PSALM: Pixelwise Segmentation with Large Multi-modal Model.- Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model.- Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images.- Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture.- Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities.- Kinetic Typography Diffusion Model.- Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction.- Light-in-Flight for a World-in-Motion.- GroupDiff: Diffusion-based Group Portrait Editing.- Faceptor: A Generalist Model for Face Perception.- Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks.- Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels.- InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping.- KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval.- Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images.- Learning with Unmasked Tokens Drives Stronger Vision Learners.- Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken.- Multi-Task Domain Adaptation for Language Grounding with 3D Objects.- Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels.- Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture.- Camera-LiDAR Cross-modality Gait Recognition.- LiteSAM is Actually what you Need for segment Everything.- IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning.
£64.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisRecursive Visual Programming.- LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.- Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks.- Learning to Adapt SAM for Segmenting Cross-domain Point Clouds.- Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging.- ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers.- Fine-grained Dynamic Network for Generic Event Boundary Detection.- Take A Step Back: Rethinking the Two Stages in Visual Reasoning.- AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation.- Learning with Counterfactual Explanations for Radiology Report Generation.- SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models.- Better Regression Makes Better Test-time Adaptive 3D Object Detection.- ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.- Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization.- Finding Visual Task Vectors.- Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation.- Event Camera Data Dense Pre-training.- Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning.- Rethinking Image-to-Video Adaptation: An Object-centric Perspective.- Layer-Wise Relevance Propagation with Conservation Property for ResNet.- DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism.- EgoLifter: Open-world 3D Segmentation for Egocentric Perception.- MEVG : Multi-event Video Generation with Text-to-Video Models.- Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively.- Data-to-Model Distillation: Data-Efficient Learning Framework.- DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays.- AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network.
£64.99
Springer Computer Vision ECCV 2024
£104.49
Springer Computer Vision ECCV 2024
£71.99
Springer Computer Vision ECCV 2024
£71.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisTri^{2}-plane: Thinking Head Avatar via Feature Pyramid.- ControlCap: Controllable Region-level Captioning.- Free Lunch for Gait Recognition: A Novel Relation Descriptor.- SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding.- Adaptive Correspondence Scoring for Unsupervised Medical Image Registration.- MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models.- Watch Your Steps: Local Image and Scene Editing by Text Instructions.- Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation.- 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences.- Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation.- Human-in-the-Loop Visual Re-ID for Population Size Estimation.- SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation.- PointNeRF++: A multi-scale, point-based Neural Radiance Field.- A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties.- UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding.- Fast View Synthesis of Casual Videos with Soup-of-Planes.- Adaptive Human Trajectory Prediction via Latent Corridors.- Video Question Answering with Procedural Programs.- DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification.- TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling.- C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition.- LLMGA: Multimodal Large Language Model based Generation Assistant.- Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos.- Shape from Heat Conduction.- An Adaptive Screen-Space Meshing Approach for Normal Integration.- Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation.- HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning.
£64.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
£152.99
Springer Computer Vision ECCV 2024
£123.49
Springer Computer Vision ECCV 2024
£104.49
Springer Computer Vision ECCV 2024
Book SynopsisReal-time Holistic Robot Pose Estimation with Unknown States.- CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning.- A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars.- An accurate detection is not all you need to combat label noise in web-noisy datasets.- Online Vectorized HD Map Construction using Geometry.- Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids.- Learned HDR Image Compression for Perceptually Optimal Storage and Display.- Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion.- Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration.- Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression.- Improving Virtual Try-On with Garment-focused Diffusion Models.- Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection.- Disentangled Generation and Aggregation for Robust Radiance Fields.- UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation.- Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation.- MoAI: Mixture of All Intelligence for Large Language and Vision Models.- Semantic-guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift.- Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations.- SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models.- Open-World Dynamic Prompt and Continual Visual Representation Learning.- Learning Video Context as Interleaved Multimodal Sequences.- Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors.- Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding.- Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks.- Multi-scale Cross Distillation for Object Detection in Aerial Images.- Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation.- Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning.- Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models.- Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.- Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression.- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation.- CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing.- UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt.- LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents.- ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference.- Two-Stage Active Learning for Efficient Temporal Action Segmentation.- TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation.- MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views.- Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions.- Towards More Practical Group Activity Detection: A New Benchmark and Model.- Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models.- Zero-Shot Image Feature Consensus with Deep Functional Maps.- WindPoly: Polygonal Mesh Reconstruction via Winding Numbers.- MinD-3D: Reconstruct High-quality 3D objects in Human Brain.- Tokenize Anything via Prompting.- Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views.- Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks.- City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web.- GRAPE: Generalizable and Robust Multi-view Facial Capture.- Training-Free Model Merging for Multi-target Domain Adaptation.- Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses.- Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection.- Open-Vocabulary Camouflaged Object Segmentation.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisCogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion.- SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers.- Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM.- Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation.- GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring.- Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring.- ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion.- CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning.- Curved Diffusion: A Generative Model With Optical Geometry Control.- Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians.- MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis.- OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation.- Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures.- Conceptual Codebook Learning for Vision-Language Models.- LingoQA: Video Question Answering for Autonomous Driving.- AnimateMe: 4D Facial Expressions via Diffusion Models.- HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning.- LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis.- PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors.- Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention.- iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning.- Context Diffusion: In-Context Aware Image Generation.- Pose Guided Fine-Grained Sign Language Video Generation.- RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos.- Certifiably Robust Image Watermark.- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery.- Online Zero-Shot Classification with CLIP.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisDiffusion Models as Optimizers for Efficient Planning in Offline RL.- Enhanced Sparsification via Stimulative Training.- How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs.- NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation.- Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image.- Efficient Snapshot Spectral Imaging: Calibration-Free Parallel Structure with Aperture Diffraction Fusion.- Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective.- PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking.- HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models.- On the Approximation Risk of Few-Shot Class-Incremental Learning.- Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach.- Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization.- SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning.- Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning.- MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation.- PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training.- General Geometry-aware Weakly Supervised 3D Object Detection.- Long-CLIP: Unlocking the Long-Text Capability of CLIP.- Dolfin: Diffusion Layout Transformers without Autoencoder.- Real-time 3D-aware Portrait Editing from a Single Image.- StructLDM: Structured Latent Diffusion for 3D Human Generation.- Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation.- Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models.- Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification.- Continuous Memory Representation for Anomaly Detection.- InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser.- PACE: Pose Annotations in Cluttered Environments.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisEvaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off.- SkyScenes: A Synthetic Dataset for Aerial Scene Understanding.- Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps.- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction.- AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation.- PFedEdit: Personalized Federated Learning via Automated Model Editing.- De-Confusing Pseudo-Labels in Source-Free Domain Adaptation.- GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes.- EraseDraw : Learning to Insert Objects by Erasing Them from Images.- SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference.- Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models.- Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training.- Keypoint Promptable Re-Identification.- Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas.- DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting.- Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos.- Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores.- MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception.- Training A Secure Model against Data-Free Model Extraction.- EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control.- TriNeRFLet: A Wavelet Based Triplane NeRF Representation.- EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset.- Photorealistic Video Generation with Diffusion Models.- RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement.- TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models.- Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval.- DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisEx2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding.- Self-Supervised Audio-Visual Soundscape Stylization.- SAVE: Protagonist Diversification with Structure Agnostic Video Editing.- VideoAgent: Long-form Video Understanding with Large Language Model as Agent.- Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning.- Source-Free Domain-Invariant Performance Prediction.- Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures.- Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort.- Direct Distillation between Different Domains.- Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery.- V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation.- GRiT: A Generative Region-to-text Transformer for Object Understanding.- LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System.- Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning.- Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending.- Geometry Fidelity for Spherical Images.- BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling.- CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning.- WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation.- Benchmarking Spurious Bias in Few-Shot Image Classifiers.- TurboEdit: Real-time text-based disentangled real image editing.- Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy.- Augmented Neural Fine-tuning for Efficient Backdoor Purification.- REDIR: Refocus-free Event-based De-occlusion Image Reconstruction.- Free-Editor: Zero-shot Text-driven 3D Scene Editing.- DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly.- An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation.
£80.99
Springer Computer Vision ECCV 2024
Book SynopsisHowToCaption: Prompting LLMs to Transform Video Annotations at Scale.- LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection.- Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction.- On Pretraining Data Diversity for Self-Supervised Learning.- Look Around and Learn: Self-Training Object Detection by Exploration.- Bayesian Self-Training for Semi-Supervised 3D Segmentation.- Motion and Structure from Event-based Normal Flow.- ParCo: Part-Coordinating Text-to-Motion Synthesis.- Learning to Complement and to Defer to Multiple Users.- Tiny Models are the Computational Saver for Large Models.- DragVideo: Interactive Drag-style Video Editing.- Multi-Sentence Grounding for Long-term Instructional Video.- Do Generalised Classifiers really work on Human Drawn Sketches?.- KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding.- Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360.- MotionDirector: Motion Customization of Text-to-Video Diffusion Models.- Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer.- Enhanced Motion Forecasting with Visual Relation Reasoning.- Rate-Distortion-Cognition Controllable Versatile Neural Image Compression.- Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers.- LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar.- MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models.- Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models.- Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer.- Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors.- Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation.- StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion.
£64.99
Springer Computer Vision ECCV 2024
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisVEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors.- HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation.- SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting.- Temporal-Mapping Photography for Event Cameras.- Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data.- LineFit: A Geometric Approach for Fitting Line Segments in Images.- Six-Point Method for Multi-Camera Systems with Reduced Solution Space.- Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network.- Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.- AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition.- HERGen: Elevating Radiology Report Generation with Longitudinal Data.- Labeled Data Selection for Category Discovery.- Dependency-aware Differentiable Neural Architecture Search.- WAS: Dataset and Methods for Artistic Text Segmentation.- CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection.- GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer.- Norface: Improving Facial Expression Analysis by Identity Normalization.- Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy.- SNeRV: Spectra-preserving Neural Representation for Video.- COMO: Compact Mapping and Odometry.- OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction.- SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder.- EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation.- An Information Theoretical View for Out-Of-Distribution Detection.- DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes.- Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation.- Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisFew-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt.- An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models.- Generalizable Symbolic Optimizer Learning.- Online Continuous Generalized Category Discovery.- Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation.- Tackling Structural Hallucination in Image Translation with Local Diffusion.- Hierarchical Separable Video Transformer for Snapshot Compressive Imaging.- Unified Medical Image Pre-training in Language-Guided Common Semantic Space.- On the Vulnerability of Skip Connections to Model Inversion Attacks.- Adversarial Robustification via Text-to-Image Diffusion Models.- Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection.- Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector.- Reinforcement Learning via Auxillary Task Distillation.- DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation.- Pre-trained Visual Dynamics Representations for Efficient Policy Learning.- View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields.- Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception.- Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models.- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation.- TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias.- Learning Quantized Adaptive Conditions for Diffusion Models.- STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay.- Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry.- Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention.- High-Fidelity Modeling of Generalizable Wrinkle Deformation.- Instruction Tuning-free Visual Token Complement for Multimodal LLMs.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisLG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation.- Efficient Training with Denoised Neural Weights.- Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning.- Integration of Global and Local Representations for Fine-grained Cross-modal Alignment.- Local and Global Flatness for Federated Domain Generalization.- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints.- Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models.- Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs.- Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer..- Implicit Neural Models to Extract Heart Rate from Video.- Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering.- PFGS: High Fidelity Point Cloud Rendering via Feature Splatting.- Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation.- E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation.- EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions.- LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement.- Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs.- Efficient Vision Transformers with Partial Attention.- Generalized Coverage for More Robust Low-Budget Active Learning.- Rasterized Edge Gradients: Handling Discontinuities Differentially.- Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment.- FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning.- LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images.- Learning Natural Consistency Representation for Face Forgery Video Detection.- ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video.- Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems.- R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model.
£125.99
Springer Computer Vision ECCV 2024
Book SynopsisTeach CLIP to Develop a Number Sense for Ordinal Regression.- Compact 3D Scene Representation via Self-Organizing Gaussian Grids.- Pix2Gif: Motion-Guided Diffusion for GIF Generation.- VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking.- SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes.- Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning.- T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models.- ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples.- Towards Certifiably Robust Face Recognition.- Linking in Style: Understanding learned features in deep learning models.- Stable Video Portraits.- UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework.- CliffPhys: Camera-based Respiratory Measurement using Clifford Neural Networks.- Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network.- PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers.- Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection.- Synthesizing Environment-Specific People in Photographs.- Weight Conditioning for Smooth Optimization of Neural Networks.- Energy-Clibrated VAE with Test Time Free Lunch.- MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection.- SceneTeller: Language-to-3D Scene Generation.- MagMax: Leveraging Model Merging for Seamless Continual Learning.- InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.- DiffusionPen: Towards Controlling the Style of Handwritten Text Generation.- Debiasing surgeon: fantastic weights and how to find them.- Denoising Vision Transformers.- Differentiable Product Quantization for Memory Efficient Camera Relocalization.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisScore Distillation Sampling with Learned Manifold Corrective.- FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving.- Benchmarking the Robustness of Cross-view Geo-localization Models.- GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth.- SUMix: Mixup with Semantic and Uncertain Information.- Flatness-aware Sequential Learning Generates Resilient Backdoors.- Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models.- IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception.- DiffClass: Diffusion-Based Class Incremental Learning.- Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees.- Instant 3D Human Avatar Generation using Image Diffusion Models.- PromptFusion: Decoupling Stability and Plasticity for Continual Learning.- Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance.- Adapting to Shifting Correlations with Unlabeled Data Calibration.- Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity.- Information Bottleneck Based Data Correction in Continual Learning.- On Spectral Properties of Gradient-based Explanation Methods.- Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization.- O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation.- Dataset Distillation by Automatic Training Trajectories.- FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation.- EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding.- UniIR: Training and Benchmarking Universal Multimodal Information Retrievers.- SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning.- Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation.- Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision.- latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisHow to Train the Teacher Model for Effective Knowledge Distillation.- Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers.- Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models.- Modality Translation for Object Detection Adaptation without forgetting prior knowledge.- FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning.- Learning Multimodal Latent Generative Models with Energy-Based Prior.- On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition.- LaWa: Using Latent Space for In-Generation Image Watermarking.- Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution.- Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable.- Co-speech Gesture Video Generation with 3D Human Meshes.- When and How do negative prompts take effect?.- GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views.- CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting.- Snuffy: Efficient Whole Slide Image Classifier.- Learning to Build by Building Your Own Instructions.- Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling.- BlenderAlchemy: Editing 3D Graphics with Vision-Language Models.- DepS: Delayed e-Shrinking for Faster Once-For-All Training.- Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models.
£59.99
Springer Computer Vision ECCV 2024
Book SynopsisSur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images.- HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes.- Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation.- Consistent 3D Line Mapping.- Distributed Active Client Selection With Noisy Clients Using Model Association Scores.- PixOOD: Pixel-Level Out-of-Distribution Detection.- GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns.- Towards a Density Preserving Objective Function for Learning on Point Sets.- AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking.- VF-NeRF: Viewshed Fields for Rigid NeRF Registration.- Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction.- Trainable Highly-expressive Activation Functions.- Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising.- Self-Supervised Representation Learning for Adversarial Attack Detection.- Do text-free diffusion models learn discriminative visual representations?.- Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness.- DOCCI: Descriptions of Connected and Contrasting Images.- EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks.- AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild.- Dataset Quantization with Active Learning based Adaptive Sampling.- LogoSticker: Inserting Logos into Diffusion Models for Customized Generation.- LEROjD: Lidar Extended Radar-Only Object Detection.- ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation.- Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching.- Probabilistic Image-Driven Traffic Modeling via Remote Sensing.- IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination.- VideoStudio: Generating Consistent-Content and Multi-Scene Videos.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisSemantic Residual Prompts for Continual Learning.- TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds.- ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling.- Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection.- Occupancy as Set of Points.- UAV First-Person Viewers Are Radiance Field Learners.- Rethinking Few-shot Class-incremental Learning: Learning from Yourself.- ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection.- A Fair Ranking and New Model for Panoptic Scene Graph Generation.- Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning.- Compensation Sampling for Improved Convergence in Diffusion Models.- Situated Instruction Following.- Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography.- SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model.- GalLop: Learning global and local prompts for vision-language models.- Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor.- Lossy Image Compression with Foundation Diffusion Models.- CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation.- FMBoost: Boosting Latent Diffusion with Flow Matching.- COMPOSE: Comprehensive Portrait Shadow Editing.- LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration.- Diffusion Models as Data Mining Tools.- Graph Neural Network Causal Explanation via Neural Causal Models.- Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions.- Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering.- GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers.- SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisGenerating Physically Realistic and Directable Human Motions from Multi-Modal Inputs.- CoTracker: It is Better to Track Together.- SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.- PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology.- Improving Adversarial Transferability via Model Alignment.- RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios.- ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation.- Embodied Understanding of Driving Scenarios.- Learning to Drive via Asymmetric Self-Play.- OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation.- ViLA: Efficient Video-Language Alignment for Video Question Answering.- Factorizing Text-to-Video Generation by Explicit Image Conditioning.- MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices.- Open-Set Biometrics: Beyond Good Closed-Set Models.- UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening.- Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution.- Osmosis: RGBD Diffusion Prior for Underwater Image Restoration.- Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization.- Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements.- DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields.- Flowed Time of Flight Radiance Fields.- 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing.- Fast Registration of Photorealistic Avatars for VR Facial Animation.- CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings.- HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs.- Image-to-Lidar Relational Distillation for Autonomous Driving Data.- Thinking Outside the BBox: Unconstrained Generative Object Compositing.
£71.99
Springer Computer Vision ECCV 2024
Book SynopsisLarge-scale Reinforcement Learning for Diffusion Models.- CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion.- FedHARM: Harmonizing Model Architectural Diversity in Federated Learning.- EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS.- Global Counterfactual Directions.- TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving.- RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark.- EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models.- RICA^2: Rubric-Informed, Calibrated Assessment of Actions.- Region-centric Image-Language Pretraining for Open-Vocabulary Detection.- Commonly Interesting Images.- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities.- CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching.- Caltech Aerial RGB-Thermal Dataset in the Wild.- Diffusion Soup: Model Merging for Text-to-Image Diffusion Models.- Volumetric Rendering with Baked Quadrature Fields.- CityGuessr: City-Level Video Geo-Localization on a Global Scale.- Pseudo-Labelling Should Be Aware of Disguising Channel Activations.- Bayesian Detector Combination for Object Detection with Crowdsourced Annotations.- Revising Densification in Gaussian Splatting.- FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing.- Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss.- Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions.- UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation.- PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis.- R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding.- A Graph-Based Approach for Category-Agnostic Pose Estimation.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisDepth-guided NeRF Training via Earth Mover's Distance.- INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding.- DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks.- Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time.- Diagnosing and Re-learning for Balanced Multimodal Learning.- Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration.- Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders.- BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion.- SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views.- MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning.- Discovering Unwritten Visual Classifiers with Large Language Models.- LITA: Language Instructed Temporal-Localization Assistant.- MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain.- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs.- Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data.- AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation.- CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection.- SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging.- Minimalist Vision with Freeform Pixels.- All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation.- LatentEditor: Text Driven Local Editing of 3D Scenes.- Single-Photon 3D Imaging with Equi-Depth Photon Histograms.- Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision.- Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models.- POET: Prompt Offset Tuning for Continual Human Action Adaptation.- Domain Generalization of 3D Object Detection by Density-Resampling.- IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers.
£59.99
Springer Computer Vision ECCV 2024
Book SynopsisReinforcement Learning Friendly Vision-Language Model for Minecraft.- Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation.- Training-free Composite Scene Generation for Layout-to-Image Synthesis.- Robustness Preserving Fine-tuning using Neuron Importance.- ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation.- PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation.- Similarity of Neural Architectures using Adversarial Attack Transferability.- Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers.- PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation.- OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web.- AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering.- Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models.- Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks.- Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation.- MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos.- Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement.- Scene-Conditional 3D Object Stylization and Composition.- GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning.- Revisit Anything: Visual Place Recognition via Image Segment Retrieval.- EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching.- DGD: Dynamic 3D Gaussians Distillation.- Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation.- DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation.- Self-Guided Generation of Minority Samples Using Diffusion Models.- DEVIAS: Learning Disentangled Video Representations of Action and Scene.- AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset.- RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting.
£59.99
Springer Computer Vision ECCV 2024
Book SynopsisWTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding.- Spiking Wavelet Transformer.- WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing.- PDT Uav Target Detection Dataset for Pests and Diseases Tree.- Hypernetworks for Generalizable BRDF Representation.- Photon Inhibition for Energy-Efficient Single-Photon Imaging.- COD: Learning Conditional Invariant Representation for Domain Adaptation Regression.- RANRAC: Robust Neural Scene Representations via Random Ray Consensus.- LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model.- Characterizing Model Robustness via Natural Input Gradients.- UpFusion: Novel View Diffusion from Unposed Sparse View Observations.- Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding.- SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks.- Tuning-Free Image Customization with Image and Text Guidance.- FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification.- Emerging Property of Masked Token for Effective Pre-training.- DQ-DETR: DETR with Dynamic Query for Tiny Object Detection.- Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation.- SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians.- Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections.- Few-shot Defect Image Generation based on Consistency Modeling.- Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits.- CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs.- Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning.- Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline.- Video Editing via Factorized Diffusion Distillation.- Trackastra: Transformer-based cell tracking for live-cell microscopy.
£64.99
Springer Computer Vision ECCV 2024
Book SynopsisSmartControl: Enhancing ControlNet for Handling Rough Visual Conditions.- InterFusion: Text-Driven Generation of 3D Human-Object Interaction.- GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval.- DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving.- Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition.- NeRF-XL: NeRF at Any Scale with Multi-GPU.- CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems.- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?.- Compositional Substitutivity of Visual Reasoning for Visual Question Answering.- LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models.- DNI: Dilutional Noise Initialization for Diffusion Video Editing.- Two-Stage Video Shadow Detection via Temporal-Spatial Adaption.- Towards Physical World Backdoor Attacks against Skeleton Action Recognition.- SAM-guided Graph Cut for 3D Instance Segmentation.- Fully Authentic Visual Question Answering Dataset from Online Communities.- Active Generation for Image Classification.- FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors.- Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes.- Understanding Multi-compositional learning in Vision and Language models via Category Theory.- FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients.- Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration.- Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image.- Diffusion-Guided Weakly Supervised Semantic Segmentation.- Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment.- When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset.- NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image.- Segment and Recognize Anything at Any Granularity.
£64.99