ELECTRONIC LETTERS ON COMPUTER VISION AND IMAGE ANALYSIS (ELCVIA), cilt.24, sa.2, ss.273-285, 2026 (Scopus)
We address global occupancy–map completion from partial, sequential observations by framing it asconditional image generation. Using HouseExpo/SUNCG floorplans, we build a Gazebo+ROS pipeline thatcollects time–ordered exploration data and release a 128×128 dataset for reproducible benchmarking.Six model families are compared under a unified setup: Conditional VAE (CVAE), pix2pix–hingeConditional GAN (CGAN), RePaint–style diffusion (DDPM), Vision Transformer (ViT) encoder–decoder,residual UNet, and a geometric, training–free baseline (GEOM). UNet achieves the lowest pixelwise errors(L1/MSE/RMSE) and the best discrete scores (IoU/F1/Accuracy). GEOM, despite being training-free, at-tains the highest SSIM—demonstrating that explicit geometric priors can rival data-driven learning whenstructural regularity dominates. Diffusion offers strong perceptual quality but at higher computational cost.Our findings reveal that occupancy-map completion is fundamentally a structure-exploitation task: con-volutional architectures inherently favor rectilinear layouts, while attention-based models require more datato learn comparable priors. We release the dataset, ROS pipeline, and all implementations to enable repro-ducible research in generative mapping.