WaterGen: Decoupling Scene and Medium in Underwater Image Generation

Jiayi Wu1,*, Tianfu Wang1,*, Tianyi Xiong1, Dehao Yuan1, Xiaomin Lin2, Md Jahidul Islam3, Cornelia Fermuller1, Christopher Metzler1, Yiannis Aloimonos1
1University of Maryland 2University of South Florida 3University of Florida
* Equal contribution
ECCV 2026
WaterGen synthesizes diverse underwater scenes while independently controlling physical water-medium effects such as attenuation, scattering, and background light.
WaterGen teaser showing controlled underwater image generation
Top-right images are real; top-left images are generated by WaterGen.

Overview

Underwater vision systems need large, varied, and accurately labeled data, but real underwater collection is expensive and hard to annotate. WaterGen addresses this by treating underwater generation as two separate controls: the scene content and the surrounding water medium.

The model first generates clean underwater scene latents using a LoRA-adapted latent diffusion backbone. A medium-conditioned decoder then applies physically meaningful attenuation and backscattering according to specified water parameters. This decoupling lets a single generated scene be rendered under many water conditions without changing the underlying geometry.

The resulting paired clean/degraded images become scalable synthetic training data for underwater restoration and semantic segmentation, improving downstream performance across multiple models and datasets.

Scene-Medium Decoupling

Comparison of SDXL, non-decoupled generation, and WaterGen medium control
WaterGen preserves scene structure while changing medium appearance. In contrast, base and non-decoupled diffusion baselines often entangle object layout with water color and turbidity.

Controlled Underwater Generation

Qualitative comparison across underwater generation methods and water types
Given common water types from the UF7D setting, WaterGen produces realistic underwater degradations while maintaining text-image alignment and scene fidelity.

Proposed Method

Key idea. WaterGen separates underwater generation into two complementary parts: a diffusion model handles diverse scene generation, while a physically grounded decoder handles accurate local water-medium effects. This keeps object layout, scene geometry, and prompt alignment in the generative latent space, and moves attenuation, scattering, and background-light control to the stage where pixel-level radiometric accuracy matters most.

WaterGen inference pipeline
At inference time, WaterGen generates a clean semantic latent from a text prompt, then decodes it with explicit water-medium parameters. The same latent can be decoded repeatedly under different physical water conditions.

WaterGen two-stage training pipeline
Training is isolated into two stages: clean-target diffusion fine-tuning for underwater scene semantics, followed by medium-conditioned decoder training with physically synthesized degradation.

Fine-Grained Medium Control

WaterGen medium control grid
Holding the generated scene fixed, WaterGen smoothly varies underwater appearance by changing physical medium parameters. This makes it possible to generate aligned clean/degraded pairs for learning under challenging water conditions.

Quantitative Results

On clear underwater scene generation, WaterGen improves visual quality and text-image alignment while adding explicit text + medium controllability.

Method UIQM ↑ MUSIQ ↑ CLIP Score ↑ Controllability
Atlantis 2.8338 ± 0.1927 67.5437 ± 1.7373 0.2457 ± 0.0274 Text-only
TIDE 2.3725 ± 0.3816 66.4304 ± 2.1780 0.2305 ± 0.0118 Text-only
WaterGen 3.0239 ± 0.1317 69.2638 ± 0.8813 0.2614 ± 0.0073 Text + Medium

Downstream Applications

Restoration results with WaterGen data augmentation
Synthetic pairs from WaterGen improve underwater restoration training by exposing models to broader attenuation and backscattering conditions.

Large segmentation comparison with WaterGen data augmentation
For underwater segmentation, WaterGen provides degraded images paired with reliable clean-image pseudo-labels, improving mask completeness under strong degradation.

Generation Diversity

Diverse zero-shot WaterGen underwater environments
WaterGen also generalizes to diverse and unusual text prompts, producing high-fidelity underwater scenes while keeping medium effects controllable.

BibTeX

@inproceedings{wu2026watergen,
  title={WaterGen: Decoupling Scene and Medium in Underwater Image Generation},
  author={Wu, Jiayi and Wang, Tianfu and Xiong, Tianyi and Yuan, Dehao and Lin, Xiaomin and Islam, Md Jahidul and Fermuller, Cornelia and Metzler, Christopher and Aloimonos, Yiannis},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2026}
}