WaterGen: Decoupling Scene and Medium in Underwater Image Generation

Jiayi Wu^1,*, Tianfu Wang^1,*, Tianyi Xiong¹, Dehao Yuan¹, Xiaomin Lin², Md Jahidul Islam³, Cornelia Fermuller¹, Christopher Metzler¹, Yiannis Aloimonos¹

¹University of Maryland ²University of South Florida ³University of Florida

* Equal contribution

ECCV 2026

Paper arXiv Code

WaterGen synthesizes diverse underwater scenes while independently controlling physical water-medium effects such as attenuation, scattering, and background light.

WaterGen teaser showing controlled underwater image generation

Top-right images are real; top-left images are generated by WaterGen.

Overview

Underwater vision systems need large, varied, and accurately labeled data, but real underwater collection is expensive and hard to annotate. WaterGen addresses this by treating underwater generation as two separate controls: the scene content and the surrounding water medium.

The model first generates clean underwater scene latents using a LoRA-adapted latent diffusion backbone. A medium-conditioned decoder then applies physically meaningful attenuation and backscattering according to specified water parameters. This decoupling lets a single generated scene be rendered under many water conditions without changing the underlying geometry.

The resulting paired clean/degraded images become scalable synthetic training data for underwater restoration and semantic segmentation, improving downstream performance across multiple models and datasets.

Scene-Medium Decoupling

Comparison of SDXL, non-decoupled generation, and WaterGen medium control

WaterGen preserves scene structure while changing medium appearance. In contrast, base and non-decoupled diffusion baselines often entangle object layout with water color and turbidity.

Controlled Underwater Generation

Qualitative comparison across underwater generation methods and water types

Given common water types from the UF7D setting, WaterGen produces realistic underwater degradations while maintaining text-image alignment and scene fidelity.

Proposed Method

Key idea. WaterGen separates underwater generation into two complementary parts: a diffusion model handles diverse scene generation, while a physically grounded decoder handles accurate local water-medium effects. This keeps object layout, scene geometry, and prompt alignment in the generative latent space, and moves attenuation, scattering, and background-light control to the stage where pixel-level radiometric accuracy matters most.

At inference time, WaterGen generates a clean semantic latent from a text prompt, then decodes it with explicit water-medium parameters. The same latent can be decoded repeatedly under different physical water conditions.

Training is isolated into two stages: clean-target diffusion fine-tuning for underwater scene semantics, followed by medium-conditioned decoder training with physically synthesized degradation.

Fine-Grained Medium Control

Holding the generated scene fixed, WaterGen smoothly varies underwater appearance by changing physical medium parameters. This makes it possible to generate aligned clean/degraded pairs for learning under challenging water conditions.

Quantitative Results

On clear underwater scene generation, WaterGen improves visual quality and text-image alignment while adding explicit text + medium controllability.

Method	UIQM ↑	MUSIQ ↑	CLIP Score ↑	Controllability
Atlantis	2.8338 ± 0.1927	67.5437 ± 1.7373	0.2457 ± 0.0274	Text-only
TIDE	2.3725 ± 0.3816	66.4304 ± 2.1780	0.2305 ± 0.0118	Text-only
WaterGen	3.0239 ± 0.1317	69.2638 ± 0.8813	0.2614 ± 0.0073	Text + Medium

Downstream Applications

Restoration results with WaterGen data augmentation

Synthetic pairs from WaterGen improve underwater restoration training by exposing models to broader attenuation and backscattering conditions.

Large segmentation comparison with WaterGen data augmentation

For underwater segmentation, WaterGen provides degraded images paired with reliable clean-image pseudo-labels, improving mask completeness under strong degradation.

Generation Diversity

Diverse zero-shot WaterGen underwater environments

WaterGen also generalizes to diverse and unusual text prompts, producing high-fidelity underwater scenes while keeping medium effects controllable.

BibTeX

@inproceedings{wu2026watergen,
  title={WaterGen: Decoupling Scene and Medium in Underwater Image Generation},
  author={Wu, Jiayi and Wang, Tianfu and Xiong, Tianyi and Yuan, Dehao and Lin, Xiaomin and Islam, Md Jahidul and Fermuller, Cornelia and Metzler, Christopher and Aloimonos, Yiannis},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2026}
}