NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting

Abstract

Traditional volume visualization (VolVis) methods, like direct volume rendering, suffer from rigid transfer function designs and high computational costs. Although novel view synthesis approaches enhance rendering efficiency, they require additional learning effort for non-experts and lack support for semantic-level interaction. To bridge this gap, we propose NLI4VolVis, an interactive system that enables users to explore, query, and edit volumetric scenes using natural language.

NLI4VolVis integrates multi-view semantic segmentation and vision-language models to extract and understand semantic components in a scene. We introduce a multi-agent large language model architecture equipped with extensive function-calling tools to interpret user intents and execute visualization tasks. The agents leverage external tools and declarative VolVis commands to interact with the VolVis engine powered by 3D editable Gaussians, enabling open-vocabulary object querying, real-time scene editing, best-view selection, and 2D stylization.

We validate our system through case studies and a user study, highlighting its improved accessibility and usability in volumetric data exploration.

Case Studies

We demonstrate NLI4VolVis capabilities through six representative case studies, showcasing the system's ability to handle diverse datasets and natural language interactions across different visualization scenarios.

Carp

Biological dataset segmented into seven components with guided tours and structure highlighting. Users can isolate fins, enhance specific parts like the pectoral fin, and apply stylized transformations such as "Transform the entire fish into a cyborg" in real time.

Backpack

CT scan exploration using vague but natural descriptions like “the box,” “the storage container,” or “the square-shaped object.” The system handles open-ended references through vision-language embeddings, identifying and highlighting correct components.

Chameleon

Context-driven visualization demonstrating adaptive lighting and environmental styling. When asked "How might the chameleon look in a desert?" the system changes skin color to match desert environments and stylizes the scene accordingly through iterative perception and reasoning.

Kingsnake

Implicit instruction interpretation where "Display the snake inside the egg" leads the system to render a fully visible snake within a semi-transparent egg, demonstrating natural language understanding and automatic opacity adjustment.

Mantle Temperature

Temperature simulation with colormap visualization from green (coolest) to orange (hottest). Users can highlight specific temperature regions like "hot mantle in bright blue" and apply creative stylizations such as transforming the scene to resemble a Black Forest cake, showcasing scientific precision with artistic creativity.

Supernova

Astrophysical simulation exploration with scientific reasoning. When asked "Tell me what the green part is about," the system identifies it as the shockwave. Follow-up queries like "Show me only the blue part" lead to isolation of turbulent plasma with detailed explanations, demonstrating combined visual reasoning and scientific knowledge.

System Pipeline

NLI4VolVis employs a multi-stage pipeline that seamlessly integrates natural language processing with volume visualization. The system consists of four main components:

1. Semantic Understanding: Multi-view semantic segmentation and vision-language models extract and understand semantic components within volumetric scenes, enabling open-vocabulary object querying.

2. Editable 3D Gaussian Rendering: The VolVis engine powered by editable 3D Gaussians provides real-time scene editing, best-view selection, and 2D stylization capabilities.

3. LLM Multi-Agent Collaboration: A coordinated system of specialized language model agents interprets user requests, leveraging function-calling tools to decompose complex tasks into executable visualization commands.

4. Natural Language Interface: Users input natural language queries through an intuitive chat interface, allowing them to express complex visualization intents without technical expertise.

System pipeline overview showing the integration of natural language processing, multi-agent coordination, semantic understanding, and 3D Gaussian-based volume rendering.

Interactive Interface

NLI4VolVis features an intuitive interface that democratizes volume visualization through natural language interaction. The interface seamlessly combines traditional visualization controls with conversational AI, allowing users to express their visualization intents in plain English rather than learning complex technical parameters.

The NLI4VolVis interface consists of four key components: (a) panel, (b) rendering window, (c) chat widget, and (d) action log.

User Study

We conducted a comprehensive user study to evaluate the effectiveness and usability of NLI4VolVis. The study involved 8 participants with varying levels of visualization expertise, from novices to domain experts, to assess the system's accessibility and practical utility.

Key Advantages

The user study revealed several significant advantages of our natural language approach:

Improved Accessibility: Novice users demonstrated significantly faster task completion times and reduced learning curves when using natural language commands.
Reduced Cognitive Load: Users no longer needed to memorize complex parameter relationships or technical visualization terminology, allowing them to focus on data analysis rather than interface mechanics.
Increased Exploration Efficiency: The multi-agent system enabled users to perform complex visualization tasks through simple conversational exchanges, leading to more comprehensive data exploration.

Identified Shortcomings

Despite the overall positive reception, the study also identified areas for improvement:

Ambiguity in Natural Language: Some participants experienced difficulties when their natural language queries were ambiguous or imprecise, requiring multiple iterations to achieve desired results.
LLM Latency: Response delays from the language model agents occasionally disrupted the interactive flow, particularly for complex multi-step visualization tasks.
Limited Domain Coverage: The system's effectiveness was constrained by the predefined function tools, occasionally struggling with highly specialized domain-specific terminology.

Survey responses to nine post-questionnaire questions, along with mean scores (on a 1–5 scale) and standard deviations.

Overall, the user study demonstrates that NLI4VolVis successfully democratizes volume visualization by making it more accessible to non-expert users while maintaining the sophistication required for complex data exploration tasks. The identified shortcomings provide valuable directions for future improvements, particularly in handling natural language ambiguity and LLM response latency.

BibTeX

@article{ai2025nli4volvis, author = {Ai, Kuangshi and Tang, Kaiyuan and Wang, Chaoli}, title = {{NLI4VolVis}: Natural Language Interaction for Volume Visualization via Multi-{LLM} Agents and Editable {3D Gaussian} Splatting}, journal = {IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2025)}, volume = {32}, number = {1}, year = {2026}, note = {Conditionally Accepted} }