Génération d'objets 3D et d'environnements par IA - DreamFusion, Point-E, Gaussian Splatting et mondes virtuels

Text-to-3D et Génération de Mondes

La génération 3D par IA permet de créer des objets, personnages et environnements entiers à partir de descriptions textuelles ou d'images.

Évolution de la génération 3D

┌─────────────────────────────────────────────────────────────────┐
│             ÉVOLUTION DE LA GÉNÉRATION 3D                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  TRADITIONNEL (avant 2022)                                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Artiste 3D → Modélisation manuelle → Texture → Rendu   │   │
│  │  Temps: heures à jours par objet                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  2022: DREAMFUSION / CLIP-MESH                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Texte → Optimisation par diffusion → Mesh 3D           │   │
│  │  Temps: 1-2 heures (GPU A100)                            │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  2023: MODÈLES FEED-FORWARD                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Texte/Image → Réseau neuronal → 3D en secondes         │   │
│  │  Temps: 10-60 secondes                                   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  2024: GAUSSIAN SPLATTING + GÉNÉRATION                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Texte → Gaussians 3D → Rendu temps réel haute qualité  │   │
│  │  Qualité photoréaliste, éditable                        │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Représentations 3D

Types de représentations

┌─────────────────────────────────────────────────────────────────┐
│              REPRÉSENTATIONS 3D POUR L'IA                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. MESH (triangles)                                            │
│     ┌───────────────┐                                           │
│     │    /\         │  Avantages:                               │
│     │   /  \        │  ✓ Standard industrie                     │
│     │  /____\       │  ✓ Éditable dans Blender, Maya           │
│     │  |\    |      │  Inconvénients:                           │
│     │  | \   |      │  ✗ Topologie difficile à apprendre       │
│     │  |__\__|      │  ✗ Résolution fixe                       │
│     └───────────────┘                                           │
│                                                                  │
│  2. NERF (Neural Radiance Field)                                │
│     ┌───────────────┐                                           │
│     │  f(x,y,z,θ,φ) │  Avantages:                               │
│     │      ↓        │  ✓ Qualité photoréaliste                 │
│     │  (color, σ)   │  ✓ Vue-dépendant (reflets)               │
│     └───────────────┘  Inconvénients:                           │
│     Réseau qui prédit │  ✗ Lent à rendre                        │
│     couleur + densité │  ✗ Pas éditable                         │
│                                                                  │
│  3. GAUSSIAN SPLATTING                                          │
│     ┌───────────────┐                                           │
│     │  ● ● ●        │  Avantages:                               │
│     │   ● ● ●       │  ✓ Rendu temps réel (100+ fps)          │
│     │    ● ● ●      │  ✓ Qualité proche NeRF                   │
│     └───────────────┘  ✓ Éditable                               │
│     Millions de        Inconvénients:                           │
│     gaussiennes 3D     ✗ Fichiers volumineux                   │
│                                                                  │
│  4. POINT CLOUD                                                 │
│     ┌───────────────┐                                           │
│     │  . . .        │  Simple mais limité                      │
│     │   . . .       │  Utilisé comme intermédiaire             │
│     │    . . .      │                                           │
│     └───────────────┘                                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

DreamFusion et Score Distillation

Architecture DreamFusion

┌─────────────────────────────────────────────────────────────────┐
│                    DREAMFUSION                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Prompt: "A red sports car"                                     │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │   ┌──────────────┐      ┌──────────────────────────┐   │   │
│  │   │  NeRF 3D     │      │  Modèle de Diffusion    │   │   │
│  │   │  (à optimiser)│      │  (Imagen, gelé)         │   │   │
│  │   └──────┬───────┘      └────────────┬─────────────┘   │   │
│  │          │                           │                  │   │
│  │          │ Render from               │                  │   │
│  │          │ random angle              │                  │   │
│  │          ▼                           │                  │   │
│  │   ┌──────────────┐                   │                  │   │
│  │   │   Image 2D   │───────────────────┘                  │   │
│  │   │   rendue     │                                      │   │
│  │   └──────┬───────┘                                      │   │
│  │          │                                               │   │
│  │          ▼                                               │   │
│  │   ┌──────────────────────────────────────────────────┐ │   │
│  │   │  Score Distillation Sampling (SDS)               │ │   │
│  │   │                                                   │ │   │
│  │   │  "Cette image ressemble-t-elle à 'red sports car'│ │   │
│  │   │   selon le modèle de diffusion?"                  │ │   │
│  │   │                                                   │ │   │
│  │   │  Gradient → met à jour le NeRF                   │ │   │
│  │   └──────────────────────────────────────────────────┘ │   │
│  │                                                          │   │
│  │  Répéter ~10000 fois depuis différents angles           │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Temps: 1-2 heures sur A100                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Implémentation conceptuelle

# Score Distillation Sampling (SDS) - Concept simplifié

import torch
import torch.nn.functional as F

class ScoreDistillationLoss:
    """
    Utilise un modèle de diffusion pré-entraîné
    pour guider l'optimisation 3D
    """

    def __init__(self, diffusion_model, guidance_scale=100):
        self.diffusion_model = diffusion_model
        self.guidance_scale = guidance_scale

    def __call__(
        self,
        rendered_image: torch.Tensor,  # Image rendue depuis le NeRF
        text_embedding: torch.Tensor,   # Embedding du prompt
        noise_level: float = 0.5,
    ) -> torch.Tensor:
        """Calcule le gradient SDS"""
        # 1. Ajouter du bruit à l'image rendue
        noise = torch.randn_like(rendered_image)
        noisy_image = rendered_image * (1 - noise_level) + noise * noise_level

        # 2. Prédire le bruit avec le modèle de diffusion
        with torch.no_grad():
            # Prédiction conditionnée sur le texte
            predicted_noise_cond = self.diffusion_model(
                noisy_image,
                text_embedding,
                noise_level
            )
            # Prédiction non conditionnée
            predicted_noise_uncond = self.diffusion_model(
                noisy_image,
                None,
                noise_level
            )

        # 3. Classifier-free guidance
        predicted_noise = predicted_noise_uncond + self.guidance_scale * (
            predicted_noise_cond - predicted_noise_uncond
        )

        # 4. Le gradient pousse l'image vers ce que le diffusion model attend
        # Ce gradient est backprop dans le NeRF
        loss = F.mse_loss(noise, predicted_noise)

        return loss


class DreamFusionOptimizer:
    """Optimise un NeRF avec Score Distillation"""

    def __init__(self, nerf_model, diffusion_model, prompt: str):
        self.nerf = nerf_model
        self.sds = ScoreDistillationLoss(diffusion_model)
        self.text_embedding = encode_text(prompt)
        self.optimizer = torch.optim.Adam(nerf_model.parameters(), lr=1e-3)

    def train_step(self):
        # 1. Échantillonner un angle de vue aléatoire
        camera_pose = random_camera_pose()

        # 2. Rendre l'image depuis le NeRF
        rendered = self.nerf.render(camera_pose)

        # 3. Calculer la perte SDS
        loss = self.sds(rendered, self.text_embedding)

        # 4. Backprop et update
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        return loss.item()

Modèles feed-forward

Point-E (OpenAI)

┌─────────────────────────────────────────────────────────────────┐
│                      POINT-E                                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Pipeline en 2 étapes:                                          │
│                                                                  │
│  Texte: "A green chair"                                         │
│           │                                                      │
│           ▼                                                      │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  ÉTAPE 1: Text-to-Image                                 │   │
│  │  GLIDE génère une image 2D du prompt                    │   │
│  └─────────────────────┬───────────────────────────────────┘   │
│                        │                                        │
│                        ▼                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  ÉTAPE 2: Image-to-3D                                   │   │
│  │  Diffusion sur point cloud (1024 → 4096 points)        │   │
│  └─────────────────────┬───────────────────────────────────┘   │
│                        │                                        │
│                        ▼                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  OPTIONNEL: Point cloud → Mesh                          │   │
│  │  (Reconstruction de surface)                            │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Temps: ~40 secondes                                            │
│  Qualité: Moyenne (point cloud basique)                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

# Utilisation de Point-E
from point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from point_e.models.download import load_checkpoint
from point_e.models.configs import MODEL_CONFIGS, model_from_config

# Charger les modèles
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 1. Text-to-Image
print('Loading text-to-image model...')
text2im_model = model_from_config(MODEL_CONFIGS['base40M-textvec'], device)
text2im_model.load_state_dict(load_checkpoint('base40M-textvec', device))

# 2. Image-to-3D
print('Loading point cloud model...')
pc_model = model_from_config(MODEL_CONFIGS['base40M'], device)
pc_model.load_state_dict(load_checkpoint('base40M', device))

# Générer
prompt = "a red motorcycle"

# Étape 1: Générer image
images = sample_text2im(text2im_model, prompt)

# Étape 2: Générer point cloud
point_cloud = sample_im2pc(pc_model, images[0])

# Visualiser
fig = plot_point_cloud(point_cloud)

Shap-E (OpenAI)

# Shap-E - Génère directement des paramètres NeRF/mesh
import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Charger les modèles
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

# Générer
prompt = "a shark"

latents = sample_latents(
    batch_size=1,
    model=model,
    diffusion=diffusion,
    guidance_scale=15.0,
    model_kwargs=dict(texts=[prompt]),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

# Décoder en mesh
from shap_e.util.notebooks import decode_latent_mesh

for i, latent in enumerate(latents):
    mesh = decode_latent_mesh(xm, latent).tri_mesh()
    mesh.write_obj(f'mesh_{i}.obj')

Modèles récents (2024)

┌─────────────────────────────────────────────────────────────────┐
│              MODÈLES TEXT-TO-3D 2024                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  INSTANT3D (OpenAI)                                             │
│  • Feed-forward, pas d'optimisation                             │
│  • ~20 secondes par objet                                       │
│  • Qualité proche DreamFusion                                   │
│                                                                  │
│  MESHY                                                          │
│  • API commerciale                                              │
│  • Text-to-3D et Image-to-3D                                   │
│  • Export GLB, FBX, OBJ, USDZ                                  │
│                                                                  │
│  TRIPO3D                                                        │
│  • Open source                                                   │
│  • Génération en ~10 secondes                                   │
│  • Bon rapport qualité/vitesse                                  │
│                                                                  │
│  LUMA GENIE                                                     │
│  • API commerciale                                              │
│  • Très haute qualité                                           │
│  • Textures détaillées                                          │
│                                                                  │
│  COMPARAISON:                                                   │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Modèle      │ Temps  │ Qualité │ Textures │ Prix     │    │
│  ├────────────────────────────────────────────────────────┤    │
│  │ DreamFusion │ 1-2h   │ ★★★★   │ ★★★     │ GPU     │    │
│  │ Point-E     │ 40s    │ ★★      │ ★        │ Gratuit │    │
│  │ Shap-E      │ 30s    │ ★★★     │ ★★      │ Gratuit │    │
│  │ Meshy       │ 2min   │ ★★★★   │ ★★★★   │ $$$     │    │
│  │ Luma Genie  │ 1min   │ ★★★★★ │ ★★★★★ │ $$$$    │    │
│  └────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Gaussian Splatting

Principe

┌─────────────────────────────────────────────────────────────────┐
│               3D GAUSSIAN SPLATTING                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Représente une scène par des millions de "gaussiennes" 3D     │
│                                                                  │
│  Chaque gaussienne a:                                           │
│  • Position (x, y, z)                                           │
│  • Covariance (taille et orientation)                          │
│  • Opacité                                                      │
│  • Couleur (spherical harmonics pour vue-dépendance)           │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │      ●●●                                                 │   │
│  │     ●●●●●        Millions de gaussiennes                │   │
│  │    ●●●●●●●       forment l'objet                        │   │
│  │     ●●●●●                                                │   │
│  │      ●●●                                                 │   │
│  │                                                          │   │
│  │  Rendu: Projection + Alpha compositing                  │   │
│  │  → 100+ FPS en temps réel!                              │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  AVANTAGES:                                                     │
│  ✓ Qualité proche NeRF                                         │
│  ✓ Rendu 100x plus rapide                                      │
│  ✓ Éditable (déplacer, supprimer des gaussians)               │
│  ✓ Pas de réseau de neurones au rendu                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Utilisation de Gaussian Splatting

# Entraînement de Gaussian Splatting depuis des images
# Nécessite le repo: github.com/graphdeco-inria/gaussian-splatting

"""
# 1. Préparer les données (images + poses caméra)
# Structure requise:
# data/
#   ├── images/
#   │   ├── 000.jpg
#   │   ├── 001.jpg
#   │   └── ...
#   └── sparse/  (poses COLMAP)

# 2. Entraînement
python train.py -s data/ -m output/

# 3. Visualisation temps réel
SIBR_viewers/bin/SIBR_gaussianViewer_app -m output/
"""

# Pour la génération text-to-3D avec Gaussian Splatting:
# Projet DreamGaussian

"""
# Installation
pip install -r requirements.txt

# Génération depuis texte
python main.py --config configs/text.yaml \\
    prompt="a delicious hamburger" \\
    save_path=./output

# Génération depuis image
python main.py --config configs/image.yaml \\
    input=./examples/hamburger.png \\
    save_path=./output
"""

DreamGaussian

┌─────────────────────────────────────────────────────────────────┐
│                    DREAMGAUSSIAN                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Combine Score Distillation + Gaussian Splatting               │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  STAGE 1: Génération rapide (2 min)                     │   │
│  │  • Initialise avec des gaussians aléatoires             │   │
│  │  • Optimise avec SDS (modèle de diffusion)              │   │
│  │  • Résultat: Gaussians 3D bruts                         │   │
│  │                                                          │   │
│  │  STAGE 2: Raffinement mesh (1 min)                      │   │
│  │  • Extrait mesh depuis les gaussians                    │   │
│  │  • UV unwrapping automatique                            │   │
│  │  • Génère texture haute résolution                      │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Total: ~3 minutes (vs 1-2h DreamFusion)                       │
│  Qualité: Comparable à DreamFusion                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Génération de mondes virtuels

World Models et génération procédurale

┌─────────────────────────────────────────────────────────────────┐
│           GÉNÉRATION DE MONDES PAR IA                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  APPROCHES:                                                     │
│                                                                  │
│  1. GÉNÉRATION D'ASSETS + PLACEMENT                            │
│     ┌───────────────────────────────────────────────────┐      │
│     │ LLM planifie → Text-to-3D génère → Placement auto │      │
│     └───────────────────────────────────────────────────┘      │
│     Ex: "Une forêt enchantée avec un château"                  │
│     → Génère: arbres, château, rochers, etc.                   │
│     → Place automatiquement selon règles                        │
│                                                                  │
│  2. DIFFUSION SUR SCÈNE ENTIÈRE                                │
│     ┌───────────────────────────────────────────────────┐      │
│     │ Génère la scène 3D complète en une fois           │      │
│     └───────────────────────────────────────────────────┘      │
│     Plus cohérent mais plus difficile                          │
│                                                                  │
│  3. WORLD MODELS (simulation)                                   │
│     ┌───────────────────────────────────────────────────┐      │
│     │ IA qui "imagine" la suite de l'environnement      │      │
│     └───────────────────────────────────────────────────┘      │
│     Ex: Genie (Google), UniSim                                 │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Genie (Google DeepMind)

┌─────────────────────────────────────────────────────────────────┐
│                      GENIE                                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  "Generative Interactive Environment"                           │
│  Génère des mondes jouables à partir d'une seule image         │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  Image de départ    Actions         Monde généré        │   │
│  │  ┌──────────┐       ┌─────┐        ┌──────────┐        │   │
│  │  │  🏠      │   +   │ → ← │   =    │  🏠 →    │        │   │
│  │  │ plateau  │       │ ↑ ↓ │        │ jouable  │        │   │
│  │  └──────────┘       │ jump│        └──────────┘        │   │
│  │                     └─────┘                             │   │
│  │                                                          │   │
│  │  L'IA apprend:                                          │   │
│  │  • La physique implicite (gravité, collisions)         │   │
│  │  • Les règles du jeu (platformer, puzzle)              │   │
│  │  • La cohérence visuelle                                │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Entraîné sur 200,000 heures de vidéos de gameplay             │
│  11 milliards de paramètres                                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Génération de terrains

# Génération de terrain par IA
# Combine diffusion + heightmap

import torch
import numpy as np

class TerrainGenerator:
    """Génère des terrains 3D à partir de descriptions"""

    def __init__(self, diffusion_model):
        self.diffusion = diffusion_model

    def generate(self, prompt: str, size: int = 512) -> np.ndarray:
        """
        Génère un terrain depuis un prompt.

        Args:
            prompt: "A mountain range with a river valley"
            size: Résolution du heightmap

        Returns:
            Heightmap 2D (peut être converti en mesh)
        """
        # 1. Générer une vue satellite du terrain
        satellite_view = self.diffusion.generate(
            f"Satellite view of {prompt}, top-down, terrain map"
        )

        # 2. Convertir en heightmap
        # (modèle spécialisé ou estimation de profondeur)
        heightmap = self.image_to_heightmap(satellite_view)

        # 3. Post-traitement
        heightmap = self.smooth_terrain(heightmap)
        heightmap = self.add_erosion(heightmap)

        return heightmap

    def heightmap_to_mesh(self, heightmap: np.ndarray) -> "Mesh":
        """Convertit un heightmap en mesh 3D"""
        h, w = heightmap.shape
        vertices = []
        faces = []

        # Créer les vertices
        for y in range(h):
            for x in range(w):
                vertices.append([
                    x / w,
                    heightmap[y, x],
                    y / h
                ])

        # Créer les faces (triangles)
        for y in range(h - 1):
            for x in range(w - 1):
                i = y * w + x
                faces.append([i, i + w, i + 1])
                faces.append([i + 1, i + w, i + w + 1])

        return Mesh(vertices, faces)

# Utilisation
generator = TerrainGenerator(diffusion_model)
terrain = generator.generate("Alpine mountains with snow peaks and pine forest")
mesh = generator.heightmap_to_mesh(terrain)
mesh.export("terrain.obj")

Applications

Jeux vidéo

┌─────────────────────────────────────────────────────────────────┐
│            APPLICATIONS JEUX VIDÉO                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. GÉNÉRATION D'ASSETS                                        │
│     • Props et objets décoratifs                               │
│     • Variations d'un même objet                               │
│     • Prototypage rapide                                       │
│                                                                  │
│  2. NPCs ET PERSONNAGES                                         │
│     • Génération de personnages uniques                        │
│     • Customisation par le joueur                              │
│     • Foules variées                                           │
│                                                                  │
│  3. ENVIRONNEMENTS PROCÉDURAUX                                  │
│     • Donjons générés                                          │
│     • Paysages infinis                                         │
│     • Villes et villages                                       │
│                                                                  │
│  EXEMPLES:                                                      │
│  • Roblox: Intègre IA pour création d'assets                   │
│  • Unity: Partenariat avec RunwayML                           │
│  • Epic: MetaHuman + IA générative                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Réalité virtuelle et métavers

# Pipeline de création de monde VR assisté par IA

class VRWorldBuilder:
    """Constructeur de mondes VR avec assistance IA"""

    def __init__(
        self,
        text_to_3d,
        scene_understanding,
        physics_engine
    ):
        self.text_to_3d = text_to_3d
        self.scene_ai = scene_understanding
        self.physics = physics_engine

    def create_from_description(self, description: str) -> "VRScene":
        """
        Crée une scène VR complète depuis une description.

        Ex: "A cozy coffee shop with wooden furniture,
             large windows, and plants"
        """
        # 1. Analyser la description
        scene_plan = self.scene_ai.plan_scene(description)
        # → {"room_type": "interior", "style": "cozy",
        #    "objects": ["table", "chairs", "counter", "plants"...]}

        # 2. Générer chaque objet
        assets = {}
        for obj in scene_plan["objects"]:
            obj_description = f"{scene_plan['style']} {obj}"
            assets[obj] = self.text_to_3d.generate(obj_description)

        # 3. Placer les objets intelligemment
        layout = self.scene_ai.compute_layout(
            room_bounds=scene_plan["bounds"],
            objects=assets,
            constraints=scene_plan.get("constraints", [])
        )

        # 4. Ajouter la physique
        scene = VRScene()
        for obj_name, position in layout.items():
            scene.add_object(
                assets[obj_name],
                position=position,
                physics=self.physics.create_collider(assets[obj_name])
            )

        # 5. Éclairage automatique
        scene.auto_lighting(style=scene_plan["style"])

        return scene

# Utilisation
builder = VRWorldBuilder(...)
coffee_shop = builder.create_from_description(
    "A cozy Parisian coffee shop with exposed brick walls, "
    "small round tables, vintage espresso machine"
)
coffee_shop.export_unity("coffee_shop.prefab")

Limites et futur

┌─────────────────────────────────────────────────────────────────┐
│              LIMITES ACTUELLES                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ⚠ QUALITÉ VS TEMPS                                            │
│    • Haute qualité = minutes à heures                          │
│    • Temps réel = qualité réduite                              │
│    • Pas encore production-ready pour AAA                      │
│                                                                  │
│  ⚠ COHÉRENCE                                                   │
│    • Génération d'un même objet sous plusieurs angles          │
│    • Style cohérent entre objets d'une scène                   │
│    • Animations difficiles à générer                           │
│                                                                  │
│  ⚠ CONTRÔLE                                                    │
│    • Difficile de faire des modifications précises             │
│    • "Prompt engineering" 3D est un art                        │
│    • Résultats parfois imprévisibles                           │
│                                                                  │
│  TENDANCES FUTURES:                                             │
│  → Modèles feed-forward de qualité DreamFusion                 │
│  → Édition fine via prompts (comme InstructPix2Pix)           │
│  → Génération de mondes cohérents en une passe                 │
│  → Animation intégrée (4D generation)                          │
│  → Intégration native dans moteurs de jeu                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pour aller plus loin

DreamFusion Paper - Poole et al.
3D Gaussian Splatting
Point-E - OpenAI
Shap-E - OpenAI
DreamGaussian
Genie Paper - DeepMind
Meshy - API commerciale
Luma AI - Genie et capture 3D

Text-to-3D et Génération de Mondes

On this page