Robotique et IA Incarnée

L'IA incarnée (Embodied AI) est l'intersection entre l'intelligence artificielle et la robotique, où les algorithmes doivent interagir avec le monde physique réel.

Vue d'ensemble

IA INCARNÉE - STACK TECHNOLOGIQUE :

┌─────────────────────────────────────────────────────────────┐
│                      INTELLIGENCE                            │
├─────────────────────────────────────────────────────────────┤
│  Planification    │  Apprentissage    │  Raisonnement       │
│  de tâches        │  par renforcement │  symbolique         │
├─────────────────────────────────────────────────────────────┤
│                      PERCEPTION                              │
├─────────────────────────────────────────────────────────────┤
│  Vision 3D        │  LIDAR/Radar      │  Capteurs tactiles  │
│  Reconnaissance   │  SLAM             │  Proprioception     │
├─────────────────────────────────────────────────────────────┤
│                      CONTRÔLE                                │
├─────────────────────────────────────────────────────────────┤
│  Cinématique      │  Dynamique        │  Contrôle moteur    │
│  inverse          │  du mouvement     │  temps réel         │
├─────────────────────────────────────────────────────────────┤
│                      MATÉRIEL                                │
├─────────────────────────────────────────────────────────────┤
│  Actionneurs      │  Capteurs         │  Processeurs        │
│  (moteurs, etc.)  │  (caméras, etc.)  │  embarqués          │
└─────────────────────────────────────────────────────────────┘

Types de robots IA

Classification par forme

Type	Exemples	Applications
Bras robotiques	Franka, UR5, Kuka	Industrie, chirurgie
Humanoïdes	Atlas, Optimus, Figure	Assistance, industrie
Quadrupèdes	Spot, ANYmal	Inspection, sécurité
Drones	DJI, Skydio	Livraison, cartographie
Véhicules autonomes	Waymo, Tesla	Transport
Robots mobiles	Roomba, AMR	Logistique, nettoyage

Niveaux d'autonomie

SPECTRE D'AUTONOMIE :

Téléopération    Semi-autonome     Autonome supervisé    Pleine autonomie
     │                 │                   │                    │
     ▼                 ▼                   ▼                    ▼
┌─────────┐      ┌─────────┐         ┌─────────┐         ┌─────────┐
│ Humain  │      │ Humain  │         │  Robot  │         │  Robot  │
│ contrôle│      │ supervise│        │ décide  │         │ décide  │
│ tout    │      │ + corrige│        │ humain  │         │  seul   │
│         │      │         │         │ valide  │         │         │
└─────────┘      └─────────┘         └─────────┘         └─────────┘

Exemples:
• Chirurgie    • Voitures      • Drones         • Aspirateurs
  robotique      assistées       industriels      robots

Perception robotique

Vision 3D avec caméras de profondeur

import pyrealsense2 as rs
import numpy as np
import cv2

class DepthCamera:
    def __init__(self):
        self.pipeline = rs.pipeline()
        config = rs.config()
        config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
        config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
        self.pipeline.start(config)

        # Alignement depth sur color
        self.align = rs.align(rs.stream.color)

    def get_frames(self) -> tuple:
        frames = self.pipeline.wait_for_frames()
        aligned = self.align.process(frames)

        depth_frame = aligned.get_depth_frame()
        color_frame = aligned.get_color_frame()

        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        return color_image, depth_image

    def get_point_cloud(self) -> np.ndarray:
        """Retourne un nuage de points 3D."""
        frames = self.pipeline.wait_for_frames()
        depth = frames.get_depth_frame()

        pc = rs.pointcloud()
        points = pc.calculate(depth)

        vertices = np.asanyarray(points.get_vertices())
        return np.array([[v[0], v[1], v[2]] for v in vertices])

    def get_distance(self, x: int, y: int) -> float:
        """Distance en mètres à un pixel donné."""
        frames = self.pipeline.wait_for_frames()
        depth = frames.get_depth_frame()
        return depth.get_distance(x, y)

SLAM (Simultaneous Localization and Mapping)

# Exemple conceptuel avec ORB-SLAM
import numpy as np

class SimpleSLAM:
    """SLAM simplifié basé sur des features visuelles."""

    def __init__(self):
        self.map_points = []  # Points 3D de la carte
        self.keyframes = []   # Images clés
        self.pose = np.eye(4)  # Pose caméra (4x4 transformation)

    def process_frame(self, image: np.ndarray, depth: np.ndarray):
        # 1. Extraire les features (ORB, SIFT, etc.)
        features = self.extract_features(image)

        # 2. Matcher avec la frame précédente
        matches = self.match_features(features, self.prev_features)

        # 3. Estimer le mouvement (PnP ou odométrie visuelle)
        motion = self.estimate_motion(matches, depth)
        self.pose = self.pose @ motion

        # 4. Ajouter des points à la carte
        new_points = self.triangulate_points(matches, depth)
        self.map_points.extend(new_points)

        # 5. Optimiser (bundle adjustment)
        if self.should_add_keyframe():
            self.keyframes.append((image, self.pose.copy()))
            self.optimize_map()

        return self.pose

    def extract_features(self, image):
        orb = cv2.ORB_create(nfeatures=1000)
        keypoints, descriptors = orb.detectAndCompute(image, None)
        return keypoints, descriptors

    def get_map(self) -> np.ndarray:
        """Retourne la carte 3D."""
        return np.array(self.map_points)

    def get_trajectory(self) -> list:
        """Retourne la trajectoire du robot."""
        return [kf[1][:3, 3] for kf in self.keyframes]

Détection et segmentation d'objets

from ultralytics import YOLO
import numpy as np

class ObjectDetector:
    def __init__(self, model_path: str = "yolov8n.pt"):
        self.model = YOLO(model_path)

    def detect(self, image: np.ndarray) -> list[dict]:
        """Détecte les objets dans une image."""
        results = self.model(image, verbose=False)[0]

        detections = []
        for box in results.boxes:
            detections.append({
                "class": results.names[int(box.cls)],
                "confidence": float(box.conf),
                "bbox": box.xyxy[0].tolist(),  # [x1, y1, x2, y2]
                "center": [
                    (box.xyxy[0][0] + box.xyxy[0][2]) / 2,
                    (box.xyxy[0][1] + box.xyxy[0][3]) / 2
                ]
            })

        return detections

    def detect_with_depth(
        self,
        color: np.ndarray,
        depth: np.ndarray
    ) -> list[dict]:
        """Détection avec estimation de distance."""
        detections = self.detect(color)

        for det in detections:
            cx, cy = int(det["center"][0]), int(det["center"][1])
            # Distance moyenne dans la région centrale
            region = depth[cy-5:cy+5, cx-5:cx+5]
            det["distance"] = float(np.median(region[region > 0])) / 1000  # en mètres

        return detections

Contrôle robotique

Cinématique inverse

import numpy as np
from scipy.spatial.transform import Rotation

class RobotArm:
    """Bras robotique 6 DOF simplifié."""

    def __init__(self, link_lengths: list[float]):
        self.links = link_lengths
        self.joint_limits = [(-np.pi, np.pi)] * len(link_lengths)

    def forward_kinematics(self, joint_angles: np.ndarray) -> np.ndarray:
        """Calcule la position de l'effecteur à partir des angles."""
        T = np.eye(4)

        for i, (angle, length) in enumerate(zip(joint_angles, self.links)):
            # Matrice de transformation DH
            T_i = self._dh_matrix(angle, length)
            T = T @ T_i

        return T  # Pose 4x4 de l'effecteur

    def inverse_kinematics(
        self,
        target_pose: np.ndarray,
        initial_guess: np.ndarray = None
    ) -> np.ndarray:
        """Trouve les angles pour atteindre une pose cible."""
        from scipy.optimize import minimize

        if initial_guess is None:
            initial_guess = np.zeros(len(self.links))

        def cost(angles):
            current = self.forward_kinematics(angles)
            position_error = np.linalg.norm(current[:3, 3] - target_pose[:3, 3])
            rotation_error = np.linalg.norm(current[:3, :3] - target_pose[:3, :3])
            return position_error + 0.1 * rotation_error

        result = minimize(
            cost,
            initial_guess,
            method='SLSQP',
            bounds=self.joint_limits
        )

        return result.x

    def _dh_matrix(self, theta: float, d: float, a: float = 0, alpha: float = 0):
        """Matrice de transformation Denavit-Hartenberg."""
        ct, st = np.cos(theta), np.sin(theta)
        ca, sa = np.cos(alpha), np.sin(alpha)

        return np.array([
            [ct, -st*ca,  st*sa, a*ct],
            [st,  ct*ca, -ct*sa, a*st],
            [0,   sa,     ca,    d],
            [0,   0,      0,     1]
        ])

Planification de trajectoire

import numpy as np
from scipy.interpolate import CubicSpline

class TrajectoryPlanner:
    def __init__(self, robot: RobotArm):
        self.robot = robot

    def plan_joint_trajectory(
        self,
        start: np.ndarray,
        goal: np.ndarray,
        duration: float,
        dt: float = 0.01
    ) -> list[np.ndarray]:
        """Planifie une trajectoire lisse en espace articulaire."""
        t = np.linspace(0, 1, int(duration / dt))

        # Profil de vitesse trapézoïdal
        s = self._trapezoidal_profile(t)

        trajectory = []
        for si in s:
            q = start + si * (goal - start)
            trajectory.append(q)

        return trajectory

    def plan_cartesian_trajectory(
        self,
        waypoints: list[np.ndarray],
        duration: float
    ) -> list[np.ndarray]:
        """Planifie une trajectoire passant par des waypoints cartésiens."""
        # Interpolation spline des positions
        positions = np.array([wp[:3, 3] for wp in waypoints])
        t_waypoints = np.linspace(0, 1, len(waypoints))

        spline_x = CubicSpline(t_waypoints, positions[:, 0])
        spline_y = CubicSpline(t_waypoints, positions[:, 1])
        spline_z = CubicSpline(t_waypoints, positions[:, 2])

        # Échantillonner la trajectoire
        t = np.linspace(0, 1, int(duration * 100))
        trajectory = []

        for ti in t:
            target = np.eye(4)
            target[:3, 3] = [spline_x(ti), spline_y(ti), spline_z(ti)]

            # Cinématique inverse pour chaque point
            joints = self.robot.inverse_kinematics(target)
            trajectory.append(joints)

        return trajectory

    def _trapezoidal_profile(self, t: np.ndarray) -> np.ndarray:
        """Profil de vitesse trapézoïdal (accélération-vitesse-décélération)."""
        accel_time = 0.25  # 25% accélération, 50% vitesse constante, 25% décélération

        s = np.zeros_like(t)
        for i, ti in enumerate(t):
            if ti < accel_time:
                s[i] = 2 * (ti / accel_time) ** 2 * accel_time
            elif ti < 1 - accel_time:
                s[i] = accel_time + (ti - accel_time)
            else:
                remaining = 1 - ti
                s[i] = 1 - 2 * (remaining / accel_time) ** 2 * accel_time

        return s / s[-1]  # Normaliser entre 0 et 1

Apprentissage robotique

Apprentissage par imitation

import torch
import torch.nn as nn
import numpy as np

class BehaviorCloning(nn.Module):
    """Apprentissage par imitation de démonstrations humaines."""

    def __init__(self, obs_dim: int, action_dim: int):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(obs_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, action_dim)
        )

    def forward(self, obs: torch.Tensor) -> torch.Tensor:
        return self.network(obs)

    def get_action(self, obs: np.ndarray) -> np.ndarray:
        with torch.no_grad():
            obs_tensor = torch.FloatTensor(obs).unsqueeze(0)
            action = self.forward(obs_tensor)
            return action.numpy()[0]

def train_behavior_cloning(
    demonstrations: list[tuple],  # [(obs, action), ...]
    epochs: int = 100
):
    """Entraîne un modèle de behavior cloning."""
    obs_dim = len(demonstrations[0][0])
    action_dim = len(demonstrations[0][1])

    model = BehaviorCloning(obs_dim, action_dim)
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    loss_fn = nn.MSELoss()

    # Préparer les données
    obs = torch.FloatTensor([d[0] for d in demonstrations])
    actions = torch.FloatTensor([d[1] for d in demonstrations])

    for epoch in range(epochs):
        predicted = model(obs)
        loss = loss_fn(predicted, actions)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

    return model

Reinforcement Learning robotique

import gymnasium as gym
import numpy as np
from stable_baselines3 import SAC

class RobotEnv(gym.Env):
    """Environnement Gym pour un robot."""

    def __init__(self, robot):
        super().__init__()
        self.robot = robot

        # Espaces d'observation et d'action
        self.observation_space = gym.spaces.Box(
            low=-np.inf, high=np.inf, shape=(10,), dtype=np.float32
        )
        self.action_space = gym.spaces.Box(
            low=-1, high=1, shape=(6,), dtype=np.float32
        )

        self.target = None
        self.max_steps = 100
        self.current_step = 0

    def reset(self, seed=None):
        super().reset(seed=seed)
        self.robot.reset()
        self.target = self._random_target()
        self.current_step = 0
        return self._get_obs(), {}

    def step(self, action):
        # Appliquer l'action
        self.robot.apply_action(action)
        self.current_step += 1

        # Calculer la récompense
        obs = self._get_obs()
        distance = np.linalg.norm(obs[:3] - self.target)
        reward = -distance

        # Bonus si proche de la cible
        if distance < 0.05:
            reward += 10
            terminated = True
        else:
            terminated = False

        truncated = self.current_step >= self.max_steps

        return obs, reward, terminated, truncated, {}

    def _get_obs(self):
        joint_positions = self.robot.get_joint_positions()
        end_effector = self.robot.get_end_effector_pose()[:3, 3]
        return np.concatenate([joint_positions, end_effector, self.target])

    def _random_target(self):
        return np.random.uniform([-0.5, -0.5, 0.1], [0.5, 0.5, 0.5])

# Entraînement avec SAC
env = RobotEnv(robot)
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100000)

Robots et LLM

Contrôle par langage naturel

from anthropic import Anthropic
import json

class LanguageRobotController:
    """Contrôle un robot via des commandes en langage naturel."""

    def __init__(self, robot):
        self.robot = robot
        self.client = Anthropic()
        self.known_objects = {}  # Objets détectés et leurs positions

    def execute_command(self, command: str) -> str:
        """Exécute une commande en langage naturel."""
        # Obtenir le plan d'action du LLM
        plan = self._get_action_plan(command)

        # Exécuter chaque étape
        results = []
        for step in plan["steps"]:
            result = self._execute_step(step)
            results.append(result)

        return "\n".join(results)

    def _get_action_plan(self, command: str) -> dict:
        """Utilise un LLM pour décomposer la commande en actions."""
        system = """Tu es un planificateur de robot. Décompose les commandes en actions primitives.

Actions disponibles :
- move_to(object_name): Se déplacer vers un objet
- pick(object_name): Saisir un objet
- place(location): Poser l'objet tenu
- look_around(): Scanner l'environnement
- say(message): Parler

Réponds en JSON avec le format :
{"steps": [{"action": "...", "params": {...}}, ...]}"""

        context = f"Objets connus : {list(self.known_objects.keys())}"

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=500,
            system=system,
            messages=[{
                "role": "user",
                "content": f"{context}\n\nCommande : {command}"
            }]
        )

        return json.loads(response.content[0].text)

    def _execute_step(self, step: dict) -> str:
        """Exécute une action primitive."""
        action = step["action"]
        params = step.get("params", {})

        if action == "move_to":
            target = self.known_objects.get(params["object_name"])
            if target:
                self.robot.move_to(target)
                return f"Déplacé vers {params['object_name']}"
            return f"Objet {params['object_name']} non trouvé"

        elif action == "pick":
            success = self.robot.pick()
            return "Objet saisi" if success else "Échec de saisie"

        elif action == "place":
            self.robot.place()
            return f"Objet posé à {params.get('location', 'position actuelle')}"

        elif action == "look_around":
            detections = self.robot.scan_environment()
            self.known_objects.update(detections)
            return f"Objets détectés : {list(detections.keys())}"

        elif action == "say":
            return f"Robot dit : {params.get('message', '')}"

        return f"Action inconnue : {action}"

# Exemple d'utilisation
controller = LanguageRobotController(robot)
controller.execute_command("Trouve la tasse rouge et pose-la sur la table")

Simulation robotique

PyBullet

import pybullet as p
import pybullet_data

class RobotSimulator:
    def __init__(self, robot_urdf: str):
        # Connecter au moteur physique
        self.physics_client = p.connect(p.GUI)
        p.setAdditionalSearchPath(pybullet_data.getDataPath())

        # Configurer la simulation
        p.setGravity(0, 0, -10)
        p.setTimeStep(1/240)

        # Charger le sol
        self.plane = p.loadURDF("plane.urdf")

        # Charger le robot
        self.robot = p.loadURDF(robot_urdf, [0, 0, 0.5])
        self.num_joints = p.getNumJoints(self.robot)

    def set_joint_positions(self, positions: list[float]):
        """Définit les positions des joints."""
        for i, pos in enumerate(positions):
            p.setJointMotorControl2(
                self.robot,
                i,
                p.POSITION_CONTROL,
                targetPosition=pos
            )

    def step(self):
        """Avance la simulation d'un pas."""
        p.stepSimulation()

    def get_end_effector_pose(self) -> tuple:
        """Retourne la pose de l'effecteur."""
        state = p.getLinkState(self.robot, self.num_joints - 1)
        position = state[0]
        orientation = state[1]
        return position, orientation

    def add_object(self, urdf: str, position: list[float]) -> int:
        """Ajoute un objet à la scène."""
        return p.loadURDF(urdf, position)

    def run_simulation(self, duration: float):
        """Exécute la simulation pendant une durée donnée."""
        steps = int(duration * 240)
        for _ in range(steps):
            self.step()

Isaac Sim (NVIDIA)

# Exemple conceptuel pour NVIDIA Isaac Sim
# (Nécessite l'installation d'Isaac Sim)

class IsaacSimRobot:
    """Interface pour robots dans Isaac Sim."""

    def __init__(self, usd_path: str):
        from omni.isaac.core import World
        from omni.isaac.core.robots import Robot

        self.world = World(stage_units_in_meters=1.0)
        self.robot = self.world.scene.add(
            Robot(prim_path="/World/Robot", usd_path=usd_path)
        )
        self.world.reset()

    def step(self):
        self.world.step(render=True)

    def apply_action(self, joint_positions: list[float]):
        self.robot.set_joint_positions(joint_positions)

    def get_camera_image(self, camera_name: str):
        from omni.isaac.sensor import Camera
        camera = Camera(prim_path=f"/World/{camera_name}")
        return camera.get_rgba()

Frameworks robotiques

ROS 2 (Robot Operating System)

# Exemple de nœud ROS 2
import rclpy
from rclpy.node import Node
from geometry_msgs.msg import Twist
from sensor_msgs.msg import Image
from cv_bridge import CvBridge

class RobotController(Node):
    def __init__(self):
        super().__init__('robot_controller')

        # Publishers
        self.cmd_vel_pub = self.create_publisher(Twist, '/cmd_vel', 10)

        # Subscribers
        self.image_sub = self.create_subscription(
            Image, '/camera/image_raw', self.image_callback, 10
        )

        # Timer pour le contrôle
        self.timer = self.create_timer(0.1, self.control_loop)

        self.bridge = CvBridge()
        self.target_detected = False

    def image_callback(self, msg):
        """Callback pour les images de la caméra."""
        cv_image = self.bridge.imgmsg_to_cv2(msg, "bgr8")
        # Traiter l'image (détection, etc.)
        self.target_detected = self.detect_target(cv_image)

    def control_loop(self):
        """Boucle de contrôle principale."""
        twist = Twist()

        if self.target_detected:
            twist.linear.x = 0.2  # Avancer
        else:
            twist.angular.z = 0.5  # Tourner pour chercher

        self.cmd_vel_pub.publish(twist)

    def detect_target(self, image):
        # Implémentation de la détection
        pass

def main():
    rclpy.init()
    controller = RobotController()
    rclpy.spin(controller)
    controller.destroy_node()
    rclpy.shutdown()

Résumé

IA INCARNÉE / ROBOTIQUE :

PERCEPTION
├── Vision 3D (caméras RGBD, stéréo)
├── SLAM (localisation + cartographie)
├── Détection/segmentation d'objets
└── Fusion multi-capteurs

CONTRÔLE
├── Cinématique directe/inverse
├── Planification de trajectoire
├── Contrôle en temps réel
└── Évitement d'obstacles

APPRENTISSAGE
├── Imitation (behavior cloning)
├── Reinforcement Learning
├── Sim-to-Real transfer
└── Learning from demonstrations

LLM + ROBOTS
├── Commandes en langage naturel
├── Planification de tâches
├── Dialogue et explication
└── Raisonnement contextuel

OUTILS
├── ROS 2 : Framework standard
├── PyBullet : Simulation physique
├── Isaac Sim : Simulation NVIDIA
└── MuJoCo : Physique précise

L'IA incarnée est un domaine en pleine expansion avec l'arrivée des robots humanoïdes (Tesla Optimus, Figure, 1X) et l'intégration des LLM pour des interactions plus naturelles.

Robotique et IA Incarnée

On this page