Documentation Index
Fetch the complete documentation index at: https://docs.innate.bot/llms.txt
Use this file to discover all available pages before exploring further.
Policy-defined skills are neural network policies that run end-to-end—taking sensor input and directly outputting robot actions. Unlike code-defined skills where you write explicit logic, policy-defined skills learn behaviors from demonstration.
This approach works well for manipulation tasks where the variability of real-world scenarios makes explicit programming impractical.
Two Types of Policies
| Type | Definition | Use Case |
|---|
| Learned | Neural network trained on demonstrations | Tasks requiring visual adaptation |
| Replay | Recorded action sequence | Repeatable, fixed motions |
Learned Skills (ACT Policy)
Learned skills use Action Chunking with Transformers (ACT)—a neural network architecture that takes camera images and joint positions as input, and outputs action sequences.
{
"name": "pick_socks",
"type": "learned",
"guidelines": "Use when you need to pick up socks from the floor",
"execution": {
"model_type": "act_policy",
"checkpoint": "act_policy_step_135000.pth",
"stats_file": "dataset_stats.pt",
"action_dim": 10,
"duration": 45.0,
"start_pose": [-0.015, -0.399, 1.456, -1.135, -0.023, 0.833]
}
}
Execution Flow
-
Load Policy: Neural network checkpoint loaded into GPU memory
-
Move to Start Pose: Arm moves to consistent initial configuration
-
Inference Loop (25Hz): Every 40ms:
-
Capture images from both cameras
-
Read current joint positions
-
Run policy forward pass
-
Output: 6 joint commands + 2 base velocity commands
-
Progress Monitoring: Early termination when progress > 95%
-
End Pose: Optionally return to safe configuration
Key Characteristics
-
Reactive: Continuously adjusts based on visual feedback
-
Adaptive: Handles variation in object position, lighting, orientation
-
Coordinated: Can move arm and base simultaneously
Replay Skills
Replay skills play back pre-recorded action sequences. Simpler than learned skills, but deterministic and reliable.
{
"name": "wave",
"type": "replay",
"guidelines": "Use when greeting someone or saying hello",
"execution": {
"model_type": "replay",
"replay_file": "episode_0.h5",
"replay_frequency": 50.0,
"start_pose": [1.577, -0.6, 1.477, -0.738, 0.0, 0.0],
"end_pose": [1.577, -0.6, 1.477, -0.738, 0.0, 0.0]
}
}
Execution Flow
-
Load Recording: H5 file containing timestamped actions
-
Move to Start Pose: Arm moves to recorded initial position
-
Playback (50Hz): Execute recorded actions in sequence
-
End Pose: Return to specified configuration
Execution Pipeline
Both skill types are executed by the BehaviorServer:
BASIC calls skill
|
v
PrimitiveExecutionActionServer
|
v (physical skill detected)
BehaviorServer.ExecuteBehavior
|
+-- Learned --> Load policy, run inference loop
|
+-- Replay ---> Load H5 file, playback loop
|
v
Robot hardware (/mars/arm/commands, /cmd_vel)
Creating New Skills
Learned Skill Workflow
-
Collect Demonstrations: Teleoperate robot through task 10-100 times
-
Train Policy: Run ACT training pipeline to produce checkpoint
-
Create Metadata: Add
metadata.json with execution parameters
-
Deploy: Place in
skills/<name>/ directory
Replay Skill Workflow
-
Record Episode: Teleoperate through motion once, save to H5
-
Create Metadata: Add
metadata.json with replay parameters
-
Deploy: Place in
skills/<name>/ directory
Demonstration Quality
The quality of learned skills depends directly on demonstration quality:
| Good Demonstrations | Poor Demonstrations |
|---|
| Consistent starting positions | Variable starting positions |
| Smooth, deliberate motions | Hesitant, jerky motions |
| Varied object positions | Always same position |
| Include error recovery | Only perfect executions |
Skill Selection Guide
| Scenario | Recommended Type |
|---|
| Task requires visual adaptation | Learned |
| Object position varies | Learned |
| Motion must be identical every time | Replay |
| Simple gesture (wave, point) | Replay |
| Faster development cycle | Replay |