Training overview - Innate Docs

Innate lets you train end-to-end manipulation policies directly from your phone or browser. You demonstrate a task with the leader arm, upload the data, and deploy the result as a skill your robot can execute autonomously. Training runs on Innate’s infrastructure — our cloud GPUs do the heavy lifting, so you don’t need a GPU or any ML setup of your own. The underlying architecture is ACT (Action Chunking with Transformers) — a neural network that observes camera images and joint positions, then outputs coordinated arm and base actions. In practice this means the robot learns the manipulation by watching you do it, instead of you hand-coding every motion.

The pipeline at a glance

Record episodes  →  Upload dataset  →  Train on cloud  →  Download model  →  Run as a skill
  (app / web)         (~10s/min)        (1-3 hours)         (automatic)        (app or code)

You can drive the whole pipeline from the phone app or the web app — each stage maps to a tab (app) or page (web):

Stage	Phone app	Web app
Record demonstrations	Record tab	Collect page
Configure & launch training	Train tab	Training page
Monitor runs	Runs tab	Running now card
Download & activate	Completed tab	automatic

Once a trained model is activated, the skill is available to agents and code, and (in the phone app) appears in Manual Control.

What goes in, what comes out

Input: A dataset of teleoperated demonstrations — each episode captures synchronized camera images (main + wrist), joint positions, joint velocities, and optionally wheel odometry at 30 Hz. We recommend at least 30 episodes of good quality — consistent start poses, smooth motions — and more diverse data almost always improves robustness (see the recording tips). (Curious what’s inside an episode file? See Dataset format.) Output: A PyTorch checkpoint that runs inference at 25 Hz, outputting 6 arm joint commands and 2 base velocity commands every 40 ms.

When to use trained skills

Trained policies shine when the task needs visuomotor coordination — reaching, grasping, placing — especially when object positions vary between runs. For fixed, repeatable motions (a wave, a gesture), a replay skill is simpler: record once, play back. The full comparison lives in the skill selection guide.

Next steps

Record a dataset

Collect high-quality demonstrations.

Train a policy

Configure and launch training on Innate’s cloud.

Deploy your skill

Download, activate, and run your trained model.

Dataset format

Understand what’s inside each episode file.

Foxglove Setup

Record a dataset

⌘I

​The pipeline at a glance

​What goes in, what comes out

​When to use trained skills

​Next steps

Record a dataset

Train a policy

Deploy your skill

Dataset format

The pipeline at a glance

What goes in, what comes out

When to use trained skills

Next steps