Skip to main content
The Training Manager is an experimental tool — built in an evening by the team to scratch an itch. It works, it’s useful, and it ships with the OS. But it’s rough around the edges. Contributions welcome.
The Training Manager is a local web server that runs on your robot and gives you a browser-based dashboard for the entire training pipeline. It’s the power-user complement to the app’s training UI — useful when you need to merge datasets, remove bad episodes, or point a training run at a custom ACT fork.

Launch it

The Training Manager is bundled with the training-client CLI. Run it from inside the robot (via SSH) or inside the Docker container:
python -m training_client.cli ui
This starts a local web server and prints two URLs:
  Training Manager
    Local:   http://localhost:8080
    Network: http://192.168.50.22:8080
Open the Network URL from any device on the same WiFi — your laptop, phone, or tablet.

CLI options

FlagDefaultDescription
--port8080HTTP port
--skills-dir~/skillsRoot skills directory
-s, --serverenv TRAINING_SERVER_URLOrchestrator URL
-t, --tokenenv INNATE_SERVICE_KEYService key
--issuerenv INNATE_AUTH_ISSUER_URLAuth issuer URL

The three tabs

The UI is organized into three tabs: Skills, Datasets, and Training.

Skills tab

Browse every skill directory on the robot. Each card shows the skill name, type, episode count, and whether a trained checkpoint exists. Click a skill to open its detail view, where you can:
  • Edit metadata — change the skill name, guidelines (the text BASIC reads to decide when to use this skill), and execution parameters
  • View the full metadata.json — useful for debugging or verifying that a checkpoint was activated correctly

Datasets tab

This is where the Training Manager really earns its keep. For each skill, you can:
  • Browse episodes — see every episode in the dataset with timestamps and metadata
  • Play back video — watch the recorded camera feeds for any episode directly in the browser (both main and wrist cameras)
  • Delete episodes — select bad episodes and create a cleaned copy of the dataset without them. The original is preserved; a new skill directory is created with the episodes re-indexed.
  • Merge datasets — combine episodes from multiple skills into a single new dataset. Select which episodes to include from each source. This is useful when you’ve recorded demonstrations across multiple sessions or want to mix data from different setups.
  • Upload to cloud — submit a skill and upload its data to Innate’s training servers, with a progress bar showing compression and upload stages
Merge workflow example: You recorded 30 episodes of “pick up cup” last week and 25 more today with a slightly different cup. Instead of retraining separately, merge both into a 55-episode “pick up cup v2” dataset and train once on the combined data.

Training tab

View all training runs across all skills, create new runs, and monitor progress. When creating a new run, you get full control over: Hyperparameters — all the same parameters from the app, plus more:
ParameterDefaultDescription
LEARNING_RATE5e-5Transformer learning rate
LEARNING_RATE_BACKBONE5e-5Vision backbone (ResNet18) learning rate
BATCH_SIZE96Training batch size
MAX_STEPS120,000Total training iterations
CHUNK_SIZE30Action chunk length
NUM_WORKERS4Data loader workers
WORLD_SIZE4Number of GPUs
Repository and branch — point the training server at a custom ACT repository and branch. This is the key feature for researchers: fork the ACT training code, modify the architecture or loss function, and run training against your fork without any server-side changes.
FieldDescription
RepositoryGitHub owner/repo path (e.g. your-org/act-custom)
RefBranch name, tag, or commit SHA to check out
Infrastructure — configure GPU type, GPU count, time budget, and cost budget. Architecture parameters are shown as read-only for reference (vision backbone, model dimensions, encoder/decoder layers, VAE settings). Each run card shows its current status with live updates via server-sent events (SSE), so you can watch a run progress through the lifecycle without refreshing.

Log terminal

A collapsible terminal panel at the bottom of every page streams real-time backend logs. This shows every API call, upload progress message, and error — handy for debugging when something doesn’t work as expected.

Architecture

The Training Manager is a FastAPI backend serving a React + Tailwind SPA. The backend delegates all cloud operations to the same training_client library that the ROS training node uses, so there’s no duplicate logic.
Browser ──→ FastAPI server (port 8080)
               ├── /api/skills     → reads/writes ~/skills/*/metadata.json
               ├── /api/datasets   → episode browsing, video streaming, merge, delete
               ├── /api/training   → list runs, create runs, watch status (via SSE)
               ├── /api/logs       → real-time log stream (SSE)
               └── /*              → serves the React SPA


               training_client.SkillManager


               Innate Training Orchestrator (training-v1.innate.bot)