Building a Local Robot Brain on Jetson Orin Nano Super with ROS 2, Whisper, Llama, Piper TTS and an LLM Bridge

This article documents the missing “brain” layer of my local robot assistant runtime.

In previous posts, I had already built most of the low-level runtime needed for a local voice robot on the NVIDIA Jetson Orin Nano:

an Isaac ROS / ROS 2 development container,
a working camera and robot runtime,
a local wake-word and Whisper speech-to-text pipeline,
a low-latency Piper TTS node,
PulseAudio output through a Jabra USB speaker,
and ROS 2 topics to connect the system.

This article adds the missing piece:

a local LLM between Whisper and Piper.

The goal is to create a complete local voice loop to interact with the Jarvis like robot brain in a touchless way:

 -> Wake word  
 -> microphone  
 -> VAD  
 -> Whisper STT  
 -> /asr/text  
 -> llm_bridge_streaming  
 -> local LLM through Ollama  
 -> /tts/say  
 -> Piper TTS  
 -> Jabra speaker

This builds directly on these previous posts:

Install a Local AI Runtime container on Jetson Orin Nano with Isaac ROS
Running Piper TTS on NVIDIA Jetson Orin Nano with Low Latency
ROS 2 Architecture Patterns That Scale
Real-Time Linux for Robotics

The end result is a fully local voice-to-voice robot brain running on the Jetson Orin Nano Super.

No cloud API is required for the interaction loop.

Final Architecture

The system is split between the Jetson host and the Isaac ROS container.

The host runs:

the wake-word Python script,
Whisper HTTP server,
Ollama,
the local LLM.

The container runs:

ROS 2,
the LLM bridge node,
Piper TTS,
the robot runtime.

The final architecture looks like this:

┌──────────────────────────────────────────────────────────────┐
│ Jetson Orin Nano Super host                                  │
│                                                              │
│  ┌────────────────────┐       ┌───────────────────────────┐  │
│  │ Wake word + VAD    │       │ whisper-server             │  │
│  │ whisper_wake...py  │──────▶│ 127.0.0.1:8080/inference   │  │
│  └─────────┬──────────┘       └───────────────────────────┘  │
│            │                                                 │
│            │ publishes ROS 2 std_msgs/String                 │
│            ▼                                                 │
│        /asr/text                                             │
│                                                              │
│  ┌────────────────────┐                                      │
│  │ Ollama             │                                      │
│  │ llama3.2:3b        │                                      │
│  │ 127.0.0.1:11434    │                                      │
│  └─────────▲──────────┘                                      │
│            │                                                 │
└────────────┼─────────────────────────────────────────────────┘
             │ host networking
             ▼
┌──────────────────────────────────────────────────────────────┐
│ Isaac ROS / ROS 2 container                                  │
│                                                              │
│  ┌────────────────────────────┐                              │
│  │ llm_bridge_streaming       │                              │
│  │ subscribes: /asr/text      │                              │
│  │ calls: Ollama HTTP API     │                              │
│  │ publishes: /tts/say        │                              │
│  └──────────────┬─────────────┘                              │
│                 │                                            │
│                 ▼                                            │
│  ┌────────────────────────────┐                              │
│  │ piper_tts                  │                              │
│  │ subscribes: /tts/say       │                              │
│  │ outputs audio via pacat    │                              │
│  └──────────────┬─────────────┘                              │
│                 │                                            │
│                 ▼                                            │
│          PulseAudio socket -> Jabra speaker                  │
└──────────────────────────────────────────────────────────────┘

The important ROS 2 topics are:

/asr/text  
/tts/say

/asr/text carries transcribed user speech.

/tts/say carries the final text that Piper should speak.

Why I Used ROS 2 Topics

The main design goal was to avoid tightly coupling the system.

Whisper should not know which LLM is used.

Piper should not know whether the text came from a human, a test command, a cloud model, or a local model.

The LLM bridge should only translate one event into another:

/asr/text -> local LLM -> /tts/say

This makes each layer testable.

For example, I can test Piper without Whisper or the LLM:

ros2 topic pub –once /tts/say std_msgs/msg/String “{data: ‘Hello from direct Piper test.’}”

I can also test the LLM and Piper without speaking:

ros2 topic pub –once /asr/text std_msgs/msg/String “{data: ‘Say hello and introduce yourself briefly.’}”

That separation was essential during debugging.

Existing Components Before Adding the LLM

Before building the LLM bridge, the system already had:

a working Isaac ROS container,
a working ROS 2 workspace,
a working Jabra USB speaker,
a working Piper TTS ROS 2 node,
a working wake-word pipeline,
a working Whisper HTTP server.

The previous Piper TTS node subscribed to:

/tts/say

and could be tested with:

ros2 topic pub --once /tts/say std_msgs/msg/String "{data: 'Hello from ROS two.'}"

The missing piece was a node that could subscribe to:

/asr/text

call a local LLM, and then publish the answer to:

/tts/say

Starting the Whisper HTTP Server

Whisper runs on the Jetson host through whisper.cpp.

I start the server with:

~/whisper.cpp/build/bin/whisper-server \  
  -m ~/whisper.cpp/models/ggml-small.bin \  
  -t  4 \  
  --host  127.0.0.1 \  
  --port  8080

This exposes the local endpoint:

http://127.0.0.1:8080/inference

My wake-word script sends each detected utterance to that endpoint.

The script was already optimized and working well, so I did not want to rewrite it.

Instead, I created a ROS-enabled copy.

Creating a ROS-Enabled Copy of the Wake-Word Script

The original script was:

~/whisper_wake_up_led.py

I created a copy:

cp ~/whisper_wake_up_led.py ~/whisper_wake_up_led_ros.py  
chmod  +x ~/whisper_wake_up_led_ros.py

The goal was to keep the original script untouched.

The ROS-enabled version publishes the transcription to:

/asr/text

I added these imports:

import  rclpy  
from  rclpy.node  import  Node  
from  std_msgs.msg  import  String

At the beginning of main() I initialized a ROS 2 node:

rclpy.init()  
ros_node  =  Node('whisper_to_ros_bridge')  
asr_pub  =  ros_node.create_publisher(String, '/asr/text', 10)

Then, after receiving a transcription from Whisper, I added:

if  text:  
  msg  =  String()  
  msg.data =  text  
  asr_pub.publish(msg)  
  rclpy.spin_once(ros_node, timeout_sec=0.0)  
  print(f"[ROS] published /asr/text: {text}", flush=True)

And during shutdown:

ros_node.destroy_node()  
rclpy.shutdown()

The host-side voice input chain became:

Wake word -> VAD -> whisper-server -> transcription -> /asr/text

Choosing the Local LLM

I first tested Qwen3 4B with Ollama:

ollama pull qwen3:4b

It installed and ran correctly, but it was not a good match for this real-time voice pipeline.

Even when using:

"think": false

the model still produced long reasoning-style responses in practice.

The robot started saying things like:

Okay, the user asked me to say hello…
Let me think about how to respond…
First, I need to make sure…

That is not acceptable for a spoken robot assistant.

I do not want to hide the reasoning after the fact.

I want the model not to produce it in the first place.

So I removed Qwen from the final design and switched to:

llama3.2:3b

I installed it with:

ollama pull llama3.2:3b

This was much more appropriate for the project:

small enough for Jetson Orin Nano,
fast enough for short spoken replies,
no reasoning leakage,
predictable output,
good enough for a first local robot brain.

Starting Ollama

Ollama runs on the Jetson host:

ollama serve

I tested the model with:

curl http://127.0.0.1:11434/api/chat -d  '{  
 "model": "llama3.2:3b",  
 "messages": [  
 { "role": "system", "content": "You are a spoken robot assistant. Always reply in English. Use one short sentence only. Output only the final answer." },  
 { "role": "user", "content": "Say hello." }  
 ],  
 "stream": false  
}'

The expected output should be a short direct response.

For example:

Hello!

No chain of thought.

No “let me think”.

No long explanation.

Creating the ROS 2 LLM Bridge

The bridge node is called:

llm_bridge_streaming

It lives in:

~/ros2_ws/src/robot_assistant/robot_assistant/llm_bridge_streaming.py

Its job is simple:

/asr/text -> Ollama -> /tts/say

The bridge subscribes to:

self.sub =  self.create_subscription(String, self.asr_topic, self.on_asr_text, 10)

It publishes to:

self.tts_pub =  self.create_publisher(String, self.tts_topic, 10)

The Ollama payload uses streaming:

payload  = {  
  "model": self.model,  
  "messages": [  
 {"role": "system", "content": self.system_prompt},  
 {"role": "user", "content": user_text},  
 ],  
  "stream": True,  
  "keep_alive": self.keep_alive,  
}

The system prompt is intentionally strict:

You are a spoken robot assistant.
Always reply in English.
Use one short sentence only.
Output only the final answer.
Do not reveal reasoning.
Do not think aloud.

I keep the LLM in English because my current Whisper setup returns English text, even when I speak French.

That gives a consistent internal pipeline:

Whisper output: English
LLM input: English
LLM output: English
Piper voice: English

Why the Bridge Streams

Waiting for the complete LLM answer before speaking creates a bad user experience.

A spoken robot should not behave like this:

Human speaks
…
Robot waits
…
LLM finishes
…
TTS starts

It should behave more like this:

Human speaks
…
LLM starts generating
…
TTS starts speaking the first useful chunk

The bridge streams tokens from Ollama, buffers them, and sends short speakable chunks to Piper.

I do not send every token to Piper.

That would create unnatural speech.

Instead, the bridge sends short chunks or completed sentences.

Example:

LLM output:
Hello, I am your robot assistant and I am ready to help.

TTS chunks:
Hello, I am your robot assistant
and I am ready to help.

This improves perceived latency while keeping the speech understandable.

Registering the Node in setup.py

The package already had a setup.py at:

~/ros2_ws/src/robot_assistant/setup.py

I added the bridge to the console_scripts list:

entry_points={  
  'console_scripts': [  
  'router = robot_assistant.router_node:main',  
  'chat_stub = robot_assistant.chat_stub_node:main',  
  'vlm_stub = robot_assistant.vlm_stub_node:main',  
  'orchestrator = robot_assistant.orchestrator_node:main',  
  'piper_tts = robot_assistant.piper_tts_node:main',  
  'llm_bridge_streaming = robot_assistant.llm_bridge_streaming:main',  
 ],  
},

Then I rebuilt inside the container:

source /opt/ros/humble/setup.bash  
cd /home/admin/ros2_ws  
colcon build --symlink-install  --merge-install  --packages-up-to robot_assistant  
source /home/admin/ros2_ws/install/setup.bash

I verified the executable:

ros2 pkg executables robot_assistant | grep llm_bridge_streaming

Expected output:

robot_assistant llm_bridge_streaming

Audio Problem: PulseAudio Socket After Reboot

The most annoying part of the setup was audio after reboot.

Previously, Piper used:

/run/user/1000/pulse/native

But after reboot, PulseAudio sometimes exposed:

/tmp/pulse-XXXXXX/native

So I now always check the real PulseAudio socket on the host:

pactl info | grep “Server String”

Example output:

Server String: /tmp/pulse-PKdhtXMmr18n/native

Then I pass this exact socket to Piper.

This was critical.

When Piper was using the wrong PulseAudio socket, the ROS node received /tts/say messages but no audio came out.

Starting Piper TTS

Piper runs inside the ROS 2 container.

First I enter the container:

docker exec -it  -u admin isaac_ros_dev-aarch64-container bash

Then I run Piper with the correct PulseAudio socket:

export  AMENT_TRACE_SETUP_FILES=  
export  PULSE_SERVER=unix:/tmp/pulse-PKdhtXMmr18n/native  
source /opt/ros/humble/setup.bash  
source /home/admin/ros2_ws/install/setup.bash  
source /home/admin/ros2_ws/bootstrap_container.sh  



ros2 run robot_assistant piper_tts --ros-args \  
  -p model_path:=/home/admin/ros2_ws/models/piper/en_US-amy-low.onnx \  
  -p pulse_server:=unix:/tmp/pulse-PKdhtXMmr18n/native \  
  -p pulse_sink:=alsa_output.usb-0b0e_Jabra_SPEAK_410_USB_08C8C2AE4A84x011200-00.analog-stereo \  
  -p use_cuda_flag:=true \  
  -p max_chars:=120 \  
  -p interrupt:=true \  
  -p drop_old:=true \  
  -p warmup:=false \  
  -p length_scale:=0.7 \  
  -p pacat_latency_msec:=60 \  
  -p pacat_process_time_msec:=30 \  
  -p read_chunk_bytes:=2048 \  
  -p inter_utterance_silence_ms:=40 \  
  -p restart_piper_on_interrupt:=false \  
  -p drop_audio_after_interrupt_ms:=150

To test Piper directly:

ros2 topic pub --once /tts/say std_msgs/msg/String "{data: 'Hello from direct Piper test.'}"

If I hear sound, Piper is working.

Starting the LLM Bridge

In a second container terminal:

docker exec -it  -u admin isaac_ros_dev-aarch64-container bash

Then:

source /opt/ros/humble/setup.bash  
source /home/admin/ros2_ws/install/setup.bash  
source /home/admin/ros2_ws/bootstrap_container.sh  



ros2 run robot_assistant llm_bridge_streaming --ros-args \  
  -p asr_topic:=/asr/text \  
  -p tts_topic:=/tts/say \  
  -p ollama_url:=http://127.0.0.1:11434/api/chat \  
  -p model:=llama3.2:3b \  
  -p min_words_before_flush:=6

To test the complete LLM-to-TTS path:

ros2 topic pub --once /asr/text std_msgs/msg/String "{data: 'Say hello and introduce yourself briefly.'}"

The robot should speak a short response.

Full Reboot Procedure

This is the complete procedure I now use after rebooting the Jetson Orin Nano Super.

Terminal 1: Start audio and the Isaac ROS container

On the host:

~/start_isaac_ros_audio.sh

Terminal 2: Check the PulseAudio socket

On the host:

pactl info | grep  "Server String"

Example:

Server String: /tmp/pulse-PKdhtXMmr18n/native

Keep this value for Piper.

Terminal 3: Start the Whisper HTTP server

On the host:

~/whisper.cpp/build/bin/whisper-server \  
  -m ~/whisper.cpp/models/ggml-small.bin \  
  -t  4 \  
  --host  127.0.0.1 \  
  --port  8080

Terminal 4: Start Ollama

On the host:

ollama serve

Terminal 5: Start Piper

On the host:

docker exec -it  -u admin isaac_ros_dev-aarch64-container bash

Inside the container:

export  AMENT_TRACE_SETUP_FILES=  
export  PULSE_SERVER=unix:/tmp/pulse-PKdhtXMmr18n/native  
source /opt/ros/humble/setup.bash  
source /home/admin/ros2_ws/install/setup.bash  
source /home/admin/ros2_ws/bootstrap_container.sh  



ros2 run robot_assistant piper_tts --ros-args \  
  -p model_path:=/home/admin/ros2_ws/models/piper/en_US-amy-low.onnx \  
  -p pulse_server:=unix:/tmp/pulse-PKdhtXMmr18n/native \  
  -p pulse_sink:=alsa_output.usb-0b0e_Jabra_SPEAK_410_USB_08C8C2AE4A84x011200-00.analog-stereo \  
  -p use_cuda_flag:=true \  
  -p max_chars:=120 \  
  -p interrupt:=true \  
  -p drop_old:=true \  
  -p warmup:=false \  
  -p length_scale:=0.7 \  
  -p pacat_latency_msec:=60 \  
  -p pacat_process_time_msec:=30 \  
  -p read_chunk_bytes:=2048 \  
  -p inter_utterance_silence_ms:=40 \  
  -p restart_piper_on_interrupt:=false \  
  -p drop_audio_after_interrupt_ms:=150

Terminal 6: Start the LLM bridge

On the host:

docker exec -it  -u admin isaac_ros_dev-aarch64-container bash

Inside the container:

source /opt/ros/humble/setup.bash  
source /home/admin/ros2_ws/install/setup.bash  
source /home/admin/ros2_ws/bootstrap_container.sh  



ros2 run robot_assistant llm_bridge_streaming --ros-args \  
  -p asr_topic:=/asr/text \  
  -p tts_topic:=/tts/say \  
  -p ollama_url:=http://127.0.0.1:11434/api/chat \  
  -p model:=llama3.2:3b \  
  -p min_words_before_flush:=6

Terminal 7: Start the wake-word and Whisper ROS bridge

On the host:

source /opt/ros/humble/setup.bash  
/home/thomas/whisper_wake_up_led_ros.py

At this point the robot is live.

Debug Checklist

Check ROS topics

Inside the container:

ros2 topic list

Expected:

/asr/text  
/tts/say  
/parameter_events  
/rosout

Check ROS nodes

Inside the container:

ros2 node list

Expected:

/llm_bridge_streaming  
/piper_tts

Check ASR messages

ros2 topic echo /asr/text

Check TTS messages

ros2 topic echo /tts/say

Test Piper only

ros2 topic pub --once /tts/say std_msgs/msg/String "{data: 'Hello from direct Piper test.'}"

Test LLM and Piper without Whisper

ros2 topic pub --once /asr/text std_msgs/msg/String "{data: 'Say hello and introduce yourself briefly.'}"

Check Whisper server

On the host:

ps aux | grep whisper-server | grep  -v  grep

Check Ollama

On the host:

curl http://127.0.0.1:11434/api/version

Check the local LLM

On the host:

curl http://127.0.0.1:11434/api/chat -d  '{  
 "model": "llama3.2:3b",  
 "messages": [  
 { "role": "system", "content": "You are a spoken robot assistant. Always reply in English. Use one short sentence only. Output only the final answer." },  
 { "role": "user", "content": "Say hello." }  
 ],  
 "stream": false  
}'

What Worked

The final working stack is:

whisper.cpp server  
wake-word Python script  
ROS 2 /asr/text  
llm_bridge_streaming  
Ollama  
llama3.2:3b  
ROS 2 /tts/say  
Piper TTS  
PulseAudio  
Jabra USB speaker

The best design choice was to keep the contracts simple:

Speech recognition output -> /asr/text
Speech synthesis input -> /tts/say

Everything else can evolve around those two topics.

What Did Not Work

Qwen3 4B was not a good match for this real-time spoken robot loop.

Even when disabling thinking through the API, it still produced long reasoning-style text in my tests.

For a chatbot, that may be acceptable.

For a robot voice loop, it is not.

The robot should not say:

Let me think about how to answer…

It should just answer.

Switching to llama3.2:3b made the system much more usable.

What I Learned

The LLM is not the whole robot brain.

The real robot brain is the architecture around it:

perception -> state -> reasoning -> speech/action

In this article, I only implemented the first usable voice loop:

listen -> transcribe -> reason -> speak

But because it is built on ROS 2 topics, it is now extensible.

The next logical steps are:

add robot memory,
inject robot state into the LLM prompt,
expose ROS 2 actions as tools,
connect perception topics,
add behavior arbitration,
and move from “voice assistant” to “robot executive”.

Final Result

The Jetson Orin Nano Super now runs a fully local voice-to-voice robot brain:

OK ROBOT
-> VAD
-> Whisper
-> /asr/text
-> local LLM
-> /tts/say
-> Piper
-> speaker

It is local. (no wifi, no ethernet, no fiber)

It is modular.

It is debuggable.

And most importantly, it is now connected through ROS 2 in a way that can grow into a real robot control architecture.