Categories
Coding home automation

Running LLaMA-omni2 to replace home assistant voice assistants to control Roborock S7

TL;DR: I can run some basic inference and tts, but there is no proper pipeline or any integration available with home assistant…, so I’ll next go back to rhasspy.

In my last post I setup my Roborock S7 (aka Rocki) with home assistant and setup a voice assistant with the voice preview device and google gemini models.

In this post, I want to document my exploration into running a proper speech-speech-omni model and control Rocki. First step, get the model to run and somehow be able to input speech.

Following: https://github.com/ictnlp/LLaMA-Omni2

git clone https://github.com/ictnlp/LLaMA-Omni2
cd LLaMA-Omni2
# sidetrack to install anaconda: go to https://repo.anaconda.com/archive/
# i selected https://repo.anaconda.com/archive/Anaconda3-2025.06-1-Linux-x86_64.sh
# I run linux in WSL on windows
conda create -n llama-omni2 python=3.10
conda activate llama-omni2
pip install -e .
# now run python shell
python
> import whisper
> model = whisper.load_model("large-v3", download_root="models/speech_encoder/")
> exit()
huggingface-cli download --resume-download ICTNLP/cosy2_decoder --local-dir models/cosy2_decoder
model_name=LLaMA-Omni2-7B
huggingface-cli download --resume-download ICTNLP/$model_name --local-dir models/$model_name
# it's downloading a lot of large files...
# maybe the 7B is a little big for my RTX3060 with 12GB VRAM, and also might be slow, so for testing, let's get the smallest 0.5B model. 
# And who knows how this will work, like does whisper run in parallel, then blocking VRAM?
model_name=LLaMA-Omni2-0.5B
huggingface-cli download --resume-download ICTNLP/$model_name --local-dir models/$model_name

# FIX 1: now somehow we need matcha-tts I ran into errors and doing this allows demo to run
pip install matcha-tts
# FIX 2: install ffmpeg (source: https://gist.github.com/ScottJWalter/eab4f534fa2fc9eb51278768fd229d70)
sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg

# open 3 terminals, make sure to activate the conda environment in each
# 1)
python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000

# 2)
python -m llama_omni2.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --vocoder-dir models/cosy2_decoder
# this has problems: jsonable_encoder stuff, gemini recommended to: 
pip install --upgrade pydantic
pip install --upgrade fastapi
# problem persists...

# 3)
model_name=LLaMA-Omni2-0.5B
python -m llama_omni2.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path models/$model_name --model-name $model_name

Ok, it didn’t work out of the box. 😐

Trying the local inference python script. This works!
I adapted the questions, recorded my own audio and you get a response:
“Why is the sky blue?”

###questions.json
[
    {
        "id": "helpful_base_0",
        "conversation": [
            {
                "from": "human",
				"speech": "examples/wav/whyskyblue.wav"
            }
        ]
    }
]
##

output_dir=examples/$model_name
mkdir -p $output_dir

python llama_omni2/inference/run_llama_omni2.py \
    --model_path models/$model_name \
    --question_file examples/questions.json \
    --answer_file $output_dir/answers.jsonl \
    --temperature 0 \
    --s2s

python llama_omni2/inference/run_cosy2_decoder.py \
    --input-path $output_dir/answers.jsonl \
    --output-dir $output_dir/wav \
    --lang en

I adapted it with my own question: Why is the sky blue.

run_llama_omni2.py takes ~18.6s

run_cose2_decoder.py takes ~14.1s

Result:

So: I think I need some more out of the box approach here. Maybe go back to Rhasspy?!

Categories
Coding home automation Server

home assistant voice assistant and Roborock integration

TL;DR: roborock s7 and home assistant work quite well with voice assistant.

So today I wanted to try to find a replacement for home asssitant voice assistant. I successfully setup a voice assistant within home assistant. However, the performance of this voice assistant was not, what I was hoping for. My main scenario was to start my vacuum with home assistant.

Rocki and home assistant

So how do I do that…? Roborock S6 integration with home assistant was easy. However, in order to start room clean with one click, you have to create boolean helper variables for each and a script, that correlates these variables to a room id. Then, based on wich boolean is activated will activate the room clean:

Note here, for the inexperienced home assistant user, you CAN edit everything in code and not in the GUI.

sequence:
  - variables:
      room_configs:
        - name: living_room
          boolean: input_boolean.rocki_room_living_room
          id: 16
        - name: kitchen
          boolean: input_boolean.rocki_room_kitchen
          id: 17
        - name: storeroom
          boolean: input_boolean.rocki_room_storeroom
          id: 19
        - name: dining_room
          boolean: input_boolean.rocki_room_dining_room
          id: 20
        - name: foodstorage
          boolean: input_boolean.rocki_room_foodstorage
          id: 21
        - name: office
          boolean: input_boolean.rocki_room_office
          id: 23
        - name: hallway
          boolean: input_boolean.rocki_room_hallway
          id: 24
  - variables:
      selected_rooms: |-
        {% set ns = namespace(rooms=[]) %}
        {% for room in room_configs %}
          {% if is_state(room.boolean, 'on') %}
            {% set ns.rooms = ns.rooms + [room.id] %}
          {% endif %}
        {% endfor %}
        {{ ns.rooms }}         
  - data:
      command: app_segment_clean
      params:
        - segments: |
            {{selected_rooms}}
    target:
      entity_id: vacuum.rocki
    action: vacuum.send_command
alias: Selective Cleaning
description: ""

So, how does it work? I activate each boolean on my dashboard and then tell it to clean:

I was really happy here 🙂 And after a few trials, I can safely say, my flat has never been vacuumed so thoroughly!

Kudos to: https://www.youtube.com/watch?v=xe7xjnGqYiU

Voice Assistant

I’m a happy owner of the Voice Preview Edition (https://www.home-assistant.io/voice-pe/) Starting with the voice assistant was not THAT successful. I run home assistant in docker, that might make it a little more complicated.

The integrated home assistant voice assistant (did nothing, and I ended up deleting it)

I also tried the home assistant cloud, and I would have been happy on supporting them monthly, if it gave me a working voice assistant.

Next I tried Local-LLM via hacs. This seemed promising, but neither the llama.cpp on my server, nor an ollama instance on my PC worked properly.

What did work, was integrate it with google-gemini. I just used all google services.

And it worked:

BUT: In order now to start my cleaning i have to issue a voice command like this:

Ok nabu! Activate the hallway, storeroom, kitchen and dining room, then start cleaning

This command mimics how I would do it manually via the dashboard, activate booleans and then initiate cleaning.

Another problem: my nabu device doesn’t engage in a more nuanced dialogue, I cannot chain or anything. Like:

Ok nabu! What time is ti?
>>> It is …
When was the last cleaning?
>>> Last cleaning was …
Ok, so please clean kitchen and dining room again
>>> Starting cleaning…, do you want me to create an automation that rocki cleans the kitchen every Wednesday
No thanks.

OK, so the last part may be over the top, but still, that’s my goal.
So what was the hacker reaction to this, I figured, I might need to program everything from scratch and do it myself! 😅

Let’s see!