Categories
Coding home automation

Running LLaMA-omni2 to replace home assistant voice assistants to control Roborock S7

TL;DR: I can run some basic inference and tts, but there is no proper pipeline or any integration available with home assistant…, so I’ll next go back to rhasspy.

In my last post I setup my Roborock S7 (aka Rocki) with home assistant and setup a voice assistant with the voice preview device and google gemini models.

In this post, I want to document my exploration into running a proper speech-speech-omni model and control Rocki. First step, get the model to run and somehow be able to input speech.

Following: https://github.com/ictnlp/LLaMA-Omni2

git clone https://github.com/ictnlp/LLaMA-Omni2
cd LLaMA-Omni2
# sidetrack to install anaconda: go to https://repo.anaconda.com/archive/
# i selected https://repo.anaconda.com/archive/Anaconda3-2025.06-1-Linux-x86_64.sh
# I run linux in WSL on windows
conda create -n llama-omni2 python=3.10
conda activate llama-omni2
pip install -e .
# now run python shell
python
> import whisper
> model = whisper.load_model("large-v3", download_root="models/speech_encoder/")
> exit()
huggingface-cli download --resume-download ICTNLP/cosy2_decoder --local-dir models/cosy2_decoder
model_name=LLaMA-Omni2-7B
huggingface-cli download --resume-download ICTNLP/$model_name --local-dir models/$model_name
# it's downloading a lot of large files...
# maybe the 7B is a little big for my RTX3060 with 12GB VRAM, and also might be slow, so for testing, let's get the smallest 0.5B model. 
# And who knows how this will work, like does whisper run in parallel, then blocking VRAM?
model_name=LLaMA-Omni2-0.5B
huggingface-cli download --resume-download ICTNLP/$model_name --local-dir models/$model_name

# FIX 1: now somehow we need matcha-tts I ran into errors and doing this allows demo to run
pip install matcha-tts
# FIX 2: install ffmpeg (source: https://gist.github.com/ScottJWalter/eab4f534fa2fc9eb51278768fd229d70)
sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ffmpeg

# open 3 terminals, make sure to activate the conda environment in each
# 1)
python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000

# 2)
python -m llama_omni2.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --vocoder-dir models/cosy2_decoder
# this has problems: jsonable_encoder stuff, gemini recommended to: 
pip install --upgrade pydantic
pip install --upgrade fastapi
# problem persists...

# 3)
model_name=LLaMA-Omni2-0.5B
python -m llama_omni2.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path models/$model_name --model-name $model_name

Ok, it didn’t work out of the box. 😐

Trying the local inference python script. This works!
I adapted the questions, recorded my own audio and you get a response:
“Why is the sky blue?”

###questions.json
[
    {
        "id": "helpful_base_0",
        "conversation": [
            {
                "from": "human",
				"speech": "examples/wav/whyskyblue.wav"
            }
        ]
    }
]
##

output_dir=examples/$model_name
mkdir -p $output_dir

python llama_omni2/inference/run_llama_omni2.py \
    --model_path models/$model_name \
    --question_file examples/questions.json \
    --answer_file $output_dir/answers.jsonl \
    --temperature 0 \
    --s2s

python llama_omni2/inference/run_cosy2_decoder.py \
    --input-path $output_dir/answers.jsonl \
    --output-dir $output_dir/wav \
    --lang en

I adapted it with my own question: Why is the sky blue.

run_llama_omni2.py takes ~18.6s

run_cose2_decoder.py takes ~14.1s

Result:

So: I think I need some more out of the box approach here. Maybe go back to Rhasspy?!

Categories
Coding home automation Server

home assistant voice assistant and Roborock integration

TL;DR: roborock s7 and home assistant work quite well with voice assistant.

So today I wanted to try to find a replacement for home asssitant voice assistant. I successfully setup a voice assistant within home assistant. However, the performance of this voice assistant was not, what I was hoping for. My main scenario was to start my vacuum with home assistant.

Rocki and home assistant

So how do I do that…? Roborock S6 integration with home assistant was easy. However, in order to start room clean with one click, you have to create boolean helper variables for each and a script, that correlates these variables to a room id. Then, based on wich boolean is activated will activate the room clean:

Note here, for the inexperienced home assistant user, you CAN edit everything in code and not in the GUI.

sequence:
  - variables:
      room_configs:
        - name: living_room
          boolean: input_boolean.rocki_room_living_room
          id: 16
        - name: kitchen
          boolean: input_boolean.rocki_room_kitchen
          id: 17
        - name: storeroom
          boolean: input_boolean.rocki_room_storeroom
          id: 19
        - name: dining_room
          boolean: input_boolean.rocki_room_dining_room
          id: 20
        - name: foodstorage
          boolean: input_boolean.rocki_room_foodstorage
          id: 21
        - name: office
          boolean: input_boolean.rocki_room_office
          id: 23
        - name: hallway
          boolean: input_boolean.rocki_room_hallway
          id: 24
  - variables:
      selected_rooms: |-
        {% set ns = namespace(rooms=[]) %}
        {% for room in room_configs %}
          {% if is_state(room.boolean, 'on') %}
            {% set ns.rooms = ns.rooms + [room.id] %}
          {% endif %}
        {% endfor %}
        {{ ns.rooms }}         
  - data:
      command: app_segment_clean
      params:
        - segments: |
            {{selected_rooms}}
    target:
      entity_id: vacuum.rocki
    action: vacuum.send_command
alias: Selective Cleaning
description: ""

So, how does it work? I activate each boolean on my dashboard and then tell it to clean:

I was really happy here 🙂 And after a few trials, I can safely say, my flat has never been vacuumed so thoroughly!

Kudos to: https://www.youtube.com/watch?v=xe7xjnGqYiU

Voice Assistant

I’m a happy owner of the Voice Preview Edition (https://www.home-assistant.io/voice-pe/) Starting with the voice assistant was not THAT successful. I run home assistant in docker, that might make it a little more complicated.

The integrated home assistant voice assistant (did nothing, and I ended up deleting it)

I also tried the home assistant cloud, and I would have been happy on supporting them monthly, if it gave me a working voice assistant.

Next I tried Local-LLM via hacs. This seemed promising, but neither the llama.cpp on my server, nor an ollama instance on my PC worked properly.

What did work, was integrate it with google-gemini. I just used all google services.

And it worked:

BUT: In order now to start my cleaning i have to issue a voice command like this:

Ok nabu! Activate the hallway, storeroom, kitchen and dining room, then start cleaning

This command mimics how I would do it manually via the dashboard, activate booleans and then initiate cleaning.

Another problem: my nabu device doesn’t engage in a more nuanced dialogue, I cannot chain or anything. Like:

Ok nabu! What time is ti?
>>> It is …
When was the last cleaning?
>>> Last cleaning was …
Ok, so please clean kitchen and dining room again
>>> Starting cleaning…, do you want me to create an automation that rocki cleans the kitchen every Wednesday
No thanks.

OK, so the last part may be over the top, but still, that’s my goal.
So what was the hacker reaction to this, I figured, I might need to program everything from scratch and do it myself! 😅

Let’s see!

Categories
Server

Setup your Linux Server

State: draft

Note: This shall serve as a collection of all relevant commands, so that one might save some time searching the world wide web.

USER and OS: Configure your own user and setup daily OS upgrades.

# Change root pw:
passwd
# add personal user 
adduser USERNAME
# add user to sudo group
adduser USERNAME sudo
# install unattended upgrades package...
sudo apt-get update
sudo apt-get install unattended-upgrades
# ... and enable
sudo dpkg-reconfigure --priority=low unattended-upgrades

FIREWALL: Install firewall and setup. You may want to change SSH port. Make sure to first allow ssh (so that you will not be excluded from your own server), allow the new port and once connected successfully, deny ssh.

Categories
Server

Strato Linux VPS (Linux-V10) and caprover – no happy end

TL;DR: caprover does NOT run on Strato Linux VServer (e.g. Linux V10). Think about nginx reverse-proxy, certbot and docker-compose instead.

The whole idea of running my own server was to install caprover and then be able to easliy create and push web applications. However, it did not work as intended…

This is a note to everybody that tries to install caprover, a self hosted PaaS (something like heroku) on a Linux VPS hosted on strato: It does not work.

The Strato VPS Linux servers, e.g. Linux V10, seem to be docker containers themselves (This might be the reason, why one can rent such a server at the low prize of 5 EUR/month). Caprover however would need to be installed on a non-dockerized host machine. Inside a docker container, the swarm and networking capabilities seem to be restricted.