ChatGPT / AI - Image Feedback

I’ve been seeing lots of youtube video’s showing people using platforms as Home Assistance, sending a picture to ChatGPT / AI platform and it sends back what’s in the picture.

I don’t see this as an option to send images to the ChatGPT binding, am I missing something?

Is anybody doing anything like this with OH? If so, can you share your rule/binding with me?

Best, Jay

I run a local AI model and send a snapshot from frigate and it gives back the description as text, which I send to piper TTS then sent to Chromecast for audible announcement. It’s done just for fun and giggles

Here is my rule, written in JRuby

require "faraday"

OLLAMA_URL = "http://192.168.1.10:11434/api/generate"

def describe_snapshot(image)
  image_data = Base64.strict_encode64(image)

  payload = {
    model: "llava",
    prompt: <<~PROMPT,
      Describe the people as if they are non-human species from Star Trek.
      Count the number of people correctly.
      Don't use the words the camera or image.
      Don't describe the setting.
      Keep it to a maximum of 25 words.
    PROMPT
    images: [image_data],
    stream: false,
    options: { temperature: 0.2 } # Lower temp for more deterministic output
  }

  response = Faraday.post(OLLAMA_URL, payload.to_json, "Content-Type" => "application/json")
  result = JSON.parse(response.body)
  result["response"]
rescue => e
  logger.error("Error during describe snapshot: #{e.message}")
  nil
end

rule "Describe people in Front Porch snapshot" do
  changed FrontPorch_Person_Snapshot
  debounce_for 2.seconds
  only_if(&:state?)
  not_if { Sleep_Mode.on? }
  only_if { FrontPorch_People.state.positive? }
  run do |event|
    logger.info "Describing people in Front Porch snapshot"
    # Uncomment the next 2 lines if you want to save the snapshot image to disk
    # snapshot = Time.now.strftime("%Y%m%d-%H%M%S")
    # File.write(OpenHAB::Core.config_folder / "photos/#{snapshot}.jpg", event.state.bytes.to_s)
    describe_snapshot(event.state.bytes.to_s)&.then do |desc|
      logger.info "Snapshot description: #{desc}"
      Voice.say desc
    end
  end
end

It uses FrontPorch_Person_Snapshot and FrontPorch_People which are items linked to Frigate’s MQTT topics frigate/frontporch/person/snapshot and frigate/frontporch/person which are supplied from Frigate.

Note that my script saves the snapshot images too - you don’t need to do it for interfacing with the AI.

I use the following ollama docker-compose:

version: "3.8"
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ~/ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

I only have a super lame nvidia GPU (T600 - 4GB RAM). It takes about 30-60 seconds for ollama to return the result. I don’t think ollama even uses my GPU at all. The CPU is Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz.

2025-12-25 15:42:35.799 [INFO ] [ripting.ipcameras.rule.ipcameras:220] - Describing people in Front Porch snapshot
2025-12-25 15:43:00.328 [INFO ] [ripting.ipcameras.rule.ipcameras:220] - Describe result:  There is one person in this image, who appears to be dancing or performing some sort of action with his right arm extended outward. He is wearing a black t-shirt and blue shorts. The background suggests an outdoor residential area. 

I could definitely use a better prompt to come up with more amusing descriptions but I haven’t played with it much.

Here’s the DSL working version using Grok to help me code/debug it.

Create an account with https://platform.openai.com/ and grab API key and fund it (low as $5/month)

Script (analyze_image.sh)

#!/usr/bin/env bash

# Usage:
#   ./analyze_image.sh /path/to/your/image.jpg "Describe exactly what is happening in this image"

# ────────────────────────────────────────────────────────────────────────────────
# Configuration - CHANGE THESE
# ────────────────────────────────────────────────────────────────────────────────

OPENAI_API_KEY="<your API key>" 

MODEL="gpt-4o-mini"        # or gpt-4o, gpt-4-turbo
MAX_TOKENS=300

# ────────────────────────────────────────────────────────────────────────────────
# Check arguments
# ────────────────────────────────────────────────────────────────────────────────

if [ $# -lt 2 ]; then
    echo "Usage: $0 <image_file> <prompt>"
    echo "Example:"
    echo "  $0 ./snapshot.jpg 'Describe exactly what is happening. Be specific about people, objects and actions.'"
    exit 1
fi

IMAGE_PATH="$1"
PROMPT="$2"

# Check if file exists and is readable
if [ ! -f "$IMAGE_PATH" ] || [ ! -r "$IMAGE_PATH" ]; then
    echo "Error: File not found or not readable: $IMAGE_PATH"
    exit 1
fi

# ────────────────────────────────────────────────────────────────────────────────
# Convert image to base64
# ────────────────────────────────────────────────────────────────────────────────

# Detect mime type (jpg, png, webp, etc.)
MIME_TYPE=$(file --mime-type -b "$IMAGE_PATH")

# Most common fallback
if [[ "$MIME_TYPE" == *"jpeg"* ]]; then
    MIME="image/jpeg"
elif [[ "$MIME_TYPE" == *"png"* ]]; then
    MIME="image/png"
elif [[ "$MIME_TYPE" == *"webp"* ]]; then
    MIME="image/webp"
else
    MIME="image/jpeg"  # default - OpenAI is quite forgiving
fi

BASE64_IMAGE=$(base64 -w 0 "$IMAGE_PATH")

# ────────────────────────────────────────────────────────────────────────────────
# Build JSON payload
# ────────────────────────────────────────────────────────────────────────────────

JSON_PAYLOAD=$(cat <<EOF
{
  "model": "$MODEL",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "$PROMPT"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:$MIME;base64,$BASE64_IMAGE"
          }
        }
      ]
    }
  ],
  "max_tokens": $MAX_TOKENS
}
EOF
)

# ────────────────────────────────────────────────────────────────────────────────
# Send request to OpenAI
# ────────────────────────────────────────────────────────────────────────────────

echo "Sending request to OpenAI... (image size: $(($(stat -c%s "$IMAGE_PATH") / 1024)) KB)"

RESPONSE=$(curl -s -X POST https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d "$JSON_PAYLOAD")

# ────────────────────────────────────────────────────────────────────────────────
# Pretty print the answer (requires jq installed)
# ────────────────────────────────────────────────────────────────────────────────

if command -v jq >/dev/null 2>&1; then
    ANSWER=$(echo "$RESPONSE" | jq -r '.choices[0].message.content // .error.message // "No answer"')
    echo -e "\nAnswer:\n"
    echo "$ANSWER"
else
    echo "Response (raw - install jq for pretty output):"
    echo "$RESPONSE" | grep -A 10 '"content"' || echo "$RESPONSE"
fi

# Optional: save full response
# echo "$RESPONSE" > response_$(date +%Y%m%d_%H%M%S).json

Item:

Switch	AI_Switch						"AI Switch [%s]"			(HomeState)	
String	AI_Results						"AI Results [%s]"			(HomeState)	

Rule: (attachmentFront = full path of image including image name)

			var String results51 = "NULL"
			AI_Results.postUpdate('OpenAI returned NO results.')


				if (AI_Switch.state == ON) {				

					try {
							results51 = executeCommandLine(Duration.ofSeconds(10), "/bin/bash", "/etc/openhab/scripts/analyze_image.sh", attachmentFront, "Describe exactly what is happening in this image. Be very specific about people, clothing, objects and any unusual activity.")
								
							if (results51 !== null && results51 != NULL && results51.toString() != '') { 	

								AI_Results.postUpdate(results51)
								logInfo("OPENAI",results51)
							}
						
						} catch (Exception u5) { logError("OPENAI","openAI picture analysis FAILED via CURL.  Exception is " + u5.getMessage) }						
				}

Example of the results from it:

Sending request to OpenAI… (image size: 43 KB)

Answer:

In the image, it appears to show a residential area during the morning hours. There is a driveway leading to a house, with a front lawn that has sparse grass and some bushes.

A person is seen standing on the driveway, wearing a green long-sleeve shirt and beige shorts. The individual is facing away from the camera and appears to be looking towards the street.

There is a vehicle parked in the street, which is a blue car. The surroundings indicate it might be winter, as a few patches of snow are visible on the grass. Near the left side of the image, an American flag is positioned on a pole. The overall scene is quiet, with no other notable activity occurring.

Best, Jay

1 Like

I’m curious, what’s the latency / time between starting to submit the query (so it includes the upload time) and getting back the result?

Around 1 second, its super-fast and costs less than a penny to process that image.

Best, Jay

Cool! If you have fun prompts or know where we can find a list of such prompts, please share! :slight_smile:

From here (Germany) it is around 10 seconds (gpt-4o-mini), a little bit too long.
I will have a look into the Ollama stuff …

Great idea, thx a lot :+1: