AI & Contexts

person · January 10, 2025

Projects Echo

Personal AI, and context

but consider: contexts. I was going to mention this earlier. this is not something I’ve read anywhere, but it makes sense for my own goal, Being able to hand the llm as many contextually related pieces of info at query time, as possible, could be very interesting. so like, ok u have jarvis. actually, wait i’ll use a real-world example. https://youtu.be/KXoIpwKsekY?si=LAU47vwiJcC0Ywjt

I’ve previously talked about Home Assistant style LLM integration being useful for an LLM. I want to talk about why that makes sense as far as making it better. The word is context.

Context in LLMs is also a sticking point over time. Context windows range from a few thousand to 100,000 tokens. The Great Gatsby can fit into 72,000 tokens. You can load an entire short novel. But that seems a bit useless if you think about it. RAG systems bypass the need to do something so brute force. You can semantic search and rerank and optimize away the need. It’s better to save that window for the historical data of a chat or conversation.

so maybe you don’t need to load all of The Great Gatsby in. Instead think of referencing 5-6 different books and asking for throughlines of the thematics, or for checking several papers in your own RAG system for citations.

I believe one of the most powerful ways forward for current systems, into a general voice interface, is actually through use of simultaneous contextual data. What do I mean?

In the above video, the commands spoken to the Home Assistant are not direct. They’re implicative, passive requests, in a sense. There is nothing directly in the sentences to indicate to the system what to actuate. The speaker isn’t saying “Alexa, turn on lamp 2.”

But the system is likely (I haven’t looked at the code/architecture yet) passing in the context from which the command was derived. A wakeword service picked up user input, and then something like a voice-to-text model (probably Whisper) was used to convert the speech to text. The text was then likely passed to the LLM with tools or direct context of:

What room the microphone receiving the input was in
Directly, or in an accessible manner, what other devices are in the room.

From there, the LLM can determine (if they’re properly labelled) which devices are relevant to the user’s speech, and then actuate accordingly. Semantic glue.

This is a decently complex arrangement. It’s pretty cool.

This is where I’d like to bring in the idea of multiple contexts.

Let’s add just one more, the idea that you might also have a voice tagging system, so the assistant can figure out who is speaking. This can become a type of authentication (security conscious folks, I can feel it too, there has to be another factor or at least an anti-voice gen layer, let’s ignore that for the moment). Ok let’s add even more!

Speaker diarization exists, and Apple has for years been able to lock a device’s Siri commands to the owner. Combine the two, and support multiple users, and suddenly one system can provide bespoke interactions with a household or a corporation.

You might also use contexts to generate context. So identifying a speaker may cause a query to run through personal calendars, emails, notes. In a properly large window, you can pull them all at once. What if you asked a personal AI to schedule a meeting for Tuesday? What if not only did it tell you you were full-up, it intelligently replied with a re-negotiation of holding it on Thursday instead, when yours and the other people’s timing align better. In this particular it’d be handling contexts:

Current speaker
Current room location (for replying through the correct speaker system)
Current calendar
Calendars for other, involved people

And that’s not even including the idea that it could suggest cancelling a more ancillary meeting on Tuesday, in favor of your new one, based on intuiting priorities from myriad other contexts that make sense if it has data access to emails and enough side data to determine your current pursuits and how you might rank them(this could come from historical meeting recordings that have been parsed into diarized and speaker identified logs).

We can also do sentiment analysis. What if the assistant was able to change it’s wording, maybe it’s a particularly stressful day, so it gingerly mentions the scheduling issue. Or maybe it has a personality analysis, and knows you prefer straight talk regardless, so it’s more frank. Contexts and context windows suddenly become a significant factor in how anthropic an assistant can be.

So thinking about all this, many, if not all parts of it, exist in themselves. I do not mean to make it a small feat. I do want to say, we have all the pieces, it’s just a matter of applying them to each other in intelligent and efficient ways.