Fundamentals of Assisted Intelligence

AI is moving fast, apparently killing startups left and right. As an ex-Viv engineer (w/ ex-Siri team), let me help ease everyone's future trauma as well with the Fundamentals of Assisted Intelligence.

Yes, OpenAI is building a new kind of computer, beyond just an LLM for a middleware / frontend. Key parts they'll need to pull it off:

Persistent User Preferences

The biggest unlock of assistants has always been to deeply understand what someone wants in the most specific way.
This is the "wow" moment where computers stop being scary and start feeling truly helpful.
We did this in 2016 on Viv when our AI knew what you liked for each and every service you used via Viv and mixed that in with context like what kind of flowers you told us your mom liked.
This will need to include access to your personal information to infer preference as well.

External, Real-time Data

50% of the utility of an LLM comes from the base training and RLHF fine-tuning; but much more comes from extending its available data with external sources.
Zapier, Airbyte and others will help, but expect deep integration with 3rd party apps / data pipelines.
"Chat w/ PDF" is a tiny, tiny part of this. If you're only building that, think much bigger.

Actual Computing on a Virtual Machines

Context windows are limiting, so AI providers will continue benefiting from running tasks directly on a Python or Node/Deno virtual env so it can consume huge amounts of data just like a computer today can.
Today these are short-lived envs used by Data Analyst / Julius, but over time they'll become a new type of Dropbox where your data is persisted long term for additional processing or cross-file inference / insights.

Agent Task / Flow Planning

Planning can't function without intent. Understanding intent has always been a holy grail, and LLMs finally helped us unlock what we spent years approximating at Viv with NLP tricks.
Once intent is accurate, planning can start. Creating an agent planner is incredibly nuanced and will take significant integration with user preferences, 3rd party data sets, knowledge of compute capabilities, etc.
The bulk of the real magic of Viv was the dynamic planner / mixer that would pull all these data and APIs together and generate both a workflow AND dynamic UI on top of them for a normal consumer to execute.

An App Store of Experts

Apple initially made the mistake of building a closed app store; then realized they could monetize a cornucopia of creativity if they opened it.
Regardless of OpenAI saying they're focused on ChatGPT and only ChatGPT, it's inevitable they'll rescope it and enable a long tail of specialized assistants.
Builders will be able to compose multiple tools together into workflows that can specialize
And AIs over time will be able to auto-compose these tools together as well, learning from the builders that came before them.

Persistent, Contextual Memory

Embeddings are helpful, but they are missing fundamental parts like context switching, conversational centroids, summarization, enrichment, etc.
Most of the cost of LLMs today comes from prompts, but as history and persistence is embedded and the inference cached, this will unlock the ability to have long term memory with pointers to critical subjects, topics, feelings, tone, etc.
Core memory is just the beginning. We still need all the rich information our minds conjure when we think about a past sunset, a breakup, a scientific understanding, or sensitive context for people we interact with.

Long Polling Tasks

"Agent" is a loaded word, but part of the intent is to have tasks that can be scheduled and self-completing regardless of the time horizon required.
E.g. "Let me know when flights from Montréal to Hawaii are less than $500"
This will require coordination of compute across API providers, as well as virtual envs in the cloud.

Dynamic UI

Chat is not the final, end-all interface. There's a reason apps have affordances like buttons, date pickers, images. It simplifies, clarifies.
AI will be a copilot, but to be a copilot it'll need to adjust to what works best for a given user. The future is personalized as optimizations require it, so UI will be dynamic.

API & Tool Composition

Expect AIs to generate custom "apps" in the future where we can build our own workflows and compose together APIs, without waiting for a big startup to do so.
Fewer apps and startups will be needed to generate frontends, and AI will be better at composing an array of tools and APIs together coupled with a gas fee / tax.

Assistant-to-Assistant Interaction

There will be countless assistants in the future, with each assisting humans and other assistants towards some greater intent.
Alongside this, assistants will need to learn to interface across text, APIs, file systems, and other modalities used both by agents / startups and humans as integration flows deeper into our world.

Plugin / Tool Stores

Specialized assistants can only be made possible by composing tools, APIs, prompts, data, preferences, and much more.
The current plugin store is super early days, so expect much more work to come, and expect many of those plugins to be rolled in-house as they become more mission critical.

Tip of the Iceberg

Much, much more is needed behind the scenes including:

just-in-time software (eg no human-in-the-loop), driven by traffic/virality
dynamically generated APIs, mostly for other AIs to use (and to commoditize existing walled garden vendors), and connectors
auto-scaffolded data stores, enriched and optimized based upon the same sort of “PageRank” click-back heuristics to tune and improve retrieval accuracy
reward functions for AI agent-driven development (eg digital Darwinism: which AI can build the best AIs and software, judged by humans and other AIs)
crawlers which collate domain-specific data, especially lurking within human communities like “Slack for iOS developers”, and generating fine tuned models or RAG systems
coordinators for an “OAuth into you” that enrich requests with private preference data, controlled by you (or your AI)
auto-distillation of internet data, along the lines of what Perplexity is lazy loading but done by AIs
and an infinite feedback loop of all this into more of itself
community (for intent, building, RLHF, etc)
gas fees to effortlessly pay for compute and access
context ingestion via glasses / earbuds / etc

Now imagine that across every single domain: art, biology, behavioral science, materials science, medicine, culinary science & restaurants, avionics, software, consumer goods.

For now, AI is a human copilot and accelerant. But what's the accelerant for the accelerant?

If you think it's too late to be in AI, just know the above is about 25% of what it'll actually take, with much more to come as we iterate and get even more creative.

-- Rob