I want to build J.A.R.V.I.S.
Y’know, J.A.R.V.I.S.—Tony Stark’s personal AI assistant from the film Iron Man? J.A.R.V.I.S., or I should just call Jarvis from now, is an acronym for “Just A Rather Very Intelligent System”. It is a formless, highly intelligent computer system that assists Tony Stark in his day-to-day activities. Jarvis is capable of speech, understanding natural language, planning, suggesting solutions, has personality, and access points to interact with the physical world.
A similar concept could be TARS from Interstellar. TARS’ most distinctive trait is its personality, which is a blend of humour, sarcasm, and honesty.
Before ChatGPT, building an AI persoanl assistant seemed like a rather challenging task. Back then, interpretation of natural language was only mid at best. Now with ChatGPT, we have an overpowered autocomplete tool that can generate consistent and coherent responses that’s capable of faking understanding, convincingly.
“Fake it till you make it”? How can a conman act smart without actually making some smart comments? Similarly, how can ChatGPT fake understanding and intelligence so well without having the slightest bit of potential for “true intelligence”?
I believe that we are on the right track with ChatGPT, Gemini, Claude, and many other frontier LLMs towards our own personal assistant. Understanding a command is the first step to getting it done.
Martin Skow Røed, CTO of Databutton once told me during my internship interview (something along the lines of) “after all, there’s definitely a prompt to every right answer”. We have the tools to create such an AI personal assistant.
We simply have to ask the right questions.
Understanding the Situation
1. Seek inspiration from actual secretaries
- Humans are habitual creatures. We do things subconsciously in a pattern unique to us. In this regard, a horizontal software like ChatGPT cannot potentially be as effective as a vertical software that is tailored to the user’s habits. Jarvis has to optimise / finetune its responses and understanding to it’s user’s habits.
- Requirement: We need our personal assistant to observe, learn, and record our patterns.
2. Hierarchical agents
- A large language model is just a supercharged autocomplete tool. To “fake intelligence”, it has to give actual intelligent responses. But overload a single model with too many tasks and it will perform none satisfactorily.
- The first step is then to break down a large complex tasks into many smaller, simpler tasks.
- Different models have different strengths and weaknesses. For instance, OpenAI’s o1 model may be good at handling logic tasks, while OpenAI’s gpt4o may be better at completing literary tasks. Alternatively, Claude 3.5 Sonnet may be better than Google’s Gemini 2.0 at coding tasks, but Gemini 2.0 may be better at understanding humour or sarcasm.
- Different models are hence suited for different tasks.
- Requirement: We need our personal assistant to be able to delegate tasks to different models.
3. System 1 and System 2
- System 1 is fast, intuitive, and emotional. System 2 is slow, logical, and deliberate.
- System 1 is like ChatGPT, spitting out information based on its perceived likelihood quickly at the risk of being inaccurate. System 2 is like LLMs supplemented with Retrieval Augmented Generation capabilities (RAG), which requires reference to ground truth in generating responses. System 2 is slower, but more accurate.
- Requirement: For tasks that have a known solution, look for it and reference it before answering. Otherwise, admit that it doesn’t know the answer, and suggest a possible solution by comparing many System 1 responses. We could also reference similar tasks and their solutions.
4. Brain with hands
- A hyper-intelligent mind without the ability to communicate its thoughts or act on them is useless.
- Jarvis cannot simply be a thinker, it has to be a doer.
- Having access to tools that are better suited / required for certain tasks will allow us to create a more capable, all-rounded personal assistant. For instance, instead of generating the answers to a mathematical problem, we could simply use tools embedded with known-mathematical methods to solve the problem.
- Why reinvent the wheel or act smarter than we are? Use the tools that have been developed by the ingenious minds before us.
- Requirement: We have to equip our personal assistant with tools.
An all-purpose, all-intelligent personal AI assistant is too ambitious and too big of a project to take on at once. Like our assistant, we too have to break it down into smaller, more manageable tasks.
Let’s begin with building a personal research assistant to make sound investing decisions. We’ll call this Jarvis-I.
Jarvis-I will be able to:
- Learn what the user looks out for when analysing financial statements.
- Learn the user’s investment philosophy.
- Cover the user’s blind spots, neutralising the user’s biases.
- Ask insightful questions and discuss with the user on the soundness of the investment idea.
- Create graphics to help the user visualise key information (making it more human-friendly).
- Take notes about key points and finetune the thesis along the way.
To be continued.