Why does Microsoft use DSL to build copilot?

From Catcliffe Development
Jump to navigation Jump to search


Liquid's Attention Graph Architecture ∞  AI-DSL Technical Report (February to May 2021) ∞  If LLM Is the Wizard, Then Code Is the Wand ∞  Why does Microsoft use DSL to build copilot? ∞  Can LLMs Really Reason and Plan? ∞ 

Why does Microsoft use DSL to build copilot, and how can you do the same in your GenAI app?

From: Substack.com https://shchegrikovich.substack.com/p/why-does-microsoft-use-dsl-to-build
Shchegrikovich
Jan 04, 2024

I like this video from the Ignite conference - "How Microsoft 365 Copilot works | BRK256". In the video, you can find the process and challenges of building an AI assistant; I'll focus on the DSL part only by reviewing a use case - adding a new slide to a presentation.

To add a new slide to the presentation, the LLM needs two things from a user: a slide deck and a task to execute. As a result, the LLM will return actions to perform on the deck. The question is, in what form the actions should be returned? In the video, the DSL is used in the response from the LLM:

add_slide("AI Predictions 2024")
 
add_list("Small LLMs", "Graphs/Ontology/DSL", "Open-source")

The DSL helps to debug and explain the whole user flow by increasing the level of abstraction. Reading high-level code for slide manipulation is easier compared to low-level API. In addition, DSL can be reduced to only safe operations rather than execute code provided by the LLM directly at the run-time. The last thing - DSL helps auto-recover from errors. If the LLM generates code with errors, we can check that code and return an error message to the LLM for re-generation. On this last verification step, I want to focus.

What we really want from the LLM for this scenario is to create a plan. The problem is that the LLM can be quite bad at planning - "Our results show that LLM’s ability to autonomously generate executable plans is quite meagre, averaging only about a 3% success rate" - https://arxiv.org/abs/2302.06706. But if we provide the LLM with feedback, then in several iterations we can produce the desired plan. Every time the LLM returns a code with errors, we send back an error message as a part of the prompt and ask the LLM to re-generate the plan. This is why we need some formal language, and DSL is the best candidate.

It looks like the DSL as an intermediate language for the LLM might be standard de facto, taking into account how well LLMs work with languages. To implement this technique, we need to teach the LLM a new language via fine-tuning or a few-shot prompting, create a DSL for a problem, and implement support in the code. Python has a package - ply, which provides parsing, Abstract Syntax Tree and one-pass compiler capabilities. The alternative is ANTLR.

The image is from "How Microsoft 365 Copilot works | BRK256" video.

Resources:

How Microsoft 365 Copilot works | BRK256

Avenging Polanyi's Revenge (Exploiting the Approximate Omniscience of LLMs in Planning..)

https://arxiv.org/abs/2401.00812 - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

https://cacm.acm.org/blogs/blog-cacm/276268-can-llms-really-reason-and-plan/ - Can LLMs Really Reason and Plan?

https://arxiv.org/pdf/2206.10498.pdf - PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change

https://arxiv.org/abs/2302.06706 - On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

https://github.com/dabeaz/ply - Python Lex-Yacc

https://github.com/antlr/antlr4 - ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Subscribe to Shchegrikovich LLM
Launched a year ago (from 16
49, 16 October 2024 (UTC))
Shchegrikovich LLM. From architecture to engineering. From attention and transformers to RAGs and agents.