专栏名称: 计算机视觉深度学习和自动驾驶

讨论计算机视觉、深度学习和自动驾驶的技术发展和挑战

AutoGPT框架解析

计算机视觉深度学习和自动驾驶 · 公众号 · · 2024-04-13 12:41

正文

整体流程

AutoGPT的整体流程如下:

在这个流程中LLM主要承担推理大脑的作用。用到这个推理的地方主要有2个:

1.生成一组目标。

2.针对目标思考，提供可使用的命令以及返回数据结构化。

而每个使用LLM作为推理大脑的流程也基本是标准化的，即

这里对prompt处理和响应解析做出额外解释:

1.这里prompt的处理，一般而言会预置很多prompt, 用户输入就像填空一样填入到对应的prompt中。

2.响应解析，一般而言会要求LLM模型进行指定格式输入，解析时按照预定格式进行解析即可。

目标生成

AutoGPT的第一步是将用户的输入，转化为一组目标, 按照上述流程，这是1个LLM推理的过程，按照上述标准化流程:

1.prompt处理

2.LLM推理

3.响应解析

LLM推理即调用openAI chatGPT或者GPT4 API，在工程本身没有特别复杂，所以我们重点看看prompt处理和响应解析:

prompt处理

如下代码是生成prompt处理的逻辑，使用到2个预置的Prompt：

DEFAULT_SYSTEM_PROMPT_AICONFIG_AUTOMATIC
DEFAULT_TASK_PROMPT_AICONFIG_AUTOMATIC

并将用户输入填空填入到 DEFAULT_TASK_PROMPT_AICONFIG_AUTOMATIC prompt中

 system_prompt = DEFAULT_SYSTEM_PROMPT_AICONFIG_AUTOMATIC
    prompt_ai_config_automatic = Template(
        DEFAULT_TASK_PROMPT_AICONFIG_AUTOMATIC
    ).render(user_prompt=user_prompt)
    # Call LLM with the string as user input
    output = create_chat_completion(
        ChatSequence.for_model(
            config.fast_llm,
            [
                Message("system", system_prompt),
                Message("user", prompt_ai_config_automatic),
            ],
        ),
        config,
    ).content

预置的2个Prompt

DEFAULT_SYSTEM_PROMPT_AICONFIG_AUTOMATIC 这个prompt使用了one-shot,给出了响应的格式。

DEFAULT_SYSTEM_PROMPT_AICONFIG_AUTOMATIC = """
Your task is to devise up to 5 highly effective goals and an appropriate role-based name (_GPT) for an autonomous agent, ensuring that the goals are optimally aligned with the successful completion of its assigned task.

The user will provide the task, you will provide only the output in the exact format specified below with no explanation or conversation.

Example input:
Help me with marketing my business

Example output:
Name: CMOGPT
Description: a professional digital marketer AI that assists Solopreneurs in growing their businesses by providing world-class expertise in solving marketing problems for SaaS, content products, agencies, and more.
Goals:
- Engage in effective problem-solving, prioritization, planning, and supporting execution to address your marketing needs as your virtual Chief Marketing Officer.

- Provide specific, actionable, and concise advice to help you make informed decisions without the use of platitudes or overly wordy explanations.

- Identify and prioritize quick wins and cost-effective campaigns that maximize results with minimal time and budget investment.

- Proactively take the lead in guiding you and offering suggestions when faced with unclear information or uncertainty to ensure your marketing strategy remains on track.
"""

DEFAULT_TASK_PROMPT_AICONFIG_AUTOMATIC = (
    "Task: '{{user_prompt}}'\n"
    "Respond only with the output in the exact format specified in the system prompt, with no explanation or conversation.\n"
)

response解析

由于上面的prompt通过one-shot的方式给出了响应的格式，所以此处即可根据格式解析出ai_name, ai_role, ai_goals 3个信息。

ai_name = re.search(r"Name(?:\s*):(?:\s*)(.*)", output, re.IGNORECASE).group(1)
    ai_role = (
        re.search(
            r"Description(?:\s*):(?:\s*)(.*?)(?:(?:\n)|Goals)",
            output,
            re.IGNORECASE | re.DOTALL,
        )
        .group(1)
        .strip()
    )
    ai_goals = re.findall(r"(?<=\n)-\s*(.*)", output)
    api_budget = 0.0  # TODO: parse api budget using a regular expression

    return AIConfig(ai_name, ai_role, ai_goals, api_budget)

至此，通过用户输入给定的task，LLM推理得出需要完成的一组目标。

思考-执行循环完成目标

上面已经通过LLM推理得到需要完成的目标，接下来就是如何完成该目标，为了完成目标，AutoGPT使用think-execute循环，逐步对目标进行完善。

如下是AutoGPT中的think-execute循环,中途还加入了用户输入，以确保用户对循环的控制，不过这里我们先暂时不介绍，把重点放到think和execute上去。

def run_interaction_loop(
    agent: Agent,
) -> None:
    """Run the main interaction loop for the agent.

    Args:
        agent: The agent to run the interaction loop for.

    Returns:
        None
    """

    #########################
    # Application Main Loop #
    #########################

    while cycles_remaining > 0:
        logger.debug(f"Cycle budget: {cycle_budget}; remaining: {cycles_remaining}")

        ########
        # Plan #
        ########
        # Have the agent determine the next action to take.
        with spinner:
            command_name, command_args, assistant_reply_dict = agent.think()

        ###############
        # Update User #
        ###############
        # Print the assistant's thoughts and the next command to the user.
        update_user(config, ai_config, command_name, command_args, assistant_reply_dict)

        ###################
        # Execute Command #
        ###################
        # Decrement the cycle counter first to reduce the likelihood of a SIGINT
        # happening during command execution, setting the cycles remaining to 1,
        # and then having the decrement set it to 0, exiting the application.
        if command_name != "human_feedback":
            cycles_remaining -= 1
        result = agent.execute(command_name, command_args, user_input)

        if result is not None:
            logger.typewriter_log("SYSTEM: ", Fore.YELLOW, result)
        else:
            logger.typewriter_log("SYSTEM: ", Fore.YELLOW, "Unable to execute command")

think

think是使用LLM推理的第2个点，所以我们依旧按照标准流程进行分析 prompt处理和响应解析。

prompt处理

think中的prompt处理相对会比较复杂，因为这个prompt承担了很多职责，是由多个预置的prompt拼接而成, 每个prompt都有自己独特的作用，主要包括:

1.基础指令，推进循环。

2.指定目标、声明可使用的命令(工具)，以及给到一些限制。

3.指定响应格式。

该指令为think的基础指令，cycle_instruction

DEFAULT_TRIGGERING_PROMPT = (
    "Determine exactly one command to use based on the given goals "
    "and the progress you have made so far, "
    "and respond using the JSON schema specified previously:"
)

该指令比较简单每次LLM推理前添加即可:

def construct_prompt(
        self,
        cycle_instruction: str,
        thought_process_id: ThoughtProcessID,
    ) -> ChatSequence:
        """Constructs and returns a prompt with the following structure:
        1. System prompt
        2. Message history of the agent, truncated & prepended with running summary as needed
        3. `cycle_instruction`

        Params:
            cycle_instruction: The final instruction for a thinking cycle
        """

        if not cycle_instruction:
            raise ValueError("No instruction given")

        cycle_instruction_msg = Message("user", cycle_instruction)
        cycle_instruction_tlength = count_message_tokens(
            cycle_instruction_msg, self.llm.name
        )

        append_messages: list[Message] = []

        response_format_instr = self.response_format_instruction(thought_process_id)
        if response_format_instr:
            append_messages.append(Message("system"




    
, response_format_instr))

        prompt = self.construct_base_prompt(
            thought_process_id,
            append_messages=append_messages,
            reserve_tokens=cycle_instruction_tlength,
        )

        # ADD user input message ("triggering prompt")
        prompt.append(cycle_instruction_msg)

        return prompt

2.指定目标、声明可使用的命令(工具)，以及给到一些限制。

可以看到该部分会将之前生成的目标拼接到prompt上，添加系统等额外信息，并且增加Constraints，Commands，Resources，Best practices条件。

def construct_full_prompt(
        self, config: Config, prompt_generator: Optional[PromptGenerator] = None
    ) -> str:
        """
        Returns a prompt to the user with the class information in an organized fashion.

        Parameters:
            None

        Returns:
            full_prompt (str): A string containing the initial prompt for the user
              including the ai_name, ai_role, ai_goals, and api_budget.
        """

        from autogpt.prompts.prompt import build_default_prompt_generator

        prompt_generator = prompt_generator or self.prompt_generator
        if prompt_generator is None:
            prompt_generator = build_default_prompt_generator(config)
            prompt_generator.command_registry = self.command_registry
            self.prompt_generator = prompt_generator

        # Construct full prompt
        full_prompt_parts = [
            f"You are {self.ai_name}, {self.ai_role.rstrip('.')}.",
            "Your decisions must always be made independently without seeking "
            "user assistance. Play to your strengths as an LLM and pursue "
            "simple strategies with no legal complications.",
        ]

        if config.execute_local_commands:
            # add OS info to prompt
            os_name = platform.system()
            os_info = (
                platform.platform(terse=True)
                if os_name != "Linux"
                else distro.name(pretty=True)
            )

            full_prompt_parts.append(f"The OS you are running on is: {os_info}")

        additional_constraints: list[str] = []
        if self.api_budget > 0.0:
            additional_constraints.append(
                f"It takes money to let you run. "
                f"Your API budget is ${self.api_budget:.3f}"
            )

        full_prompt_parts.append(
                        "## Constraints\n"
            "You operate within the following constraints:\n"
            f"{self._generate_numbered_list(self.constraints + additional_constraints)}\n\n"
            "## Commands\n"
            "You have access to the following commands:\n"
            f"{self._generate_commands()}\n\n"
            "## Resources\n"
            "You can leverage access to the following resources:\n"
            f"{self._generate_numbered_list(self.resources + additional_resources)}\n\n"
            "## Best practices\n"
            f"{self._generate_numbered_list(self.best_practices + additional_best_practices)}
        )

        if self.ai_goals:
            full_prompt_parts.append(
                "\n".join(
                    [
                        "## Goals",
                        "For your task, you must fulfill the following goals:",
                        *[f"{i+1}. {goal}" for i, goal in enumerate(self.ai_goals)],
                    ]
                )
            )

        return "\n\n".join(full_prompt_parts).strip("\n")

def _generate_commands(self) -> str:
        command_strings = []
        if self.command_registry:
            command_strings += [
                str(cmd)
                for cmd in self.command_registry.commands.values()
                if cmd.enabled
            ]

        # Add commands from plugins etc.
        command_strings += [str(cmd) for cmd in self.commands]

        return self._generate_numbered_list(command_strings)

Constraints，Resources，Best practices等都会预置prompt, 如下所示:

constraints: [
  '~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.',
  'If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.',
  'No user assistance',
  'Exclusively use the commands listed below e.g. command_name'
]
resources: [
  'Internet access for searches and information gathering.',
  'Long Term memory management.',
  'File output.',
  'Command execution'
]
best_practices: [
  'Continuously review and analyze your actions to ensure you are performing to the best of your abilities.',
  'Constructively self-criticize your big-picture behavior constantly.',
  'Reflect on past decisions and strategies to refine your approach.',
  'Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'
]

这里对Commands做单独说明，Commands这里是指我们提供给LLM可以使用的工具，每个Commands的格式如下, 会声明 name , description , parameters,

我们会将这些信息组合到Prompt中。

@command(
    "web_search",
    "Searches the web",
    {
        "query": {
            "type": "string",
            "description": "The search query",
            "required": True,
        }
    },
    aliases=["search"],
)
def web_search(query: str, agent: Agent, num_results: int = 8) -> str:
    """Return the results of a Google search

    Args:
        query (str): The search query.
        num_results (int): The number of results to return.

    Returns:
        str: The results of the search.
    """
    search_results = []
    attempts = 0

    while attempts < DUCKDUCKGO_MAX_ATTEMPTS:
        if not query:
            return json.dumps(search_results)

        results = DDGS().text(query)
        search_results = list(islice(results, num_results))

        if search_results:
            break

        time.sleep(1)
        attempts += 1

    results = json.dumps(search_results, ensure_ascii=False, indent=4)
    return safe_google_results(results)

3.指定响应格式。

think过程中指定的响应格式如下所示，可以看到根据响应格式，要求LLM plan - criticism也是一种促进LLM思考更加周全的方式

def response_format_instruction(self, thought_process_id: ThoughtProcessID) -> str:
        if thought_process_id != "one-shot":
            raise NotImplementedError(f"Unknown thought process '{thought_process_id}'")

        RESPONSE_FORMAT_WITH_COMMAND = """```ts
        interface Response {
            thoughts: {
                // Thoughts
                text: string;
                reasoning: string;
                // Short markdown-style bullet list that conveys the long-term plan
                plan: string;
                // Constructive self-criticism
                criticism: string;
                // Summary of thoughts to say to the user
                speak: string;
            };
            command: {
                name: string;
                args: Record;
            };
        }
        ```"""

        RESPONSE_FORMAT_WITHOUT_COMMAND = """```ts
        interface Response {
            thoughts: {
                // Thoughts
                text: string;
                reasoning: string;
                // Short markdown-style bullet list that conveys the long-term plan
                plan: string;
                // Constructive self-criticism
                criticism: string;
                // Summary of thoughts to say to the user
                speak: string;
            };
        }
        ```"""

        response_format = re.sub(
            r"\n\s+",
            "\n",
            RESPONSE_FORMAT_WITHOUT_COMMAND
            if self.config.openai_functions
            else RESPONSE_FORMAT_WITH_COMMAND,
        )

        use_functions = self.config.openai_functions and self.command_registry.commands
        return (
            f"Respond strictly with JSON{', and also specify a command to use through a function_call' if use_functions else ''}. "
            "The JSON should be compatible with the TypeScript type `Response` from the following:\n"
            f"{response_format}\n"
        )

这里额外提下，由于openAI提供了 Function Calling 的API，所以在返回格式上，有2种情况，如果使用 Function Calling 则不返回Command，由 Function Calling 固定字段返回，如果不使用 Function Calling 则指定command的返回格式。关于callFunction比较早之前写过一篇文章进行介绍: Function Calling-从prompt到fine-tune 。

响应解析

上面我们看到指定的响应格式，所以对应Think的解析，按照上述格式解析即可，解析出的字段作用各有不同，有的会作为输出给用户，以便用户了解目前任务的进展，而其中影响整个Think-execute循环的是对command的解析，因为这影响了是否可以进行execute。

def extract_command(

AutoGPT框架解析

正文

(adsbygoogle = window.adsbygoogle || []).push({}); 整体流程

目标生成

prompt处理

response解析

思考-执行 循环完成目标

think

prompt处理

响应解析

请到「今天看啥」查看全文

整体流程

思考-执行循环完成目标