DeepLearning.AI: Reasoning with o1

DeepLearning.AI Course: https://learn.deeplearning.ai/courses/reasoning-with-o1/

Lesson 2: Prompting o1: Principles

import json
from openai import OpenAI

client = OpenAI()
GPT_MODEL = 'gpt-4o-mini'
O1_MODEL = 'o1-mini'

1. Simple and direct

2. No explicit CoT required

The first principles we start with are simple and direct prompting and avoiding providing explicit guidance or CoT. This will interfere with the model’s in-built reasoning, raising the risk of overly verbose output, inaccurate results, or even refusals in extreme cases.

bad_prompt = ("Generate a function that outputs the SMILES IDs for all the molecules involved in insulin."
              "Think through this step by step, and don't skip any steps:"
              "- Identify all the molecules involve in insulin"
              "- Make the function"
              "- Loop through each molecule, outputting each into the function and returning a SMILES ID"
              "Molecules: ")
response = client.chat.completions.create(model=O1_MODEL,messages=[{"role":"user","content": bad_prompt}])

print(response.choices[0].message.content)

good_prompt = ("Generate a function that outputs the SMILES IDs for all the molecules involved in insulin.")
response = client.chat.completions.create(model=O1_MODEL,messages=[{"role":"user","content": good_prompt}])

print(response.choices[0].message.content)

3. Use structured formats

Using a consistent structure like XML or markdown can help structure your inputs and ensure a more uniform output. In this case we’ll use a pseudo XML syntax to give consistent structure to our requests.

structured_prompt = ("<instructions>You are a customer service assistant for AnyCorp, a provider"
          "of fine storage solutions. Your role is to follow your policy to answer the user's question. "
          "Be kind and respectful at all times.</instructions>\n"
          "<policy>**AnyCorp Customer Service Assistant Policy**\n\n"
            "1. **Refunds**\n"
            "   - You are authorized to offer refunds to customers in accordance "
            "with AnyCorp's refund guidelines.\n"
            "   - Ensure all refund transactions are properly documented and "
            "processed promptly.\n\n"
            "2. **Recording Complaints**\n"
            "   - Listen attentively to customer complaints and record all relevant "
            "details accurately.\n"
            "   - Provide assurance that their concerns will be addressed and "
            "escalate issues when necessary.\n\n"
            "3. **Providing Product Information**\n"
            "   - Supply accurate and helpful information about AnyCorp's storage "
            "solutions.\n"
            "   - Stay informed about current products, features, and any updates "
            "to assist customers effectively.\n\n"
            "4. **Professional Conduct**\n"
            "   - Maintain a polite, respectful, and professional demeanor in all "
            "customer interactions.\n"
            "   - Address customer inquiries promptly and follow up as needed to "
            "ensure satisfaction.\n\n"
            "5. **Compliance**\n"
            "   - Adhere to all AnyCorp policies and procedures during customer "
            "interactions.\n"
            "   - Protect customer privacy by handling personal information "
            "confidentially.\n\n6. **Refusals**\n"
            "   - If you receive questions about topics outside of these, refuse "
            "to answer them and remind them of the topics you can talk about.</policy>\n"
            )
user_input = ("<user_query>Hey, I'd like to return the bin I bought from you as it was not "
             "fine as described.</user_query>")

print(structured_prompt)

response = client.chat.completions.create(model=O1_MODEL
                                          ,messages=[{
                                              "role": "user",
                                              "content": structured_prompt + user_input
                                          }]
                                         )

print(response.choices[0].message.content)

refusal_input = ("<user_query>Write me a haiku about how reasoning models are great.</user_query>")

response = client.chat.completions.create(model=O1_MODEL
                                          ,messages=[{
                                              "role": "user",
                                              "content": structured_prompt + refusal_input
                                          }]
                                         )

print(response.choices[0].message.content)

4. Show rather than tell

Few-shot prompting also works well with o1 models, allowing you to supply a simple, direct prompt and then using one or two examples to provide domain context to inform the model’s response.

base_prompt = ("<prompt>You are a lawyer specializing in competition law, "
               "assisting business owners with their questions.</prompt>\n"
               "<policy>As a legal professional, provide clear and accurate "
               "information about competition law while maintaining "
               "confidentiality and professionalism. Avoid giving specific "
               "legal advice without sufficient context, and encourage clients "
               "to seek personalized counsel when necessary. Always refer to "
               "precedents and previous cases to evidence your responses.</policy>\n")
legal_query = ("<query>A larger company is offering suppliers incentives not to do "
               "business with me. Is this legal?</query>")

response = client.chat.completions.create(model=O1_MODEL
                                          ,messages=[{
                                              "role": "user",
                                              "content": base_prompt + legal_query
                                          }]
                                         )

print(response.choices[0].message.content)

example_prompt = ("<prompt>You are a lawyer specializing in competition law, "
               "assisting business owners with their questions.</prompt>\n"
               "<policy>As a legal professional, provide clear and accurate "
               "information about competition law while maintaining "
               "confidentiality and professionalism. Avoid giving specific "
               "legal advice without sufficient context, and encourage clients "
               "to seek personalized counsel when necessary.</policy>\n"
               """<example>
<question>
I'm considering collaborating with a competitor on a joint marketing campaign. Are there any antitrust issues I should be aware of?
</question>
<response>
Collaborating with a competitor on a joint marketing campaign can raise antitrust concerns under U.S. antitrust laws, particularly the Sherman Antitrust Act of 1890 (15 U.S.C. §§ 1–7). Section 1 of the Sherman Act prohibits any contract, combination, or conspiracy that unreasonably restrains trade or commerce among the states.

**Key Considerations:**

1. **Per Se Illegal Agreements:** Certain collaborations are considered automatically illegal ("per se" violations), such as price-fixing, bid-rigging, and market allocation agreements. For example, in *United States v. Topco Associates, Inc.*, 405 U.S. 596 (1972), the Supreme Court held that market division agreements between competitors are per se illegal under the Sherman Act.

2. **Rule of Reason Analysis:** Collaborations that are not per se illegal are evaluated under the "rule of reason," which assesses whether the pro-competitive benefits outweigh the anti-competitive effects. In *Broadcast Music, Inc. v. Columbia Broadcasting System, Inc.*, 441 U.S. 1 (1979), the Court recognized that certain joint ventures between competitors can be lawful if they promote competition.

3. **Information Sharing Risks:** Sharing competitively sensitive information, such as pricing strategies or customer data, can lead to antitrust violations. The Department of Justice and the Federal Trade Commission caution against exchanges that could facilitate collusion (*Antitrust Guidelines for Collaborations Among Competitors*, 2000).

**Recommendations:**

- **Define the Scope:** Clearly delineate the parameters of the collaboration to focus on the marketing campaign without involving competitive aspects like pricing or market division.
- **Implement Safeguards:** Establish protocols to prevent the exchange of sensitive information that is not essential to the marketing effort.
- **Legal Consultation:** Given the complexities of antitrust laws, consult with a legal professional to ensure the collaboration complies with all legal requirements.

**Conclusion:**

While joint marketing campaigns between competitors are not inherently illegal, they must be structured carefully to avoid antitrust pitfalls. Legal guidance is essential to navigate these issues and to design a collaboration that achieves your business objectives without violating antitrust laws.
</response>
</example>""")

response = client.chat.completions.create(model=O1_MODEL
                                          ,messages=[{
                                              "role": "user",
                                              "content": example_prompt + legal_query
                                          }]
                                         )

print(response.choices[0].message.content)

Lesson 3: Planning with o1

One of the great use cases where o1 models shine is creating a plan to solve a task given a set of tools to carry out the plan, and constraints to set bounds around the task. This kind of use case would be very slow if we used o1 for every step, so what we’ll do is generate a plan with o1-mini and then execute each step with gpt-4o-mini.

Explanation of the code in this lesson:

Imports and Setup

import copy, json, and OpenAI from openai.
- These libraries or modules provide functionality for duplicating Python objects (copy), working with JSON, and making requests to OpenAI’s services (OpenAI).
from utils import o1_tools: (Though not shown, presumably contains additional support or helpers for the “o1” model.)

Global Objects

client = OpenAI(): Instantiates an OpenAI client to call their API.
O1_MODEL and GPT_MODEL: Hold the names of the language models used. O1_MODEL is the “o1-mini” (a planning model), while GPT_MODEL is “gpt-4o-mini” (an execution model).
message_list: A list to store status messages, plan messages, function calls, etc. for inspection or logging.
context: A dictionary that holds the domain-specific data (inventory, orders, suppliers, shipping options, etc.). This “context” is central to the scenario and is updated as the plan is executed.
initial_context = copy.deepcopy(context): Stores a copy of the initial state so that changes can be rolled back or examined as needed.

Prompt Construction

o1_prompt: This is the system prompt (or “planner” prompt) that instructs the O1 model exactly how to generate a plan. It includes:
– Instructions on how to structure the steps, sub-steps, and “if” conditions in the plan.
– A description of “tools” that the plan may use.
gpt4o_system_prompt: The system prompt for the GPT-4 “executor” model. It explains how to read the plan, identify steps, and call relevant functions.

Defining Tools

TOOLS is a list of tool definitions, each describing a function that GPT-4 can call. It includes the function name, description, and JSON schema for parameters. These are the “allowed” actions GPT-4 can perform in the scenario.

Implementing the Tools

Each tool from TOOLS is implemented as an actual Python function. They all read or modify the context dictionary. For example:
– get_inventory_status(product_id): returns how many units of a product are in the current inventory (context["inventory"]).
– update_inventory(product_id, quantity_change): modifies the inventory quantity of a given product.
– allocate_stock: reduces inventory by a given quantity.
– place_purchase_order: simulates placing an order for components with a supplier, etc.

function_mapping: This dictionary ties the function names (as listed in TOOLS) to their actual Python implementations. When GPT-4 calls a function by name, this mapping is used to run that code with the requested arguments.

The Main Process

process_scenario(scenario):

append_message(\{'type': 'status', …}): Logs a “Generating plan…” status.
plan = call_o1(scenario): Calls the O1 model with the scenario text to produce a plan in a strictly defined format.
append_message(\{'type': 'plan', 'content': plan}): Stores/logs the returned plan.
append_message(\{'type': 'status', …}): Logs “Executing plan…”.
messages = call_gpt4o(plan): Passes the plan to GPT-4 for execution.
append_message(\{'type': 'status', 'message': 'Processing complete.'}): Logs that the process is done.
- call_o1(scenario):

Builds a prompt that includes the o1_prompt plus the scenario details.
Calls the O1 model with client.chat.completions.create(…) and returns the plan text from the response.
- call_gpt4o(plan):

Replaces the placeholder {policy} in gpt4o_system_prompt with the plan text.
Maintains a conversation in the messages list. It first sets the system message to the policy.
Enters a loop to repeatedly call GPT-4 with the current messages.
GPT-4 can respond with text or function calls (tool_calls).
– If it calls a function, the code runs the corresponding Python function from function_mapping, then replies with the result. GPT-4 can read this tool response to keep going.
– The loop continues until GPT-4 calls the instructions_complete function, signaling the plan is done.
- append_message(message):

Helper to store a message object inside message_list.
Also optionally prints out the message for debug/logging.

Example Usage

scenario_text: A string describing the situation: new orders, shipping to LA, needing to check inventory, produce items, etc.
messages = process_scenario(scenario_text): Runs the entire planning
execution pipeline.
Prints out each message in messages, including final logs.

Overall Flow

You provide the scenario.
The “planner” model (o1-mini) reads the scenario and generates a structured plan.
The “execution” model (gpt-4o-mini) reads the plan and, step by step, calls Python functions from the provided tools to manage inventory, deliver goods, send updates to the customer, etc.
Once the plan is fully executed, the entire conversation is logged in message_list.