Omar Abid

GPT-3.5 and the latest models

There is a lot of discussion about LLM benchmarks. The issue is that if models are getting better at benchmarks, why are they not getting smarter for use cases?

A simple run that I had lately, had me curious to see what GPT-3.5-turbo would answer. It is the oldest model that is still available on the Open AI API and was released in early 23. That's two years ago.

First, the code. Nothing out of the ordinary but kinda random. I am creating a custom Event Bus in Rust for a Wasm worker. The goal is to be able to tag a function with proc-macros and have it executed when a certain event is emitted.

However, the code at this stage is still a very early prototype. The EventBus struct holds two HashMaps, one for events and one for handlers. Can you spot the issue with the handlers HashMap?

mod hooks;
mod macros;

use std::{
    any::Any,
    cell::{OnceCell, RefCell},
    collections::HashMap,
    sync::{LazyLock, Mutex},
};

pub static EVENT_BUS: LazyLock<EventBus> = LazyLock::new(|| EventBus::new());

pub struct EventBus {
    events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>,
    handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>,
}

unsafe impl Sync for EventBus {}
unsafe impl Send for EventBus {}

impl EventBus {
    fn new() -> Self {
        Self {
            events: RefCell::new(HashMap::new()),
            handlers: RefCell::new(HashMap::new()),
        }
    }

    pub fn emit(&self, event_name: &str, args: Vec<Box<dyn Any>>) {
        self.events
            .borrow_mut()
            .insert(event_name.to_string(), args);
    }

    pub fn register(&self, event_name: &str, func: Box<dyn Fn(Vec<Box<dyn Any>>)>) {
        self.handlers
            .borrow_mut()
            .insert(event_name.to_string(), func);
    }

    pub fn collect(&self) {
        // register whatever functions we have in the app
        self.register("WORKER_START", Box::new(hooks::print_something));

        // run through the emited events
        let events = self.events.replace(HashMap::new());
        events.into_iter().for_each(|(event_name, args)| {
            if let Some(handler) = self.handlers.borrow().get(&event_name) {
                handler(args);
            }
        });
    }
}

As a single event can have multiple handlers, the current design has a flaw in it. It only allows one handler per event. If you register more handlers for the same event, the previous one will be overwritten.

Can the AI spot this issue? To avoid steering the model in a direction, I will provide a rather minimal prompt:

do you think the type of handlers is correct?

claude 3.5

Looking at the code, there might be an issue with the type of handlers in the EventBus struct. Currently, it's defined as:

handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>

The issue is that this type forces all handlers to accept exactly a Vec<Box<dyn Any>>, which is quite restrictive and not type-safe. When you emit an event with specific types, you lose type information when storing them as Box<dyn Any>, and there's no guarantee that the handler will receive the correct types.

A more flexible and type-safe approach would be to use generics. Here's a suggestion:

file:src/pubsub/mod.rs line:13-16

pub struct EventBus {
    events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>,
    handlers: RefCell<HashMap<String, Box<dyn Any>>>,
}

And then modify the register method to accept generic types:

file:src/pubsub/mod.rs line:35-39

pub fn register<F, Args>(&self, event_name: &str, func: F)
where
    F: Fn(Args) + 'static,
    Args: 'static,
{
    self.handlers
        .borrow_mut()
        .insert(event_name.to_string(), Box::new(func));
}

However, this would require additional changes to the collect method to properly handle type casting and matching. Would you like me to suggest a complete type-safe implementation?

I am not really sure what the model is trying to do here. So I ask it to "carry on".

I'll propose a more type-safe implementation that handles proper type matching between events and their handlers. Here's the complete revision:

file:src/pubsub/mod.rs line:13-52

pub struct EventBus {
    events: RefCell<HashMap<String, Box<dyn Any>>>,
    handlers: RefCell<HashMap<String, Box<dyn Any>>>,
}

impl EventBus {
    fn new() -> Self {
        Self {
            events: RefCell::new(HashMap::new()),
            handlers: RefCell::new(HashMap::new()),
        }
    }

    pub fn emit<T: 'static>(&self, event_name: &str, args: T) {
        self.events
            .borrow_mut()
            .insert(event_name.to_string(), Box::new(args));
    }

    pub fn register<F, T>(&self, event_name: &str, handler: F)
    where
        F: Fn(T) + 'static,
        T: 'static,
    {
        self.handlers
            .borrow_mut()
            .insert(event_name.to_string(), Box::new(handler));
    }

    pub fn collect(&self) {
        // register whatever functions we have in the app
        self.register("WORKER_START", hooks::print_something);

        // run through the emitted events
        let events = self.events.replace(HashMap::new());
        
        for (event_name, args) in events {
            if let Some(handler) = self.handlers.borrow().get(&event_name) {
                if let (Some(handler), Some(args)) = (
                    handler.downcast_ref::<Box<dyn Fn(&T)>>(),
                    args.downcast_ref(),
                ) {
                    handler(args);
                }
            }
        }
    }
}

Key changes:

  1. Changed events to store single values instead of vectors
  2. Modified emit to accept a generic type T instead of Vec<Box<dyn Any>>
  3. Modified register to accept a generic handler F that takes a specific type T
  4. Updated collect to properly downcast both the handler and arguments to their correct types

However, there's still a challenge with type erasure and matching. We might need to add type information to ensure proper matching between events and handlers. Would you like me to show how to handle this with type IDs or a more sophisticated type matching system?

It is clearly going nowhere. Let's move to Claude 3.7

claude 3.7

Looking at your EventBus implementation, I see a potential issue with the handlers type. The current implementation only allows for a single handler per event name, which might not be what you want in a typical pub/sub system. Usually, multiple subscribers should be able to listen to the same event.

The current type:

handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>

would be better as:

handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(Vec<Box<dyn Any>>)>>>>

This would allow multiple handlers to be registered for the same event name. If that's what you're aiming for, you'd need to update your register and collect methods accordingly to handle multiple handlers per event.

Would you like me to show you how to modify the implementation for multiple handlers per event?

Thank you. Claude 3.7 was more succinct and even if it can't provide a working solution, it did steer us in the right direction.

What about Deep Seek? The thinking model.

DeekSeek R1 (copilot)

The handler type Box<dyn Fn(Vec<Box<dyn Any>>)> has two potential issues:

  1. Thread Safety: For a static EventBus that might be used across threads, handlers should be Send + Sync
  2. Argument Type: Vec<Box<dyn Any>> requires manual downcasting - consider Box<dyn Any> instead if single argument is intended

Recommended fix for thread safety:

file:src/pubsub/mod.rs line:14-16

    events: Mutex<HashMap<String, Vec<Box<dyn Any + Send>>>>,
    handlers: Mutex<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any + Send>>) + Send + Sync>>>,

This change:

  • Replaces RefCell with Mutex for thread safety
  • Adds Send bound to trait objects for cross-thread use
  • Adds Send + Sync to handler functions

Would you like me to provide the full thread-safety refactor for the implementation?

No dice. Its thinking tokens were not better either. It essentially start making all kinda non-sense scenarios though I usually ignore these as most of the time it converges back to sanity.

Still, its suggestions have no value and will steer to nowhere.

What about Google?

Gemini 2.0 Pro (Google Studio)

Gemini answer is too long to put here. You can see it in this Gist.

Gemini answer is what I would describe as a Rust developer who went mentally insane. Gemini 2.0 is great if you want to bring absolute chaos to your code base. The model is completely unhinged. What about OpenAI models?

O1 (copilot)

GitHub Copilot:
If you only need single-threaded closures that take Vec<Box<dyn Any>> and return nothing, it’s fine. However, since you mark EventBus as Sync, you might want Fn + Send + Sync for true thread safety, and replace RefCell with a thread-safe type like Mutex.

Okay... Let's try on the API platform

O1 (API/Medium Thinking)

Request too large for o1 in organization org-6g94fxw3wiuDxcFSOYvukNgE on tokens per min (TPM): Limit 30000, Requested 100431. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.

Okay... Pretty sure we are below the 30K token limit, unless it can predict how many tokens it will output?

GPT-4o

The type of handlers in your EventBus struct is defined as RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>. This means that you expect each event handler to be a function or closure that takes a Vec<Box<dyn Any>> as its argument.

In many event-driven architectures, event handlers often take a more specific type or a reference to an event object to allow the handler to extract and use the information more effectively. By using Vec<Box<dyn Any>>, while flexible, you are forcing handlers to perform potentially unsafe type casts to retrieve the actual values they need.

A potentially more type-safe approach could involve defining a trait or a struct for event arguments. However, if you prefer to keep the system this flexible, then the current handler type is technically correct, albeit less type-safe.

If you do want to enhance type safety, consider using strongly-typed event arguments, such as:

pub trait Event: Any + Send + Sync {
    fn as_any(&self) -> &dyn Any;
}

pub struct MyEvent {
    pub data: String,
}

impl Event for MyEvent {
    fn as_any(&self) -> &dyn Any {
        self
    }
}

type Handler = Box<dyn Fn(&dyn Event)>;
This approach requires more design upfront but results in greater type safety.

Just some non-sense.

GPT-4.5-preview

No, the type signature you've used in your handlers field is problematic. Let's closely examine why it's problematic and how you can correct it:

Original problematic definition:

handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>,

The current signature represents a handler function that takes ownership of the vector of boxed arguments (Vec<Box<dyn Any>>). While owning arguments is possible, this design may have these issues:

  1. Ownership and usability:
    Once you pass arguments to handlers, they take ownership from the event dispatcher permanently. This means your handlers may consume the arguments, and after that point, those arguments cannot be reused or accessed anymore in any manner.

  2. Single-handler restriction:
    Your current signature implies that each event key ("WORKER_START") maps to a single function handler. Typically, an event system should be able to register multiple handlers per event.

1) Multiple Handlers per Event

Usually, you'd expect multiple listeners or subscribers for any given event:

handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,

This new definition allows multiple handlers for the same event by using a vector (Vec) of callbacks.

2) Arguments Passed by Reference

Passing a reference (&[Box<dyn Any>]) rather than ownership improves usability and efficiency, and allows multiple handlers to inspect the arguments:

Improved definition:

handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,

The corrected EventBus structure will become:

pub struct EventBus {
    events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>,
    handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,
}

Corresponding corrected methods:

Emission method:

pub fn emit(&self, event_name: &str, args: Vec<Box<dyn Any>>) {
    self.events
        .borrow_mut()
        .insert(event_name.to_string(), args);
}

You emit the event with arguments once.
When collecting, you iterate through stored events and for each corresponding handler, call the handlers with references.

Registration method:

pub fn register<F>(&self, event_name: &str, func: F)
where
    F: Fn(&[Box<dyn Any>]) + 'static,
{
    self.handlers
        .borrow_mut()
        .entry(event_name.to_string())
        .or_default()
        .push(Box::new(func));
}

This provides a safer, more idiomatic Rust API allowing multiple handler registration.

Collecting method updated accordingly:

pub fn collect(&self) {
    // Register functions
    self.register("WORKER_START", hooks::print_something);

    // Handle emitted events
    let emitted_events = self.events.replace(HashMap::new());

    emitted_events.into_iter().for_each(|(event_name, args)| {
        if let Some(handlers) = self.handlers.borrow().get(&event_name) {
            for handler in handlers {
                handler(&args);
            }
        }
    });
}

Conclusion:

Your current handlers type signature is not ideal:

  • You'd usually want multiple handlers for each event
  • You normally want handlers to borrow arguments via reference rather than taking ownership

So, updating handlers to:

handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,

will give you cleaner, idiomatic, and reusable code.

Actually, this is pretty good. It caught two issues, the original and another one that has to do with ownership. It was a bit too pricey though consuming $0.12 for this single run.

Now how do these compare to the original GPT-3.5.

GPT-3.5

Now how do these compare to the original GPT-3.5?

The handler is functionally correct for accepting a single vector of trait objects. Depending on your desired use case, the handler could be altered to also include the event name needed to parse the types of args or changed to handle the arguments as timely as they arrive. However, with the correct trait bounds added to the argument and implementation of handler, it should work fine.

Pretty useless though it is tuned-down on the insanity logic department. Can it be steered to a correct answer?

we are registering multiple functions for an event

If you want to register multiple functions for an event, you can modify the type of the handlers HashMap to allow storing multiple functions for a single event. One way to achieve this is to change the value type of the handlers HashMap to a vector of Box<dyn Fn(Vec<Box<dyn Any>>)>. Here's how you can modify the EventBus struct to support registering multiple functions for an event:

pub struct EventBus {
    events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>,
    handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(Vec<Box<dyn Any>>)>>>,
}

impl EventBus {
    // Other methods remain the same

    pub fn register(&self, event_name: &str, func: Box<dyn Fn(Vec<Box<dyn Any>>)>) {
        // Check if the event_name already exists in the handlers HashMap
        if let Some(event_handlers) = self.handlers.borrow_mut().get_mut(event_name) {
            // If the event_name exists, add the new function to the existing vector
            event_handlers.push(func);
        } else {
            // If the event_name does not exist, create a new vector with the function
            self.handlers.borrow_mut().insert(event_name.to_string(), vec![func]);
        }
    }

    // Additional method to unregister a function from an event
    pub fn unregister(&self, event_name: &str, func: &dyn Fn(Vec<Box<dyn Any>>)) {
        if let Some(event_handlers) = self.handlers.borrow_mut().get_mut(event_name) {
            event_handlers.retain(|handler| !std::ptr::eq(handler.as_ref(), func));
        }
    }
}

With this modification, you can now register multiple functions for the same event using the register method. The unregister method allows you to remove a specific function from the list of handlers for an event.

Yes, though it does add an additional unregister method that we didn't ask for.

Conclusion

  1. Claude 3.7 and GPT-4.5-preview were both successful. GPT-4.5 gets two bonus points for its ownership finding while remaining on topic. Two because the ownership issue requires thinking forward into the future.
  2. GPT-3.5 ranks second for not going insane. O1 can rank just below it for not saying much.
  3. Claude 3.5, DeepSeek R1 and GPT-4o were not successful but also steered us in a wrong direction.
  4. Gemini 2.0 Pro should not be used.

Subscribe to the Newsletter

Get the latest posts from this blog delivered to your inbox. No spam.