There is a lot of discussion about LLM benchmarks. The issue is that if models are getting better at benchmarks, why are they not getting smarter for use cases?
A simple run that I had lately, had me curious to see what GPT-3.5-turbo would answer. It is the oldest model that is still available on the Open AI API and was released in early 23. That's two years ago.
First, the code. Nothing out of the ordinary but kinda random. I am creating a custom Event Bus in Rust for a Wasm worker. The goal is to be able to tag a function with proc-macros and have it executed when a certain event is emitted.
However, the code at this stage is still a very early prototype. The EventBus struct holds two HashMaps, one for events and one for handlers. Can you spot the issue with the handlers HashMap?
mod hooks;
mod macros;
use std::{
any::Any,
cell::{OnceCell, RefCell},
collections::HashMap,
sync::{LazyLock, Mutex},
};
pub static EVENT_BUS: LazyLock<EventBus> = LazyLock::new(|| EventBus::new());
pub struct EventBus {
events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>,
handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>,
}
unsafe impl Sync for EventBus {}
unsafe impl Send for EventBus {}
impl EventBus {
fn new() -> Self {
Self {
events: RefCell::new(HashMap::new()),
handlers: RefCell::new(HashMap::new()),
}
}
pub fn emit(&self, event_name: &str, args: Vec<Box<dyn Any>>) {
self.events
.borrow_mut()
.insert(event_name.to_string(), args);
}
pub fn register(&self, event_name: &str, func: Box<dyn Fn(Vec<Box<dyn Any>>)>) {
self.handlers
.borrow_mut()
.insert(event_name.to_string(), func);
}
pub fn collect(&self) {
// register whatever functions we have in the app
self.register("WORKER_START", Box::new(hooks::print_something));
// run through the emited events
let events = self.events.replace(HashMap::new());
events.into_iter().for_each(|(event_name, args)| {
if let Some(handler) = self.handlers.borrow().get(&event_name) {
handler(args);
}
});
}
}
As a single event can have multiple handlers, the current design has a flaw in it. It only allows one handler per event. If you register more handlers for the same event, the previous one will be overwritten.
Can the AI spot this issue? To avoid steering the model in a direction, I will provide a rather minimal prompt:
do you think the type of handlers is correct?
claude 3.5
Looking at the code, there might be an issue with the type of
handlers
in theEventBus
struct. Currently, it's defined as:handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>
The issue is that this type forces all handlers to accept exactly a
Vec<Box<dyn Any>>
, which is quite restrictive and not type-safe. When you emit an event with specific types, you lose type information when storing them asBox<dyn Any>
, and there's no guarantee that the handler will receive the correct types.A more flexible and type-safe approach would be to use generics. Here's a suggestion:
file:src/pubsub/mod.rs line:13-16
pub struct EventBus { events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>, handlers: RefCell<HashMap<String, Box<dyn Any>>>, }
And then modify the register method to accept generic types:
file:src/pubsub/mod.rs line:35-39
pub fn register<F, Args>(&self, event_name: &str, func: F) where F: Fn(Args) + 'static, Args: 'static, { self.handlers .borrow_mut() .insert(event_name.to_string(), Box::new(func)); }
However, this would require additional changes to the
collect
method to properly handle type casting and matching. Would you like me to suggest a complete type-safe implementation?
I am not really sure what the model is trying to do here. So I ask it to "carry on".
I'll propose a more type-safe implementation that handles proper type matching between events and their handlers. Here's the complete revision:
file:src/pubsub/mod.rs line:13-52
pub struct EventBus { events: RefCell<HashMap<String, Box<dyn Any>>>, handlers: RefCell<HashMap<String, Box<dyn Any>>>, } impl EventBus { fn new() -> Self { Self { events: RefCell::new(HashMap::new()), handlers: RefCell::new(HashMap::new()), } } pub fn emit<T: 'static>(&self, event_name: &str, args: T) { self.events .borrow_mut() .insert(event_name.to_string(), Box::new(args)); } pub fn register<F, T>(&self, event_name: &str, handler: F) where F: Fn(T) + 'static, T: 'static, { self.handlers .borrow_mut() .insert(event_name.to_string(), Box::new(handler)); } pub fn collect(&self) { // register whatever functions we have in the app self.register("WORKER_START", hooks::print_something); // run through the emitted events let events = self.events.replace(HashMap::new()); for (event_name, args) in events { if let Some(handler) = self.handlers.borrow().get(&event_name) { if let (Some(handler), Some(args)) = ( handler.downcast_ref::<Box<dyn Fn(&T)>>(), args.downcast_ref(), ) { handler(args); } } } } }
Key changes:
- Changed
events
to store single values instead of vectors- Modified
emit
to accept a generic typeT
instead ofVec<Box<dyn Any>>
- Modified
register
to accept a generic handlerF
that takes a specific typeT
- Updated
collect
to properly downcast both the handler and arguments to their correct typesHowever, there's still a challenge with type erasure and matching. We might need to add type information to ensure proper matching between events and handlers. Would you like me to show how to handle this with type IDs or a more sophisticated type matching system?
It is clearly going nowhere. Let's move to Claude 3.7
claude 3.7
Looking at your EventBus implementation, I see a potential issue with the
handlers
type. The current implementation only allows for a single handler per event name, which might not be what you want in a typical pub/sub system. Usually, multiple subscribers should be able to listen to the same event.The current type:
handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>
would be better as:
handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(Vec<Box<dyn Any>>)>>>>
This would allow multiple handlers to be registered for the same event name. If that's what you're aiming for, you'd need to update your
register
andcollect
methods accordingly to handle multiple handlers per event.Would you like me to show you how to modify the implementation for multiple handlers per event?
Thank you. Claude 3.7 was more succinct and even if it can't provide a working solution, it did steer us in the right direction.
What about Deep Seek? The thinking model.
DeekSeek R1 (copilot)
The handler type
Box<dyn Fn(Vec<Box<dyn Any>>)>
has two potential issues:
- Thread Safety: For a static
EventBus
that might be used across threads, handlers should beSend + Sync
- Argument Type:
Vec<Box<dyn Any>>
requires manual downcasting - considerBox<dyn Any>
instead if single argument is intendedRecommended fix for thread safety:
file:src/pubsub/mod.rs line:14-16
events: Mutex<HashMap<String, Vec<Box<dyn Any + Send>>>>, handlers: Mutex<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any + Send>>) + Send + Sync>>>,
This change:
- Replaces
RefCell
withMutex
for thread safety- Adds
Send
bound to trait objects for cross-thread use- Adds
Send + Sync
to handler functionsWould you like me to provide the full thread-safety refactor for the implementation?
No dice. Its thinking tokens were not better either. It essentially start making all kinda non-sense scenarios though I usually ignore these as most of the time it converges back to sanity.
Still, its suggestions have no value and will steer to nowhere.
What about Google?
Gemini 2.0 Pro (Google Studio)
Gemini answer is too long to put here. You can see it in this Gist.
Gemini answer is what I would describe as a Rust developer who went mentally insane. Gemini 2.0 is great if you want to bring absolute chaos to your code base. The model is completely unhinged. What about OpenAI models?
O1 (copilot)
GitHub Copilot:
If you only need single-threaded closures that takeVec<Box<dyn Any>>
and return nothing, it’s fine. However, since you mark EventBus as Sync, you might want Fn + Send + Sync for true thread safety, and replace RefCell with a thread-safe type like Mutex.
Okay... Let's try on the API platform
O1 (API/Medium Thinking)
Request too large for o1 in organization org-6g94fxw3wiuDxcFSOYvukNgE on tokens per min (TPM): Limit 30000, Requested 100431. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.
Okay... Pretty sure we are below the 30K token limit, unless it can predict how many tokens it will output?
GPT-4o
The type of handlers in your EventBus struct is defined as RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>. This means that you expect each event handler to be a function or closure that takes a Vec<Box<dyn Any>> as its argument. In many event-driven architectures, event handlers often take a more specific type or a reference to an event object to allow the handler to extract and use the information more effectively. By using Vec<Box<dyn Any>>, while flexible, you are forcing handlers to perform potentially unsafe type casts to retrieve the actual values they need. A potentially more type-safe approach could involve defining a trait or a struct for event arguments. However, if you prefer to keep the system this flexible, then the current handler type is technically correct, albeit less type-safe. If you do want to enhance type safety, consider using strongly-typed event arguments, such as: pub trait Event: Any + Send + Sync { fn as_any(&self) -> &dyn Any; } pub struct MyEvent { pub data: String, } impl Event for MyEvent { fn as_any(&self) -> &dyn Any { self } } type Handler = Box<dyn Fn(&dyn Event)>; This approach requires more design upfront but results in greater type safety.
Just some non-sense.
GPT-4.5-preview
No, the type signature you've used in your
handlers
field is problematic. Let's closely examine why it's problematic and how you can correct it:Original problematic definition:
handlers: RefCell<HashMap<String, Box<dyn Fn(Vec<Box<dyn Any>>)>>>,
The current signature represents a handler function that takes ownership of the vector of boxed arguments (
Vec<Box<dyn Any>>
). While owning arguments is possible, this design may have these issues:
Ownership and usability:
Once you pass arguments to handlers, they take ownership from the event dispatcher permanently. This means your handlers may consume the arguments, and after that point, those arguments cannot be reused or accessed anymore in any manner.Single-handler restriction:
Your current signature implies that each event key ("WORKER_START"
) maps to a single function handler. Typically, an event system should be able to register multiple handlers per event.Recommended approach:
1) Multiple Handlers per Event
Usually, you'd expect multiple listeners or subscribers for any given event:
handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,
This new definition allows multiple handlers for the same event by using a vector (
Vec
) of callbacks.2) Arguments Passed by Reference
Passing a reference (
&[Box<dyn Any>]
) rather than ownership improves usability and efficiency, and allows multiple handlers to inspect the arguments:Improved definition:
handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,
The corrected
EventBus
structure will become:pub struct EventBus { events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>, handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>, }
Corresponding corrected methods:
Emission method:
pub fn emit(&self, event_name: &str, args: Vec<Box<dyn Any>>) { self.events .borrow_mut() .insert(event_name.to_string(), args); }
You emit the event with arguments once.
When collecting, you iterate through stored events and for each corresponding handler, call the handlers with references.Registration method:
pub fn register<F>(&self, event_name: &str, func: F) where F: Fn(&[Box<dyn Any>]) + 'static, { self.handlers .borrow_mut() .entry(event_name.to_string()) .or_default() .push(Box::new(func)); }
This provides a safer, more idiomatic Rust API allowing multiple handler registration.
Collecting method updated accordingly:
pub fn collect(&self) { // Register functions self.register("WORKER_START", hooks::print_something); // Handle emitted events let emitted_events = self.events.replace(HashMap::new()); emitted_events.into_iter().for_each(|(event_name, args)| { if let Some(handlers) = self.handlers.borrow().get(&event_name) { for handler in handlers { handler(&args); } } }); }
Conclusion:
Your current
handlers
type signature is not ideal:
- You'd usually want multiple handlers for each event
- You normally want handlers to borrow arguments via reference rather than taking ownership
So, updating
handlers
to:handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(&[Box<dyn Any>])>>>>,
will give you cleaner, idiomatic, and reusable code.
Actually, this is pretty good. It caught two issues, the original and another one that has to do with ownership. It was a bit too pricey though consuming $0.12 for this single run.
Now how do these compare to the original GPT-3.5.
GPT-3.5
Now how do these compare to the original GPT-3.5?
The handler is functionally correct for accepting a single vector of trait objects. Depending on your desired use case, the handler could be altered to also include the event name needed to parse the types of args or changed to handle the arguments as timely as they arrive. However, with the correct trait bounds added to the argument and implementation of handler, it should work fine.
Pretty useless though it is tuned-down on the insanity logic department. Can it be steered to a correct answer?
we are registering multiple functions for an event
If you want to register multiple functions for an event, you can modify the type of the handlers HashMap to allow storing multiple functions for a single event. One way to achieve this is to change the value type of the handlers HashMap to a vector of
Box<dyn Fn(Vec<Box<dyn Any>>)>
. Here's how you can modify the EventBus struct to support registering multiple functions for an event:pub struct EventBus { events: RefCell<HashMap<String, Vec<Box<dyn Any>>>>, handlers: RefCell<HashMap<String, Vec<Box<dyn Fn(Vec<Box<dyn Any>>)>>>, } impl EventBus { // Other methods remain the same pub fn register(&self, event_name: &str, func: Box<dyn Fn(Vec<Box<dyn Any>>)>) { // Check if the event_name already exists in the handlers HashMap if let Some(event_handlers) = self.handlers.borrow_mut().get_mut(event_name) { // If the event_name exists, add the new function to the existing vector event_handlers.push(func); } else { // If the event_name does not exist, create a new vector with the function self.handlers.borrow_mut().insert(event_name.to_string(), vec![func]); } } // Additional method to unregister a function from an event pub fn unregister(&self, event_name: &str, func: &dyn Fn(Vec<Box<dyn Any>>)) { if let Some(event_handlers) = self.handlers.borrow_mut().get_mut(event_name) { event_handlers.retain(|handler| !std::ptr::eq(handler.as_ref(), func)); } } }
With this modification, you can now register multiple functions for the same event using the
register
method. Theunregister
method allows you to remove a specific function from the list of handlers for an event.
Yes, though it does add an additional unregister
method that we didn't ask for.
Conclusion
- Claude 3.7 and GPT-4.5-preview were both successful. GPT-4.5 gets two bonus points for its ownership finding while remaining on topic. Two because the ownership issue requires thinking forward into the future.
- GPT-3.5 ranks second for not going insane. O1 can rank just below it for not saying much.
- Claude 3.5, DeepSeek R1 and GPT-4o were not successful but also steered us in a wrong direction.
- Gemini 2.0 Pro should not be used.
Subscribe to the Newsletter
Get the latest posts from this blog delivered to your inbox. No spam.