🪴 Potted
-
The Curious Case of LLM Evaluations
Our modeling, scaling and generalization techniques grew faster than our benchmarking abilities - which in turn have resulted in poor evaluation and hyped capabilities.
-
A Personal Test Suite for LLMs
Most LLM benchmarks are either academic or do not capture what I use them for. So, inspired by some other people, this is my own test suite.
-
Restaurants I Like
I love to try out new places to eat, and I wanted to share them here. And I am looking forward to suggestions, so tell me what you love in these cities
-
Games I Enjoy
I have never been a gamer, but I am slowly becoming someone who enjoys games
-
Inktober 2023 - No Reference?
I decided to try inktober without using any reference. I did not end up completing it but this is a logbook of what I did accomplish (and if I ever come around to completing it someday)
-
Demonstrating Gender Bias in GPT4
A brief demonstration of gender bias in GPT4, as observed from various downstream task perspectives ft. Taylor Swift