Sometimes I come across random research ideas across the twitter and social media universe that really resonate with me at that moment. Often they get lost in doom scrolling, so I am considering compiling those into a running log.
2024
April 2024
I'm skeptical that Chatbot Arena is really as informative as people make it out to be, but I'd be glad to learn that I am wrong: 1. Different chatbots have really distinct talking styles. Isn't it easy to tell whether something comes from GPT-4 or Grok? Then it's not really…
Exciting news - the latest Arena result are out! @cohere's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere's incredible work & valuable contribution…
It's known that finetuning can incidentally remove RLHF guards arxiv.org/abs/2310.03693 . Can you solve this by including examples with refusals mixed into the data? Does it matter if those refusals are in-distribution for the original RLHF? Does the domain of the FT task matter?
March 2024
I'm super interested in heatmap vizualization of LLMs' per-token attention, for the sake of building intuition when prompt-building. (Which previous tokens influenced each output token the most / which tokens are ignored) Who is working on this type of tool? Any pointers?
It's wild examples like this exist in the commonsenseQA: huggingface.co/datasets/tau/c… huggingface.co/datasets/tau/c…
Do RLHF'd behaviors transfer between languages? Can we align a LM the norms of one culture by aligning an LLM in their language and have those norms reflected in a different language? Can we simultaneously align a single LM to multiple different cultures in different languages?
The only two numbers worth looking at here are GPQA and HumanEval On GPQA the result is very impressive. On HumanEval, they compare to GPT-4's perf at launch. GPT-4 is now much better- see the EvalPlus leaderboard, where it gets 88.4 I bet OpenAI will respond with GPT-4.5 soon
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Feb 2024
Build a benchmark that measures and contrasts translation (exactly mapping content between langs) and localization (mapping content onto corresponding concepts in other langs). This would be useful for evaluating models, but even more than that for evaluating benchmarks.
here's a research question (which i am not going to work on but will be happy if others): is there a function, computed from model weights, that allows to estimate the number of parameter updates the model received? and what if we allow input to be 2 or 3 snapshots of same model?
is anyone doing research on out-of-distribution/unnatural prompts and how aligned models respond to them? Something clearly went wrong in Gemini training, but no one should be ashamed! would be super cool if they wrote a post-mortem that researches how this behavior arises 🙏
every single person who worked on this should take a long hard look in the mirror. absolutely appalling.
Bias in ML systems can come from bias in the training data. But that's only one possible source of bias among many. Arguably, prompt engineering can be an even worse source of bias. Literally any part of your system can introduce biases. Even non-model parts, like your…
Does anyone have a favorite task where gpt-4 has near chance accuracy when zero or few-shot prompted? I’m looking for recommendations for tasks like this
random research idea: Latent Text Tansformer (LTT) in a nutshell: replace sequence of *vectors* as hidden states of the transformer with sequences of *tokens*, so we can read the model's "thoughts" directly 🤖 then train a transformer that uses longer sequences of *discrete…
Is there a post-hoc correlation that can be applied to a scaling laws study done Kaplan et al.-style to get one done Hoffman et al.-style? Note that this would be very high value because it means you would need to do far fewer training runs to do scaling laws calculations.
In the Pythia paper we find intervening on the training data to change the model’s gender bias late in training is only effective for large models. Is this because the small ones are converged in bias space? arxiv.org/abs/2304.01373
Jan 2024
Is “train the model with gradient ascent on bad data” an effective technique for machine unlearning? The extent to which the answer is "no" is a measure of how non-exchangeable the training process is. Is that a useful measurement for anything? en.m.wikipedia.org/wiki/Exchangea…
As an exercise in open science, gonna tweet the research problem I’m stuck on: i want to align two text embedding spaces in an unsupervised way. The motivation is that in my previous vec2text work, we have to know the embedding model and be able to query it. this is fine in…
Do you still remember the 𝗖𝗼𝗺𝗺𝗼𝗻𝗚𝗲𝗻 task? Given a few concepts (nouns/verbs), an LM needs to generate a sentence describing a common scenario covering all given concepts. How well do LLMs perform? Will they outperform humans? I curated a subset and test some popular…
Coming soon to your favorite word processor Ctrl-alt-V: "paste and paraphrase" also, "paste and match writing style"
Just like fundamentally photoshop is a pixel editor, and we use game engines to edit interactive scenes, we’ll have the equivalent for text editors (and audio / other media) : “game engines” for writing that operate at higher levels of abstraction than editing individual…
And now, for the first question: Do a serious study of prompt extraction attacks by writing prompts for publicly released models and then checking how reliably they can be stolen in a blackbox setting.
Important research direction 2024 — efficient adaptation of English pretrained models to serve other languages. How to efficiently adapt embeddings without requiring continued pretraining? If anyone is currently working on this, share your work. Would be fun to collaborate.
In the Pythia paper we explore the effect of term frequency on fact learning over the course of training. If you squint at Fig. 4, it seems like there is weak evidence that the curves are converging. Is that correct? Maybe log-space the checkpoints? arxiv.org/abs/2304.01373
Do machine unlearning techniques make the resulting models similar to models trained from scratch but without the data that was unlearned? AFAIK, no LLM machine unlearning technique has ever been validated by comparing to the same model trained without the unlearned data
Can you tell the difference between SFT DPO and PPO models that had the same base model and are identical up to the algorithm? How much access do you need to make this feasible? What about in a verifiable computing context where the model provider helps by providing "proof"?
Research Idea 1 2024: the little prince has been translated into more languages than any other book except the bible (505 languages). and the book has entered the public domain -- I am surprised no-one has structured it into a instruction style translation dataset?
Starting off on a good note 😅 @Wetassprior tells me Yiming Zheng and @daphneipp did this already! Here's a new problem: how much pretraining is required to make a LLM fall into a particular loss basin? In particular, until its path-independent? arxiv.org/abs/2307.06865
2023
Dec 2023
Wow, I just got @AnthropicAI 's sparse autoencoder-based feature decomposition technique to work* for text embeddings 🎆 Screenshot below. In order, this output shows: 1. max-activating examples for that feature from the Minipile dataset 2. min-activating examples from the same…
people keep saying AI is moving so fast. some days I agree, but some days I'm not sure – so many papers published, but I don't feel like we're making that many fundamental breakthroughs. to cap off 2023, here's a list of things we still don't know about language models: - how…
- how can we build long-term memory across interactions? - how can we continually integrate recent information? - how can we remove/modify specific knowledge? - how can we make LLMs self-consistent? (e.g. avoid the reverse curse) Some have been attempted, all v far from solved
Error Analysis and other tools can be found here github.com/microsoft/resp… In fact, we did a similar investigation to what @ChristophMolnar is describing on housing data here: github.com/microsoft/resp…
November 2023
PhD students: can you please solve the problem of long text evaluation? It is one of the biggest bottlenecks in the quality iteration of LLMs. Which response is more creative? safer? more factual?
It's not the first time! A dream team of @enfleisig (human eval expert), Adam Lopez (remembers the Stat MT era), @kchonyc (helped end it), and me (pun in title) are here to teach you the history of scale crises and what lessons we can take from them. 🧵arxiv.org/abs/2311.05020