Blurbs
-
Recraft ai for Presentation Graphics
I am having so much fun playing around with recraft.ai line and fill generator (I get credits if you use this link on desktop and create your first image). Somehow, it fits my vibe perfectly well.
All of these were created using Line & Fill style and then I used the background removal option. I didn’t need to do that though because Line & Fill is a vectorized style.
-
Trying Bluesky
Like everyone else disappointed with the political stronghold over Twitter, I wanted to try Bluesky too. So, here is my account:
But also, here are some tools that you might find helpful:
- Most tools mentioned here
- Looking to find people on Bluesky that you used to follow on twitter? You can use this extension, which unfortunately is not perfect, but a good starting medium:
- Want to move your tweets over from Twitter to Bluesky? You can try blueark which is paid (but really really cheap and maintains original post dates) or use a script.
- Follow everyone that someone else is following:
- Did you subscribe to a block list (I am not going to recommend any here) and realized later that the list was maliciously created? Once that list has been deleted by bluesky, there is no easy to way to “unsubscribe” from that blocklist. For that, you might need:
- Embed your bluesky feed on your website:
- And finally starter packs. These are the most common ones for NLP/ML as of Nov 24, 2024, but might be deleted/modified in the future.
-
Configure citation style and bibliography style separately
On Nov 18, 2024, Yoav asks:
latex q: i need all bibliography items to be numbered, but i want the text cites to still be author-year. how?And, I wanted to document Jackson’s solution here just in case anyone else needs it again:
\documentclass{article} \usepackage[ citestyle=authoryear, bibstyle=numeric, natbib=true ]{biblatex} \addbibresource{references.bib} \title{Mixed Citations} \author{Jackson Petty} \date{November 2024} \begin{document} \maketitle As argued in \citet{macdonald-2008-syntactic}, inner aspect occupies a functional AspP projection intermediate between the \emph{v}P and VP phrases. \printbibliography \end{document}
with the comment that:
natbib=true
isn't necessary, it's just to provide natbib-style commands like\citet
and\citep
which people tend to be more familiar with over the biblatex-native commandsand this gives us:
-
Show Only Appendix ToC
If you have ever wanted to create a Table of Contents section in appendix without the contents from the main paper — here is how you can do it.
\appendix \clearpage % Start on a new page % Local command to print the header without affecting ToC or bookmarks \newcommand{\localheader}[1]{% \vspace*{-\topskip}\vspace*{\dimexpr\topskip-\baselineskip\relax} {\large\bfseries #1\par} \vspace{\baselineskip} } % Print the local header \localheader{Appendix Contents} \begin{center} \begin{minipage}{0.9\textwidth} \startcontents[appendix] % Start a new table of contents for the appendix \printcontents[appendix]{}{0}{% Print the appendix ToC \renewcommand{\contentsname}{}% Remove default "Contents" heading } \end{minipage} \end{center}
The good part is that it still maintains proper bookmarks in the exported pdf — and you can still use
\tabaleofcontents
while drafting to get the complete Table of Contents. -
Saving Money by Rounding Expenses Up
I honestly am not a person who is on TikTok or Instagram really, so I know I wouldn't get a TikTok account. I never even installed them; I don't have an account on them. But YouTube is something that I use often. YouTube Shorts, it's hard to avoid those. One of the most interesting things I learned on YouTube was saving money by rounding things up.
I've always been an extremely meticulous budgeted person. Like, I record every single money that goes in and out, and I divide it. Automatic apps don't work for me because I often use Venmo and Splitwise and things like those. I've always been the person who would ignore money coming in from, say, research studies or things like those, just so I have this nice net of savings that don't remain visible to me. I've always been this way, even with my allowance as a kid.
But one of the newest ways I've found to kind of save money is I round my transactions up. So if I spend $5, I mark it as $10 in my budgeting database in Notion. Basically, it still stores $5, but what I see is the sealed amount, and that sealed amount is $10. Or instead of $45, it is $50, or instead of $60, it is $100. It's kind of the same way of saving change in pennies, but instead in dollar notations. I really like that. I just wanted to share it.
-
I am Stuck in a Loop of Datasets ↔ Techniques
I keep jumping between - I do not trust the evaluation, the data is poor, to the dataset I created only has 100 samples.
Over the past three years, starting in 2021, I've had multiple ideas about the best ways to evaluate certain processes Show information for the linked content we currently do with LLMs. Whether it be general evaluation on one single task, multi-run evaluations, theory of mind evaluations Show information for the linked content , or something like RAG where you're retrieving over a set of documents, I've always had those ideas and really wanted to implement them.
I think the problem I keep getting stuck on is that data sets don't actually make sense to me. Sure, there's a lot of conversation about how many of these data sets are useless and companies need private data sets, etc. But you're publishing a paper on companies' private data sets, you're publishing it on public data sets, and so many public data sets right now are very much just generated by large models. That makes no sense to me because you're using them to evaluate these large models.
Every single time I “look at the data”, the thing that everyone asks you to do, and the thing that I've been doing throughout my PhD, I find it disappointing. Then I go into this realm of "Oh, so I should create it myself," and that is something that I really like to do. I have done it before, but I think the thing that is holding me back is this idea of how many data samples the data set needs to have for it to be a viable data set to be published, and for the research that is done using that data set to be published and be useful.
I think I have been inflating the number in my head, whereas some of the data sets that I like are barely 500 samples. Those that I like recently are like 100 to 150 samples. I think I need to change my mindset about creating a data set and putting it out there. If I'm running experiments on it and putting the subsequent publication out there, it doesn't need to be a data set that has thousands of samples.
I do have an eye for good data and good data curation. I should be able to create a good data set that is just 150 samples and is still a viable prospect in its usage and tells something useful about the model that we are using it on.
-
Please Don’t Have 10 Textareas in Review Forms
Another summer, another round of paper reviewing issues. I kinda just wanted to record this twitter conversation somewhere, so that I have someplace to point people to when they ask me what I want to be changed in reviewing.
My hot take is reviews should be a single text box. Asking people to fill in 10 text boxes for a paper increases the chances that no one wants to do it.Especially looking at NeurIPS Datasets & Benchmarks, why does it have NINE text areas other than "Summary And Contributions" and "Review".ACs have said that it allows inexperienced reviewers to decompose the review.
I guess it is the way I review which makes it really hard to do this format. I feel guilty about writing a single line in the textboxes there, but I review line by line on ipad (similar to overleaf comments, just in pdf format) and would very much prefer to be able to extract those and just paste that in. For example, I often write “typo: fix this word” or “cite X here, which negates the results you are obtaining, talk about why that might be the case” inline. Extracting this into separate content areas (especially with overlaps in purpose) feels really overwhelming.
Yanai here makes a point about highlighting global, aka, main strengths/weaknesses of the paper. And that, eventually these are the things the AC should focus on, and reviewers should make it easier for the AC to detect these.
And I do not disagree, but for a review load of >4 in a month, I end up dreading the second pass requirement, instead of being able to actually provide the review. I have always wondered if there's an opportunity to run reviewing experiments in newer conferences like @COLM_conf where people can review the paper inline like @hypothes_is/@askalphaxiv, and you tag comments with their kind (suggestion, missing etc) -- would that improve the reviewing culture? It would still be easy for AC’s to filter out the main global points based off tags, but it doesn’t require a second pass? As a cherry on top, it is also hard to do unlike the GPT based summary reviews of the paper?
Something I did not mention in that conversation is, I often want to extract these highlights, and ask GPT/Claude/LLM of the month to separate them out into categories to put into those text areas — but the outcome doesn’t ever sound natural; and there are always slight hallucinations that are difficult to catch. So, if that doesn’t work as well, maybe we try to change the review UX itself?
-
Add packages to ChatGPT code interpreter environment
TIL you can add unavailable packages into ChatGPT environment (for example, if you want your output to use
seaborn
). You can use the custom GPT pip install to download these libraries that calls remote actions in the background.From Simon Willson’s blog post:
If there's a package that you want to use you can grab the.whl
files for it from PyPI, upload them to ChatGPT and it will install them into its own environment for the duration of that chat session.If a library has dependencies you have to upload those, too.
-
Some people have a knack for interesting math problems
I have read this and calculated it uptil , but this is still so unintuitive for me.
Daniel Litt on Twitter asks:
Flip a fair coin 100 times-it gives a sequence of heads (H) and tails (T). For each HH in the sequence of flips, Alice gets a point; for each HT, Bob does, so e.g. for the sequence THHHT Alice gets 2 points and Bob gets 1 point. Who is most likely to win?And the answer is:
The correct answer is “Bob.” Congrats to the 10% who got it right — those few brave dreamers.But how? Why? This is so beyond my mathematical intuition
-
The 500 Day Duolingo Streak
Honestly, I am so proud of myself for this. I actually got here even with my PhD defense and job search. Yes, it took multiple streak freezes and yes, it took some cheesing (do lesson 1 unit 1 on days you are really not in a mood to learn anything new), but I have gotten so close before and missed. This time I did it!