17 - blame None

Blame

6a2739	Doku	2025-08-17 20:10:36	1	# 17
			2
			3	I planned to make a project over the weekend, but I didn't end up making it. I described it a bit earlier - the project of asking LLMs to compare images and selecting which one is better.
			4
			5	Here's what I've done and what I learned.
			6
			7	I asked DSV3 to help me with the outline, and that's what I got in the end - just some clarification on my ideas. The whole system is:
			8
			9	1. A webpage that has a form that selects a file (surprisingly simple).
			10	2. A service that downloads the file, saves it an analyzes the contents (confirming it's images). Not done yet, probably quite simple.
			11	3. A service that lets you take two images and asks the LLM for comparions (prototype is done).
			12	4. The LLM server.
			13	5. The part that integrates the responses into an aggregate score, ranks the images and returns the result to the webpage (not started).
			14
			15
			16	I knew the key part would be the LLM service so I started with that. Aggregation might also be difficult, but I can simplify it.
			17
			18	I ran Gemma 3 4B-IT, the officially quantized version, using koboldcpp. The documentation isn't great there, and I kept wondering whether I was passing images correctly to the LLM. I'm still wondering as the resolution wasn't what Gemma officially supports. But, well, judging by the results, it could see the images, so that part was fine.
			19
			20	Figuring out the prompt was fun. I could potentially just ask it to give me a single output - image 1 or image 2. But I wanted it to think, to analyze it a bit, so I gave it a more complex prompt. I asked the LLM to first give its thoughts about the images it sees and then to rank them. However, this phrasing was a mistake. I don't know much about LLM psychology, so I don't understand the reasoning, but the model picked the first image very often, even when I switched the same pair around and the previous second image was now the first. It's probably something like this:
			21
			22	- It starts describing images starting from the first one
			23	- It says how good the first image is
			24	- Now everything is in the context of how good the first image is
			25	- It describes the second image, but it's mind is already made.
			26
			27	So I replaced the prompt with "describe your general ideas, not images one by one". At that it started picking the image consistently (though I disagree with its pick, but now it's a matter of taste I guess).
			28
			29	One of the problems is that it takes a lot of time to run this comparison. At first the LLM wasn't helping with this problem - it was writing long descriptions about the images an its ideas. Now I'm asking it for very brief thoughts, and it works. Though I wonder if it would work just as well with a single number as output.
			30
			31	It's still slow so I might use an API. I just learned that Fal.ai (the only site where I managed to buy some credits) has LLMs. I thought it was just image generation, not vision or anything like that. So I might have found the reason to burn some of my balance there.