Maybe wrong is a reasonable replacement for spam. Not using my spam tag seems a little worrisome. Wrong might call out my attention to not use it if what i'm saying is clearly right. I worked on gnuradio blocks for a bit! Yay! Radios are exciting. I'm stuck around my cognitive inhibitions with a stats problem. I'm trying to estimate which among a set of populatiom histograms a sample is most likely to fit, and I keep on freakin' out trying to make my brain do it rightly. Really i'd like to give a number to each option, so the user had an idea of how certain to be, and about what. I have two half-clang heuristics. One is to use a bernoulli distribution for each bin to give the portion of samplings that would be a worse guess for the bin, and take the product of that metric for all the bins. I don't remember it well: it gived a nice similarity metric from 0%-100%. The other half-clang heuristic is a matrix solution. I make a matrix out of all the population histograms, and solve for what vector multiplies them to get the histogram in question. This comes out with a nice result where 1.0 occupies choices that are precisely the same, but makes incredibly poor guesses for situations where there is no really good choice. I made the heuristics while playing around with actual probability and statistics, but it isn't quite cognitively working for me. At this point I can barely think about it! I can barely review my existing heuristics, even. Ideally, I'd like to output the actual probability of each histogram, being the one among the set, that the sampled histogram was sampled from. I've never taken many probability or stats classes, and the few I did take seemed to be just parroting things that were already obvious, so I didn't attend to them well, and don't have much experience or training with these things. When I google things like this, it usually tells me to go through a song and dance involving confidence intervals and significance and such. These are things I value, but as I guess I was a hacker I really value understanding the things I use, and picking the best solution based on that understanding. One thing i've noticed that makes it a little easiet for me, is that stats descriptions can leave out the properties they are describing the statistic of, which can make it more confusing. The probability of something happening is different than the probability of your guess about it happening being right, which is different than the probability of it happening if undescribed information about it is known, etc etc. A stats page might mention the distinction once and then assume everyone remembers, and that's hard for me nowadays. Notably, the probability of something happening given data, is different from the probability if it happening in the real world. That's confusing to me. It could be fun to model everything I measure to give it a good prior thingy, but that's not even what I want: I don't want to know what is most likely overall, I want to know what the data indicates is most likely. I'm not totally sure how to think about that, but the fundamental concept of summing favorable events and dividing them by total outcomes clearly assumes a uniform distribution of outcomes, and my brain does that too when I consider things around me. If you want to make a fair comparison of things, assume they have a uniform distribution so the comparison acts fairly. I don't really know. So, if we want to figure out the likelihood of one histogram being made from another among a set of distribution histograms, we could obviously enumerate all possible histograms that could be sampled from any one among the set, count how many are the same as one we sampled from each, and divide the number for the one in question by the total that are the same. I think I'm leaving something out there, statistically, when I consider it, but it seems like a helpful grounding point. The goal is clearly possible. With small histograms and relatively few samples, it would even be possible to simulate the above brute-force solution, to give the probability a histogram fits among a set, chart some data along every variable, and empirically derive an equation of probability. But it seems like such a basic thing that, had I the education, I expect there would be some formula solution one would know from the problem.