Maybe wrong is a reasonable replacement for spam. Not using my spam tag seems a little worrisome. Wrong might call out my attention to not use it if what i'm saying is clearly right. I worked on gnuradio blocks for a bit! Yay! Radios are exciting. I'm stuck around my cognitive inhibitions with a stats problem. I'm trying to estimate which among a set of populatiom histograms a sample is most likely to fit, and I keep on freakin' out trying to make my brain do it rightly. Really i'd like to give a number to each option, so the user had an idea of how certain to be, and about what. I have two half-clang heuristics. One is to use a bernoulli distribution for each bin to give the portion of samplings that would be a worse guess for the bin, and take the product of that metric for all the bins. I don't remember it well: it gived a nice similarity metric from 0%-100%. The other half-clang heuristic is a matrix solution. I make a matrix out of all the population histograms, and solve for what vector multiplies them to get the histogram in question. This comes out with a nice result where 1.0 occupies choices that are precisely the same, but makes incredibly poor guesses for situations where there is no really good choice. I made the heuristics while playing around with actual probability and statistics, but it isn't quite cognitively working for me. At this point I can barely think about it! I can barely review my existing heuristics, even. Ideally, I'd like to output the actual probability of each histogram, being the one among the set, that the sampled histogram was sampled from. I've never taken many probability or stats classes, and the few I did take seemed to be just parroting things that were already obvious, so I didn't attend to them well, and don't have much experience or training with these things. When I google things like this, it usually tells me to go through a song and dance involving confidence intervals and significance and such. These are things I value, but as I guess I was a hacker I really value understanding the things I use, and picking the best solution based on that understanding. One thing i've noticed that makes it a little easiet for me, is that stats descriptions can leave out the properties they are describing the statistic of, which can make it more confusing. The probability of something happening is different than the probability of your guess about it happening being right, which is different than the probability of it happening if undescribed information about it is known, etc etc. A stats page might mention the distinction once and then assume everyone remembers, and that's hard for me nowadays. Notably, the probability of something happening given data, is different from the probability if it happening in the real world. That's confusing to me. It could be fun to model everything I measure to give it a good prior thingy, but that's not even what I want: I don't want to know what is most likely overall, I want to know what the data indicates is most likely. I'm not totally sure how to think about that, but the fundamental concept of summing favorable events and dividing them by total outcomes clearly assumes a uniform distribution of outcomes, and my brain does that too when I consider things around me. If you want to make a fair comparison of things, assume they have a uniform distribution so the comparison acts fairly. I don't really know. So, if we want to figure out the likelihood of one histogram being made from another among a set of distribution histograms, we could obviously enumerate all possible histograms that could be sampled from any one among the set, count how many are the same as one we sampled from each, and divide the number for the one in question by the total that are the same. I think I'm leaving something out there, statistically, when I consider it, but it seems like a helpful grounding point. The goal is clearly possible. With small histograms and relatively few samples, it would even be possible to simulate the above brute-force solution, to give the probability a histogram fits among a set, chart some data along every variable, and empirically derive an equation of probability. But it seems like such a basic thing that, had I the education, I expect there would be some formula solution one would know from the problem.
I have a habit, when a concept isn't working for me, to repeatedly access it in different ways, so that eventually my brain might get the picture and respond reliably. That isn't always working that well for me any more, but it is just something that I am so familiar with. When engaging this, I get confused when i think of the stats concept of "combinations". i'm not sure they're relevant here, but I can barely understand what they are. I think I
As I was saying, if knew all the possible histograms that could be sampled from the candidates, we could jist compare and count the answer. But this can be too many to reasonably do with latge data.
So one thing one considers a lot, is taking each bin separately, and combining them. I don"t know how hatd either of those are, to do.l, but to To consider whether that is reasonable to do or not, we imagine trying it. If I have one bin from a histogram, and I consider this histogram was sampled from a population described by another, there might be only so many ways of sampling the other histogram, to fill the one.
Sample a known histogram of a population. Say there nbuckets bins, and poptotal events. When we sample a histogram, I think there are ... Some number of total ways to do it. The poptotal. So there are poptotal total options for the first sample and poptotal^samptotal total sampled histograms: and many of those histograms are identical, indicating they are more probable. I am definitely making a number of weighted choices repeatedly.
On Sun, May 30, 2021, 3:21 PM Karl <gmkarl@gmail.com> wrote:
Sample a known histogram of a population.
Say there nbuckets bins, and poptotal events.
When we sample a histogram, I think there are ... Some number of total ways to do it. The poptotal. So there are poptotal total options for the first sample and poptotal^samptotal total sampled histograms: and many of those histograms are identical, indicating they are more probable.
I am definitely making a number of weighted choices repeatedly.
I am _so_ _confused_ and I don't understand the confusion. Usually when confused I find ways to continue. The goal is: discern how accurately it is reasonable to measure the most likely choice of population histogram given a sample, and do so. To meet a goal we need steps to reach it.
On Sun, May 30, 2021, 3:23 PM Karl <gmkarl@gmail.com> wrote:
On Sun, May 30, 2021, 3:21 PM Karl <gmkarl@gmail.com> wrote:
Sample a known histogram of a population.
Say there nbuckets bins, and poptotal events.
When we sample a histogram, I think there are ... Some number of total ways to do it. The poptotal. So there are poptotal total options for the first sample and poptotal^samptotal total sampled histograms: and many of those histograms are identical, indicating they are more probable.
I am definitely making a number of weighted choices repeatedly.
I am _so_ _confused_ and I don't understand the confusion. Usually when confused I find ways to continue.
The goal is: discern how accurately it is reasonable to measure the most likely choice of population histogram given a sample, and do so. To meet a goal we need steps to reach it.
Steps are easy to find by considering steps that have worked for things that are similar, or looking for behaviors that have similarity to parts of the goal. A great way to look for things is to consider things that are conceptual near for similar tasks, and if you can remember why you're looking you can do it even better. A step towards discerning this is to try out untried avenues, and preserve information on whether they are likely to work or not.
I don't remember whether it is likely to work to consider the probability of each population being correct given the content of each individual bin. I think i've tried this a couple times, but I'm not sure my work was sound. I'm trying to understand the theory of it. Is it reasonable to count the number of ways a bin could be sampled from each population, and divide? If this is reasonable, it may give the most accurate answer. It may even be easy to do. If it involves factorial, most of them may cancel when placed in ratio, or there may be a distribution that gives good results when the sample count is large.
Considering the avenue of counting how many diffetent samplings give a bin with the sampled height. Is there or is there not a reasonable way to measure this? What would a stats teacher tell you? Enumerating all the samplings involves considering each population bin, with each next population bin, and each next population bin. This pattern is why the total number of possible histogram samplings is popcount^sampcount or some such. If this bin has a height of z, that means that exactly z of those iterations landed on it, and sampcount-z of them did not. The sampcount-z quantity is a product that will be the same for every way of filling the bin. So the big question is, how many ways are there of filling the bin.
How many ways are there of filling the bin. The ways involving non-bin parts of the histogram are a constant factor. The bin is filled by a selection from the sampling events. Any five of the sampcount events may be involved in filling the bin. The sampcount events happen always in the same order, so I don't think we want permutations of them ... I'm guessing that this is a combinations situation. Where using the concept of combinations will give the proportional number of samplings that fill the bin to that height.
So if I am making n decisions, and exactly five of them are a specific decision, how many different total decision combinations result in that? I think that the answer is combinations, nCr, but I'm really confused around that. Combinations says, if I put n things in a bag, and try out all the ways of taking r of them out, and consider the same items in a different order to be the same .... Anyway I think it is permutations, not combinations ...
We have popcount^sampcount samplings. We're interested in the ones which have binheight selections of bin. The samplings are made in order. The number is an exponent, so however many ways there are of selecting those binheight ones,
What is a permutation? Permutations say if I have a bag, and I select items from the bag and put them in an _ordered list_, how many different ordered lists can i get? Combinations does unordered sets, permutations does ordered lists.
So, with the bin filling decisions, do we want hoe many ordered lists there are, or how many unordered sets there are? Changing the order of the decisions doesn't make a new possible outcome. To calculate how many possible outcomes there are, we keep the decision order fixed. So, we want combinations here. The number of ways to fill a histogram bin to binheight appears to be (sampcount C binheight) popcount ^ (sampcount - binheight), but I'm not sure !
That'a really satisfying to me to have such a better guess around. I've been really lucky the past week or two to be able to do hobby algorithm dev for that project I have that is now heading towards gnuradio. My next task is to get back to my computer to work on it more, after getting through that cognitive issue by posting online.
Sorry for posting more, but my cognition is so contextual, and obviously that answer is wrong. The wrong part I found is that I am assuming binheight specific samples need to be chosen, when really eaxctly sampbinheight samples need to be chosen from a popbinheight constant set among popcount. The sampcount - sampbinheight samples that are not involved are still the similar (popcount - popbinheight) ^ (sampcount - sampbinheight) total outcomes. Then we have popbinheight many population samples that are selected exactly sampbinheight times from sampcount selections. Maybe the combinations can give us how many different decisions it can occur in, and then maybe Noting that simulating this is important because it is so hard for me to verify logical things. If combinations are the factor, then the next queation would be how many different ways can we select sampbinheight items from popbinheight items. Here, do we want the count of ordered lists, or the count of unordered sets?
The question is whether we want how many ordered lists or how many unordered sets maybe? Combinations got involved for picking which decisions will be used. Now we want to know which items the decisions pick. The decisions are already made in the same order. Since they are all made in the same order, picking things in a different order is diffetent. I think we want permutations here, because each ordering is a diffetent set of events. So it's an exponent, times some combinations, times some smaller permutations.
The count for ways of doing each way of matching the bins was (popcount - popbinheight) ^ (sampcount - sampbinheight). If we already know which sampbinheight bins are
On Sun, May 30, 2021, 5:16 PM Karl <gmkarl@gmail.com> wrote:
The count for ways of doing each way of matching the bins was (popcount - popbinheight) ^ (sampcount - sampbinheight).
We also figured some things possibly around permutations and combinations, both. And remember to simulate it to find more and more mistakes.
On Sun, May 30, 2021, 5:22 PM Karl <gmkarl@gmail.com> wrote:
On Sun, May 30, 2021, 5:16 PM Karl <gmkarl@gmail.com> wrote:
The count for ways of doing each way of matching the bins was (popcount - popbinheight) ^ (sampcount - sampbinheight).
We also figured some things possibly around permutations and combinations, both.
If you are looking for how many ways to make sampbinheight bins from only a popbinheight bin, and selecting one in a different decision is a different event, then the answer could be (popbinheight P sampbinheight) .... Oh no I think this is wrong! Permutations assume they go away when you pick them, but the distribution does not change when sampled from.
And remember to simulate it to find more and more mistakes.
When we were sampling from decisions, there were only so many, but from a population summarised by a histogram, there are enough to use replacement. Is this permutations with replacement? Is that even a thing?
Okay. Permutations with replacement are indeed just n^r which I have in my mind already a little bit. So if we have a popbinheight bin we were only picking from, in a set of ordered decisions, what are we doing? In the first decision, we can pick from popbinheight options. Same in the rest. The total number of outcomes is then each subset of outcomes, for each excluded one. That's popbinheight^sampbinheight, permutations with replacement, I think. So, we might have a big exponent, times combinations, times a small exponent. The needed scale factors appear to all be present, which is heartening. Is the combinations still looking ok, now that the system is describsd differently? What is the whole expression? Remember to simulate to find more mistakes.
On Sun, May 30, 2021, 5:37 PM Karl <gmkarl@gmail.com> wrote:
Okay. Permutations with replacement are indeed just n^r which I have in my mind already a little bit.
So if we have a popbinheight bin we were only picking from, in a set of ordered decisions, what are we doing?
In the first decision, we can pick from popbinheight options. Same in the rest. The total number of outcomes is then each subset of outcomes, for each excluded one. That's popbinheight^sampbinheight, permutations with replacement, I think.
(popcount - popbinheight) ^ (sampcount - sampbinheight)
So, we might have a big exponent, times combinations, times a small exponent.
The needed scale factors appear to all be present, which is heartening. Is the combinations still looking ok, now that the system is describsd differently? What is the whole expression?
Remember to simulate to find more mistakes.
On Sun, May 30, 2021, 5:39 PM Karl <gmkarl@gmail.com> wrote:
On Sun, May 30, 2021, 5:37 PM Karl <gmkarl@gmail.com> wrote:
Okay. Permutations with replacement are indeed just n^r which I have in my mind already a little bit.
So if we have a popbinheight bin we were only picking from, in a set of ordered decisions, what are we doing?
In the first decision, we can pick from popbinheight options. Same in the rest. The total number of outcomes is then each subset of outcomes, for each excluded one. That's popbinheight^sampbinheight, permutations with replacement, I think.
(popcount - popbinheight) ^ (sampcount - sampbinheight)
I think the combinations thing had to do with selecting sampbinheight decisions from among all the decisions made sampling. So (sampcount C sampbinheight)
So, we might have a big exponent, times combinations, times a small exponent.
The needed scale factors appear to all be present, which is heartening. Is the combinations still looking ok, now that the system is describsd differently? What is the whole expression?
Remember to simulate to find more mistakes.
(popbinheight^sampbinheight) (popcount - popbinheight) ^ (sampcount - sampbinheight) (sampcount C sampbinheight) Is the combinations still looking ok, now that the system is describsd
differently?
The question is whether or not the expression counts all the possible different decision sets, without duplication, that result in exactly sampbinheight samples from popbinheight population portions.
Remember to simulate to find more mistakes.
On Sun, May 30, 2021, 5:52 PM Karl <gmkarl@gmail.com> wrote:
(popbinheight^sampbinheight) (popcount - popbinheight) ^ (sampcount - sampbinheight) (sampcount C sampbinheight)
Is the combinations still looking ok, now that the system is describsd
differently?
The question is whether or not the expression counts all the possible different decision sets, without duplication, that result in exactly sampbinheight samples from popbinheight population portions.
Goal: validate expression. Note: expression is useful on computer, not mailing list. Mailing list can build behaviors of posting hilarious attempts to think as fictional snippets.
Remember to simulate to find more mistakes.
On Sun, May 30, 2021, 5:58 PM Karl <gmkarl@gmail.com> wrote:
On Sun, May 30, 2021, 5:52 PM Karl <gmkarl@gmail.com> wrote:
(popbinheight^sampbinheight) (popcount - popbinheight) ^ (sampcount - sampbinheight) (sampcount C sampbinheight)
Is the combinations still looking ok, now that the system is describsd
differently?
The question is whether or not the expression counts all the possible different decision sets, without duplication, that result in exactly sampbinheight samples from popbinheight population portions.
Goal: validate expression. Note: expression is useful on computer, not mailing list.
We can break the outcomes into sets based on which of the decisions are the ones that pick the bin of interest. This is a selection of sampbinheight decisions from among sampcount total decisions, which cannot be reordered. (sampcount C sampbinheight). Each of these sets has the same two unknown spaces, none of which overlap, of which non-relevent decisions are made, and which of the popbinheight samples are selected. The non-relevent decisions are made from a nondepleting group that is (popcount - popbinheight) large. The sample choices are each made from a homogenous group that is popbinheight large. Selecting a different item from popbinheight is a different decision, the same way selecting a different nonrelevent item is ... I think? So we have (popcount - popbinheight) ^ (sampcount - sampbinheight) and we have popbinheight ^ sampbinheight . These are the same expressions listed above. I'm sure I made another error, but I don't know what it is, and it's nice to have a better guess.
Remember to simulate to find more mistakes.
Remember to simulate to find more mistakes. (popbinheight^sampbinheight) (popcount - popbinheight) ^ (sampcount - sampbinheight) (sampcount C sampbinheight) I'm thinking that if we wanted to consider the number of samplings that made the whole precise histogram, we would replace the larger exponent with further copies of the rest of the expression, where sampcount is reduced appropriately in each. Is there an error, because popcount is not present in the result? Not sure. Popcount is needed to know the probability of getting one of the bins, but we are already assuming that we got them all ....
participants (1)
-
Karl