text analysis

Mok-Kong Shen mok-kong.shen at stud.uni-muenchen.de
Thu Aug 6 04:44:34 PDT 1998


CyberPsychotic wrote:

> text). Anyways, when things come to 2 characters set, i have to get 1024
> character set, and so on, which looks quite unreasonable to me to allocate
> memory for elements, which probably will be never found in text... I was
> thinking of other solution and came to two way connected lists (correct
> term?)  things, i.e. : i have some structure like:
> 
> struct element {
> char value[ELEMENT_LENGTH];
> unsigned int frequency;
> struct element *previous;
> struct element *next;
> }
>  and could dinamically allocate memory for each new found element, but
> this would slow down whole code by the time list of new elements grow up.

I think currently memory is cheap enough so that you could do
frequency counts of at least trigrams with one dimensional array.

M. K. Shen






More information about the cypherpunks-legacy mailing list