text analysis

Mok-Kong Shen mok-kong.shen at stud.uni-muenchen.de
Thu Aug 6 04:44:34 PDT 1998


CyberPsychotic wrote:

> text). Anyways, when things come to 2 characters set, i have to get 1024
> character set, and so on, which looks quite unreasonable to me to allocate
> memory for elements, which probably will be never found in text... I was
> thinking of other solution and came to two way connected lists (correct
> term?)  things, i.e. : i have some structure like:
> 
> struct element {
> char value[ELEMENT_LENGTH];
> unsigned int frequency;
> struct element *previous;
> struct element *next;
> }
>  and could dinamically allocate memory for each new found element, but
> this would slow down whole code by the time list of new elements grow up.

I think currently memory is cheap enough so that you could do
frequency counts of at least trigrams with one dimensional array.

M. K. Shen






More information about the Testlist mailing list