Given a dictionary, say one of the frequent words lists of the University of Leipzig, given a series of words: How can you check which ones belong to the list ?

Another option would be to use the operator available since Perl 5.10: :::perl if ($word ~~ @list) {…} But this gets very slow if the size of the list increases. I wrote a naive implementation of the binary search algorithm in Perl that I would like to share. It is not that fast though. Basic but it works.

First of all the wordlist gets read:

my $dict = 'leipzig10000';
open (DICTIONARY, $dict) or die "Error...: $!\n";
my @list = ;
close (DICTIONARY) or die "Error...: $!\n";
chomp(@list);
my $number = scalar @list;

Then you have to initialize the values (given a list @words) and the upper bound:

my $start= scalar @words;
my $log2 = 0;
my $n = $number;
my ($begin, $end, $middle, $i, $word);
my $a = 0;
while ($n > 1){
    $n = $n / 2;
    $log2 = $log2 + 1;
}

Then the binary search can start:

foreach $mot (@mots) {
    $begin = 0;
    $end = $number - 1;
    $middle = int($number/2);
    $word =~ tr/A-Z/a-z/;
    $i = 0;
    while($i < $log2 + 1){
        if ($word eq lc($list[$middle])){
            $a++;
            last;
        }
        elsif ($word gt lc($list[$middle])){
            $begin = $middle;
            $middle = int(($middle + $end)/2);
        }
        else {
            $end = $middle;
            $middle = int(($begin + $middle)/2);
        }
        $i++;
    }
}

At the end it is sometimes interesting to generate a ratio:

my $ratio = ($a/$start)*100;
print "number of words: $start\n";
print "in the list: $a\n";
print "ratio: $ratio\n";`