Given a dictionary, say one of the frequent words lists of the University of Leipzig, given a series of words: How can you check which ones belong to the list ?
Another option would be to use the operator available since Perl 5.10: :::perl if ($word ~~ @list) {…} But this gets very slow if the size of the list increases. I wrote a naive implementation of the binary search algorithm in Perl that I would like to share. It is not that fast though. Basic but it works.
First of all the wordlist gets read:
my $dict = 'leipzig10000';
open (DICTIONARY, $dict) or die "Error...: $!\n";
my @list = ;
close (DICTIONARY) or die "Error...: $!\n";
chomp(@list);
my $number = scalar @list;
Then you have to initialize the values (given a list @words) and the upper bound:
my $start= scalar @words;
my $log2 = 0;
my $n = $number;
my ($begin, $end, $middle, $i, $word);
my $a = 0;
while ($n > 1){
$n = $n / 2;
$log2 = $log2 + 1;
}
Then the binary search can start:
foreach $mot (@mots) {
$begin = 0;
$end = $number - 1;
$middle = int($number/2);
$word =~ tr/A-Z/a-z/;
$i = 0;
while($i < $log2 + 1){
if ($word eq lc($list[$middle])){
$a++;
last;
}
elsif ($word gt lc($list[$middle])){
$begin = $middle;
$middle = int(($middle + $end)/2);
}
else {
$end = $middle;
$middle = int(($begin + $middle)/2);
}
$i++;
}
}
At the end it is sometimes interesting to generate a ratio:
my $ratio = ($a/$start)*100;
print "number of words: $start\n";
print "in the list: $a\n";
print "ratio: $ratio\n";`