Text::Soundex - Implementation of the soundex algorithm. Basic Usage: Soundex is used to do a one way transformation of a name, converting a character string given as input into a set of codes representing the identifiable sounds those characters might make in the output. For example: use Text::Soundex; print soundex("Mark"), "\n"; # prints: M620 print soundex("Marc"), "\n"; # prints: M620 print soundex("Hansen"), "\n"; # prints: H525 print soundex("Hanson"), "\n"; # prints: H525 print soundex("Henson"), "\n"; # prints: H525 In many situations, code such as the following: if ($name1 eq $name2) { ... } Can be substituted with: if (soundex($name1) eq soundex($name2)) { ... } Installation: Once the archive has been unpacked then the following steps are needed to build, test and install the module (to be done in the directory which contains the Makefile.PL) perl Makefile.PL make make test If the make test succeeds then the next step may need to be run as root (on a Unix-like system) or with special privileges on other systems. make install If you do not want to use the XS code (for whatever reason) do the following instead of the above: perl Makefile.PL --no-xs make make test make install If any of the tests report 'not ok' and you are running perl 5.6.0 or later then please contact Mark Mielke <mark@mielke.cc> History: Version 3.03: Updated to allow the XS implementation to work properly under an EBCDIC/EBCDIC-UTF8 character set environment. Updated documentation to better describe the history of the soundex algorithm and how it applies to this module. Version 3.02: 3.01 and 3.00 used the 'U8' type incorrectly causing some strict compilers to complain or refuse to compile the XS code. Also, Unicode support did not work properly for Perl 5.6.x. Both of these problems are now fixed. Version 3.01: A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper routines were included and the documentation refers the user to the excellent Text::Unidecode module to perform soundex encodings using unicode strings. The Perl versions of the routines have been further optimized, and correct a border case involving non-alphabetic characters at the beginning of the string. Version 3.00: Support for UTF-8 strings (unicode strings) is now in place. Note that this allows UTF-8 strings to be passed to the XS version of the soundex() routine. The Soundex algorithm treats characters outside the ascii range (0x00 - 0x7F) as if they were not alphabetical. The interface has been simplified. In order to explicitly use the non-XS implementation of soundex(): use Text::Soundex (); $code = Text::Soundex::soundex_noxs($name); In order to use the NARA soundex algorithm: use Text::Soundex 'soundex_nara'; $code = soundex_nara($name); Use of the ':NARA-Ruleset' import directive is now obsolete. To emulate the old behaviour: use Text::Soundex (); *soundex = \&Text::Soundex::soundex_nara; $code = soundex($name); Version 2.20: This version includes support for the algorithm used to index the U.S. Federal Censuses. There is a slight descrepancy in the definition for a soundex code which is not commonly known or recognized involved similar sounding letters being seperated by the characters H or W. This is defined as the NARA ruleset, as this descrepency was discovered by them. (Calling it "the US Census ruleset" was too unwieldy...) NARA can be found at: http://www.nara.gov/genealogy/ The algorithm used by NARA can be found at: http://home.utah-inter.net/kinsearch/Soundex.html Version 2.00: This version is a full re-write of the 1.0 engine by Mark Mielke. The goal was for speed... and this was achieved. There is an optional XS module which can be used completely transparently by the user which offers a further speed increase of a factor of more than 7.5X. Version 1.00: This version can be found in the perl core distribution from at least Perl 5.8.0 and down. It was written by Mike Stok. It can be identified by the fact that it does not contain a $VERSION in the beginning of the module, and as well it uses an RCS tag with a version of 1.x. This version, before some perl5'ish packaging was introduced, was actually written for perl4.