Kentucky Department for Libraries and Archives

 Catalog| Visitor Information| Service Directory| Staff Directory

 Home  > KDLA Publications > The Soundex Index System

KY Census Years with Available Soundex Indexing
  • 1880 - only households with children ten years of age or under
  • 1900
  • 1910
  • 1920

The Soundex Index System

Image: "1940 Census employee uses a punch machine to tally results by hand." (U.S. Census Bureau)

When having trouble deciphering a name, sometimes it helps to say it aloud and to figure out all the possible spelling variations. But what genealogist has time to come up with those? One would have to presume the nationality, dialect, and level of literacy of the enumerator - all things that are impossible to presume, especially when trying to decipher records over 100 years old.

A major solution to this problem was provided when Robert C. Russell of Pittsburgh, Pennsylvania created in 1918 a system of indexing information by how it sounds rather than alphabetically, known as "soundexing." This system was adopted by the U.S. government during the 1930s for the Social Security Administration in response to that agency's need to identify individuals who would be eligible to apply for old-age benefits. Because early birth records are unavailable in many states, census manuscripts became the most dependable means of verifying dates of birth for people who would qualify. The Soundex system made it possible to find a surname even though it may have been recorded under various spellings.

Soundex Coding

Soundex index entries are arranged on cards, first in Soundex code order and then alphabetically by first name of the head of household. For each person said to be living in the house at the time of the census, the Soundex card should show name, race, month and year of birth, age, citizenship status, place of residence by state and county, civil division, and for those living in towns, the city name, house number, and street name. The cards also include the volume number, enumeration district number, and page and line numbers of the original schedules.

A surname's soundex code is based upon the first letter of the name, plus a three-digit code based on the "key letters" of the name. (See chart at right.)

Code Key Letters and Equivalents
  • B,P,F,V - 1
  • C,S,K,G,J,Q,X,Z - 2
  • D,T - 3
  • L - 4
  • M, N - 5
  • R - 6
  • A,E,I,O,U,W,Y,H - omitted from names, unless the first letter

Examples:

Johnson - J-525
J + "N" + "S" + "N" = J + 5 + 2 + 5

Wilson - W-425
"W" + "L" + "S" + "N" = W + 4 + 2 + 5

If a surname that does not have enough letters to qualify it for a letter and three digit code, zeroes are added at the end if necessary to produce a four-character code. Additional letters are disregarded.

Examples:

Smith - S-530 - "I" and "H" are normally disregarded. Since this would produce a three-digit code, a zero is used at the end.

Jones - J-520 - "O" and "E" are normally disregarded.

Lee - L-000 - "E" is normally disregarded. Since this would produce a one-digit code, zeroes are used after the letter.

If the surname has any double letters, they should be treated as one letter.

Examples:

Williams - W-452 - The double "L's" have been represented by one number, "4."

Harris - H-620 - The double "R's" have been represented by one number, "6;" since this only creates a three digit code, a zero is added at the end.

If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter.
If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded.
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:

Examples:

Jackson - J-250 - The "C" is ignored since it has the same coding as the letter "K," 2.

Bascomb - B-251 - The "C" is ignored.

Lucas - L-200 - The "C" is ignored since it has the same coding as the letter "S," which is to the right of the letter "A;" since this only creates a two digit code, zeroes are added at end.

Ashcroft - A-261 - The "C" is ignored since it has the same coding as the letter "S," which is to the right of the letter "H."

If a surname has a prefix, such as Van, Con, De, Di, La, or Le, code both with and without the prefix because the surname might be listed under either code. Mc and Mac are not considered prefixes.

With this indexing approach, many different surnames will be included within the same Soundex code. For example, similar sounding surnames such as Allen, Alan, Allan, and Allyn are coded as A-450. This way, no matter how an enumerator spelled a surname, the records can be accessed from one central location. Within this code, the individual and family cards are arranged alphabetically by given name, or by known nicknames, middle names, or abbreviations of the first name. Also, the Soundex can be a means of determining how a family, or those sharing the same surname, is distributed throughout a state.

Please note, the names of nuns, Native Americans and Asians may be a challenge to locate. Phonetically spelled Asian and Native American names were either coded as one continuous name or by what seemed to be a surname. Nuns were coded as if "Sister" were the surname, and they appear in each state's Soundex under the code S236.

For those totally overwhelmed with Soundex coding, a free online converter for surnames into soundex codes is available within the RootsWeb site at http://resources.rootsweb.com/cgi-bin/soundexconverter.

Information Updated:04/21/2005