200 Words – Parsing Non-Alphanumeric Characters and Ligatures in SQL Server

Date February 6, 2009 by: Brendan

200_words

I’m always running into situations where special characters need to be ignored when my applications communicate with SQL Server. I have library functions that I’ve written that parse out special characters, whether it be a specific set that are unique to the application, or simply all non-alphanumeric. While the functions work for simple cases, I often run into scenarios where I need to manage these filters at the View level. Having even a single function within your View can affect the performance of your queries. Instead, I leave the base View alone, and do the formatting locally within the stored procedure.

I came across an elegant solution, putting character codes into a static table (Utility.CharCode in this case) within your database. Then, you can update your variable or field with a simple UPDATE statement:

UPDATE Utility.CharCode
SET @txt_value = REPLACE(@txt_value, CHAR(CharCode), ”)

However, the two-letter combinations “oe” and “ae” are translated to specific character codes. Codes 140, 156, 198, and 230 must be eliminated from your table. Œ, and æ are ligatures in the Latin alphabet. Depending on your collation, SQL will translate the character code for these ligatures to the English letter equivalents, stripping any occurrences from your string.

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Live
  • StumbleUpon
  • Technorati
  • TwitThis
  • NewsVine
  • Pownce
  • Slashdot
  • e-mail
  • Reddit
  • Sphinn
  • Mixx
  • Fark
  • Blogosphere News

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>