1. <small id='GAvcS'></small><noframes id='GAvcS'>

      <i id='GAvcS'><tr id='GAvcS'><dt id='GAvcS'><q id='GAvcS'><span id='GAvcS'><b id='GAvcS'><form id='GAvcS'><ins id='GAvcS'></ins><ul id='GAvcS'></ul><sub id='GAvcS'></sub></form><legend id='GAvcS'></legend><bdo id='GAvcS'><pre id='GAvcS'><center id='GAvcS'></center></pre></bdo></b><th id='GAvcS'></th></span></q></dt></tr></i><div id='GAvcS'><tfoot id='GAvcS'></tfoot><dl id='GAvcS'><fieldset id='GAvcS'></fieldset></dl></div>
      <legend id='GAvcS'><style id='GAvcS'><dir id='GAvcS'><q id='GAvcS'></q></dir></style></legend>

      <tfoot id='GAvcS'></tfoot>
        • <bdo id='GAvcS'></bdo><ul id='GAvcS'></ul>

        计算 UTF8 字符串的 MD5 哈希值

        Compute MD5 hash of a UTF8 string(计算 UTF8 字符串的 MD5 哈希值)
        • <bdo id='KILCN'></bdo><ul id='KILCN'></ul>

            • <legend id='KILCN'><style id='KILCN'><dir id='KILCN'><q id='KILCN'></q></dir></style></legend>
                <i id='KILCN'><tr id='KILCN'><dt id='KILCN'><q id='KILCN'><span id='KILCN'><b id='KILCN'><form id='KILCN'><ins id='KILCN'></ins><ul id='KILCN'></ul><sub id='KILCN'></sub></form><legend id='KILCN'></legend><bdo id='KILCN'><pre id='KILCN'><center id='KILCN'></center></pre></bdo></b><th id='KILCN'></th></span></q></dt></tr></i><div id='KILCN'><tfoot id='KILCN'></tfoot><dl id='KILCN'><fieldset id='KILCN'></fieldset></dl></div>
                <tfoot id='KILCN'></tfoot>
                  <tbody id='KILCN'></tbody>
              1. <small id='KILCN'></small><noframes id='KILCN'>

                  本文介绍了计算 UTF8 字符串的 MD5 哈希值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我有一个 SQL 表,我在其中存储必须唯一的大字符串值.为了确保唯一性,我在一个列上有一个唯一索引,我在其中存储了大字符串的 MD5 哈希的字符串表示.

                  I have an SQL table in which I store large string values that must be unique. In order to ensure the uniqueness, I have a unique index on a column in which I store a string representation of the MD5 hash of the large string.

                  保存这些记录的 C# 应用程序使用以下方法进行散列:

                  The C# app that saves these records uses the following method to do the hashing:

                  public static string CreateMd5HashString(byte[] input)
                  {
                      var hashBytes = MD5.Create().ComputeHash(input);
                      return string.Join("", hashBytes.Select(b => b.ToString("X")));
                  }
                  

                  为了调用它,我首先使用UTF-8编码将string转换为byte[]:

                  In order to call this, I first convert the string to byte[] using the UTF-8 encoding:

                  // this is what I use in my app
                  CreateMd5HashString(Encoding.UTF8.GetBytes("abc"))
                  // result: 90150983CD24FB0D6963F7D28E17F72
                  

                  现在我希望能够在 SQL 中实现这个散列函数,使用 HASHBYTES 函数,但我得到不同的值:

                  Now I would like to be able to implement this hashing function in SQL, using the HASHBYTES function, but I get a different value:

                  print hashbytes('md5', N'abc')
                  -- result: 0xCE1473CF80C6B3FDA8E3DFC006ADC315
                  

                  这是因为 SQL 计算字符串的 UTF-16 表示的 MD5.如果我执行 CreateMd5HashString(Encoding.Unicode.GetBytes("abc")),我在 C# 中得到相同的结果.

                  This is because SQL computes the MD5 of the UTF-16 representation of the string. I get the same result in C# if I do CreateMd5HashString(Encoding.Unicode.GetBytes("abc")).

                  我无法更改应用程序中进行散列的方式.

                  I cannot change the way hashing is done in the application.

                  有没有办法让 SQL Server 计算字符串的 UTF-8 字节的 MD5 哈希值?

                  Is there a way to get SQL Server to compute the MD5 hash of the UTF-8 bytes of the string?

                  我查找了类似的问题,我尝试使用排序规则,但到目前为止还没有运气.

                  I looked up similar questions, I tried using collations, but had no luck so far.

                  推荐答案

                  您需要创建一个 UDF 来将 NVARCHAR 数据转换为 UTF-8 表示形式的字节.假设它被称为 dbo.NCharToUTF8Binary 那么你可以这样做:

                  You need to create a UDF to convert the NVARCHAR data to bytes in UTF-8 Representation. Say it is called dbo.NCharToUTF8Binary then you can do:

                  hashbytes('md5', dbo.NCharToUTF8Binary(N'abc', 1))
                  

                  这是一个可以做到这一点的 UDF:

                  Here is a UDF which will do that:

                  create function dbo.NCharToUTF8Binary(@txt NVARCHAR(max), @modified bit)
                  returns varbinary(max)
                  as
                  begin
                  -- Note: This is not the fastest possible routine. 
                  -- If you want a fast routine, use SQLCLR
                      set @modified = isnull(@modified, 0)
                      -- First shred into a table.
                      declare @chars table (
                      ix int identity primary key,
                      codepoint int,
                      utf8 varbinary(6)
                      )
                      declare @ix int
                      set @ix = 0
                      while @ix < datalength(@txt)/2  -- trailing spaces
                      begin
                          set @ix = @ix + 1
                          insert @chars(codepoint)
                          select unicode(substring(@txt, @ix, 1))
                      end
                  
                      -- Now look for surrogate pairs.
                      -- If we find a pair (lead followed by trail) we will pair them
                      -- High surrogate is uD800 to uDBFF
                      -- Low surrogate  is uDC00 to uDFFF
                      -- Look for high surrogate followed by low surrogate and update the codepoint   
                      update c1 set codepoint = ((c1.codepoint & 0x07ff) * 0x0800) + (c2.codepoint & 0x07ff) + 0x10000
                      from @chars c1 inner join @chars c2 on c1.ix = c2.ix -1
                      where c1.codepoint >= 0xD800 and c1.codepoint <=0xDBFF
                      and c2.codepoint >= 0xDC00 and c2.codepoint <=0xDFFF
                      -- Get rid of the trailing half of the pair where found
                      delete c2 
                      from @chars c1 inner join @chars c2 on c1.ix = c2.ix -1
                      where c1.codepoint >= 0x10000
                  
                      -- Now we utf-8 encode each codepoint.
                      -- Lone surrogate halves will still be here
                      -- so they will be encoded as if they were not surrogate pairs.
                      update c 
                      set utf8 = 
                      case 
                      -- One-byte encodings (modified UTF8 outputs zero as a two-byte encoding)
                      when codepoint <= 0x7f and (@modified = 0 OR codepoint <> 0)
                      then cast(substring(cast(codepoint as binary(4)), 4, 1) as varbinary(6))
                      -- Two-byte encodings
                      when codepoint <= 0x07ff
                      then substring(cast((0x00C0 + ((codepoint/0x40) & 0x1f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                      -- Three-byte encodings
                      when codepoint <= 0x0ffff
                      then substring(cast((0x00E0 + ((codepoint/0x1000) & 0x0f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x40) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                      -- Four-byte encodings 
                      when codepoint <= 0x1FFFFF
                      then substring(cast((0x00F0 + ((codepoint/0x00040000) & 0x07)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x1000) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x40) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                  
                      end
                      from @chars c
                  
                      -- Finally concatenate them all and return.
                      declare @ret varbinary(max)
                      set @ret = cast('' as varbinary(max))
                      select @ret = @ret + utf8 from @chars c order by ix
                      return  @ret
                  
                  end
                  

                  这篇关于计算 UTF8 字符串的 MD5 哈希值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Building a comma separated list?(建立一个逗号分隔的列表?)
                  Errors in SQL Server while importing CSV file despite varchar(MAX) being used for each column(尽管每列都使用了 varchar(MAX),但在导入 CSV 文件时 SQL Server 中出现错误)
                  How can I import an Excel file into SQL Server?(如何将 Excel 文件导入 SQL Server?)
                  Export table to file with column headers (column names) using the bcp utility and SQL Server 2008(使用 bcp 实用程序和 SQL Server 2008 将表导出到带有列标题(列名称)的文件)
                  Concat field value to string in SQL Server(将字段值连接到 SQL Server 中的字符串)
                  SQL Server Bulk insert of CSV file with inconsistent quotes(SQL Server 批量插入带有不一致引号的 CSV 文件)

                  <tfoot id='LufEw'></tfoot>
                  • <bdo id='LufEw'></bdo><ul id='LufEw'></ul>

                      1. <small id='LufEw'></small><noframes id='LufEw'>

                          <i id='LufEw'><tr id='LufEw'><dt id='LufEw'><q id='LufEw'><span id='LufEw'><b id='LufEw'><form id='LufEw'><ins id='LufEw'></ins><ul id='LufEw'></ul><sub id='LufEw'></sub></form><legend id='LufEw'></legend><bdo id='LufEw'><pre id='LufEw'><center id='LufEw'></center></pre></bdo></b><th id='LufEw'></th></span></q></dt></tr></i><div id='LufEw'><tfoot id='LufEw'></tfoot><dl id='LufEw'><fieldset id='LufEw'></fieldset></dl></div>
                          • <legend id='LufEw'><style id='LufEw'><dir id='LufEw'><q id='LufEw'></q></dir></style></legend>
                              <tbody id='LufEw'></tbody>